torch.load() 、torch.load_state_dict() 详解

‍个人简介： 深度学习图像领域工作者
总结链接：
链接中主要是个人工作的总结，每个链接都是一些常用demo，代码直接复制运行即可。包括：
1.工作中常用深度学习脚本
2.torch、numpy等常用函数详解
3.opencv 图片、视频等操作
4.个人工作中的项目总结（纯干活）
链接： https://blog.csdn.net/qq_28949847/article/details/128552785
视频讲解： 以上记录，通过B站等平台进行了视频讲解使用，可搜索 ‘Python图像识别’ 进行观看
B站：Python图像识别
抖音：Python图像识别
西瓜视频：Python图像识别

1. torch.load()

函数格式为：torch.load(f, map_location=None, pickle_module=pickle, **pickle_load_args)一般我们使用的时候，基本只使用前两个参数。
map_location参数: 具体来说，map_location参数是用于重定向，比如此前模型的参数是在cpu中的，我们希望将其加载到cuda:0中。或者我们有多张卡，那么我们就可以将卡1中训练好的模型加载到卡2中，这在数据并行的分布式深度学习中可能会用到。

（1）map_location=None
不指定map_location，默认以训练保存模型时的位置加载，也就是训练在cuda:0，在不指定map_location时，load也是在cuda:0上，相应的训练在cuda:1，那么load也在cuda:1上

model = HighResolutionNet(base_channel=32, num_joints=17)weights_dict = torch.load("./pose_hrnet_w32_256x192.pth")# 打印模型权重所在的位置print(weights_dict['conv1.weight'].device)print('weights_dict.keys():', weights_dict.keys())

结果为：

cuda:0

此处结果为 cuda:0，是因为加载的模型是在cuda:0上训练的，所以加载进来也是。

（2）map_location=cpu

将模型参数加载在CPU上

model = HighResolutionNet(base_channel=32, num_joints=17)print('model:', model)weights_dict = torch.load("./pose_hrnet_w32_256x192.pth", map_location='cpu')print('weights_dict:', weights_dict)# 打印模型权重所在的位置print(weights_dict['conv1.weight'].device)print('weights_dict.keys():', weights_dict.keys())

结果为：

cpu

模型从cuda:0变成了cpu。

（3）map_location={xx:xx}

model = HighResolutionNet(base_channel=32, num_joints=17)print('model:', model)weights_dict = torch.load("./pose_hrnet_w32_256x192.pth", map_location={'cuda:0':'cuda:1'})print('weights_dict:', weights_dict)# 打印模型权重所在的位置print(weights_dict['conv1.weight'].device)print('weights_dict.keys():', weights_dict.keys())

结果为：

cuda:1

模型从cuda:0变成了cuda:1

model = HighResolutionNet(base_channel=32, num_joints=17)print('model:', model)weights_dict = torch.load("./pose_hrnet_w32_256x192.pth", map_location={'cuda:1':'cuda:2'})print('weights_dict:', weights_dict)# 打印模型权重所在的位置print(weights_dict['conv1.weight'].device)print('weights_dict.keys():', weights_dict.keys())

结果为：

cuda:0

模型还是cuda:0，并没有变成cpu。因为这个map_location的映射是不对的，原始的模型就是cuda:0，而映射是cuda:2到cpu，是不对的。这种情况下，map_location返回None，也就是和不加map_location相同。

2. torch.load_state_dict()

在pytorch中构建好一个模型后，一般需要将torch.load()的预训练权重加载到自己的模型重。torch.load_state_dict()函数就是用于将预训练的参数权重加载到新的模型之中，操作方式如下所示：

# 模型初始化model = HighResolutionNet(base_channel=32, num_joints=17)# 读取官方的模型参数weights_dict = torch.load("./pose_hrnet_w32_256x192.pth", map_location='cpu')# 加载官方模型参数到模型中model.load_state_dict(weights_dict, strict=False)

在load_state_dict中，我们重点关注的是属性 strict，当strict=True,要求预训练权重层数的键值与新构建的模型中的权重层数名称完全吻合；如果新构建的模型在层数上进行了部分微调，则上述代码就会报错：说key对应不上。

此时，如果我们采用strict=False 就能够完美的解决这个问题。与训练权重中与新构建网络中匹配层的键值就进行使用，没有的就默认初始化。

完整测试代码：

import torchimport torch.nn as nnBN_MOMENTUM = 0.1class BasicBlock(nn.Module):expansion = 1def __init__(self, inplanes, planes, stride=1, downsample=None):super(BasicBlock, self).__init__()self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=3, stride=stride, padding=1, bias=False)self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)self.relu = nn.ReLU(inplace=True)self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)self.downsample = downsampleself.stride = stridedef forward(self, x):residual = xout = self.conv1(x)out = self.bn1(out)out = self.relu(out)out = self.conv2(out)out = self.bn2(out)if self.downsample is not None:residual = self.downsample(x)out += residualout = self.relu(out)return outclass Bottleneck(nn.Module):expansion = 4def __init__(self, inplanes, planes, stride=1, downsample=None):super(Bottleneck, self).__init__()self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1, bias=False)self.bn3 = nn.BatchNorm2d(planes * self.expansion,momentum=BN_MOMENTUM)self.relu = nn.ReLU(inplace=True)self.downsample = downsampleself.stride = stridedef forward(self, x):residual = xout = self.conv1(x)out = self.bn1(out)out = self.relu(out)out = self.conv2(out)out = self.bn2(out)out = self.relu(out)out = self.conv3(out)out = self.bn3(out)if self.downsample is not None:residual = self.downsample(x)out += residualout = self.relu(out)return outclass StageModule(nn.Module):def __init__(self, input_branches, output_branches, c):"""构建对应stage，即用来融合不同尺度的实现:param input_branches: 输入的分支数，每个分支对应一种尺度:param output_branches: 输出的分支数:param c: 输入的第一个分支通道数"""super().__init__()self.input_branches = input_branchesself.output_branches = output_branchesself.branches = nn.ModuleList()for i in range(self.input_branches):# 每个分支上都先通过4个BasicBlockw = c * (2 ** i)# 对应第i个分支的通道数branch = nn.Sequential(BasicBlock(w, w),BasicBlock(w, w),BasicBlock(w, w),BasicBlock(w, w))self.branches.append(branch)self.fuse_layers = nn.ModuleList()# 用于融合每个分支上的输出for i in range(self.output_branches):self.fuse_layers.append(nn.ModuleList())for j in range(self.input_branches):if i == j:# 当输入、输出为同一个分支时不做任何处理self.fuse_layers[-1].append(nn.Identity())elif i < j:# 当输入分支j大于输出分支i时(即输入分支下采样率大于输出分支下采样率)，# 此时需要对输入分支j进行通道调整以及上采样，方便后续相加self.fuse_layers[-1].append(nn.Sequential(nn.Conv2d(c * (2 ** j), c * (2 ** i), kernel_size=1, stride=1, bias=False),nn.BatchNorm2d(c * (2 ** i), momentum=BN_MOMENTUM),nn.Upsample(scale_factor=2.0 ** (j - i), mode='nearest')))else:# i > j# 当输入分支j小于输出分支i时(即输入分支下采样率小于输出分支下采样率)，# 此时需要对输入分支j进行通道调整以及下采样，方便后续相加# 注意，这里每次下采样2x都是通过一个3x3卷积层实现的，4x就是两个，8x就是三个，总共i-j个ops = []# 前i-j-1个卷积层不用变通道，只进行下采样for k in range(i - j - 1):ops.append(nn.Sequential(nn.Conv2d(c * (2 ** j), c * (2 ** j), kernel_size=3, stride=2, padding=1, bias=False),nn.BatchNorm2d(c * (2 ** j), momentum=BN_MOMENTUM),nn.ReLU(inplace=True)))# 最后一个卷积层不仅要调整通道，还要进行下采样ops.append(nn.Sequential(nn.Conv2d(c * (2 ** j), c * (2 ** i), kernel_size=3, stride=2, padding=1, bias=False),nn.BatchNorm2d(c * (2 ** i), momentum=BN_MOMENTUM)))self.fuse_layers[-1].append(nn.Sequential(*ops))self.relu = nn.ReLU(inplace=True)def forward(self, x):# 每个分支通过对应的blockx = [branch(xi) for branch, xi in zip(self.branches, x)]# 接着融合不同尺寸信息x_fused = []for i in range(len(self.fuse_layers)):x_fused.append(self.relu(sum([self.fuse_layers[i][j](x[j]) for j in range(len(self.branches))])))return x_fusedclass HighResolutionNet(nn.Module):def __init__(self, base_channel: int = 32, num_joints: int = 17):super().__init__()# Stemself.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=2, padding=1, bias=False)self.bn1 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)self.conv2 = nn.Conv2d(64, 64, kernel_size=3, stride=2, padding=1, bias=False)self.bn2 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)self.relu = nn.ReLU(inplace=True)# Stage1downsample = nn.Sequential(nn.Conv2d(64, 256, kernel_size=1, stride=1, bias=False),nn.BatchNorm2d(256, momentum=BN_MOMENTUM))self.layer1 = nn.Sequential(Bottleneck(64, 64, downsample=downsample),Bottleneck(256, 64),Bottleneck(256, 64),Bottleneck(256, 64))self.transition1 = nn.ModuleList([nn.Sequential(nn.Conv2d(256, base_channel, kernel_size=3, stride=1, padding=1, bias=False),nn.BatchNorm2d(base_channel, momentum=BN_MOMENTUM),nn.ReLU(inplace=True)),nn.Sequential(nn.Sequential(# 这里又使用一次Sequential是为了适配原项目中提供的权重nn.Conv2d(256, base_channel * 2, kernel_size=3, stride=2, padding=1, bias=False),nn.BatchNorm2d(base_channel * 2, momentum=BN_MOMENTUM),nn.ReLU(inplace=True)))])# Stage2self.stage2 = nn.Sequential(StageModule(input_branches=2, output_branches=2, c=base_channel))# transition2self.transition2 = nn.ModuleList([nn.Identity(),# None,- Used in place of "None" because it is callablenn.Identity(),# None,- Used in place of "None" because it is callablenn.Sequential(nn.Sequential(nn.Conv2d(base_channel * 2, base_channel * 4, kernel_size=3, stride=2, padding=1, bias=False),nn.BatchNorm2d(base_channel * 4, momentum=BN_MOMENTUM),nn.ReLU(inplace=True)))])# Stage3self.stage3 = nn.Sequential(StageModule(input_branches=3, output_branches=3, c=base_channel),StageModule(input_branches=3, output_branches=3, c=base_channel),StageModule(input_branches=3, output_branches=3, c=base_channel),StageModule(input_branches=3, output_branches=3, c=base_channel))# transition3self.transition3 = nn.ModuleList([nn.Identity(),# None,- Used in place of "None" because it is callablenn.Identity(),# None,- Used in place of "None" because it is callablenn.Identity(),# None,- Used in place of "None" because it is callablenn.Sequential(nn.Sequential(nn.Conv2d(base_channel * 4, base_channel * 8, kernel_size=3, stride=2, padding=1, bias=False),nn.BatchNorm2d(base_channel * 8, momentum=BN_MOMENTUM),nn.ReLU(inplace=True)))])# Stage4# 注意，最后一个StageModule只输出分辨率最高的特征层self.stage4 = nn.Sequential(StageModule(input_branches=4, output_branches=4, c=base_channel),StageModule(input_branches=4, output_branches=4, c=base_channel),StageModule(input_branches=4, output_branches=1, c=base_channel))# Final layerself.final_layer = nn.Conv2d(base_channel, num_joints, kernel_size=1, stride=1)def forward(self, x):x = self.conv1(x)x = self.bn1(x)x = self.relu(x)x = self.conv2(x)x = self.bn2(x)x = self.relu(x)x = self.layer1(x)x = [trans(x) for trans in self.transition1]# Since now, x is a listx = self.stage2(x)x = [self.transition2[0](x[0]),self.transition2[1](x[1]),self.transition2[2](x[-1])]# New branch derives from the "upper" branch onlyx = self.stage3(x)x = [self.transition3[0](x[0]),self.transition3[1](x[1]),self.transition3[2](x[2]),self.transition3[3](x[-1]),]# New branch derives from the "upper" branch onlyx = self.stage4(x)x = self.final_layer(x[0])return xif __name__ == '__main__':# 模型初始化model = HighResolutionNet(base_channel=32, num_joints=17)print('model:', model)weights_dict = torch.load("./pose_hrnet_w32_256x192.pth", map_location='cpu')print('weights_dict:', weights_dict)# 打印模型权重所在的位置print(weights_dict['conv1.weight'].device)print('weights_dict.keys():', weights_dict.keys())for k in list(weights_dict.keys()):# 如果载入的是imagenet权重，就删除无用权重if "head" in k or "fc" in k:del weights_dict[k]# 如果载入的是coco权重，17，如果不相等就删除if "final_layer" in k:if weights_dict[k].shape[0] != 17:del weights_dict[k]missing_keys, unexpected_keys = model.load_state_dict(weights_dict, strict=False)if len(missing_keys) != 0:print("missing_keys: ", missing_keys)

torch.load() 、torch.load_state_dict() 详解

1. torch.load()

2. torch.load_state_dict()

最新关注

热文推荐

【C语言】案例：输出n位水仙花数

时序预测 | MATLAB实现贝叶斯优化CNN-LSTM时间序列预测(股票价格预测)

Android 8.1 铃声音量通话音量同步调节

Node.js学习笔记-03

大一C语言作业 12.14

pytorch-实现天气识别

torch.load() 、torch.load_state_dict() 详解

1. torch.load()

2. torch.load_state_dict()

相关文章

最新关注

热文推荐