YOLOv7 网络结构图详解

    • yolo.py 输出结构
    • 整体图
    • yolov7.yaml
    • 组件结构
      • CBS 模块
      • ELAN1
      • ELAN2
      • MP1&2
        • MP1
        • MP2
      • SPPCSPC
  • 参考

yolo.py 输出结构

输出的 arguments 和yaml文件的区别就是 多了第一列Conv输入的通道数

YOLORv0.1-112-g55b90e1 torch 1.7.0 CUDA:0 (Quadro RTX 4000, 8191.6875MB) fromnparamsmodulearguments 0-11 928models.common.Conv[3, 32, 3, 1] 1-11 18560models.common.Conv[32, 64, 3, 2]2-11 36992models.common.Conv[64, 64, 3, 1]3-11 73984models.common.Conv[64, 128, 3, 2] 4-118320models.common.Conv[128, 64, 1, 1] 5-218320models.common.Conv[128, 64, 1, 1] 6-11 36992models.common.Conv[64, 64, 3, 1]7-11 36992models.common.Conv[64, 64, 3, 1]8-11 36992models.common.Conv[64, 64, 3, 1]9-11 36992models.common.Conv[64, 64, 3, 1] 10[-1, -3, -5, -6]1 0models.common.Concat[1]11-11 66048models.common.Conv[256, 256, 1, 1] 12-11 0models.common.MP[] 13-11 33024models.common.Conv[256, 128, 1, 1] 14-31 33024models.common.Conv[256, 128, 1, 1] 15-11147712models.common.Conv[128, 128, 3, 2] 16[-1, -3]1 0models.common.Concat[1]17-11 33024models.common.Conv[256, 128, 1, 1] 18-21 33024models.common.Conv[256, 128, 1, 1] 19-11147712models.common.Conv[128, 128, 3, 1] 20-11147712models.common.Conv[128, 128, 3, 1] 21-11147712models.common.Conv[128, 128, 3, 1] 22-11147712models.common.Conv[128, 128, 3, 1] 23[-1, -3, -5, -6]1 0models.common.Concat[1]24-11263168models.common.Conv[512, 512, 1, 1] 25-11 0models.common.MP[] 26-11131584models.common.Conv[512, 256, 1, 1] 27-31131584models.common.Conv[512, 256, 1, 1] 28-11590336models.common.Conv[256, 256, 3, 2] 29[-1, -3]1 0models.common.Concat[1]30-11131584models.common.Conv[512, 256, 1, 1] 31-21131584models.common.Conv[512, 256, 1, 1] 32-11590336models.common.Conv[256, 256, 3, 1] 33-11590336models.common.Conv[256, 256, 3, 1] 34-11590336models.common.Conv[256, 256, 3, 1] 35-11590336models.common.Conv[256, 256, 3, 1] 36[-1, -3, -5, -6]1 0models.common.Concat[1]37-11 1050624models.common.Conv[1024, 1024, 1, 1] 38-11 0models.common.MP[] 39-11525312models.common.Conv[1024, 512, 1, 1]40-31525312models.common.Conv[1024, 512, 1, 1]41-11 2360320models.common.Conv[512, 512, 3, 2] 42[-1, -3]1 0models.common.Concat[1]43-11262656models.common.Conv[1024, 256, 1, 1]44-21262656models.common.Conv[1024, 256, 1, 1]45-11590336models.common.Conv[256, 256, 3, 1] 46-11590336models.common.Conv[256, 256, 3, 1] 47-11590336models.common.Conv[256, 256, 3, 1] 48-11590336models.common.Conv[256, 256, 3, 1] 49[-1, -3, -5, -6]1 0models.common.Concat[1]50-11 1050624models.common.Conv[1024, 1024, 1, 1] 51-11 7609344models.common.SPPCSPC [1024, 512, 1] 52-11131584models.common.Conv[512, 256, 1, 1] 53-11 0torch.nn.modules.upsampling.Upsample[None, 2, 'nearest'] 54371262656models.common.Conv[1024, 256, 1, 1]55[-1, -2]1 0models.common.Concat[1]56-11131584models.common.Conv[512, 256, 1, 1] 57-21131584models.common.Conv[512, 256, 1, 1] 58-11295168models.common.Conv[256, 128, 3, 1] 59-11147712models.common.Conv[128, 128, 3, 1] 60-11147712models.common.Conv[128, 128, 3, 1] 61-11147712models.common.Conv[128, 128, 3, 1] 62[-1, -2, -3, -4, -5, -6]1 0models.common.Concat[1]63-11262656models.common.Conv[1024, 256, 1, 1]64-11 33024models.common.Conv[256, 128, 1, 1] 65-11 0torch.nn.modules.upsampling.Upsample[None, 2, 'nearest'] 66241 65792models.common.Conv[512, 128, 1, 1] 67[-1, -2]1 0models.common.Concat[1]68-11 33024models.common.Conv[256, 128, 1, 1] 69-21 33024models.common.Conv[256, 128, 1, 1] 70-11 73856models.common.Conv[128, 64, 3, 1]71-11 36992models.common.Conv[64, 64, 3, 1] 72-11 36992models.common.Conv[64, 64, 3, 1] 73-11 36992models.common.Conv[64, 64, 3, 1] 74[-1, -2, -3, -4, -5, -6]1 0models.common.Concat[1]75-11 65792models.common.Conv[512, 128, 1, 1] 76-11 0models.common.MP[] 77-11 16640models.common.Conv[128, 128, 1, 1] 78-31 16640models.common.Conv[128, 128, 1, 1] 79-11147712models.common.Conv[128, 128, 3, 2] 80[-1, -3, 63]1 0models.common.Concat[1]81-11131584models.common.Conv[512, 256, 1, 1] 82-21131584models.common.Conv[512, 256, 1, 1] 83-11295168models.common.Conv[256, 128, 3, 1] 84-11147712models.common.Conv[128, 128, 3, 1] 85-11147712models.common.Conv[128, 128, 3, 1] 86-11147712models.common.Conv[128, 128, 3, 1] 87[-1, -2, -3, -4, -5, -6]1 0models.common.Concat[1]88-11262656models.common.Conv[1024, 256, 1, 1]89-11 0models.common.MP[] 90-11 66048models.common.Conv[256, 256, 1, 1] 91-31 66048models.common.Conv[256, 256, 1, 1] 92-11590336models.common.Conv[256, 256, 3, 2] 93[-1, -3, 51]1 0models.common.Concat[1]94-11525312models.common.Conv[1024, 512, 1, 1]95-21525312models.common.Conv[1024, 512, 1, 1]96-11 1180160models.common.Conv[512, 256, 3, 1] 97-11590336models.common.Conv[256, 256, 3, 1] 98-11590336models.common.Conv[256, 256, 3, 1] 99-11590336models.common.Conv[256, 256, 3, 1]100[-1, -2, -3, -4, -5, -6]1 0models.common.Concat[1] 101-11 1049600models.common.Conv[2048, 512, 1, 1] 102751328704models.common.RepConv [128, 256, 3, 1]103881 1312768models.common.RepConv [256, 512, 3, 1]104 1011 5246976models.common.RepConv [512, 1024, 3, 1] 105 [102, 103, 104]1 39550IDetect [2, [[12, 16, 19, 36, 40, 28], [36, 75, 76, 55, 72, 146], [142, 110, 192, 243, 459, 401]], [256, 512, 1024]]Model Summary: 415 layers, 37201950 parameters, 37201950 gradients, 105.1 GFLOPS

整体图

整体图如下所示,这个有有yaml层数,下一张有具体输出,第三张b导的简洁一些,结合3张图起来看配合yaml文件,基本就很好理解了。


yolov7.yaml

[-1, 1, Conv, [32, 3, 1] 其中的[32, 3, 1] 表示输出通道数为32 ,卷积核为3*3,步长为2
边看整体网络结构图,边看yaml文件,对着看。
注意:
backbone 和 head中的模块MP-1和MP-2区别,backbone中尺寸减半通道数不变,head中尺寸减半通道数变成两倍
backbone 和 head中的模块ELAN-1和ELAN-2的区别,banbone中通道数变成两倍,head中减半

ELAN在backbone中扩张我估计是为了更好的提取特征,而MP-1通道数减半,可以把它理解为改进版本的下采样。

# parametersnc: 2# number of classesdepth_multiple: 1.0# model depth multiplewidth_multiple: 1.0# layer channel multiple# anchorsanchors:- [12,16, 19,36, 40,28]# P3/8- [36,75, 76,55, 72,146]# P4/16- [142,110, 192,243, 459,401]# P5/32# yolov7 backbonebackbone:# [from, number, module, args] 640*640*3[[-1, 1, Conv, [32, 3, 1]],# 0 640*640*32 [-1, 1, Conv, [64, 3, 2]],# 1-P1/2320*320*64 [-1, 1, Conv, [64, 3, 1]],# 320*320*64[-1, 1, Conv, [128, 3, 2]],# 3-P2/4 160*160*128 # ELAN1 [-1, 1, Conv, [64, 1, 1]], [-2, 1, Conv, [64, 1, 1]], [-1, 1, Conv, [64, 3, 1]], [-1, 1, Conv, [64, 3, 1]], [-1, 1, Conv, [64, 3, 1]], [-1, 1, Conv, [64, 3, 1]], [[-1, -3, -5, -6], 1, Concat, [1]], [-1, 1, Conv, [256, 1, 1]],# 11 160*160*256 # MPConv [-1, 1, MP, []], [-1, 1, Conv, [128, 1, 1]], [-3, 1, Conv, [128, 1, 1]], [-1, 1, Conv, [128, 3, 2]], [[-1, -3], 1, Concat, [1]],# 16-P3/880*80*256 # ELAN1 [-1, 1, Conv, [128, 1, 1]], [-2, 1, Conv, [128, 1, 1]], [-1, 1, Conv, [128, 3, 1]], [-1, 1, Conv, [128, 3, 1]], [-1, 1, Conv, [128, 3, 1]], [-1, 1, Conv, [128, 3, 1]], [[-1, -3, -5, -6], 1, Concat, [1]], [-1, 1, Conv, [512, 1, 1]],# 24 80*80*512 # MPConv [-1, 1, MP, []], [-1, 1, Conv, [256, 1, 1]], [-3, 1, Conv, [256, 1, 1]], [-1, 1, Conv, [256, 3, 2]], [[-1, -3], 1, Concat, [1]],# 29-P4/16 40*40*512 # ELAN1 [-1, 1, Conv, [256, 1, 1]], [-2, 1, Conv, [256, 1, 1]], [-1, 1, Conv, [256, 3, 1]], [-1, 1, Conv, [256, 3, 1]], [-1, 1, Conv, [256, 3, 1]], [-1, 1, Conv, [256, 3, 1]], [[-1, -3, -5, -6], 1, Concat, [1]], [-1, 1, Conv, [1024, 1, 1]],# 3740*40*1024 # MPConv [-1, 1, MP, []], [-1, 1, Conv, [512, 1, 1]], [-3, 1, Conv, [512, 1, 1]], [-1, 1, Conv, [512, 3, 2]], [[-1, -3], 1, Concat, [1]],# 42-P5/3220*20*1024 # ELAN1 [-1, 1, Conv, [256, 1, 1]], [-2, 1, Conv, [256, 1, 1]], [-1, 1, Conv, [256, 3, 1]], [-1, 1, Conv, [256, 3, 1]], [-1, 1, Conv, [256, 3, 1]], [-1, 1, Conv, [256, 3, 1]], [[-1, -3, -5, -6], 1, Concat, [1]], [-1, 1, Conv, [1024, 1, 1]],# 50 20*20*1024]# yolov7 headhead:[[-1, 1, SPPCSPC, [512]], # 5120*20*512 [-1, 1, Conv, [256, 1, 1]],# 20*20*256 [-1, 1, nn.Upsample, [None, 2, 'nearest']], [37, 1, Conv, [256, 1, 1]], # route backbone P440*40*1024->40*40*256 [[-1, -2], 1, Concat, [1]], #40*40*512 # ELAN2注意:Head和Backbone的ELAN不一样 [-1, 1, Conv, [256, 1, 1]], [-2, 1, Conv, [256, 1, 1]], [-1, 1, Conv, [128, 3, 1]], [-1, 1, Conv, [128, 3, 1]], [-1, 1, Conv, [128, 3, 1]], [-1, 1, Conv, [128, 3, 1]], [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]], [-1, 1, Conv, [256, 1, 1]], # 6340*40*256 [-1, 1, Conv, [128, 1, 1]], # 80*80*128 [-1, 1, nn.Upsample, [None, 2, 'nearest']], [24, 1, Conv, [128, 1, 1]], # route backbone P3 80*80*512->80*80*128 [[-1, -2], 1, Concat, [1]],#80*80*256 # ELAN2 [-1, 1, Conv, [128, 1, 1]], [-2, 1, Conv, [128, 1, 1]], [-1, 1, Conv, [64, 3, 1]], [-1, 1, Conv, [64, 3, 1]], [-1, 1, Conv, [64, 3, 1]], [-1, 1, Conv, [64, 3, 1]], [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]], [-1, 1, Conv, [128, 1, 1]], # 7580*80*128 # MPConv Channel × 2 [-1, 1, MP, []], [-1, 1, Conv, [128, 1, 1]], [-3, 1, Conv, [128, 1, 1]], [-1, 1, Conv, [128, 3, 2]], [[-1, -3, 63], 1, Concat, [1]],# 40*40*256 # ELAN2 [-1, 1, Conv, [256, 1, 1]], [-2, 1, Conv, [256, 1, 1]], [-1, 1, Conv, [128, 3, 1]], [-1, 1, Conv, [128, 3, 1]], [-1, 1, Conv, [128, 3, 1]], [-1, 1, Conv, [128, 3, 1]], [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]], [-1, 1, Conv, [256, 1, 1]], # 8840*40*256# MPConv Channel × 2 [-1, 1, MP, []], [-1, 1, Conv, [256, 1, 1]], [-3, 1, Conv, [256, 1, 1]], [-1, 1, Conv, [256, 3, 2]], [[-1, -3, 51], 1, Concat, [1]],# 40*40*512 # ELAN2 [-1, 1, Conv, [512, 1, 1]], [-2, 1, Conv, [512, 1, 1]], [-1, 1, Conv, [256, 3, 1]], [-1, 1, Conv, [256, 3, 1]], [-1, 1, Conv, [256, 3, 1]], [-1, 1, Conv, [256, 3, 1]], [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]], [-1, 1, Conv, [512, 1, 1]], # 101 20*20*512[75, 1, RepConv, [256, 3, 1]],#102 80*80*256 [88, 1, RepConv, [512, 3, 1]],#103 40*40*512 [101, 1, RepConv, [1024, 3, 1]],#104 20*20*1024 [[102,103,104], 1, IDetect, [nc, anchors]], # Detect(P3, P4, P5)]

组件结构

CBS 模块

yaml 文件中的Conv表示卷积归一化激活

对于CBS模块,我们可以看从图中可以看出它是由一个Conv层,也就是卷积层,一个BN层,也就是Batch normalization层,还有一个Silu层,这是一个激活函数。silu激活函数是swish激活函数的变体。

class Conv(nn.Module):# Standard convolutiondef __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):# ch_in, ch_out, kernel, stride, padding, groupssuper(Conv, self).__init__()self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)self.bn = nn.BatchNorm2d(c2)self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())def forward(self, x):return self.act(self.bn(self.conv(x)))def fuseforward(self, x):return self.act(self.conv(x))

ELAN1

利用Conv构件围城的模块,在backbone中通道数扩张两倍

 #[-1, 1, Conv, [128, 3, 2]],# 3-P2/4 160*160*128 # ELAN1 [-1, 1, Conv, [64, 1, 1]], [-2, 1, Conv, [64, 1, 1]], [-1, 1, Conv, [64, 3, 1]], [-1, 1, Conv, [64, 3, 1]], [-1, 1, Conv, [64, 3, 1]], [-1, 1, Conv, [64, 3, 1]], [[-1, -3, -5, -6], 1, Concat, [1]], [-1, 1, Conv, [256, 1, 1]],# 11 160*160*256

ELAN2

利用Conv构件围城的模块,在head中通道数减半

 # [[-1, -2], 1, Concat, [1]], #5540*40*512 # ELAN2注意:Head和Backbone的ELAN不一样 [-1, 1, Conv, [256, 1, 1]], [-2, 1, Conv, [256, 1, 1]], [-1, 1, Conv, [128, 3, 1]], [-1, 1, Conv, [128, 3, 1]], [-1, 1, Conv, [128, 3, 1]], [-1, 1, Conv, [128, 3, 1]], [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]], [-1, 1, Conv, [256, 1, 1]], # 6340*40*256

MP1&2

MP1

[-1, 1, Conv, [128, 3, 2]], 表述输出128,

 #[-1, 1, Conv, [256, 1, 1]],# 11 160*160*256 # MPConv-1 backbone 下采样 通道数不变 [-1, 1, MP, []], [-1, 1, Conv, [128, 1, 1]], [-3, 1, Conv, [128, 1, 1]], [-1, 1, Conv, [128, 3, 2]], [[-1, -3], 1, Concat, [1]],# 16-P3/880*80*256
MP2

head部分,尺寸减半,通道数扩张为两倍

# [-1, 1, Conv, [256, 1, 1]], # 8840*40*256# MPConv Channel × 2 [-1, 1, MP, []], [-1, 1, Conv, [256, 1, 1]], [-3, 1, Conv, [256, 1, 1]], [-1, 1, Conv, [256, 3, 2]], [[-1, -3, 51], 1, Concat, [1]],# 40*40*512

SPPCSPC

类似于yolov5中的SPPF,不同的是,使用了5×5、9×9、13×13最大池化。

class SPPCSPC(nn.Module):# CSP https://github.com/WongKinYiu/CrossStagePartialNetworksdef __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5, k=(5, 9, 13)):super(SPPCSPC, self).__init__()c_ = int(2 * c2 * e)# hidden channelsself.cv1 = Conv(c1, c_, 1, 1)self.cv2 = Conv(c1, c_, 1, 1)self.cv3 = Conv(c_, c_, 3, 1)self.cv4 = Conv(c_, c_, 1, 1)self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])self.cv5 = Conv(4 * c_, c_, 1, 1)self.cv6 = Conv(c_, c_, 3, 1)self.cv7 = Conv(2 * c_, c2, 1, 1)def forward(self, x):x1 = self.cv4(self.cv3(self.cv1(x)))y1 = self.cv6(self.cv5(torch.cat([x1] + [m(x1) for m in self.m], 1)))y2 = self.cv2(x)return self.cv7(torch.cat((y1, y2), dim=1))

参考

【YOLOv7_0.1】网络结构与源码解析
https://blog.csdn.net/weixin_43799388/article/details/126164288
YOLOV7详细解读(一)网络架构解读
https://blog.csdn.net/qq128252/article/details/126673493
睿智的目标检测61——Pytorch搭建YoloV7目标检测平台
https://blog.csdn.net/weixin_44791964/article/details/125827160