YOLOv5全面解析教程①：網絡結構逐行代碼解讀

來源：CSDN博客 | 2022-12-16 11:07:29 |

撰文 |?Fengwen, BBuf

本教程涉及的代碼在：

(資料圖)

https://github.com/Oneflow-Inc/one-yolov5

教程也同樣適用于 Ultralytics/YOLOv5，因為 One-YOLOv5 僅僅是換了一個運行時后端而已，計算邏輯和代碼相比?Ultralytics/YOLOv5 沒有做任何改變，歡迎 star 。詳細信息請看：一個更快的YOLOv5問世，附送全面中文解析教程

引言

YOLOv5針對不同大小（n, s, m, l, x）的網絡整體架構都是一樣的，只不過會在每個子模塊中采用不同的深度和寬度，分別應對yaml文件中的depth_multiple和width_multiple參數。

還需要注意一點，官方除了n, s, m, l, x版本外還有n6, s6, m6, l6, x6，區別在于后者是針對更大分辨率的圖片比如1280x1280,?當然結構上也有些差異，前者只會下采樣到32倍且采用3個預測特征層 , 而后者會下采樣64倍，采用4個預測特征層。

本章將以YOLOv5s為例，

從配置文件models/yolov5s.yaml

(https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolov5s.yaml)到models/yolo.py (https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolo.py)

源碼進行解讀。

yolov5s.yaml文件內容

nc:?80??#?number?of?classes?數據集中的類別數depth_multiple:?0.33??#?model?depth?multiple??模型層數因子(用來調整網絡的深度)width_multiple:?0.50??#?layer?channel?multiple?模型通道數因子(用來調整網絡的寬度)#?如何理解這個depth_multiple和width_multiple呢?它決定的是整個模型中的深度（層數）和寬度（通道數）,具體怎么調整的結合后面的backbone代碼解釋。anchors:?#?表示作用于當前特征圖的Anchor大小為?xxx#?9個anchor，其中P表示特征圖的層級，P3/8該層特征圖縮放為1/8,是第3層特征??-?[10,13,?16,30,?33,23]??#?P3/8，?表示[10,13],[16,30],?[33,23]3個anchor??-?[30,61,?62,45,?59,119]??#?P4/16??-?[116,90,?156,198,?373,326]??#?P5/32#?YOLOv5s?v6.0?backbonebackbone:??#?[from,?number,?module,?args]??[[-1,?1,?Conv,?[64,?6,?2,?2]],??#?0-P1/2???[-1,?1,?Conv,?[128,?3,?2]],??#?1-P2/4???[-1,?3,?C3,?[128]],???[-1,?1,?Conv,?[256,?3,?2]],??#?3-P3/8???[-1,?6,?C3,?[256]],???[-1,?1,?Conv,?[512,?3,?2]],??#?5-P4/16???[-1,?9,?C3,?[512]],???[-1,?1,?Conv,?[1024,?3,?2]],??#?7-P5/32???[-1,?3,?C3,?[1024]],???[-1,?1,?SPPF,?[1024,?5]],??#?9??]#?YOLOv5s?v6.0?headhead:??[[-1,?1,?Conv,?[512,?1,?1]],???[-1,?1,?nn.Upsample,?[None,?2,?"nearest"]],???[[-1,?6],?1,?Concat,?[1]],??#?cat?backbone?P4???[-1,?3,?C3,?[512,?False]],??#?13???[-1,?1,?Conv,?[256,?1,?1]],???[-1,?1,?nn.Upsample,?[None,?2,?"nearest"]],???[[-1,?4],?1,?Concat,?[1]],??#?cat?backbone?P3???[-1,?3,?C3,?[256,?False]],??#?17?(P3/8-small)???[-1,?1,?Conv,?[256,?3,?2]],???[[-1,?14],?1,?Concat,?[1]],??#?cat?head?P4???[-1,?3,?C3,?[512,?False]],??#?20?(P4/16-medium)???[-1,?1,?Conv,?[512,?3,?2]],???[[-1,?10],?1,?Concat,?[1]],??#?cat?head?P5???[-1,?3,?C3,?[1024,?False]],??#?23?(P5/32-large)???[[17,?20,?23],?1,?Detect,?[nc,?anchors]],??#?Detect(P3,?P4,?P5)??]

anchors 解讀

YOLOv5 初始化了 9 個 anchors，分別在三個特征圖（feature map）中使用，每個 feature map 的每個 grid cell 都有三個 anchor 進行預測。分配規則：

尺度越大的 feature map 越靠前，相對原圖的下采樣率越小，感受野越小，所以相對可以預測一些尺度比較小的物體(小目標)，分配到的 anchors 越小。

尺度越小的 feature map 越靠后，相對原圖的下采樣率越大，感受野越大，所以可以預測一些尺度比較大的物體(大目標)，所以分配到的 anchors 越大。

即在小特征圖（feature map）上檢測大目標，中等大小的特征圖上檢測中等目標，在大特征圖上檢測小目標。

backbone & head?解讀

[from, number, module, args] 參數

四個參數的意義分別是：

第一個參數 from ：從哪一層獲得輸入，-1表示從上一層獲得，[-1, 6]表示從上層和第6層兩層獲得。

第二個參數 number：表示有幾個相同的模塊，如果為9則表示有9個相同的模塊。

第三個參數 module：模塊的名稱，這些模塊寫在common.py中。

第四個參數 args：類的初始化參數，用于解析作為 moudle 的傳入參數。

下面以第一個模塊Conv 為例介紹下common.py中的模塊

Conv 模塊定義如下:

class?Conv(nn.Module):????#?Standard?convolution????def?__init__(self,?c1,?c2,?k=1,?s=1,?p=None,?g=1,?act=True):??#?ch_in,?ch_out,?kernel,?stride,?padding,?groups????????"""????????@Pargm?c1:?輸入通道數????????@Pargm?c2:?輸出通道數????????@Pargm?k?:?卷積核大小(kernel_size)????????@Pargm?s?:?卷積步長?(stride)????????@Pargm?p?:?特征圖填充寬度?(padding)????????@Pargm?g?:?控制分組，必須整除輸入的通道數(保證輸入的通道能被正確分組)????????"""????????super().__init__()????????#?https://oneflow.readthedocs.io/en/master/generated/oneflow.nn.Conv2d.html?highlight=Conv????????self.conv?=?nn.Conv2d(c1,?c2,?k,?s,?autopad(k,?p),?groups=g,?bias=False)????????self.bn?=?nn.BatchNorm2d(c2)????????self.act?=?nn.SiLU()?if?act?is?True?else?(act?if?isinstance(act,?nn.Module)?else?nn.Identity())????def?forward(self,?x):????????return?self.act(self.bn(self.conv(x)))????def?forward_fuse(self,?x):????????return?self.act(self.conv(x))

比如上面把width_multiple設置為了0.5，那么第一個 [64, 6, 2, 2] 就會被解析為 [3,64*0.5=32,6,2,2]，其中第一個 3 為輸入channel(因為輸入)，32 為輸出channel。

關于調整網絡大小的詳解說明

在yolo.py (https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolo.py)的256行有對yaml 文件的nc,depth_multiple等參數讀取，具體代碼如下:

anchors,?nc,?gd,?gw?=?d["anchors"],?d["nc"],?d["depth_multiple"],?d["width_multiple"]

"width_multiple"參數的作用前面介紹args參數中已經介紹過了，那么"depth_multiple"又是什么作用呢？

在yolo.py (https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolo.py) 的257行有對參數的具體定義：

n?=?n_?=?max(round(n?*?gd),?1)?if?n?>?1?else?n??#?depth?gain?暫且將這段代碼當作公式(1)

其中 gd 就是depth_multiple的值，n的值就是backbone中列表的第二個參數：

根據公式(1)很容易看出 gd 影響 n 的大小，從而影響網絡的結構大小。

后面各層之間的模塊數量、卷積核大小和數量等也都產生了變化，YOLOv5l 與 YOLOv5s 相比較起來訓練參數的大小成倍數增長，

其模型的深度和寬度也會大很多，這就使得 YOLOv5l 的精度值要比 YOLOv5s 好很多，因此在最終推理時的檢測精度高，但是模型的推理速度更慢。

所以 YOLOv5 提供了不同的選擇，如果想要追求推理速度可選用較小一些的模型如 YOLOv5s、YOLOv5m，如果想要追求精度更高對推理速度要求不高的可以選擇其他兩個稍大的模型。

如下面這張圖：

yolov5模型復雜度比較圖

Conv模塊解讀

網絡結構預覽

下面是根據yolov5s.yaml

(https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolov5s.yaml)?繪制的網絡整體結構簡化版。

yolov5s網絡整體結構圖

詳細的網絡結構圖：

https://oneflow-static.oss-cn-beijing.aliyuncs.com/one-yolo/imgs/yolov5s.onnx.png

通過export.py導出的onnx格式，并通過 https://netron.app/ 網站導出的圖片(模型導出將在本教程的后續文章單獨介紹)。

模塊組件右邊參數表示特征圖的的形狀，比如在第一層( Conv )輸入圖片形狀為 [ 3, 640, 640] ,關于這些參數，可以固定一張圖片輸入到網絡并通過yolov5s.yaml?

(https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolov5s.yaml)?的模型參數計算得到，并且可以在工程models/yolo.py(https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolo.py) 通過代碼進行print查看，詳細數據可以參考附件表2.1。

yolo.py模塊解讀

文件地址(https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolo.py)

文件主要包含三大部分: Detect類、?Model類和 parse_model 函數

可以通過 python models/yolo.py --cfg yolov5s.yaml運行該腳本進行觀察

parse_model函數解讀

def?parse_model(d,?ch):??#?model_dict,?input_channels(3)????"""用在下面Model模塊中????解析模型文件(字典形式)，并搭建網絡結構????這個函數其實主要做的就是:?更新當前層的args（參數）,計算c2（當前層的輸出channel）?=>??????????????????????????使用當前層的參數搭建當前層?=>??????????????????????????生成?layers?+?save????@Params?d:?model_dict?模型文件?字典形式?{dict:7}??[yolov5s.yaml](https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolov5s.yaml)中的6個元素?+?ch????#Params?ch:?記錄模型每一層的輸出channel?初始ch=[3]?后面會刪除????@return?nn.Sequential(*layers):?網絡的每一層的層結構????@return?sorted(save):?把所有層結構中from不是-1的值記下?并排序?[4,?6,?10,?14,?17,?20,?23]????"""????LOGGER.info(f"\n{"":>3}{"from":>18}{"n":>3}{"params":>10}??{"module":<40}{"arguments":<30}")????#?讀取d字典中的anchors和parameters(nc、depth_multiple、width_multiple)????anchors,?nc,?gd,?gw?=?d["anchors"],?d["nc"],?d["depth_multiple"],?d["width_multiple"]????#?na:?number?of?anchors?每一個predict?head上的anchor數?=?3????na?=?(len(anchors[0])?//?2)?if?isinstance(anchors,?list)?else?anchors??#?number?of?anchors????no?=?na?*?(nc?+?5)??#?number?of?outputs?=?anchors?*?(classes?+?5)?每一個predict?head層的輸出channel?????#?開始搭建網絡????#?layers:?保存每一層的層結構????#?save:?記錄下所有層結構中from中不是-1的層結構序號????#?c2:?保存當前層的輸出channel????layers,?save,?c2?=?[],?[],?ch[-1]??#?layers,?savelist,?ch?out????# enumerate()?函數用于將一個可遍歷的數據對象(如列表、元組或字符串)組合為一個索引序列，同時列出數據和數據下標，一般用在 for 循環當中。????for?i,?(f,?n,?m,?args)?in?enumerate(d["backbone"]?+?d["head"]):??#?from,?number,?module,?args????????m?=?eval(m)?if?isinstance(m,?str)?else?m??#?eval?strings????????for?j,?a?in?enumerate(args):????????????#?args是一個列表，這一步把列表中的內容取出來????????????with?contextlib.suppress(NameError):????????????????args[j]?=?eval(a)?if?isinstance(a,?str)?else?a??#?eval?strings????????????????#?將深度與深度因子相乘，計算層深度。深度最小為1. ????????n?=?n_?=?max(round(n?*?gd),?1)?if?n?>?1?else?n??#?depth?gain????????????????#?如果當前的模塊m在本項目定義的模塊類型中，就可以處理這個模塊????????if?m?in?(Conv,?GhostConv,?Bottleneck,?GhostBottleneck,?SPP,?SPPF,?DWConv,?MixConv2d,?Focus,?CrossConv,?????????????????BottleneckCSP,?C3,?C3TR,?C3SPP,?C3Ghost,?nn.ConvTranspose2d,?DWConvTranspose2d,?C3x):????????????# c1:?輸入通道數 c2：輸出通道數????????????c1,?c2?=?ch[f],?args[0]?????????????#?該層不是最后一層，則將通道數乘以寬度因子?也就是說，寬度因子作用于除了最后一層之外的所有層????????????if?c2?!=?no:??#?if?not?output????????????????# make_divisible的作用，使得原始的通道數乘以寬度因子之后取整到8的倍數，這樣處理一般是讓模型的并行性和推理性能更好。????????????????c2?=?make_divisible(c2?*?gw,?8)????????????#?將前面的運算結果保存在args中，它也就是這個模塊最終的輸入參數。????????????args?=?[c1,?c2,?*args[1:]]?????????????#?根據每層網絡參數的不同，分別處理參數?具體各個類的參數是什么請參考它們的__init__方法這里不再詳細解釋了????????????if?m?in?[BottleneckCSP,?C3,?C3TR,?C3Ghost,?C3x]:????????????????#?這里的意思就是重復n次，比如conv這個模塊重復n次，這個n?是上面算出來的?depth?????????????????args.insert(2,?n)??#?number?of?repeats????????????????n?=?1????????elif?m?is?nn.BatchNorm2d:????????????args?=?[ch[f]]????????elif?m?is?Concat:????????????c2?=?sum(ch[x]?for?x?in?f)????????elif?m?is?Detect:????????????args.append([ch[x]?for?x?in?f])????????????if?isinstance(args[1],?int):??#?number?of?anchors????????????????args[1]?=?[list(range(args[1]?*?2))]?*?len(f)????????elif?m?is?Contract:????????????c2?=?ch[f]?*?args[0]?**?2????????elif?m?is?Expand:????????????c2?=?ch[f]?//?args[0]?**?2????????else:????????????c2?=?ch[f]????????#?構建整個網絡模塊?這里就是根據模塊的重復次數n以及模塊本身和它的參數來構建這個模塊和參數對應的Module????????m_?=?nn.Sequential(*(m(*args)?for?_?in?range(n)))?if?n?>?1?else?m(*args)??#?module????????#?獲取模塊(module type)具體名例如 models.common.Conv , models.common.C3 , models.common.SPPF 等。??????? t = str(m)[8:-2].replace("__main__.", "")??#? replace函數作用是字符串"__main__"替換為""，在當前項目沒有用到這個替換。????????np?=?sum(x.numel()?for?x?in?m_.parameters())??#?number?params????????m_.i,?m_.f,?m_.type,?m_.np?=?i,?f,?t,?np??#?attach?index,?"from"?index,?type,?number?params????????LOGGER.info(f"{i:>3}{str(f):>18}{n_:>3}{np:10.0f}??{t:<40}{str(args):<30}")??#?print????????"""????????如果x不是-1，則將其保存在save列表中，表示該層需要保存特征圖。????????這里?x?%?i?與?x?等價例如在最后一層?:?????????f?=?[17,20,23]?,?i?=?24?????????y?=?[?x?%?i?for?x?in?([f]?if?isinstance(f,?int)?else?f)?if?x?!=?-1?]????????print(y)?#?[17,?20,?23]?????????#?寫成x % i 可能因為：i - 1 =?-1 % i (比如 f =?[-1]，則?[x % i for x in f]?代表?[11]?)????????"""????????save.extend(x?%?i?for?x?in?([f]?if?isinstance(f,?int)?else?f)?if?x?!=?-1)??#?append?to?savelist????????layers.append(m_)????????if?i?==?0:?#?如果是初次迭代，則新創建一個ch（因為形參ch在創建第一個網絡模塊時需要用到，所以創建網絡模塊之后再初始化ch）????????????ch?=?[]????????ch.append(c2)????#?將所有的層封裝為nn.Sequential?,?對保存的特征圖排序????return?nn.Sequential(*layers),?sorted(save)

Model類解讀

class?Model(nn.Module):????#?YOLOv5?model????def?__init__(self,?cfg="[yolov5s.yaml](https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolov5s.yaml)",?ch=3,?nc=None,?anchors=None):??#?model,?input?channels,?number?of?classes????????super().__init__()????????#?如果cfg已經是字典，則直接賦值，否則先加載cfg路徑的文件為字典并賦值給self.yaml。????????if?isinstance(cfg,?dict):?????????????self.yaml?=?cfg??#?model?dict????????else:??#?is?*.yaml??加載yaml模塊????????????import?yaml??#?for?flow?hub?????????????self.yaml_file?=?Path(cfg).name????????????with?open(cfg,?encoding="ascii",?errors="ignore")?as?f:????????????????self.yaml?=?yaml.safe_load(f)??#?model?dict??從yaml文件中加載出字典????????#?Define?model????????# ch:?輸入通道數。?假如self.yaml有鍵‘ch’，則將該鍵對應的值賦給內部變量ch。假如沒有‘ch’，則將形參ch賦給內部變量ch????????ch?=?self.yaml["ch"]?=?self.yaml.get("ch",?ch)??#?input?channels????????#?假如yaml中的nc和方法形參中的nc不一致，則覆蓋yaml中的nc。????????if?nc?and?nc?!=?self.yaml["nc"]:????????????LOGGER.info(f"Overriding?model.yaml?nc={self.yaml["nc"]}?with?nc={nc}")????????????self.yaml["nc"]?=?nc??#?override?yaml?value????????if?anchors:?#?anchors??先驗框的配置????????????LOGGER.info(f"Overriding?model.yaml?anchors?with?anchors={anchors}")????????????self.yaml["anchors"]?=?round(anchors)??#?override?yaml?value????????#?得到模型，以及對應的保存的特征圖列表。????????????self.model,?self.save?=?parse_model(deepcopy(self.yaml),?ch=[ch])??#?model,?savelist????????self.names?=?[str(i)?for?i?in?range(self.yaml["nc"])]??#?default?names?初始化類名列表，默認為[0,1,2...]????????????????#?self.inplace=True??默認True??節省內存????????self.inplace?=?self.yaml.get("inplace",?True)????????#?Build?strides,?anchors??確定步長、步長對應的錨框????????m?=?self.model[-1]??#?Detect()????????if?isinstance(m,?Detect):?#?檢驗模型的最后一層是Detect模塊????????????s?=?256??#?2x?min?stride????????????m.inplace?=?self.inplace????????????#?計算三個feature?map下采樣的倍率??[8,?16,?32]????????????m.stride?=?flow.tensor([s?/?x.shape[-2]?for?x?in?self.forward(flow.zeros(1,?ch,?s,?s))])??#?forward????????????#?檢查anchor順序與stride順序是否一致?anchor的順序應該是從小到大，這里排一下序????????????check_anchor_order(m)??#?must?be?in?pixel-space?(not?grid-space)????????????#?對應的anchor進行縮放操作，原因：得到anchor在實際的特征圖中的位置，因為加載的原始anchor大小是相對于原圖的像素，但是經過卷積池化之后，特征圖的長寬變小了。????????????m.anchors?/=?m.stride.view(-1,?1,?1)????????????self.stride?=?m.stride????????????self._initialize_biases()?#?only?run?once??初始化偏置?????????#?Init?weights,?biases????????#?調用oneflow_utils.py下initialize_weights初始化模型權重????????initialize_weights(self)????????self.info()?#?打印模型信息????????LOGGER.info("")????#?管理前向傳播函數????def?forward(self,?x,?augment=False,?profile=False,?visualize=False):????????if?augment:#?是否在測試時也使用數據增強??Test?Time?Augmentation(TTA)????????????return?self._forward_augment(x)??#?augmented?inference,?None????????return?self._forward_once(x,?profile,?visualize)??#?single-scale?inference,?train????#?帶數據增強的前向傳播????def?_forward_augment(self,?x):????????img_size?=?x.shape[-2:]??#?height,?width????????s?=?[1,?0.83,?0.67]??#?scales????????f?=?[None,?3,?None]??#?flips?(2-ud,?3-lr)????????y?=?[]??#?outputs????????for?si,?fi?in?zip(s,?f):????????????xi?=?scale_img(x.flip(fi)?if?fi?else?x,?si,?gs=int(self.stride.max()))????????????yi?=?self._forward_once(xi)[0]??#?forward????????????#?cv2.imwrite(f"img_{si}.jpg",?255?*?xi[0].cpu().numpy().transpose((1,?2,?0))[:,?:,?::-1])??#?save????????????yi?=?self._descale_pred(yi,?fi,?si,?img_size)????????????y.append(yi)????????y?=?self._clip_augmented(y)??#?clip?augmented?tails????????return?flow.cat(y,?1),?None??#?augmented?inference,?train????#?前向傳播具體實現????def?_forward_once(self,?x,?profile=False,?visualize=False):????????"""????????@params?x:?輸入圖像????????@params?profile:?True?可以做一些性能評估????????@params?feature_vis:?True?可以做一些特征可視化????????"""????????#?y:?存放著self.save=True的每一層的輸出，因為后面的特征融合操作要用到這些特征圖????????y,?dt?=?[],?[]??#?outputs????????#?前向推理每一層結構???m.i=index???m.f=from???m.type=類名???m.np=number?of?params????????for?m?in?self.model:????????????#?if?not?from?previous?layer???m.f=當前層的輸入來自哪一層的輸出??s的m.f都是-1????????????if?m.f?!=?-1:??#?if?not?from?previous?layer????????????????x?=?y[m.f]?if?isinstance(m.f,?int)?else?[x?if?j?==?-1?else?y[j]?for?j?in?m.f]??#?from?earlier?layers????????????if?profile:????????????????self._profile_one_layer(m,?x,?dt)????????????x?=?m(x)??#?run????????????y.append(x?if?m.i?in?self.save?else?None)??#?save?output????????????if?visualize:????????????????feature_visualization(x,?m.type,?m.i,?save_dir=visualize)????????return?x????#?將推理結果恢復到原圖圖片尺寸(逆操作)????def?_descale_pred(self,?p,?flips,?scale,?img_size):????????#?de-scale?predictions?following?augmented?inference?(inverse?operation)????????"""用在上面的__init__函數上????????將推理結果恢復到原圖圖片尺寸??Test?Time?Augmentation(TTA)中用到?????????de-scale?predictions?following?augmented?inference?(inverse?operation)????????@params?p:?推理結果????????@params?flips:????????@params?scale:????????@params?img_size:????????"""????????if?self.inplace:????????????p[...,?:4]?/=?scale??#?de-scale????????????if?flips?==?2:????????????????p[...,?1]?=?img_size[0]?-?p[...,?1]??#?de-flip?ud????????????elif?flips?==?3:????????????????p[...,?0]?=?img_size[1]?-?p[...,?0]??#?de-flip?lr????????else:????????????x,?y,?wh?=?p[...,?0:1]?/?scale,?p[...,?1:2]?/?scale,?p[...,?2:4]?/?scale??#?de-scale????????????if?flips?==?2:????????????????y?=?img_size[0]?-?y??#?de-flip?ud????????????elif?flips?==?3:????????????????x?=?img_size[1]?-?x??#?de-flip?lr????????????p?=?flow.cat((x,?y,?wh,?p[...,?4:]),?-1)????????return?p????#?這個是TTA的時候對原圖片進行裁剪，也是一種數據增強方式，用在TTA測試的時候。????def?_clip_augmented(self,?y):????????#?Clip?YOLOv5?augmented?inference?tails????????nl?=?self.model[-1].nl??#?number?of?detection?layers?(P3-P5)????????g?=?sum(4?**?x?for?x?in?range(nl))??#?grid?points????????e?=?1??#?exclude?layer?count????????i?=?(y[0].shape[1]?//?g)?*?sum(4?**?x?for?x?in?range(e))??#?indices????????y[0]?=?y[0][:,?:-i]??#?large????????i?=?(y[-1].shape[1]?//?g)?*?sum(4?**?(nl?-?1?-?x)?for?x?in?range(e))??#?indices????????y[-1]?=?y[-1][:,?i:]??#?small????????return?y????#?打印日志信息??前向推理時間????def?_profile_one_layer(self,?m,?x,?dt):????????c?=?isinstance(m,?Detect)??#?is?final?layer,?copy?input?as?inplace?fix????????o?=?thop.profile(m,?inputs=(x.copy()?if?c?else?x,),?verbose=False)[0]?/?1E9?*?2?if?thop?else?0??#?FLOPs????????t?=?time_sync()????????for?_?in?range(10):????????????m(x.copy()?if?c?else?x)????????dt.append((time_sync()?-?t)?*?100)????????if?m?==?self.model[0]:????????????LOGGER.info(f"{"time?(ms)":>10s}?{"GFLOPs":>10s}?{"params":>10s}??module")????????LOGGER.info(f"{dt[-1]:10.2f}?{o:10.2f}?{m.np:10.0f}??{m.type}")????????if?c:????????????LOGGER.info(f"{sum(dt):10.2f}?{"-":>10s}?{"-":>10s}??Total")????#?initialize?biases?into?Detect(),?cf?is?class?frequency????def?_initialize_biases(self,?cf=None):?????????#?https://arxiv.org/abs/1708.02002?section?3.3????????#?cf?=?flow.bincount(flow.tensor(np.concatenate(dataset.labels,?0)[:,?0]).long(),?minlength=nc)?+?1.????????m?=?self.model[-1]??#?Detect()?module????????for?mi,?s?in?zip(m.m,?m.stride):??#?from????????????b?=?mi.bias.view(m.na,?-1).detach()??#?conv.bias(255)?to?(3,85)????????????b[:,?4]?+=?math.log(8?/?(640?/?s)?**?2)??#?obj?(8?objects?per?640?image)????????????b[:,?5:]?+=?math.log(0.6?/?(m.nc?-?0.999999))?if?cf?is?None?else?flow.log(cf?/?cf.sum())??#?cls????????????mi.bias?=?flow.nn.Parameter(b.view(-1),?requires_grad=True)????#??打印模型中最后Detect層的偏置biases信息(也可以任選哪些層biases信息)????def?_print_biases(self):????????"""????????打印模型中最后Detect模塊里面的卷積層的偏置biases信息(也可以任選哪些層biases信息)????????"""????????m?=?self.model[-1]??#?Detect()?module????????for?mi?in?m.m:??#?from????????????b?=?mi.bias.detach().view(m.na,?-1).T??#?conv.bias(255)?to?(3,85)????????????LOGGER.info(????????????????("%6g?Conv2d.bias:"?+?"%10.3g"?*?6)?%?(mi.weight.shape[1],?*b[:5].mean(1).tolist(),?b[5:].mean()))????def?_print_weights(self):????????"""????????打印模型中Bottleneck層的權重參數weights信息(也可以任選哪些層weights信息)????????"""????????for?m?in?self.model.modules():????????????if?type(m)?is?Bottleneck:????????????????LOGGER.info("%10.3g"?%?(m.w.detach().sigmoid()?*?2))??#?shortcut?weights????????# fuse()是用來進行conv和bn層合并，為了提速模型推理速度。????def?fuse(self):??#?fuse?model?Conv2d()?+?BatchNorm2d()?layers????????"""用在detect.py、val.py????????fuse?model?Conv2d()?+?BatchNorm2d()?layers????????調用oneflow_utils.py中的fuse_conv_and_bn函數和common.py中Conv模塊的fuseforward函數????????"""????????LOGGER.info("Fusing?layers...?")????????for?m?in?self.model.modules():????????????#?如果當前層是卷積層Conv且有bn結構,?那么就調用fuse_conv_and_bn函數講conv和bn進行融合,?加速推理????????????if?isinstance(m,?(Conv,?DWConv))?and?hasattr(m,?"bn"):????????????????m.conv?=?fuse_conv_and_bn(m.conv,?m.bn)??#?update?conv????????????????delattr(m,?"bn")??#?remove?batchnorm??移除bn?remove?batchnorm????????????????m.forward?=?m.forward_fuse??#?update?forward?更新前向傳播?update?forward?(反向傳播不用管,?因為這種推理只用在推理階段)????????self.info()??#?打印conv+bn融合后的模型信息????????return?self????#?打印模型結構信息?在當前類__init__函數結尾處有調用????def?info(self,?verbose=False,?img_size=640):??#?print?model?information????????model_info(self,?verbose,?img_size)????def?_apply(self,?fn):????????#?Apply?to(),?cpu(),?cuda(),?half()?to?model?tensors?that?are?not?parameters?or?registered?buffers????????self?=?super()._apply(fn)????????m?=?self.model[-1]??#?Detect()????????if?isinstance(m,?Detect):????????????m.stride?=?fn(m.stride)????????????m.grid?=?list(map(fn,?m.grid))????????????if?isinstance(m.anchor_grid,?list):????????????????m.anchor_grid?=?list(map(fn,?m.anchor_grid))????????return?self

Detect類解讀

class?Detect(nn.Module):????"""????Detect模塊是用來構建Detect層的，將輸入feature?map?通過一個卷積操作和公式計算到我們想要的shape,?為后面的計算損失或者NMS后處理作準備????"""????stride?=?None??#?strides?computed?during?build????onnx_dynamic?=?False??#?ONNX?export?parameter????export?=?False??#?export?mode????def?__init__(self,?nc=80,?anchors=(),?ch=(),?inplace=True):??#?detection?layer????????super().__init__()????????#??nc:分類數量????????self.nc?=?nc??#?number?of?classes??????????#??no:每個anchor的輸出數????????self.no?=?nc?+?5??#?number?of?outputs?per?anchor????????#?nl:預測層數，此次為3????????self.nl?=?len(anchors)??#?number?of?detection?layers????????#??na:anchors的數量，此次為3????????self.na?=?len(anchors[0])?//?2??#?number?of?anchors????????#??grid:格子坐標系，左上角為(1,1),右下角為(input.w/stride,input.h/stride)????????self.grid?=?[flow.zeros(1)]?*?self.nl??#?init?grid????????self.anchor_grid?=?[flow.zeros(1)]?*?self.nl??#?init?anchor?grid????????#?寫入緩存中，并命名為anchors????????self.register_buffer("anchors",?flow.tensor(anchors).float().view(self.nl,?-1,?2))??#?shape(nl,na,2)????????#?將輸出通過卷積到?self.no?*?self.na?的通道，達到全連接的作用????????self.m?=?nn.ModuleList(nn.Conv2d(x,?self.no?*?self.na,?1)?for?x?in?ch)??#?output?conv????????self.inplace?=?inplace??#?use?inplace?ops?(e.g.?slice?assignment)????def?forward(self,?x):????????z?=?[]??#?inference?output????????for?i?in?range(self.nl):????????????x[i]?=?self.m[i](x[i])??#?conv????????????bs,?_,?ny,?nx?=?x[i].shape??#?x(bs,255,20,20)?to?x(bs,3,20,20,85)????????????x[i]?=?x[i].view(bs,?self.na,?self.no,?ny,?nx).permute(0,?1,?3,?4,?2).contiguous()????????????if?not?self.training:??#?inference????????????????if?self.onnx_dynamic?or?self.grid[i].shape[2:4]?!=?x[i].shape[2:4]:????????????????????#?向前傳播時需要將相對坐標轉換到grid絕對坐標系中????????????????????self.grid[i],?self.anchor_grid[i]?=?self._make_grid(nx,?ny,?i)????????????????y?=?x[i].sigmoid()????????????????if?self.inplace:????????????????????y[...,?0:2]?=?(y[...,?0:2]?*?2?+?self.grid[i])?*?self.stride[i]??#?xy????????????????????y[...,?2:4]?=?(y[...,?2:4]?*?2)?**?2?*?self.anchor_grid[i]??#?wh????????????????else:??#?for?YOLOv5?on?AWS?Inferentia?https://github.com/ultralytics/yolov5/pull/2953????????????????????xy,?wh,?conf?=?y.split((2,?2,?self.nc?+?1),?4)??#?y.tensor_split((2,?4,?5),?4)??????????????????????xy?=?(xy?*?2?+?self.grid[i])?*?self.stride[i]??#?xy????????????????????wh?=?(wh?*?2)?**?2?*?self.anchor_grid[i]??#?wh????????????????????y?=?flow.cat((xy,?wh,?conf),?4)????????????????z.append(y.view(bs,?-1,?self.no))????????return?x?if?self.training?else?(flow.cat(z,?1),)?if?self.export?else?(flow.cat(z,?1),?x)????????#?相對坐標轉換到grid絕對坐標系????def?_make_grid(self,?nx=20,?ny=20,?i=0):????????d?=?self.anchors[i].device????????t?=?self.anchors[i].dtype????????shape?=?1,?self.na,?ny,?nx,?2??#?grid?shape????????y,?x?=?flow.arange(ny,?device=d,?dtype=t),?flow.arange(nx,?device=d,?dtype=t)???????????????yv,?xv?=?flow.meshgrid(y,?x,?indexing="ij")????????grid?=?flow.stack((xv,?yv),?2).expand(shape)?-?0.5??#?add?grid?offset,?i.e.?y?=?2.0?*?x?-?0.5????????anchor_grid?=?(self.anchors[i]?*?self.stride[i]).view((1,?self.na,?1,?1,?2)).expand(shape)????????return?grid,?anchor_grid

附件

表2.1 yolov5s.yaml解析表

(https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolov5s.yaml)

層數	form	moudule	arguments	input	output
0	-1	Conv	[3, 32, 6, 2, 2]	[3, 640, 640]	[32, 320, 320]
1	-1	Conv	[32, 64, 3, 2]	[32, 320, 320]	[64, 160, 160]
2	-1	C3	[64, 64, 1]	[64, 160, 160]	[64, 160, 160]
3	-1	Conv	[64, 128, 3, 2]	[64, 160, 160]	[128, 80, 80]
4	-1	C3	[128, 128, 2]	[128, 80, 80]	[128, 80, 80]
5	-1	Conv	[128, 256, 3, 2]	[128, 80, 80]	[256, 40, 40]
6	-1	C3	[256, 256, 3]	[256, 40, 40]	[256, 40, 40]
7	-1	Conv	[256, 512, 3, 2]	[256, 40, 40]	[512, 20, 20]
8	-1	C3	[512, 512, 1]	[512, 20, 20]	[512, 20, 20]
9	-1	SPPF	[512, 512, 5]	[512, 20, 20]	[512, 20, 20]
10	-1	Conv	[512, 256, 1, 1]	[512, 20, 20]	[256, 20, 20]
11	-1	Upsample	[None, 2, "nearest"]	[256, 20, 20]	[256, 40, 40]
12	[-1, 6]	Concat	[1]	[1, 256, 40, 40],[1, 256, 40, 40]	[512, 40, 40]
13	-1	C3	[512, 256, 1, False]	[512, 40, 40]	[256, 40, 40]
14	-1	Conv	[256, 128, 1, 1]	[256, 40, 40]	[128, 40, 40]
15	-1	Upsample	[None, 2, "nearest"]	[128, 40, 40]	[128, 80, 80]
16	[-1, 4]	Concat	[1]	[1, 128, 80, 80],[1, 128, 80, 80]	[256, 80, 80]
17	-1	C3	[256, 128, 1, False]	[256, 80, 80]	[128, 80, 80]
18	-1	Conv	[128, 128, 3, 2]	[128, 80, 80]	[128, 40, 40]
19	[-1, 14]	Concat	[1]	[1, 128, 40, 40],[1, 128, 40, 40]	[256, 40, 40]
20	-1	C3	[256, 256, 1, False]	[256, 40, 40]	[256, 40, 40]
21	-1	Conv	[256, 256, 3, 2]	[256, 40, 40]	[256, 20, 20]
22	[-1, 10]	Concat	[1]	[1, 256, 20, 20],[1, 256, 20, 20]	[512, 20, 20]
23	-1	C3	[512, 512, 1, False]	[512, 20, 20]	[512, 20, 20]
24	[17, 20, 23]	Detect	[80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]	[1, 128, 80, 80],[1, 256, 40, 40],[1, 512, 20, 20]	[1, 3, 80, 80, 85],[1, 3, 40, 40, 85],[1, 3, 20, 20, 85]

參考文章

https://zhuanlan.zhihu.com/p/436891962?ivk_sa=1025922q

https://zhuanlan.zhihu.com/p/110204563

https://www.it610.com/article/1550621248474648576.htm

其他人都在看

OneFlow-ONNX v0.6.0正式發布

下載量突破10億，MinIO的開源啟示錄

關于ChatGPT的一切；CUDA入門之矩陣乘

李白：你的模型權重很不錯，可惜被我沒收了

單RTX3090訓練YOLOv5s，時間減少11個小時

比快更快，開源Stable Diffusion刷新作圖速度

OneEmbedding:單卡訓練TB級推薦模型不是夢

歡迎Star、試用OneFlow最新版本：GitHub - Oneflow-Inc/oneflow: OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient. - GitHub - Oneflow-Inc/oneflow: OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.https://github.com/Oneflow-Inc/oneflow/

關鍵詞：網絡結構