撰文 |?Fengwen, BBuf
本教程涉及的代碼在:
(資料圖)
https://github.com/Oneflow-Inc/one-yolov5
教程也同樣適用于 Ultralytics/YOLOv5,因?yàn)?One-YOLOv5 僅僅是換了一個(gè)運(yùn)行時(shí)后端而已,計(jì)算邏輯和代碼相比?Ultralytics/YOLOv5 沒(méi)有做任何改變,歡迎 star 。詳細(xì)信息請(qǐng)看:一個(gè)更快的YOLOv5問(wèn)世,附送全面中文解析教程
1
引言
YOLOv5針對(duì)不同大小(n, s, m, l, x)的網(wǎng)絡(luò)整體架構(gòu)都是一樣的,只不過(guò)會(huì)在每個(gè)子模塊中采用不同的深度和寬度,分別應(yīng)對(duì)yaml文件中的depth_multiple和width_multiple參數(shù)。
還需要注意一點(diǎn),官方除了n, s, m, l, x版本外還有n6, s6, m6, l6, x6,區(qū)別在于后者是針對(duì)更大分辨率的圖片比如1280x1280,?當(dāng)然結(jié)構(gòu)上也有些差異,前者只會(huì)下采樣到32倍且采用3個(gè)預(yù)測(cè)特征層 , 而后者會(huì)下采樣64倍,采用4個(gè)預(yù)測(cè)特征層。
本章將以YOLOv5s為例,
從配置文件models/yolov5s.yaml
(https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolov5s.yaml)到models/yolo.py (https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolo.py)
源碼進(jìn)行解讀。
2
yolov5s.yaml文件內(nèi)容
nc:?80??#?number?of?classes?數(shù)據(jù)集中的類(lèi)別數(shù)depth_multiple:?0.33??#?model?depth?multiple??模型層數(shù)因子(用來(lái)調(diào)整網(wǎng)絡(luò)的深度)width_multiple:?0.50??#?layer?channel?multiple?模型通道數(shù)因子(用來(lái)調(diào)整網(wǎng)絡(luò)的寬度)#?如何理解這個(gè)depth_multiple和width_multiple呢?它決定的是整個(gè)模型中的深度(層數(shù))和寬度(通道數(shù)),具體怎么調(diào)整的結(jié)合后面的backbone代碼解釋。anchors:?#?表示作用于當(dāng)前特征圖的Anchor大小為?xxx#?9個(gè)anchor,其中P表示特征圖的層級(jí),P3/8該層特征圖縮放為1/8,是第3層特征??-?[10,13,?16,30,?33,23]??#?P3/8,?表示[10,13],[16,30],?[33,23]3個(gè)anchor??-?[30,61,?62,45,?59,119]??#?P4/16??-?[116,90,?156,198,?373,326]??#?P5/32#?YOLOv5s?v6.0?backbonebackbone:??#?[from,?number,?module,?args]??[[-1,?1,?Conv,?[64,?6,?2,?2]],??#?0-P1/2???[-1,?1,?Conv,?[128,?3,?2]],??#?1-P2/4???[-1,?3,?C3,?[128]],???[-1,?1,?Conv,?[256,?3,?2]],??#?3-P3/8???[-1,?6,?C3,?[256]],???[-1,?1,?Conv,?[512,?3,?2]],??#?5-P4/16???[-1,?9,?C3,?[512]],???[-1,?1,?Conv,?[1024,?3,?2]],??#?7-P5/32???[-1,?3,?C3,?[1024]],???[-1,?1,?SPPF,?[1024,?5]],??#?9??]#?YOLOv5s?v6.0?headhead:??[[-1,?1,?Conv,?[512,?1,?1]],???[-1,?1,?nn.Upsample,?[None,?2,?"nearest"]],???[[-1,?6],?1,?Concat,?[1]],??#?cat?backbone?P4???[-1,?3,?C3,?[512,?False]],??#?13???[-1,?1,?Conv,?[256,?1,?1]],???[-1,?1,?nn.Upsample,?[None,?2,?"nearest"]],???[[-1,?4],?1,?Concat,?[1]],??#?cat?backbone?P3???[-1,?3,?C3,?[256,?False]],??#?17?(P3/8-small)???[-1,?1,?Conv,?[256,?3,?2]],???[[-1,?14],?1,?Concat,?[1]],??#?cat?head?P4???[-1,?3,?C3,?[512,?False]],??#?20?(P4/16-medium)???[-1,?1,?Conv,?[512,?3,?2]],???[[-1,?10],?1,?Concat,?[1]],??#?cat?head?P5???[-1,?3,?C3,?[1024,?False]],??#?23?(P5/32-large)???[[17,?20,?23],?1,?Detect,?[nc,?anchors]],??#?Detect(P3,?P4,?P5)??]
3
anchors 解讀
YOLOv5 初始化了 9 個(gè) anchors,分別在三個(gè)特征圖 (feature map)中使用,每個(gè) feature map 的每個(gè) grid cell 都有三個(gè) anchor 進(jìn)行預(yù)測(cè)。分配規(guī)則:
尺度越大的 feature map 越靠前,相對(duì)原圖的下采樣率越小,感受野越小, 所以相對(duì)可以預(yù)測(cè)一些尺度比較小的物體(小目標(biāo)),分配到的 anchors 越小。
尺度越小的 feature map 越靠后,相對(duì)原圖的下采樣率越大,感受野越大, 所以可以預(yù)測(cè)一些尺度比較大的物體(大目標(biāo)),所以分配到的 anchors 越大。
即在小特征圖(feature map)上檢測(cè)大目標(biāo),中等大小的特征圖上檢測(cè)中等目標(biāo), 在大特征圖上檢測(cè)小目標(biāo)。
4
backbone & head?解讀
四個(gè)參數(shù)的意義分別是:
第一個(gè)參數(shù) from :從哪一層獲得輸入,-1表示從上一層獲得,[-1, 6]表示從上層和第6層兩層獲得。
第二個(gè)參數(shù) number:表示有幾個(gè)相同的模塊,如果為9則表示有9個(gè)相同的模塊。
第三個(gè)參數(shù) module:模塊的名稱(chēng),這些模塊寫(xiě)在common.py中。
第四個(gè)參數(shù) args:類(lèi)的初始化參數(shù),用于解析作為 moudle 的傳入?yún)?shù)。
下面以第一個(gè)模塊Conv 為例介紹下common.py中的模塊
Conv 模塊定義如下:
class?Conv(nn.Module):????#?Standard?convolution????def?__init__(self,?c1,?c2,?k=1,?s=1,?p=None,?g=1,?act=True):??#?ch_in,?ch_out,?kernel,?stride,?padding,?groups????????"""????????@Pargm?c1:?輸入通道數(shù)????????@Pargm?c2:?輸出通道數(shù)????????@Pargm?k?:?卷積核大小(kernel_size)????????@Pargm?s?:?卷積步長(zhǎng)?(stride)????????@Pargm?p?:?特征圖填充寬度?(padding)????????@Pargm?g?:?控制分組,必須整除輸入的通道數(shù)(保證輸入的通道能被正確分組)????????"""????????super().__init__()????????#?https://oneflow.readthedocs.io/en/master/generated/oneflow.nn.Conv2d.html?highlight=Conv????????self.conv?=?nn.Conv2d(c1,?c2,?k,?s,?autopad(k,?p),?groups=g,?bias=False)????????self.bn?=?nn.BatchNorm2d(c2)????????self.act?=?nn.SiLU()?if?act?is?True?else?(act?if?isinstance(act,?nn.Module)?else?nn.Identity())????def?forward(self,?x):????????return?self.act(self.bn(self.conv(x)))????def?forward_fuse(self,?x):????????return?self.act(self.conv(x))
比如上面把width_multiple設(shè)置為了0.5,那么第一個(gè) [64, 6, 2, 2] 就會(huì)被解析為 [3,64*0.5=32,6,2,2],其中第一個(gè) 3 為輸入channel(因?yàn)檩斎?,32 為輸出channel。
在yolo.py (https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolo.py)的256行 有對(duì)yaml 文件的nc,depth_multiple等參數(shù)讀取,具體代碼如下:
anchors,?nc,?gd,?gw?=?d["anchors"],?d["nc"],?d["depth_multiple"],?d["width_multiple"]
"width_multiple"參數(shù)的作用前面介紹args參數(shù)中已經(jīng)介紹過(guò)了,那么"depth_multiple"又是什么作用呢?
在yolo.py (https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolo.py) 的257行有對(duì)參數(shù)的具體定義:
n?=?n_?=?max(round(n?*?gd),?1)?if?n?>?1?else?n??#?depth?gain?暫且將這段代碼當(dāng)作公式(1)
其中 gd 就是depth_multiple的值,n的值就是backbone中列表的第二個(gè)參數(shù):
根據(jù)公式(1)很容易看出 gd 影響 n 的大小,從而影響網(wǎng)絡(luò)的結(jié)構(gòu)大小。
后面各層之間的模塊數(shù)量、卷積核大小和數(shù)量等也都產(chǎn)生了變化,YOLOv5l 與 YOLOv5s 相比較起來(lái)訓(xùn)練參數(shù)的大小成倍數(shù)增長(zhǎng),
其模型的深度和寬度也會(huì)大很多,這就使得 YOLOv5l 的精度值要比 YOLOv5s 好很多,因此在最終推理時(shí)的檢測(cè)精度高,但是模型的推理速度更慢。
所以 YOLOv5 提供了不同的選擇,如果想要追求推理速度可選用較小一些的模型如 YOLOv5s、YOLOv5m,如果想要追求精度更高對(duì)推理速度要求不高的可以選擇其他兩個(gè)稍大的模型。
如下面這張圖:
yolov5模型復(fù)雜度比較圖
5
Conv模塊解讀
下面是根據(jù)yolov5s.yaml
(https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolov5s.yaml)?繪制的網(wǎng)絡(luò)整體結(jié)構(gòu)簡(jiǎn)化版。
yolov5s網(wǎng)絡(luò)整體結(jié)構(gòu)圖
詳細(xì)的網(wǎng)絡(luò)結(jié)構(gòu)圖:
https://oneflow-static.oss-cn-beijing.aliyuncs.com/one-yolo/imgs/yolov5s.onnx.png
通過(guò)export.py導(dǎo)出的onnx格式,并通過(guò) https://netron.app/ 網(wǎng)站導(dǎo)出的圖片(模型導(dǎo)出將在本教程的后續(xù)文章單獨(dú)介紹)。
模塊組件右邊參數(shù) 表示特征圖的的形狀,比如 在 第 一 層( Conv )輸入 圖片形狀為 [ 3, 640, 640] ,關(guān)于這些參數(shù),可以固定一張圖片輸入到網(wǎng)絡(luò)并通過(guò)yolov5s.yaml?
(https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolov5s.yaml)?的模型參數(shù)計(jì)算得到,并且可以在工程models/yolo.py(https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolo.py) 通過(guò)代碼進(jìn)行print查看,詳細(xì)數(shù)據(jù)可以參考附件表2.1。
6
yolo.py模塊解讀
文件地址(https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolo.py)
文件主要包含三大部分: Detect類(lèi)、?Model類(lèi)和 parse_model 函數(shù)
可以通過(guò) python models/yolo.py --cfg yolov5s.yaml運(yùn)行該腳本進(jìn)行觀察
7
parse_model函數(shù)解讀
def?parse_model(d,?ch):??#?model_dict,?input_channels(3)????"""用在下面Model模塊中????解析模型文件(字典形式),并搭建網(wǎng)絡(luò)結(jié)構(gòu)????這個(gè)函數(shù)其實(shí)主要做的就是:?更新當(dāng)前層的args(參數(shù)),計(jì)算c2(當(dāng)前層的輸出channel)?=>??????????????????????????使用當(dāng)前層的參數(shù)搭建當(dāng)前層?=>??????????????????????????生成?layers?+?save????@Params?d:?model_dict?模型文件?字典形式?{dict:7}??[yolov5s.yaml](https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolov5s.yaml)中的6個(gè)元素?+?ch????#Params?ch:?記錄模型每一層的輸出channel?初始ch=[3]?后面會(huì)刪除????@return?nn.Sequential(*layers):?網(wǎng)絡(luò)的每一層的層結(jié)構(gòu)????@return?sorted(save):?把所有層結(jié)構(gòu)中from不是-1的值記下?并排序?[4,?6,?10,?14,?17,?20,?23]????"""????LOGGER.info(f"\n{"":>3}{"from":>18}{"n":>3}{"params":>10}??{"module":<40}{"arguments":<30}")????#?讀取d字典中的anchors和parameters(nc、depth_multiple、width_multiple)????anchors,?nc,?gd,?gw?=?d["anchors"],?d["nc"],?d["depth_multiple"],?d["width_multiple"]????#?na:?number?of?anchors?每一個(gè)predict?head上的anchor數(shù)?=?3????na?=?(len(anchors[0])?//?2)?if?isinstance(anchors,?list)?else?anchors??#?number?of?anchors????no?=?na?*?(nc?+?5)??#?number?of?outputs?=?anchors?*?(classes?+?5)?每一個(gè)predict?head層的輸出channel?????#?開(kāi)始搭建網(wǎng)絡(luò)????#?layers:?保存每一層的層結(jié)構(gòu)????#?save:?記錄下所有層結(jié)構(gòu)中from中不是-1的層結(jié)構(gòu)序號(hào)????#?c2:?保存當(dāng)前層的輸出channel????layers,?save,?c2?=?[],?[],?ch[-1]??#?layers,?savelist,?ch?out????# enumerate()?函數(shù)用于將一個(gè)可遍歷的數(shù)據(jù)對(duì)象(如列表、元組或字符串)組合為一個(gè)索引序列,同時(shí)列出數(shù)據(jù)和數(shù)據(jù)下標(biāo),一般用在 for 循環(huán)當(dāng)中。????for?i,?(f,?n,?m,?args)?in?enumerate(d["backbone"]?+?d["head"]):??#?from,?number,?module,?args????????m?=?eval(m)?if?isinstance(m,?str)?else?m??#?eval?strings????????for?j,?a?in?enumerate(args):????????????#?args是一個(gè)列表,這一步把列表中的內(nèi)容取出來(lái)????????????with?contextlib.suppress(NameError):????????????????args[j]?=?eval(a)?if?isinstance(a,?str)?else?a??#?eval?strings????????????????#?將深度與深度因子相乘,計(jì)算層深度。深度最小為1. ????????n?=?n_?=?max(round(n?*?gd),?1)?if?n?>?1?else?n??#?depth?gain????????????????#?如果當(dāng)前的模塊m在本項(xiàng)目定義的模塊類(lèi)型中,就可以處理這個(gè)模塊????????if?m?in?(Conv,?GhostConv,?Bottleneck,?GhostBottleneck,?SPP,?SPPF,?DWConv,?MixConv2d,?Focus,?CrossConv,?????????????????BottleneckCSP,?C3,?C3TR,?C3SPP,?C3Ghost,?nn.ConvTranspose2d,?DWConvTranspose2d,?C3x):????????????# c1:?輸入通道數(shù) c2:輸出通道數(shù)????????????c1,?c2?=?ch[f],?args[0]?????????????#?該層不是最后一層,則將通道數(shù)乘以寬度因子?也就是說(shuō),寬度因子作用于除了最后一層之外的所有層????????????if?c2?!=?no:??#?if?not?output????????????????# make_divisible的作用,使得原始的通道數(shù)乘以寬度因子之后取整到8的倍數(shù),這樣處理一般是讓模型的并行性和推理性能更好。????????????????c2?=?make_divisible(c2?*?gw,?8)????????????#?將前面的運(yùn)算結(jié)果保存在args中,它也就是這個(gè)模塊最終的輸入?yún)?shù)。????????????args?=?[c1,?c2,?*args[1:]]?????????????#?根據(jù)每層網(wǎng)絡(luò)參數(shù)的不同,分別處理參數(shù)?具體各個(gè)類(lèi)的參數(shù)是什么請(qǐng)參考它們的__init__方法這里不再詳細(xì)解釋了????????????if?m?in?[BottleneckCSP,?C3,?C3TR,?C3Ghost,?C3x]:????????????????#?這里的意思就是重復(fù)n次,比如conv這個(gè)模塊重復(fù)n次,這個(gè)n?是上面算出來(lái)的?depth?????????????????args.insert(2,?n)??#?number?of?repeats????????????????n?=?1????????elif?m?is?nn.BatchNorm2d:????????????args?=?[ch[f]]????????elif?m?is?Concat:????????????c2?=?sum(ch[x]?for?x?in?f)????????elif?m?is?Detect:????????????args.append([ch[x]?for?x?in?f])????????????if?isinstance(args[1],?int):??#?number?of?anchors????????????????args[1]?=?[list(range(args[1]?*?2))]?*?len(f)????????elif?m?is?Contract:????????????c2?=?ch[f]?*?args[0]?**?2????????elif?m?is?Expand:????????????c2?=?ch[f]?//?args[0]?**?2????????else:????????????c2?=?ch[f]????????#?構(gòu)建整個(gè)網(wǎng)絡(luò)模塊?這里就是根據(jù)模塊的重復(fù)次數(shù)n以及模塊本身和它的參數(shù)來(lái)構(gòu)建這個(gè)模塊和參數(shù)對(duì)應(yīng)的Module????????m_?=?nn.Sequential(*(m(*args)?for?_?in?range(n)))?if?n?>?1?else?m(*args)??#?module????????#?獲取模塊(module type)具體名例如 models.common.Conv , models.common.C3 , models.common.SPPF 等。??????? t = str(m)[8:-2].replace("__main__.", "")??#? replace函數(shù)作用是字符串"__main__"替換為"",在當(dāng)前項(xiàng)目沒(méi)有用到這個(gè)替換。????????np?=?sum(x.numel()?for?x?in?m_.parameters())??#?number?params????????m_.i,?m_.f,?m_.type,?m_.np?=?i,?f,?t,?np??#?attach?index,?"from"?index,?type,?number?params????????LOGGER.info(f"{i:>3}{str(f):>18}{n_:>3}{np:10.0f}??{t:<40}{str(args):<30}")??#?print????????"""????????如果x不是-1,則將其保存在save列表中,表示該層需要保存特征圖。????????這里?x?%?i?與?x?等價(jià)例如在最后一層?:?????????f?=?[17,20,23]?,?i?=?24?????????y?=?[?x?%?i?for?x?in?([f]?if?isinstance(f,?int)?else?f)?if?x?!=?-1?]????????print(y)?#?[17,?20,?23]?????????#?寫(xiě)成x % i 可能因?yàn)椋篿 - 1 =?-1 % i (比如 f =?[-1],則?[x % i for x in f]?代表?[11]?)????????"""????????save.extend(x?%?i?for?x?in?([f]?if?isinstance(f,?int)?else?f)?if?x?!=?-1)??#?append?to?savelist????????layers.append(m_)????????if?i?==?0:?#?如果是初次迭代,則新創(chuàng)建一個(gè)ch(因?yàn)樾螀h在創(chuàng)建第一個(gè)網(wǎng)絡(luò)模塊時(shí)需要用到,所以創(chuàng)建網(wǎng)絡(luò)模塊之后再初始化ch)????????????ch?=?[]????????ch.append(c2)????#?將所有的層封裝為nn.Sequential?,?對(duì)保存的特征圖排序????return?nn.Sequential(*layers),?sorted(save)
8
Model類(lèi)解讀
class?Model(nn.Module):????#?YOLOv5?model????def?__init__(self,?cfg="[yolov5s.yaml](https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolov5s.yaml)",?ch=3,?nc=None,?anchors=None):??#?model,?input?channels,?number?of?classes????????super().__init__()????????#?如果cfg已經(jīng)是字典,則直接賦值,否則先加載cfg路徑的文件為字典并賦值給self.yaml。????????if?isinstance(cfg,?dict):?????????????self.yaml?=?cfg??#?model?dict????????else:??#?is?*.yaml??加載yaml模塊????????????import?yaml??#?for?flow?hub?????????????self.yaml_file?=?Path(cfg).name????????????with?open(cfg,?encoding="ascii",?errors="ignore")?as?f:????????????????self.yaml?=?yaml.safe_load(f)??#?model?dict??從yaml文件中加載出字典????????#?Define?model????????# ch:?輸入通道數(shù)。?假如self.yaml有鍵‘ch’,則將該鍵對(duì)應(yīng)的值賦給內(nèi)部變量ch。假如沒(méi)有‘ch’,則將形參ch賦給內(nèi)部變量ch????????ch?=?self.yaml["ch"]?=?self.yaml.get("ch",?ch)??#?input?channels????????#?假如yaml中的nc和方法形參中的nc不一致,則覆蓋yaml中的nc。????????if?nc?and?nc?!=?self.yaml["nc"]:????????????LOGGER.info(f"Overriding?model.yaml?nc={self.yaml["nc"]}?with?nc={nc}")????????????self.yaml["nc"]?=?nc??#?override?yaml?value????????if?anchors:?#?anchors??先驗(yàn)框的配置????????????LOGGER.info(f"Overriding?model.yaml?anchors?with?anchors={anchors}")????????????self.yaml["anchors"]?=?round(anchors)??#?override?yaml?value????????#?得到模型,以及對(duì)應(yīng)的保存的特征圖列表。????????????self.model,?self.save?=?parse_model(deepcopy(self.yaml),?ch=[ch])??#?model,?savelist????????self.names?=?[str(i)?for?i?in?range(self.yaml["nc"])]??#?default?names?初始化類(lèi)名列表,默認(rèn)為[0,1,2...]????????????????#?self.inplace=True??默認(rèn)True??節(jié)省內(nèi)存????????self.inplace?=?self.yaml.get("inplace",?True)????????#?Build?strides,?anchors??確定步長(zhǎng)、步長(zhǎng)對(duì)應(yīng)的錨框????????m?=?self.model[-1]??#?Detect()????????if?isinstance(m,?Detect):?#?檢驗(yàn)?zāi)P偷淖詈笠粚邮荄etect模塊????????????s?=?256??#?2x?min?stride????????????m.inplace?=?self.inplace????????????#?計(jì)算三個(gè)feature?map下采樣的倍率??[8,?16,?32]????????????m.stride?=?flow.tensor([s?/?x.shape[-2]?for?x?in?self.forward(flow.zeros(1,?ch,?s,?s))])??#?forward????????????#?檢查anchor順序與stride順序是否一致?anchor的順序應(yīng)該是從小到大,這里排一下序????????????check_anchor_order(m)??#?must?be?in?pixel-space?(not?grid-space)????????????#?對(duì)應(yīng)的anchor進(jìn)行縮放操作,原因:得到anchor在實(shí)際的特征圖中的位置,因?yàn)榧虞d的原始anchor大小是相對(duì)于原圖的像素,但是經(jīng)過(guò)卷積池化之后,特征圖的長(zhǎng)寬變小了。????????????m.anchors?/=?m.stride.view(-1,?1,?1)????????????self.stride?=?m.stride????????????self._initialize_biases()?#?only?run?once??初始化偏置?????????#?Init?weights,?biases????????#?調(diào)用oneflow_utils.py下initialize_weights初始化模型權(quán)重????????initialize_weights(self)????????self.info()?#?打印模型信息????????LOGGER.info("")????#?管理前向傳播函數(shù)????def?forward(self,?x,?augment=False,?profile=False,?visualize=False):????????if?augment:#?是否在測(cè)試時(shí)也使用數(shù)據(jù)增強(qiáng)??Test?Time?Augmentation(TTA)????????????return?self._forward_augment(x)??#?augmented?inference,?None????????return?self._forward_once(x,?profile,?visualize)??#?single-scale?inference,?train????#?帶數(shù)據(jù)增強(qiáng)的前向傳播????def?_forward_augment(self,?x):????????img_size?=?x.shape[-2:]??#?height,?width????????s?=?[1,?0.83,?0.67]??#?scales????????f?=?[None,?3,?None]??#?flips?(2-ud,?3-lr)????????y?=?[]??#?outputs????????for?si,?fi?in?zip(s,?f):????????????xi?=?scale_img(x.flip(fi)?if?fi?else?x,?si,?gs=int(self.stride.max()))????????????yi?=?self._forward_once(xi)[0]??#?forward????????????#?cv2.imwrite(f"img_{si}.jpg",?255?*?xi[0].cpu().numpy().transpose((1,?2,?0))[:,?:,?::-1])??#?save????????????yi?=?self._descale_pred(yi,?fi,?si,?img_size)????????????y.append(yi)????????y?=?self._clip_augmented(y)??#?clip?augmented?tails????????return?flow.cat(y,?1),?None??#?augmented?inference,?train????#?前向傳播具體實(shí)現(xiàn)????def?_forward_once(self,?x,?profile=False,?visualize=False):????????"""????????@params?x:?輸入圖像????????@params?profile:?True?可以做一些性能評(píng)估????????@params?feature_vis:?True?可以做一些特征可視化????????"""????????#?y:?存放著self.save=True的每一層的輸出,因?yàn)楹竺娴奶卣魅诤喜僮饕玫竭@些特征圖????????y,?dt?=?[],?[]??#?outputs????????#?前向推理每一層結(jié)構(gòu)???m.i=index???m.f=from???m.type=類(lèi)名???m.np=number?of?params????????for?m?in?self.model:????????????#?if?not?from?previous?layer???m.f=當(dāng)前層的輸入來(lái)自哪一層的輸出??s的m.f都是-1????????????if?m.f?!=?-1:??#?if?not?from?previous?layer????????????????x?=?y[m.f]?if?isinstance(m.f,?int)?else?[x?if?j?==?-1?else?y[j]?for?j?in?m.f]??#?from?earlier?layers????????????if?profile:????????????????self._profile_one_layer(m,?x,?dt)????????????x?=?m(x)??#?run????????????y.append(x?if?m.i?in?self.save?else?None)??#?save?output????????????if?visualize:????????????????feature_visualization(x,?m.type,?m.i,?save_dir=visualize)????????return?x????#?將推理結(jié)果恢復(fù)到原圖圖片尺寸(逆操作)????def?_descale_pred(self,?p,?flips,?scale,?img_size):????????#?de-scale?predictions?following?augmented?inference?(inverse?operation)????????"""用在上面的__init__函數(shù)上????????將推理結(jié)果恢復(fù)到原圖圖片尺寸??Test?Time?Augmentation(TTA)中用到?????????de-scale?predictions?following?augmented?inference?(inverse?operation)????????@params?p:?推理結(jié)果????????@params?flips:????????@params?scale:????????@params?img_size:????????"""????????if?self.inplace:????????????p[...,?:4]?/=?scale??#?de-scale????????????if?flips?==?2:????????????????p[...,?1]?=?img_size[0]?-?p[...,?1]??#?de-flip?ud????????????elif?flips?==?3:????????????????p[...,?0]?=?img_size[1]?-?p[...,?0]??#?de-flip?lr????????else:????????????x,?y,?wh?=?p[...,?0:1]?/?scale,?p[...,?1:2]?/?scale,?p[...,?2:4]?/?scale??#?de-scale????????????if?flips?==?2:????????????????y?=?img_size[0]?-?y??#?de-flip?ud????????????elif?flips?==?3:????????????????x?=?img_size[1]?-?x??#?de-flip?lr????????????p?=?flow.cat((x,?y,?wh,?p[...,?4:]),?-1)????????return?p????#?這個(gè)是TTA的時(shí)候?qū)υ瓐D片進(jìn)行裁剪,也是一種數(shù)據(jù)增強(qiáng)方式,用在TTA測(cè)試的時(shí)候。????def?_clip_augmented(self,?y):????????#?Clip?YOLOv5?augmented?inference?tails????????nl?=?self.model[-1].nl??#?number?of?detection?layers?(P3-P5)????????g?=?sum(4?**?x?for?x?in?range(nl))??#?grid?points????????e?=?1??#?exclude?layer?count????????i?=?(y[0].shape[1]?//?g)?*?sum(4?**?x?for?x?in?range(e))??#?indices????????y[0]?=?y[0][:,?:-i]??#?large????????i?=?(y[-1].shape[1]?//?g)?*?sum(4?**?(nl?-?1?-?x)?for?x?in?range(e))??#?indices????????y[-1]?=?y[-1][:,?i:]??#?small????????return?y????#?打印日志信息??前向推理時(shí)間????def?_profile_one_layer(self,?m,?x,?dt):????????c?=?isinstance(m,?Detect)??#?is?final?layer,?copy?input?as?inplace?fix????????o?=?thop.profile(m,?inputs=(x.copy()?if?c?else?x,),?verbose=False)[0]?/?1E9?*?2?if?thop?else?0??#?FLOPs????????t?=?time_sync()????????for?_?in?range(10):????????????m(x.copy()?if?c?else?x)????????dt.append((time_sync()?-?t)?*?100)????????if?m?==?self.model[0]:????????????LOGGER.info(f"{"time?(ms)":>10s}?{"GFLOPs":>10s}?{"params":>10s}??module")????????LOGGER.info(f"{dt[-1]:10.2f}?{o:10.2f}?{m.np:10.0f}??{m.type}")????????if?c:????????????LOGGER.info(f"{sum(dt):10.2f}?{"-":>10s}?{"-":>10s}??Total")????#?initialize?biases?into?Detect(),?cf?is?class?frequency????def?_initialize_biases(self,?cf=None):?????????#?https://arxiv.org/abs/1708.02002?section?3.3????????#?cf?=?flow.bincount(flow.tensor(np.concatenate(dataset.labels,?0)[:,?0]).long(),?minlength=nc)?+?1.????????m?=?self.model[-1]??#?Detect()?module????????for?mi,?s?in?zip(m.m,?m.stride):??#?from????????????b?=?mi.bias.view(m.na,?-1).detach()??#?conv.bias(255)?to?(3,85)????????????b[:,?4]?+=?math.log(8?/?(640?/?s)?**?2)??#?obj?(8?objects?per?640?image)????????????b[:,?5:]?+=?math.log(0.6?/?(m.nc?-?0.999999))?if?cf?is?None?else?flow.log(cf?/?cf.sum())??#?cls????????????mi.bias?=?flow.nn.Parameter(b.view(-1),?requires_grad=True)????#??打印模型中最后Detect層的偏置biases信息(也可以任選哪些層biases信息)????def?_print_biases(self):????????"""????????打印模型中最后Detect模塊里面的卷積層的偏置biases信息(也可以任選哪些層biases信息)????????"""????????m?=?self.model[-1]??#?Detect()?module????????for?mi?in?m.m:??#?from????????????b?=?mi.bias.detach().view(m.na,?-1).T??#?conv.bias(255)?to?(3,85)????????????LOGGER.info(????????????????("%6g?Conv2d.bias:"?+?"%10.3g"?*?6)?%?(mi.weight.shape[1],?*b[:5].mean(1).tolist(),?b[5:].mean()))????def?_print_weights(self):????????"""????????打印模型中Bottleneck層的權(quán)重參數(shù)weights信息(也可以任選哪些層weights信息)????????"""????????for?m?in?self.model.modules():????????????if?type(m)?is?Bottleneck:????????????????LOGGER.info("%10.3g"?%?(m.w.detach().sigmoid()?*?2))??#?shortcut?weights????????# fuse()是用來(lái)進(jìn)行conv和bn層合并,為了提速模型推理速度。????def?fuse(self):??#?fuse?model?Conv2d()?+?BatchNorm2d()?layers????????"""用在detect.py、val.py????????fuse?model?Conv2d()?+?BatchNorm2d()?layers????????調(diào)用oneflow_utils.py中的fuse_conv_and_bn函數(shù)和common.py中Conv模塊的fuseforward函數(shù)????????"""????????LOGGER.info("Fusing?layers...?")????????for?m?in?self.model.modules():????????????#?如果當(dāng)前層是卷積層Conv且有bn結(jié)構(gòu),?那么就調(diào)用fuse_conv_and_bn函數(shù)講conv和bn進(jìn)行融合,?加速推理????????????if?isinstance(m,?(Conv,?DWConv))?and?hasattr(m,?"bn"):????????????????m.conv?=?fuse_conv_and_bn(m.conv,?m.bn)??#?update?conv????????????????delattr(m,?"bn")??#?remove?batchnorm??移除bn?remove?batchnorm????????????????m.forward?=?m.forward_fuse??#?update?forward?更新前向傳播?update?forward?(反向傳播不用管,?因?yàn)檫@種推理只用在推理階段)????????self.info()??#?打印conv+bn融合后的模型信息????????return?self????#?打印模型結(jié)構(gòu)信息?在當(dāng)前類(lèi)__init__函數(shù)結(jié)尾處有調(diào)用????def?info(self,?verbose=False,?img_size=640):??#?print?model?information????????model_info(self,?verbose,?img_size)????def?_apply(self,?fn):????????#?Apply?to(),?cpu(),?cuda(),?half()?to?model?tensors?that?are?not?parameters?or?registered?buffers????????self?=?super()._apply(fn)????????m?=?self.model[-1]??#?Detect()????????if?isinstance(m,?Detect):????????????m.stride?=?fn(m.stride)????????????m.grid?=?list(map(fn,?m.grid))????????????if?isinstance(m.anchor_grid,?list):????????????????m.anchor_grid?=?list(map(fn,?m.anchor_grid))????????return?self
9
Detect類(lèi)解讀
class?Detect(nn.Module):????"""????Detect模塊是用來(lái)構(gòu)建Detect層的,將輸入feature?map?通過(guò)一個(gè)卷積操作和公式計(jì)算到我們想要的shape,?為后面的計(jì)算損失或者NMS后處理作準(zhǔn)備????"""????stride?=?None??#?strides?computed?during?build????onnx_dynamic?=?False??#?ONNX?export?parameter????export?=?False??#?export?mode????def?__init__(self,?nc=80,?anchors=(),?ch=(),?inplace=True):??#?detection?layer????????super().__init__()????????#??nc:分類(lèi)數(shù)量????????self.nc?=?nc??#?number?of?classes??????????#??no:每個(gè)anchor的輸出數(shù)????????self.no?=?nc?+?5??#?number?of?outputs?per?anchor????????#?nl:預(yù)測(cè)層數(shù),此次為3????????self.nl?=?len(anchors)??#?number?of?detection?layers????????#??na:anchors的數(shù)量,此次為3????????self.na?=?len(anchors[0])?//?2??#?number?of?anchors????????#??grid:格子坐標(biāo)系,左上角為(1,1),右下角為(input.w/stride,input.h/stride)????????self.grid?=?[flow.zeros(1)]?*?self.nl??#?init?grid????????self.anchor_grid?=?[flow.zeros(1)]?*?self.nl??#?init?anchor?grid????????#?寫(xiě)入緩存中,并命名為anchors????????self.register_buffer("anchors",?flow.tensor(anchors).float().view(self.nl,?-1,?2))??#?shape(nl,na,2)????????#?將輸出通過(guò)卷積到?self.no?*?self.na?的通道,達(dá)到全連接的作用????????self.m?=?nn.ModuleList(nn.Conv2d(x,?self.no?*?self.na,?1)?for?x?in?ch)??#?output?conv????????self.inplace?=?inplace??#?use?inplace?ops?(e.g.?slice?assignment)????def?forward(self,?x):????????z?=?[]??#?inference?output????????for?i?in?range(self.nl):????????????x[i]?=?self.m[i](x[i])??#?conv????????????bs,?_,?ny,?nx?=?x[i].shape??#?x(bs,255,20,20)?to?x(bs,3,20,20,85)????????????x[i]?=?x[i].view(bs,?self.na,?self.no,?ny,?nx).permute(0,?1,?3,?4,?2).contiguous()????????????if?not?self.training:??#?inference????????????????if?self.onnx_dynamic?or?self.grid[i].shape[2:4]?!=?x[i].shape[2:4]:????????????????????#?向前傳播時(shí)需要將相對(duì)坐標(biāo)轉(zhuǎn)換到grid絕對(duì)坐標(biāo)系中????????????????????self.grid[i],?self.anchor_grid[i]?=?self._make_grid(nx,?ny,?i)????????????????y?=?x[i].sigmoid()????????????????if?self.inplace:????????????????????y[...,?0:2]?=?(y[...,?0:2]?*?2?+?self.grid[i])?*?self.stride[i]??#?xy????????????????????y[...,?2:4]?=?(y[...,?2:4]?*?2)?**?2?*?self.anchor_grid[i]??#?wh????????????????else:??#?for?YOLOv5?on?AWS?Inferentia?https://github.com/ultralytics/yolov5/pull/2953????????????????????xy,?wh,?conf?=?y.split((2,?2,?self.nc?+?1),?4)??#?y.tensor_split((2,?4,?5),?4)??????????????????????xy?=?(xy?*?2?+?self.grid[i])?*?self.stride[i]??#?xy????????????????????wh?=?(wh?*?2)?**?2?*?self.anchor_grid[i]??#?wh????????????????????y?=?flow.cat((xy,?wh,?conf),?4)????????????????z.append(y.view(bs,?-1,?self.no))????????return?x?if?self.training?else?(flow.cat(z,?1),)?if?self.export?else?(flow.cat(z,?1),?x)????????#?相對(duì)坐標(biāo)轉(zhuǎn)換到grid絕對(duì)坐標(biāo)系????def?_make_grid(self,?nx=20,?ny=20,?i=0):????????d?=?self.anchors[i].device????????t?=?self.anchors[i].dtype????????shape?=?1,?self.na,?ny,?nx,?2??#?grid?shape????????y,?x?=?flow.arange(ny,?device=d,?dtype=t),?flow.arange(nx,?device=d,?dtype=t)???????????????yv,?xv?=?flow.meshgrid(y,?x,?indexing="ij")????????grid?=?flow.stack((xv,?yv),?2).expand(shape)?-?0.5??#?add?grid?offset,?i.e.?y?=?2.0?*?x?-?0.5????????anchor_grid?=?(self.anchors[i]?*?self.stride[i]).view((1,?self.na,?1,?1,?2)).expand(shape)????????return?grid,?anchor_grid
10
附件
表2.1 yolov5s.yaml解析表
(https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolov5s.yaml)
層數(shù) | form | moudule | arguments | input | output |
---|---|---|---|---|---|
0 | -1 | Conv | [3, 32, 6, 2, 2] | [3, 640, 640] | [32, 320, 320] |
1 | -1 | Conv | [32, 64, 3, 2] | [32, 320, 320] | [64, 160, 160] |
2 | -1 | C3 | [64, 64, 1] | [64, 160, 160] | [64, 160, 160] |
3 | -1 | Conv | [64, 128, 3, 2] | [64, 160, 160] | [128, 80, 80] |
4 | -1 | C3 | [128, 128, 2] | [128, 80, 80] | [128, 80, 80] |
5 | -1 | Conv | [128, 256, 3, 2] | [128, 80, 80] | [256, 40, 40] |
6 | -1 | C3 | [256, 256, 3] | [256, 40, 40] | [256, 40, 40] |
7 | -1 | Conv | [256, 512, 3, 2] | [256, 40, 40] | [512, 20, 20] |
8 | -1 | C3 | [512, 512, 1] | [512, 20, 20] | [512, 20, 20] |
9 | -1 | SPPF | [512, 512, 5] | [512, 20, 20] | [512, 20, 20] |
10 | -1 | Conv | [512, 256, 1, 1] | [512, 20, 20] | [256, 20, 20] |
11 | -1 | Upsample | [None, 2, "nearest"] | [256, 20, 20] | [256, 40, 40] |
12 | [-1, 6] | Concat | [1] | [1, 256, 40, 40],[1, 256, 40, 40] | [512, 40, 40] |
13 | -1 | C3 | [512, 256, 1, False] | [512, 40, 40] | [256, 40, 40] |
14 | -1 | Conv | [256, 128, 1, 1] | [256, 40, 40] | [128, 40, 40] |
15 | -1 | Upsample | [None, 2, "nearest"] | [128, 40, 40] | [128, 80, 80] |
16 | [-1, 4] | Concat | [1] | [1, 128, 80, 80],[1, 128, 80, 80] | [256, 80, 80] |
17 | -1 | C3 | [256, 128, 1, False] | [256, 80, 80] | [128, 80, 80] |
18 | -1 | Conv | [128, 128, 3, 2] | [128, 80, 80] | [128, 40, 40] |
19 | [-1, 14] | Concat | [1] | [1, 128, 40, 40],[1, 128, 40, 40] | [256, 40, 40] |
20 | -1 | C3 | [256, 256, 1, False] | [256, 40, 40] | [256, 40, 40] |
21 | -1 | Conv | [256, 256, 3, 2] | [256, 40, 40] | [256, 20, 20] |
22 | [-1, 10] | Concat | [1] | [1, 256, 20, 20],[1, 256, 20, 20] | [512, 20, 20] |
23 | -1 | C3 | [512, 512, 1, False] | [512, 20, 20] | [512, 20, 20] |
24 | [17, 20, 23] | Detect | [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]] | [1, 128, 80, 80],[1, 256, 40, 40],[1, 512, 20, 20] | [1, 3, 80, 80, 85],[1, 3, 40, 40, 85],[1, 3, 20, 20, 85] |
11
參考文章
https://zhuanlan.zhihu.com/p/436891962?ivk_sa=1025922q
https://zhuanlan.zhihu.com/p/110204563
https://www.it610.com/article/1550621248474648576.htm
其他人都在看
OneFlow-ONNX v0.6.0正式發(fā)布
下載量突破10億,MinIO的開(kāi)源啟示錄
關(guān)于ChatGPT的一切;CUDA入門(mén)之矩陣乘
李白:你的模型權(quán)重很不錯(cuò),可惜被我沒(méi)收了
單RTX3090訓(xùn)練YOLOv5s,時(shí)間減少11個(gè)小時(shí)
比快更快,開(kāi)源Stable Diffusion刷新作圖速度
OneEmbedding:單卡訓(xùn)練TB級(jí)推薦模型不是夢(mèng)
關(guān)鍵詞: 網(wǎng)絡(luò)結(jié)構(gòu)