├── README.md ├── structure_CN.md └── tutorials_img ├── ADown.svg ├── BottleNeck.svg ├── GELAN_in_paper.png ├── RepNCSP.svg ├── RepNCSPELAN4.svg ├── SPPELAN.svg ├── inference_structure.svg ├── structure_in_paper.png └── train_structure.svg /README.md: -------------------------------------------------------------------------------- 1 | # Language 语言 2 | 3 | [English](./structure.md) [简体中文](structure_CN.md) 4 | 5 | # Paper Summary 6 | 7 | * **Auxiliary Reversible Branch (Training only)** 8 | 9 | Maintenance of complete information by introducing reversible architecture, but adding main branch to reversible architecture will consume a lot of inference costs. 10 | 11 | 'Reversible' is not the only necessary condition in the inference stage. 12 | 13 | * **Multi-level Auxiliary Information (Training only)** 14 | 15 | Each feature pyramid should to receive information about all target objects. Multi-level auxiliary information is then to aggregate the gradient information containing all target objects, and pass it to the main branch and then update parameters. 16 | 17 | * **GELAN Block** 18 | 19 | GELAN = CSPNet + ELAN 20 | 21 | # Model Structure overview 22 | 23 | ## Train model structure 24 | 25 | This structure based on `models/detect/yolov9.yaml`. 26 | 27 | | train_structure | train_structure | 28 | | :----------------------------------------------------------: | :----------------------------------------------------------: | 29 | | Train model structure | Train model structure (in paper) | 30 | 31 | The ***Auxiliary Reversible Branch*** and the ***Multi-level Auxiliary Information*** exists only in training mode, they help backbone achieve better performance. In this stage, the forward propagation outputs is [16, 19, 22, 31, 34, 37], and the outputs will into Detect head. By the [31, 34, 37] predict and GT label, the model can get more detail gradients information to help the[#5, #7, #9] blocks to update weights. So despite those branch will dropout in inference mode, the backbone have more rubust weights. 32 | 33 | ## Inference model structure 34 | 35 | This structure based on `models/detect/gelan.yaml`. Actually, this model is derived from pruning of the Train model (`models/detect/yolov9.yaml`). 36 | 37 | ```python 38 | Note: 39 | models/detect/gelan.yaml <---> models/detect/yolov9.yaml 40 | models/detect/gelan-c.yaml <---> models/detect/yolov9-c.yaml 41 | models/detect/gelan-e.yaml <---> models/detect/yolov9-e.yaml 42 | ``` 43 | 44 | ![train_structure](tutorials_img/inference_structure.svg) 45 | 46 | The model structure is similar to the previous version when inference mode. Note the re-parameter and GELAN blocks. 47 | 48 | Through Detect Head (mainly NMS and some others) we can get the object detection results. 49 | 50 | ## Blocks detail 51 | 52 | * **Silence** `models.common.Silence`: Do nothing. It's only use to provide source input data for Auxiliary Reversible Branch. 53 | 54 | * **CBS** `models.common.Conv`: Conv2d + BatchNorm2d + SiLU (Default act) 55 | 56 | Note: The BN layer can re-parameter when inference. (ref: [RepVGG](https://openaccess.thecvf.com/content/CVPR2021/papers/Ding_RepVGG_Making_VGG-Style_ConvNets_Great_Again_CVPR_2021_paper.pdf)) 57 | 58 | * **ELAN** `models.common.RepNCSPELAN4`: 59 | 60 | | train_structure | image-20240229151320013 | 61 | | :----------------------------------------------------------: | :----------------------------------------------------------: | 62 | | RepNCSPELAN4 Block | RepNCSPELAN4 (GELAN in paper) | 63 | | train_structure | ![train_structure](tutorials_img/BottleNeck.svg) | 64 | | RepNCSP Block | RepNBottleNeck | 65 | 66 | * **ELAN-SPP** `models.common.SPPELAN`: 67 | 68 | 69 | 70 | * **ADown `models.common.ADown`:** 71 | 72 | This block replaces a part of `CBS` in`yolov9-c.yaml` and `yolov9-e.yaml`. 73 | 74 | 75 | 76 | --- 77 | 78 | If you find some mistakes, please tell me: divided.by.07@gmail.com 79 | -------------------------------------------------------------------------------- /structure_CN.md: -------------------------------------------------------------------------------- 1 | # 语言 Language 2 | 3 | [English](./structure.md) [简体中文](structure_CN.md) 4 | 5 | # 论文总结 6 | 7 | 论文提出了PGI(Programmable Gradient Information)思想,即反向传播过程梯度信息丢失的问题需要以被解决。一共提出三个重要部分: 8 | 9 | * **辅助可逆分支**(Auxiliary Reversible Branch) 10 | 11 | 通过引入可逆结构来保证完整的信息,但在可逆结构中增加backbone参数量会消耗大量的推理成本。作者提出观点:“可逆”并不是推理阶段的唯一必要条件,因此设计了辅助可逆分支,在训练过程中帮助backbone更好地获得丰富的返回梯度信息,使得backbone具有更高的表现;而在推理过程中丢弃该分支,使得推理过程并没有增加时间损耗。该模块仅在**训练模式**使用。 12 | 13 | 14 | 15 | * **多级辅助信息**(Multi-level Auxiliary Information ) 16 | 17 | 每个特征金字塔应该接收所有目标对象的梯度信息,然后将包含所有目标对象的梯度信息进行多级辅助信息聚合,传递给主分支进行权重的更新。该模块仅在**训练模式**使用,因为其返回的梯度从辅助可逆分支中获取。 18 | 19 | * **GELAN 模块** 20 | 21 | GELAN模块主要由CSPNet和ELAN结构组合而成,并参考了Re-parameter方法。 22 | 23 | GELAN = CSPNet + ELAN 24 | 25 | # 模型结构概览 26 | 27 | ## 训练阶段模型结构 28 | 29 | 该结构基于 `models/detect/yolov9.yaml`. 30 | 31 | | train_structure | train_structure | 32 | | :----------------------------------------------------------: | :----------------------------------------------------------: | 33 | | Train model structure | Train model structure (in paper) | 34 | 35 | ***辅助可逆分支*** 和 ***多级辅助信息*** 仅在训练模式存在,用于帮助backbone获得更好的表现。在训练阶段,共有6个输出特征图,如上图中的[16, 19, 22, 31, 34, 37],这6个输出特征图送入 Detect head 后即可得到预测label。相较于先前的yolo,额外的 [31, 34, 37] 输出得到的更多label能够与 GT label 计算损失后,从辅助可逆回路中将梯度信息更好地传入[#5, #7, #9] 模块中,更新backbone的权重。 36 | 37 | ## Inference model structure 38 | 39 | 该结构基于 `models/detect/gelan.yaml`。事实上,该模型基于 `models/detect/yolov9.yaml`在结构上减去辅助分支而得来。 40 | 41 | ```python 42 | Note: 43 | models/detect/gelan.yaml <---> models/detect/yolov9.yaml 44 | models/detect/gelan-c.yaml <---> models/detect/yolov9-c.yaml 45 | models/detect/gelan-e.yaml <---> models/detect/yolov9-e.yaml 46 | ``` 47 | 48 | ![train_structure](tutorials_img/inference_structure.svg) 49 | 50 | 在推理模式下,模型结构与以前的yolo版本相似。注意re-parameter和GELAN块。 51 | 52 | 通过Detect Head(主要是NMS和其他一些操作)可以得到目标检测结果。 53 | 54 | ### Blocks 细节 55 | 56 | * **Silence** `models.common.Silence`: 该模块输出=输入,即什么都不做。这个模块的目的是为了辅助可逆分支能够获得原图信息。 57 | 58 | * **CBS** `models.common.Conv`: Conv2d + BatchNorm2d + SiLU (默认激活函数) 59 | 60 | Note: BN层在推理阶段可以将其参数融合进卷积层。在yolov9的代码中可以关注 ’fuse‘关键词,一般与rep有关,例如`models.common.Conv.forward_fuse`。(ref: [RepVGG](https://openaccess.thecvf.com/content/CVPR2021/papers/Ding_RepVGG_Making_VGG-Style_ConvNets_Great_Again_CVPR_2021_paper.pdf)) 61 | 62 | * **ELAN** `models.common.RepNCSPELAN4`: 63 | 64 | 从模块名字不难看出核心是Re-parameter + CSPNet + ELAN。 65 | 66 | | train_structure | image-20240229151320013 | 67 | | :----------------------------------------------------------: | :----------------------------------------------------------: | 68 | | RepNCSPELAN4 Block | RepNCSPELAN4 (GELAN in paper) | 69 | | train_structure | ![train_structure](tutorials_img/BottleNeck.svg) | 70 | | RepNCSP Block | RepNBottleNeck | 71 | 72 | * **ELAN-SPP** `models.common.SPPELAN`: 73 | 74 | 该模块与早前yolo版本中的SPPF结构基本一致,如下图。 75 | 76 | 77 | 78 | * **ADown `models.common.ADown`:** 79 | 80 | 该模块在`yolov9-c.yaml`与`yolov9-e.yaml`结构中出现,替代了模型中部分`CBS`模块。 81 | 82 | 83 | 84 | 85 | -------------------------------------------------------------------------------- /tutorials_img/ADown.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 16 | 35 | 37 | 40 | 43 | 44 | 51 | 54 | 57 | 58 | 65 | 68 | 71 | 72 | 79 | 80 | 84 | Input, shape=[B,C,H,W] 100 | 104 | 110 | 114 | 120 | 124 | 130 | 134 | 140 | 144 | 148 | Avg 164 | pool2d 180 | 184 | 190 | CBS 206 | 210 | 214 | Concat 230 | (dim=1) 246 | 250 | 254 | 260 | Maxpool2d 276 | 280 | 284 | 288 | Chunk(2, 1) 304 | 308 | 312 | 316 | 320 | 324 | 330 | 334 | 338 | 342 | 346 | 350 | 354 | 360 | CBS 376 | 380 | 384 | 388 | 392 | 398 | 399 | 400 | -------------------------------------------------------------------------------- /tutorials_img/BottleNeck.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 16 | 35 | 37 | 40 | 43 | 44 | 51 | 54 | 57 | 58 | 65 | 66 | 70 | CV2 86 | 92 | 96 | 102 | CBS 118 | CV1 134 | 140 | 144 | 148 | 152 | Add 168 | 172 | 178 | Input, shape=[B,C,H,W] 194 | 198 | 204 | 208 | 214 | 218 | 224 | 228 | 234 | 238 | 242 | CB 258 | 262 | 268 | CB 284 | 288 | 292 | S ( 308 | SiLU 324 | ) 340 | 344 | 348 | 352 | 356 | 360 | 364 | 365 | 366 | -------------------------------------------------------------------------------- /tutorials_img/GELAN_in_paper.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/divided7/yolov9_structure_graph/68cc7dccd13dc8e4d380ad0e3d2b27a97fd66a91/tutorials_img/GELAN_in_paper.png -------------------------------------------------------------------------------- /tutorials_img/RepNCSP.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 16 | 35 | 37 | 40 | 43 | 44 | 51 | 54 | 57 | 58 | 65 | 72 | 73 | 77 | Input, shape=[B,C,H,W] 93 | 97 | 103 | 107 | 113 | 117 | 123 | 127 | 133 | 137 | 141 | CV1 157 | 161 | CV2 177 | CV3 193 | 197 | 201 | CBS 217 | 221 | 225 | RepNBottleNeck 241 | 245 | 249 | CBS 265 | 269 | 273 | Concat 289 | (dim=1) 305 | 309 | 313 | 317 | 321 | 325 | CBS 341 | 345 | 349 | 353 | 357 | 361 | CBS 377 | 381 | 385 | CBS 401 | 405 | 409 | RepNBottleNeck 425 | 429 | 433 | Concat 449 | (dim=1) 465 | 469 | 473 | CBS 489 | 493 | 497 | 501 | 505 | 511 | 515 | 519 | CV1 535 | 539 | CV2 555 | 559 | CV3 575 | 581 | × 597 | × 613 | n 629 | n 645 | 646 | 647 | -------------------------------------------------------------------------------- /tutorials_img/RepNCSPELAN4.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 16 | 35 | 37 | 40 | 43 | 44 | 51 | 52 | 56 | 60 | 64 | CBS 80 | Input, shape=[B,C,H,W] 96 | 100 | 104 | Chunk(2, 1) 120 | 124 | 128 | 134 | 138 | 144 | 148 | 154 | 158 | 164 | 168 | 172 | 176 | 180 | 184 | 188 | 192 | 196 | 200 | 204 | 208 | 212 | 216 | RepNCSP 232 | 236 | 240 | CBS 256 | 260 | 264 | 268 | RepNCSP 284 | 288 | 292 | CBS 308 | 312 | 316 | 322 | Concat 338 | (dim=1) 354 | 358 | 362 | 366 | 370 | 374 | 378 | 382 | 386 | 390 | 394 | 398 | 402 | 406 | 410 | 414 | 418 | 422 | CV1 438 | 442 | CV2 458 | 462 | CV3 478 | 482 | 486 | CBS 502 | 506 | 512 | CV4 528 | 534 | 535 | 536 | -------------------------------------------------------------------------------- /tutorials_img/SPPELAN.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 16 | 35 | 37 | 40 | 43 | 44 | 51 | 54 | 57 | 58 | 65 | 66 | 70 | Input, shape=[B,C,H,W] 86 | 90 | 96 | 100 | 106 | 110 | 116 | 120 | 126 | CV1 142 | 148 | 152 | 156 | Maxpool2d 172 | 176 | 182 | CBS 198 | 202 | 208 | Concat 224 | (dim=1) 240 | 244 | 248 | 252 | 256 | 260 | Maxpool2d 276 | 280 | 284 | Maxpool2d 300 | 304 | 308 | 312 | 316 | 320 | CV2 336 | 340 | CV3 356 | 360 | CV4 376 | 380 | 384 | 390 | CBS 406 | CV5 422 | 428 | 432 | 433 | 434 | -------------------------------------------------------------------------------- /tutorials_img/structure_in_paper.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/divided7/yolov9_structure_graph/68cc7dccd13dc8e4d380ad0e3d2b27a97fd66a91/tutorials_img/structure_in_paper.png --------------------------------------------------------------------------------