├── README.md
├── README.pdf
├── README_YOLOv4.md
├── cfg
    ├── wei_score
    │   └── yolov4-pacsp-x-mish.cfg
    ├── yolov4-pacsp-mish.cfg
    ├── yolov4-pacsp-s-mish.cfg
    ├── yolov4-pacsp-s.cfg
    ├── yolov4-pacsp-x-mish.cfg
    ├── yolov4-pacsp-x.cfg
    ├── yolov4-pacsp.cfg
    ├── yolov4-paspp.cfg
    └── yolov4-tiny.cfg
├── data
    ├── coco.data
    ├── coco.names
    ├── coco1.data
    ├── coco1.txt
    ├── coco16.data
    ├── coco16.txt
    ├── coco1cls.data
    ├── coco1cls.txt
    ├── coco2014.data
    ├── coco2017.data
    ├── coco64.data
    ├── coco64.txt
    ├── coco_paper.names
    ├── get_coco2014.sh
    ├── get_coco2017.sh
    ├── myData.data
    ├── myData.names
    └── myData
    │   └── score
    │       ├── images
    │           ├── train
    │           │   └── readme
    │           └── val
    │           │   └── readme
    │       └── labels
    │           ├── train
    │               └── readme
    │           └── val
    │               └── readme
├── detect.py
├── experiments.md
├── images
    └── scalingCSP.png
├── models.py
├── pic
    ├── p0.png
    ├── p1.png
    ├── p2.png
    ├── p3.png
    ├── p4.png
    ├── p5.png
    ├── test1.jpg
    └── test2.jpg
├── requirements.txt
├── results_yolov4-pacsp-x-mish.txt
├── runs
    └── readme
├── test.py
├── test_half.py
├── train.py
├── utils
    ├── __init__.py
    ├── __pycache__
    │   ├── __init__.cpython-35.pyc
    │   ├── datasets.cpython-35.pyc
    │   ├── google_utils.cpython-35.pyc
    │   ├── layers.cpython-35.pyc
    │   ├── parse_config.cpython-35.pyc
    │   ├── torch_utils.cpython-35.pyc
    │   └── utils.cpython-35.pyc
    ├── adabound.py
    ├── datasets.py
    ├── evolve.sh
    ├── gcp.sh
    ├── google_utils.py
    ├── layers.py
    ├── parse_config.py
    ├── torch_utils.py
    └── utils.py
└── weights
    └── put your weights file here.txt


/README.md:
--------------------------------------------------------------------------------
  1 | ## [Pytorch-YOLO v4](https://github.com/WongKinYiu/PyTorch_YOLOv4)训练自己的数据集
  2 | 
  3 | 该版本的复现者是YOLOv4的二作：**Chien-Yao Wang**，他也是CSPNet的一作。再值得说的是YOLOv4 和 YOLOv5都用到了CSPNet。 这个PyTorch版本的YOLOv4是基于 ultralytic的YOLOv3基础上实现的。ultralytic 复现的YOLOv3 应该最强的YOLOv3 PyTorch复现：https://github.com/ultralytics/yolov3。我们将使用该本本的YOLO v4训练自己的数据集，并提供详细的代码修改和训练，测试的整个过程。
  4 | 
  5 | ![](pic/p0.png)
  6 | 
  7 | ### 1.数据准备
  8 | 
  9 | 数据集的构建参考<https://github.com/ultralytics/yolov3/wiki/Train-Custom-Data>
 10 | 
 11 | 
 12 | **1. 将数据转化为darknet fromat.** 
 13 | 
 14 | 使用LabelImg或Labelbox标注后的数据后，需要将数据转化为darknet format. 其中images和labels需要放在同级的两个文件夹下，每一个image对应一个label标注文件（如果该图像没有标注，则没有标注文件对应），标注文件满足：
 15 | 
 16 | + 一个标注box对应一行
 17 | + 每行内容： class, x_center,y_center, width,height
 18 | + Box的坐标时标准化后的（0-1）
 19 | + class的index从0开始
 20 | 
 21 | 每一个image和label文件的存放满足如下的关系
 22 | 
 23 | ```
 24 | ../coco/images/train2017/000000109622.jpg  # image
 25 | ../coco/labels/train2017/000000109622.txt  # label
 26 | ```
 27 | 
 28 | 这是一个label文件的例子，包含5个person(class=0)的类别:
 29 | 
 30 | ![](pic/p2.png)
 31 | 
 32 | **2. 创建 train 和 test \*.txt 文件.**
 33 | 
 34 | 存放了train和test的图像的路径，例如：
 35 | 
 36 | ![](pic/p3.png)
 37 | 
 38 | **3. 创建新的 \*.names 文件**
 39 | 
 40 | 存放了类别名称，例如新建`myData.names`(3个类别)
 41 | 
 42 | ```
 43 | class_1
 44 | class_2
 45 | class_3
 46 | ```
 47 | 
 48 | **4. 创建 新的 \*.data 文件** 
 49 | 
 50 | 新建`myData.data`
 51 | 
 52 | ```
 53 | classes=3
 54 | train=data/myData/myData_train.txt
 55 | valid=data/myData/myData_val.txt
 56 | names=data/myData.names
 57 | ```
 58 | 
 59 | 
 60 | ### 2.环境安装
 61 | 
 62 | 需要的安装环境
 63 | 
 64 | ```
 65 | numpy == 1.17
 66 | opencv-python >= 4.1
 67 | torch==1.3.0
 68 | torchvision==0.4.1
 69 | matplotlib
 70 | pycocotools
 71 | tqdm
 72 | pillow
 73 | tensorboard >= 1.14
 74 | ```
 75 | ※ 运行Mish model需要安装 https://github.com/thomasbrandon/mish-cuda
 76 | 
 77 | ```
 78 | sudo pip3 install git+https://github.com/thomasbrandon/mish-cuda.git
 79 | ```
 80 | 
 81 | 
 82 | 
 83 | 
 84 | ### 3.模型配置文件修改
 85 | 
 86 | 配置文件的修改个darknet版本的YOLO v3和YOLO v4是相同的，可以参考其进行修改，主要包括了一些超参数和网络的参数。
 87 | 
 88 | ![](pic/p4.png)
 89 | 
 90 | ### 4.预训练模型下载
 91 | 
 92 | 预训练模型的下载：
 93 | 
 94 | baidu链接：https://pan.baidu.com/s/1nyQlH-GHrmddCEkuv-VmAg
 95 | 提取码：78bg
 96 | 
 97 | 
 98 | ### 5.模型训练
 99 | 
100 | ```
101 | python3 train.py --data data/myData.data --cfg cfg/wei_score/yolov4-pacsp-x-mish.cfg --weights './weights/yolov4-pacsp-x-mish.pt' --name yolov4-pacsp-x-mish --img 640 640 640
102 | ```
103 | 
104 | 
105 | 
106 | ### 6.模型推断
107 | 
108 | **1.在验证集上的性能测试**
109 | 
110 | ```shell
111 | python3 test_half.py --data data/myData.data\
112 | 	--cfg cfg/wei_score/yolov4-pacsp-x-mish.cfg\
113 |     --weights weights/best_yolov4-pacsp-x-mish.pt\
114 |     --img 640\
115 |     --iou-thr 0.6\
116 |     --conf-thres 0.5\
117 |     --batch-size 1
118 | ```
119 | 
120 | ```shell
121 | python3 test.py --data data/myData.data\
122 | 	--cfg cfg/wei_score/yolov4-pacsp-x-mish.cfg\
123 | 	--weights weights/best_yolov4-pacsp-x-mish.pt\
124 | 	--img 640\
125 | 	--iou-thr 0.6\
126 | 	--conf-thres 0.5\
127 | 	--batch-size 1
128 | ```
129 | 
130 | ```shell
131 | Model Summary: 408 layers, 9.92329e+07 parameters, 9.92329e+07 gradients
132 | Fusing layers...
133 | Model Summary: 274 layers, 9.91849e+07 parameters, 9.91849e+07 gradients
134 | Caching labels (285 found, 0 missing, 0 empty, 0 duplicate, for 285 images): 100%|███████████████████████████████████████| 285/285 [00:00<00:00, 8858.32it/s]
135 |                Class    Images   Targets         P         R   mAP@0.5        F1: 100%|████████████████████████████████████| 285/285 [00:17<00:00, 16.44it/s]
136 |                  all       285       645     0.847      0.66     0.623      0.74
137 |                   QP       285       175     0.856     0.611     0.586     0.713
138 |                   NY       285       289     0.894     0.671     0.647     0.767
139 |                   QG       285       181     0.792     0.696     0.638     0.741
140 | Speed: 23.4/1.1/24.5 ms inference/NMS/total per 640x640 image at batch-size 1
141 | 
142 | ```
143 | 
144 | **2.单张图片或视频的推断**
145 | 
146 | ```shell
147 | python3 detect.py --cfg cfg/wei_score/yolov4-pacsp-x-mish.cfg\
148 | 	--names data/myData.names\
149 | 	--weights weights/best_yolov4-pacsp-x-mish.pt\
150 | 	--source data/myData/score/images/val\  
151 | 	--img-size 640\
152 | 	--conf-thres 0.3\
153 | 	--iou-thres  0.2\
154 | 	--device 0 
155 | 
156 | ```
157 | 
158 | 
159 | 
160 | 
161 | 
162 | ```shell
163 | tensorboard --logdir=runs
164 | ```
165 | 
166 | ![](pic/p5.png)
167 | 
168 | ### 7.DEMO展示
169 | 
170 | ![](pic/test2.jpg)
171 | 
172 | ![](pic/test1.jpg)
173 | 
174 | ### 8.TensorRT加速推断
175 | 
176 | **TODO**
177 | 
178 | 


--------------------------------------------------------------------------------
/README.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/README.pdf


--------------------------------------------------------------------------------
/README_YOLOv4.md:
--------------------------------------------------------------------------------
 1 | # YOLOv4
 2 | 
 3 | This is PyTorch implementation of [YOLOv4](https://github.com/AlexeyAB/darknet) which is based on [ultralytics/yolov3](https://github.com/ultralytics/yolov3).
 4 | 
 5 | * [[original Darknet implementation of YOLOv4]](https://github.com/AlexeyAB/darknet)
 6 | 
 7 | * [[ultralytics/yolov5 based PyTorch implementation of YOLOv4]](https://github.com/WongKinYiu/PyTorch_YOLOv4/tree/u5_preview).
 8 | 
 9 | ### development log
10 | 
11 | <details><summary> <b>Expand</b> </summary>
12 |   
13 | * `2020-07-23` - support CUDA accelerated Mish activation function.
14 | * `2020-07-19` - support and training tiny YOLOv4. [`yolov4-tiny`]()
15 | * `2020-07-15` - design and training conditional YOLOv4. [`yolov4-pacsp-conditional`]()
16 | * `2020-07-13` - support MixUp data augmentation.
17 | * `2020-07-03` - design new stem layers.
18 | * `2020-06-16` - support floating16 of GPU inference.
19 | * `2020-06-14` - convert .pt to .weights for darknet fine-tuning.
20 | * `2020-06-13` - update multi-scale training strategy.
21 | * `2020-06-12` - design scaled YOLOv4 follow [ultralytics](https://github.com/ultralytics/yolov5). [`yolov4-pacsp-s`]() [`yolov4-pacsp-m`]() [`yolov4-pacsp-l`]() [`yolov4-pacsp-x`]()
22 | * `2020-06-07` - design [scaling methods](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/images/scalingCSP.png) for CSP-based models. [`yolov4-pacsp-25`]() [`yolov4-pacsp-75`]()
23 | * `2020-06-03` - update COCO2014 to COCO2017.
24 | * `2020-05-30` - update FPN neck to CSPFPN. [`yolov4-yocsp`]() [`yolov4-yocsp-mish`]()
25 | * `2020-05-24` - update neck of YOLOv4 to CSPPAN. [`yolov4-pacsp`]() [`yolov4-pacsp-mish`]()
26 | * `2020-05-15` - training YOLOv4 with Mish activation function. [`yolov4-yospp-mish`]() [`yolov4-paspp-mish`]()
27 | * `2020-05-08` - design and training YOLOv4 with FPN neck. [`yolov4-yospp`]()
28 | * `2020-05-01` - training YOLOv4 with Leaky activation function using PyTorch. [`yolov4-paspp`]()
29 | 
30 | </details>
31 | 
32 | ## Pretrained Models & Comparison
33 | 
34 | | Model | Test Size | AP<sup>val</sup> | AP<sub>50</sub><sup>val</sup> | AP<sub>75</sub><sup>val</sup> | AP<sub>S</sub><sup>val</sup> | AP<sub>M</sub><sup>val</sup> | AP<sub>L</sub><sup>val</sup> | cfg | weights |
35 | | :-- | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | 
36 | | **YOLOv4**<sub>paspp</sub> | 736 | 45.7% | 64.2% | 50.3% | 27.4% | 51.3% | 58.6% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-paspp.cfg) | [weights](https://drive.google.com/file/d/1FraA4vmlBh5RoQB7ZGVc01UyCgxSlbpO/view?usp=sharing) |
37 | | **YOLOv4**<sub>pacsp-s</sub> | 736 | 36.0% | 54.2% | 39.4% | 18.7% | 41.2% | 48.0% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-s.cfg) | [weights](https://drive.google.com/file/d/1saE6CEvNDPA_Xv34RdxYT4BbCtozuTta/view?usp=sharing) |
38 | | **YOLOv4**<sub>pacsp</sub> | 736 | 46.4% | 64.8% | 51.0% | 28.5% | 51.9% | 59.5% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp.cfg) | [weights](https://drive.google.com/file/d/1SPCjPnMgA8jlfIGsAnFsMPdJU8dJeo7E/view?usp=sharing) |
39 | | **YOLOv4**<sub>pacsp-x</sub> | 736 | **47.6%** | **66.1%** | **52.2%** | **29.9%** | **53.3%** | **61.5%** | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-x.cfg) | [weights](https://drive.google.com/file/d/1MtwO5tvXvvyloc12-wZ2lMBzGKd9hsof/view?usp=sharing) |
40 | |  |  |  |  |  |  |  |
41 | | **YOLOv4**<sub>pacsp-s-mish</sub> | 736 | 37.4% | 56.3% | 40.0% | 20.9% | 43.0% | 49.3% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-s-mish.cfg) | [weights](https://drive.google.com/file/d/1Gmy2Q6af1DQ5CAb6415cVFkIgtOIt9xs/view?usp=sharing) |
42 | | **YOLOv4**<sub>pacsp-mish</sub> | 736 | 46.5% | 65.7% | 50.2% | 30.0% | 52.0% | 59.4% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-mish.cfg) | [weights](https://drive.google.com/file/d/10pw28weUtOceEexRQQrdpOjxBb79sk3u/view?usp=sharing) |
43 | | **YOLOv4**<sub>pacsp-x-mish</sub> | 736 | **48.5%** | **67.4%** | **52.7%** | **30.9%** | **54.0%** | **62.0%** | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-x-mish.cfg) | [weights](https://drive.google.com/file/d/1GsLaQLfl54Qt2C07mya00S0_FTpcXBdy/view?usp=sharing) |
44 | |  |  |  |  |  |  |  |
45 | | **YOLOv4**<sub>tiny</sub> | 416 | **22.5%** | **39.3%** | **22.5%** | **7.4%** | **26.3%** | **34.8%** | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-tiny.cfg) | [weights](https://drive.google.com/file/d/1aQKcCvTAl1uOWzzHVE9Z8Ixgikc3AuYQ/view?usp=sharing) |
46 | |  |  |  |  |  |  |  |
47 | 
48 | ## Requirements
49 | 
50 | ```
51 | pip install -r requirements.txt
52 | ```
53 | ※ For running Mish models, please install https://github.com/thomasbrandon/mish-cuda
54 | 
55 | ## Training
56 | 
57 | ```
58 | python train.py --data coco2017.data --cfg yolov4-pacsp.cfg --weights '' --name yolov4-pacsp --img 640 640 640
59 | ```
60 | 
61 | ## Testing
62 | 
63 | ```
64 | python test_half.py --data coco2017.data --cfg yolov4-pacsp.cfg --weights yolov4-pacsp.pt --img 736 --iou-thr 0.7 --batch-size 8
65 | ```
66 | 
67 | ## Citation
68 | 
69 | ```
70 | @article{bochkovskiy2020yolov4,
71 |   title={{YOLOv4}: Optimal Speed and Accuracy of Object Detection},
72 |   author={Bochkovskiy, Alexey and Wang, Chien-Yao and Liao, Hong-Yuan Mark},
73 |   journal={arXiv preprint arXiv:2004.10934},
74 |   year={2020}
75 | }
76 | ```
77 | 
78 | ```
79 | @inproceedings{wang2020cspnet,
80 |   title={{CSPNet}: A New Backbone That Can Enhance Learning Capability of {CNN}},
81 |   author={Wang, Chien-Yao and Mark Liao, Hong-Yuan and Wu, Yueh-Hua and Chen, Ping-Yang and Hsieh, Jun-Wei and Yeh, I-Hau},
82 |   booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
83 |   pages={390--391},
84 |   year={2020}
85 | }
86 | ```
87 | 
88 | ## Acknowledgements
89 | 
90 | * [https://github.com/AlexeyAB/darknet](https://github.com/AlexeyAB/darknet)
91 | * [https://github.com/ultralytics/yolov3](https://github.com/ultralytics/yolov3)
92 | * [https://github.com/ultralytics/yolov5](https://github.com/ultralytics/yolov5)
93 | 


--------------------------------------------------------------------------------
/cfg/yolov4-pacsp-mish.cfg:
--------------------------------------------------------------------------------
   1 | [net]
   2 | # Testing
   3 | #batch=1
   4 | #subdivisions=1
   5 | # Training
   6 | batch=64
   7 | subdivisions=8
   8 | width=640
   9 | height=640
  10 | channels=3
  11 | momentum=0.949
  12 | decay=0.0005
  13 | angle=0
  14 | saturation = 1.5
  15 | exposure = 1.5
  16 | hue=.1
  17 | 
  18 | learning_rate=0.00261
  19 | burn_in=1000
  20 | max_batches = 500500
  21 | policy=steps
  22 | steps=400000,450000
  23 | scales=.1,.1
  24 | 
  25 | #cutmix=1
  26 | mosaic=1
  27 | 
  28 | [convolutional]
  29 | batch_normalize=1
  30 | filters=32
  31 | size=3
  32 | stride=1
  33 | pad=1
  34 | activation=mish
  35 | 
  36 | # Downsample
  37 | 
  38 | [convolutional]
  39 | batch_normalize=1
  40 | filters=64
  41 | size=3
  42 | stride=2
  43 | pad=1
  44 | activation=mish
  45 | 
  46 | #[convolutional]
  47 | #batch_normalize=1
  48 | #filters=64
  49 | #size=1
  50 | #stride=1
  51 | #pad=1
  52 | #activation=mish
  53 | 
  54 | #[route]
  55 | #layers = -2
  56 | 
  57 | #[convolutional]
  58 | #batch_normalize=1
  59 | #filters=64
  60 | #size=1
  61 | #stride=1
  62 | #pad=1
  63 | #activation=mish
  64 | 
  65 | [convolutional]
  66 | batch_normalize=1
  67 | filters=32
  68 | size=1
  69 | stride=1
  70 | pad=1
  71 | activation=mish
  72 | 
  73 | [convolutional]
  74 | batch_normalize=1
  75 | filters=64
  76 | size=3
  77 | stride=1
  78 | pad=1
  79 | activation=mish
  80 | 
  81 | [shortcut]
  82 | from=-3
  83 | activation=linear
  84 | 
  85 | #[convolutional]
  86 | #batch_normalize=1
  87 | #filters=64
  88 | #size=1
  89 | #stride=1
  90 | #pad=1
  91 | #activation=mish
  92 | 
  93 | #[route]
  94 | #layers = -1,-7
  95 | 
  96 | #[convolutional]
  97 | #batch_normalize=1
  98 | #filters=64
  99 | #size=1
 100 | #stride=1
 101 | #pad=1
 102 | #activation=mish
 103 | 
 104 | # Downsample
 105 | 
 106 | [convolutional]
 107 | batch_normalize=1
 108 | filters=128
 109 | size=3
 110 | stride=2
 111 | pad=1
 112 | activation=mish
 113 | 
 114 | [convolutional]
 115 | batch_normalize=1
 116 | filters=64
 117 | size=1
 118 | stride=1
 119 | pad=1
 120 | activation=mish
 121 | 
 122 | [route]
 123 | layers = -2
 124 | 
 125 | [convolutional]
 126 | batch_normalize=1
 127 | filters=64
 128 | size=1
 129 | stride=1
 130 | pad=1
 131 | activation=mish
 132 | 
 133 | [convolutional]
 134 | batch_normalize=1
 135 | filters=64
 136 | size=1
 137 | stride=1
 138 | pad=1
 139 | activation=mish
 140 | 
 141 | [convolutional]
 142 | batch_normalize=1
 143 | filters=64
 144 | size=3
 145 | stride=1
 146 | pad=1
 147 | activation=mish
 148 | 
 149 | [shortcut]
 150 | from=-3
 151 | activation=linear
 152 | 
 153 | [convolutional]
 154 | batch_normalize=1
 155 | filters=64
 156 | size=1
 157 | stride=1
 158 | pad=1
 159 | activation=mish
 160 | 
 161 | [convolutional]
 162 | batch_normalize=1
 163 | filters=64
 164 | size=3
 165 | stride=1
 166 | pad=1
 167 | activation=mish
 168 | 
 169 | [shortcut]
 170 | from=-3
 171 | activation=linear
 172 | 
 173 | [convolutional]
 174 | batch_normalize=1
 175 | filters=64
 176 | size=1
 177 | stride=1
 178 | pad=1
 179 | activation=mish
 180 | 
 181 | [route]
 182 | layers = -1,-10
 183 | 
 184 | [convolutional]
 185 | batch_normalize=1
 186 | filters=128
 187 | size=1
 188 | stride=1
 189 | pad=1
 190 | activation=mish
 191 | 
 192 | # Downsample
 193 | 
 194 | [convolutional]
 195 | batch_normalize=1
 196 | filters=256
 197 | size=3
 198 | stride=2
 199 | pad=1
 200 | activation=mish
 201 | 
 202 | [convolutional]
 203 | batch_normalize=1
 204 | filters=128
 205 | size=1
 206 | stride=1
 207 | pad=1
 208 | activation=mish
 209 | 
 210 | [route]
 211 | layers = -2
 212 | 
 213 | [convolutional]
 214 | batch_normalize=1
 215 | filters=128
 216 | size=1
 217 | stride=1
 218 | pad=1
 219 | activation=mish
 220 | 
 221 | [convolutional]
 222 | batch_normalize=1
 223 | filters=128
 224 | size=1
 225 | stride=1
 226 | pad=1
 227 | activation=mish
 228 | 
 229 | [convolutional]
 230 | batch_normalize=1
 231 | filters=128
 232 | size=3
 233 | stride=1
 234 | pad=1
 235 | activation=mish
 236 | 
 237 | [shortcut]
 238 | from=-3
 239 | activation=linear
 240 | 
 241 | [convolutional]
 242 | batch_normalize=1
 243 | filters=128
 244 | size=1
 245 | stride=1
 246 | pad=1
 247 | activation=mish
 248 | 
 249 | [convolutional]
 250 | batch_normalize=1
 251 | filters=128
 252 | size=3
 253 | stride=1
 254 | pad=1
 255 | activation=mish
 256 | 
 257 | [shortcut]
 258 | from=-3
 259 | activation=linear
 260 | 
 261 | [convolutional]
 262 | batch_normalize=1
 263 | filters=128
 264 | size=1
 265 | stride=1
 266 | pad=1
 267 | activation=mish
 268 | 
 269 | [convolutional]
 270 | batch_normalize=1
 271 | filters=128
 272 | size=3
 273 | stride=1
 274 | pad=1
 275 | activation=mish
 276 | 
 277 | [shortcut]
 278 | from=-3
 279 | activation=linear
 280 | 
 281 | [convolutional]
 282 | batch_normalize=1
 283 | filters=128
 284 | size=1
 285 | stride=1
 286 | pad=1
 287 | activation=mish
 288 | 
 289 | [convolutional]
 290 | batch_normalize=1
 291 | filters=128
 292 | size=3
 293 | stride=1
 294 | pad=1
 295 | activation=mish
 296 | 
 297 | [shortcut]
 298 | from=-3
 299 | activation=linear
 300 | 
 301 | 
 302 | [convolutional]
 303 | batch_normalize=1
 304 | filters=128
 305 | size=1
 306 | stride=1
 307 | pad=1
 308 | activation=mish
 309 | 
 310 | [convolutional]
 311 | batch_normalize=1
 312 | filters=128
 313 | size=3
 314 | stride=1
 315 | pad=1
 316 | activation=mish
 317 | 
 318 | [shortcut]
 319 | from=-3
 320 | activation=linear
 321 | 
 322 | [convolutional]
 323 | batch_normalize=1
 324 | filters=128
 325 | size=1
 326 | stride=1
 327 | pad=1
 328 | activation=mish
 329 | 
 330 | [convolutional]
 331 | batch_normalize=1
 332 | filters=128
 333 | size=3
 334 | stride=1
 335 | pad=1
 336 | activation=mish
 337 | 
 338 | [shortcut]
 339 | from=-3
 340 | activation=linear
 341 | 
 342 | [convolutional]
 343 | batch_normalize=1
 344 | filters=128
 345 | size=1
 346 | stride=1
 347 | pad=1
 348 | activation=mish
 349 | 
 350 | [convolutional]
 351 | batch_normalize=1
 352 | filters=128
 353 | size=3
 354 | stride=1
 355 | pad=1
 356 | activation=mish
 357 | 
 358 | [shortcut]
 359 | from=-3
 360 | activation=linear
 361 | 
 362 | [convolutional]
 363 | batch_normalize=1
 364 | filters=128
 365 | size=1
 366 | stride=1
 367 | pad=1
 368 | activation=mish
 369 | 
 370 | [convolutional]
 371 | batch_normalize=1
 372 | filters=128
 373 | size=3
 374 | stride=1
 375 | pad=1
 376 | activation=mish
 377 | 
 378 | [shortcut]
 379 | from=-3
 380 | activation=linear
 381 | 
 382 | [convolutional]
 383 | batch_normalize=1
 384 | filters=128
 385 | size=1
 386 | stride=1
 387 | pad=1
 388 | activation=mish
 389 | 
 390 | [route]
 391 | layers = -1,-28
 392 | 
 393 | [convolutional]
 394 | batch_normalize=1
 395 | filters=256
 396 | size=1
 397 | stride=1
 398 | pad=1
 399 | activation=mish
 400 | 
 401 | # Downsample
 402 | 
 403 | [convolutional]
 404 | batch_normalize=1
 405 | filters=512
 406 | size=3
 407 | stride=2
 408 | pad=1
 409 | activation=mish
 410 | 
 411 | [convolutional]
 412 | batch_normalize=1
 413 | filters=256
 414 | size=1
 415 | stride=1
 416 | pad=1
 417 | activation=mish
 418 | 
 419 | [route]
 420 | layers = -2
 421 | 
 422 | [convolutional]
 423 | batch_normalize=1
 424 | filters=256
 425 | size=1
 426 | stride=1
 427 | pad=1
 428 | activation=mish
 429 | 
 430 | [convolutional]
 431 | batch_normalize=1
 432 | filters=256
 433 | size=1
 434 | stride=1
 435 | pad=1
 436 | activation=mish
 437 | 
 438 | [convolutional]
 439 | batch_normalize=1
 440 | filters=256
 441 | size=3
 442 | stride=1
 443 | pad=1
 444 | activation=mish
 445 | 
 446 | [shortcut]
 447 | from=-3
 448 | activation=linear
 449 | 
 450 | 
 451 | [convolutional]
 452 | batch_normalize=1
 453 | filters=256
 454 | size=1
 455 | stride=1
 456 | pad=1
 457 | activation=mish
 458 | 
 459 | [convolutional]
 460 | batch_normalize=1
 461 | filters=256
 462 | size=3
 463 | stride=1
 464 | pad=1
 465 | activation=mish
 466 | 
 467 | [shortcut]
 468 | from=-3
 469 | activation=linear
 470 | 
 471 | 
 472 | [convolutional]
 473 | batch_normalize=1
 474 | filters=256
 475 | size=1
 476 | stride=1
 477 | pad=1
 478 | activation=mish
 479 | 
 480 | [convolutional]
 481 | batch_normalize=1
 482 | filters=256
 483 | size=3
 484 | stride=1
 485 | pad=1
 486 | activation=mish
 487 | 
 488 | [shortcut]
 489 | from=-3
 490 | activation=linear
 491 | 
 492 | 
 493 | [convolutional]
 494 | batch_normalize=1
 495 | filters=256
 496 | size=1
 497 | stride=1
 498 | pad=1
 499 | activation=mish
 500 | 
 501 | [convolutional]
 502 | batch_normalize=1
 503 | filters=256
 504 | size=3
 505 | stride=1
 506 | pad=1
 507 | activation=mish
 508 | 
 509 | [shortcut]
 510 | from=-3
 511 | activation=linear
 512 | 
 513 | 
 514 | [convolutional]
 515 | batch_normalize=1
 516 | filters=256
 517 | size=1
 518 | stride=1
 519 | pad=1
 520 | activation=mish
 521 | 
 522 | [convolutional]
 523 | batch_normalize=1
 524 | filters=256
 525 | size=3
 526 | stride=1
 527 | pad=1
 528 | activation=mish
 529 | 
 530 | [shortcut]
 531 | from=-3
 532 | activation=linear
 533 | 
 534 | 
 535 | [convolutional]
 536 | batch_normalize=1
 537 | filters=256
 538 | size=1
 539 | stride=1
 540 | pad=1
 541 | activation=mish
 542 | 
 543 | [convolutional]
 544 | batch_normalize=1
 545 | filters=256
 546 | size=3
 547 | stride=1
 548 | pad=1
 549 | activation=mish
 550 | 
 551 | [shortcut]
 552 | from=-3
 553 | activation=linear
 554 | 
 555 | 
 556 | [convolutional]
 557 | batch_normalize=1
 558 | filters=256
 559 | size=1
 560 | stride=1
 561 | pad=1
 562 | activation=mish
 563 | 
 564 | [convolutional]
 565 | batch_normalize=1
 566 | filters=256
 567 | size=3
 568 | stride=1
 569 | pad=1
 570 | activation=mish
 571 | 
 572 | [shortcut]
 573 | from=-3
 574 | activation=linear
 575 | 
 576 | [convolutional]
 577 | batch_normalize=1
 578 | filters=256
 579 | size=1
 580 | stride=1
 581 | pad=1
 582 | activation=mish
 583 | 
 584 | [convolutional]
 585 | batch_normalize=1
 586 | filters=256
 587 | size=3
 588 | stride=1
 589 | pad=1
 590 | activation=mish
 591 | 
 592 | [shortcut]
 593 | from=-3
 594 | activation=linear
 595 | 
 596 | [convolutional]
 597 | batch_normalize=1
 598 | filters=256
 599 | size=1
 600 | stride=1
 601 | pad=1
 602 | activation=mish
 603 | 
 604 | [route]
 605 | layers = -1,-28
 606 | 
 607 | [convolutional]
 608 | batch_normalize=1
 609 | filters=512
 610 | size=1
 611 | stride=1
 612 | pad=1
 613 | activation=mish
 614 | 
 615 | # Downsample
 616 | 
 617 | [convolutional]
 618 | batch_normalize=1
 619 | filters=1024
 620 | size=3
 621 | stride=2
 622 | pad=1
 623 | activation=mish
 624 | 
 625 | [convolutional]
 626 | batch_normalize=1
 627 | filters=512
 628 | size=1
 629 | stride=1
 630 | pad=1
 631 | activation=mish
 632 | 
 633 | [route]
 634 | layers = -2
 635 | 
 636 | [convolutional]
 637 | batch_normalize=1
 638 | filters=512
 639 | size=1
 640 | stride=1
 641 | pad=1
 642 | activation=mish
 643 | 
 644 | [convolutional]
 645 | batch_normalize=1
 646 | filters=512
 647 | size=1
 648 | stride=1
 649 | pad=1
 650 | activation=mish
 651 | 
 652 | [convolutional]
 653 | batch_normalize=1
 654 | filters=512
 655 | size=3
 656 | stride=1
 657 | pad=1
 658 | activation=mish
 659 | 
 660 | [shortcut]
 661 | from=-3
 662 | activation=linear
 663 | 
 664 | [convolutional]
 665 | batch_normalize=1
 666 | filters=512
 667 | size=1
 668 | stride=1
 669 | pad=1
 670 | activation=mish
 671 | 
 672 | [convolutional]
 673 | batch_normalize=1
 674 | filters=512
 675 | size=3
 676 | stride=1
 677 | pad=1
 678 | activation=mish
 679 | 
 680 | [shortcut]
 681 | from=-3
 682 | activation=linear
 683 | 
 684 | [convolutional]
 685 | batch_normalize=1
 686 | filters=512
 687 | size=1
 688 | stride=1
 689 | pad=1
 690 | activation=mish
 691 | 
 692 | [convolutional]
 693 | batch_normalize=1
 694 | filters=512
 695 | size=3
 696 | stride=1
 697 | pad=1
 698 | activation=mish
 699 | 
 700 | [shortcut]
 701 | from=-3
 702 | activation=linear
 703 | 
 704 | [convolutional]
 705 | batch_normalize=1
 706 | filters=512
 707 | size=1
 708 | stride=1
 709 | pad=1
 710 | activation=mish
 711 | 
 712 | [convolutional]
 713 | batch_normalize=1
 714 | filters=512
 715 | size=3
 716 | stride=1
 717 | pad=1
 718 | activation=mish
 719 | 
 720 | [shortcut]
 721 | from=-3
 722 | activation=linear
 723 | 
 724 | [convolutional]
 725 | batch_normalize=1
 726 | filters=512
 727 | size=1
 728 | stride=1
 729 | pad=1
 730 | activation=mish
 731 | 
 732 | [route]
 733 | layers = -1,-16
 734 | 
 735 | [convolutional]
 736 | batch_normalize=1
 737 | filters=1024
 738 | size=1
 739 | stride=1
 740 | pad=1
 741 | activation=mish
 742 | 
 743 | ##########################
 744 | 
 745 | [convolutional]
 746 | batch_normalize=1
 747 | filters=512
 748 | size=1
 749 | stride=1
 750 | pad=1
 751 | activation=mish
 752 | 
 753 | [route]
 754 | layers = -2
 755 | 
 756 | [convolutional]
 757 | batch_normalize=1
 758 | filters=512
 759 | size=1
 760 | stride=1
 761 | pad=1
 762 | activation=mish
 763 | 
 764 | [convolutional]
 765 | batch_normalize=1
 766 | size=3
 767 | stride=1
 768 | pad=1
 769 | filters=512
 770 | activation=mish
 771 | 
 772 | [convolutional]
 773 | batch_normalize=1
 774 | filters=512
 775 | size=1
 776 | stride=1
 777 | pad=1
 778 | activation=mish
 779 | 
 780 | ### SPP ###
 781 | [maxpool]
 782 | stride=1
 783 | size=5
 784 | 
 785 | [route]
 786 | layers=-2
 787 | 
 788 | [maxpool]
 789 | stride=1
 790 | size=9
 791 | 
 792 | [route]
 793 | layers=-4
 794 | 
 795 | [maxpool]
 796 | stride=1
 797 | size=13
 798 | 
 799 | [route]
 800 | layers=-1,-3,-5,-6
 801 | ### End SPP ###
 802 | 
 803 | [convolutional]
 804 | batch_normalize=1
 805 | filters=512
 806 | size=1
 807 | stride=1
 808 | pad=1
 809 | activation=mish
 810 | 
 811 | [convolutional]
 812 | batch_normalize=1
 813 | size=3
 814 | stride=1
 815 | pad=1
 816 | filters=512
 817 | activation=mish
 818 | 
 819 | [route]
 820 | layers = -1, -13
 821 | 
 822 | [convolutional]
 823 | batch_normalize=1
 824 | filters=512
 825 | size=1
 826 | stride=1
 827 | pad=1
 828 | activation=mish
 829 | 
 830 | [convolutional]
 831 | batch_normalize=1
 832 | filters=256
 833 | size=1
 834 | stride=1
 835 | pad=1
 836 | activation=mish
 837 | 
 838 | [upsample]
 839 | stride=2
 840 | 
 841 | [route]
 842 | layers = 79
 843 | 
 844 | [convolutional]
 845 | batch_normalize=1
 846 | filters=256
 847 | size=1
 848 | stride=1
 849 | pad=1
 850 | activation=mish
 851 | 
 852 | [route]
 853 | layers = -1, -3
 854 | 
 855 | [convolutional]
 856 | batch_normalize=1
 857 | filters=256
 858 | size=1
 859 | stride=1
 860 | pad=1
 861 | activation=mish
 862 | 
 863 | [convolutional]
 864 | batch_normalize=1
 865 | filters=256
 866 | size=1
 867 | stride=1
 868 | pad=1
 869 | activation=mish
 870 | 
 871 | [route]
 872 | layers = -2
 873 | 
 874 | [convolutional]
 875 | batch_normalize=1
 876 | filters=256
 877 | size=1
 878 | stride=1
 879 | pad=1
 880 | activation=mish
 881 | 
 882 | [convolutional]
 883 | batch_normalize=1
 884 | size=3
 885 | stride=1
 886 | pad=1
 887 | filters=256
 888 | activation=mish
 889 | 
 890 | [convolutional]
 891 | batch_normalize=1
 892 | filters=256
 893 | size=1
 894 | stride=1
 895 | pad=1
 896 | activation=mish
 897 | 
 898 | [convolutional]
 899 | batch_normalize=1
 900 | size=3
 901 | stride=1
 902 | pad=1
 903 | filters=256
 904 | activation=mish
 905 | 
 906 | [route]
 907 | layers = -1, -6
 908 | 
 909 | [convolutional]
 910 | batch_normalize=1
 911 | filters=256
 912 | size=1
 913 | stride=1
 914 | pad=1
 915 | activation=mish
 916 | 
 917 | [convolutional]
 918 | batch_normalize=1
 919 | filters=128
 920 | size=1
 921 | stride=1
 922 | pad=1
 923 | activation=mish
 924 | 
 925 | [upsample]
 926 | stride=2
 927 | 
 928 | [route]
 929 | layers = 48
 930 | 
 931 | [convolutional]
 932 | batch_normalize=1
 933 | filters=128
 934 | size=1
 935 | stride=1
 936 | pad=1
 937 | activation=mish
 938 | 
 939 | [route]
 940 | layers = -1, -3
 941 | 
 942 | [convolutional]
 943 | batch_normalize=1
 944 | filters=128
 945 | size=1
 946 | stride=1
 947 | pad=1
 948 | activation=mish
 949 | 
 950 | [convolutional]
 951 | batch_normalize=1
 952 | filters=128
 953 | size=1
 954 | stride=1
 955 | pad=1
 956 | activation=mish
 957 | 
 958 | [route]
 959 | layers = -2
 960 | 
 961 | [convolutional]
 962 | batch_normalize=1
 963 | filters=128
 964 | size=1
 965 | stride=1
 966 | pad=1
 967 | activation=mish
 968 | 
 969 | [convolutional]
 970 | batch_normalize=1
 971 | size=3
 972 | stride=1
 973 | pad=1
 974 | filters=128
 975 | activation=mish
 976 | 
 977 | [convolutional]
 978 | batch_normalize=1
 979 | filters=128
 980 | size=1
 981 | stride=1
 982 | pad=1
 983 | activation=mish
 984 | 
 985 | [convolutional]
 986 | batch_normalize=1
 987 | size=3
 988 | stride=1
 989 | pad=1
 990 | filters=128
 991 | activation=mish
 992 | 
 993 | [route]
 994 | layers = -1, -6
 995 | 
 996 | [convolutional]
 997 | batch_normalize=1
 998 | filters=128
 999 | size=1
1000 | stride=1
1001 | pad=1
1002 | activation=mish
1003 | 
1004 | ##########################
1005 | 
1006 | [convolutional]
1007 | batch_normalize=1
1008 | size=3
1009 | stride=1
1010 | pad=1
1011 | filters=256
1012 | activation=mish
1013 | 
1014 | [convolutional]
1015 | size=1
1016 | stride=1
1017 | pad=1
1018 | filters=255
1019 | activation=linear
1020 | 
1021 | 
1022 | [yolo]
1023 | mask = 0,1,2
1024 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1025 | classes=80
1026 | num=9
1027 | jitter=.3
1028 | ignore_thresh = .7
1029 | truth_thresh = 1
1030 | random=1
1031 | scale_x_y = 1.05
1032 | iou_thresh=0.213
1033 | cls_normalizer=1.0
1034 | iou_normalizer=0.07
1035 | iou_loss=ciou
1036 | nms_kind=greedynms
1037 | beta_nms=0.6
1038 | 
1039 | [route]
1040 | layers = -4
1041 | 
1042 | [convolutional]
1043 | batch_normalize=1
1044 | size=3
1045 | stride=2
1046 | pad=1
1047 | filters=256
1048 | activation=mish
1049 | 
1050 | [route]
1051 | layers = -1, -20
1052 | 
1053 | [convolutional]
1054 | batch_normalize=1
1055 | filters=256
1056 | size=1
1057 | stride=1
1058 | pad=1
1059 | activation=mish
1060 | 
1061 | [convolutional]
1062 | batch_normalize=1
1063 | filters=256
1064 | size=1
1065 | stride=1
1066 | pad=1
1067 | activation=mish
1068 | 
1069 | [route]
1070 | layers = -2
1071 | 
1072 | [convolutional]
1073 | batch_normalize=1
1074 | filters=256
1075 | size=1
1076 | stride=1
1077 | pad=1
1078 | activation=mish
1079 | 
1080 | [convolutional]
1081 | batch_normalize=1
1082 | size=3
1083 | stride=1
1084 | pad=1
1085 | filters=256
1086 | activation=mish
1087 | 
1088 | [convolutional]
1089 | batch_normalize=1
1090 | filters=256
1091 | size=1
1092 | stride=1
1093 | pad=1
1094 | activation=mish
1095 | 
1096 | [convolutional]
1097 | batch_normalize=1
1098 | size=3
1099 | stride=1
1100 | pad=1
1101 | filters=256
1102 | activation=mish
1103 | 
1104 | [route]
1105 | layers = -1,-6
1106 | 
1107 | [convolutional]
1108 | batch_normalize=1
1109 | filters=256
1110 | size=1
1111 | stride=1
1112 | pad=1
1113 | activation=mish
1114 | 
1115 | [convolutional]
1116 | batch_normalize=1
1117 | size=3
1118 | stride=1
1119 | pad=1
1120 | filters=512
1121 | activation=mish
1122 | 
1123 | [convolutional]
1124 | size=1
1125 | stride=1
1126 | pad=1
1127 | filters=255
1128 | activation=linear
1129 | 
1130 | 
1131 | [yolo]
1132 | mask = 3,4,5
1133 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1134 | classes=80
1135 | num=9
1136 | jitter=.3
1137 | ignore_thresh = .7
1138 | truth_thresh = 1
1139 | random=1
1140 | scale_x_y = 1.05
1141 | iou_thresh=0.213
1142 | cls_normalizer=1.0
1143 | iou_normalizer=0.07
1144 | iou_loss=ciou
1145 | nms_kind=greedynms
1146 | beta_nms=0.6
1147 | 
1148 | [route]
1149 | layers = -4
1150 | 
1151 | [convolutional]
1152 | batch_normalize=1
1153 | size=3
1154 | stride=2
1155 | pad=1
1156 | filters=512
1157 | activation=mish
1158 | 
1159 | [route]
1160 | layers = -1, -49
1161 | 
1162 | [convolutional]
1163 | batch_normalize=1
1164 | filters=512
1165 | size=1
1166 | stride=1
1167 | pad=1
1168 | activation=mish
1169 | 
1170 | [convolutional]
1171 | batch_normalize=1
1172 | filters=512
1173 | size=1
1174 | stride=1
1175 | pad=1
1176 | activation=mish
1177 | 
1178 | [route]
1179 | layers = -2
1180 | 
1181 | [convolutional]
1182 | batch_normalize=1
1183 | filters=512
1184 | size=1
1185 | stride=1
1186 | pad=1
1187 | activation=mish
1188 | 
1189 | [convolutional]
1190 | batch_normalize=1
1191 | size=3
1192 | stride=1
1193 | pad=1
1194 | filters=512
1195 | activation=mish
1196 | 
1197 | [convolutional]
1198 | batch_normalize=1
1199 | filters=512
1200 | size=1
1201 | stride=1
1202 | pad=1
1203 | activation=mish
1204 | 
1205 | [convolutional]
1206 | batch_normalize=1
1207 | size=3
1208 | stride=1
1209 | pad=1
1210 | filters=512
1211 | activation=mish
1212 | 
1213 | [route]
1214 | layers = -1,-6
1215 | 
1216 | [convolutional]
1217 | batch_normalize=1
1218 | filters=512
1219 | size=1
1220 | stride=1
1221 | pad=1
1222 | activation=mish
1223 | 
1224 | [convolutional]
1225 | batch_normalize=1
1226 | size=3
1227 | stride=1
1228 | pad=1
1229 | filters=1024
1230 | activation=mish
1231 | 
1232 | [convolutional]
1233 | size=1
1234 | stride=1
1235 | pad=1
1236 | filters=255
1237 | activation=linear
1238 | 
1239 | 
1240 | [yolo]
1241 | mask = 6,7,8
1242 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1243 | classes=80
1244 | num=9
1245 | jitter=.3
1246 | ignore_thresh = .7
1247 | truth_thresh = 1
1248 | random=1
1249 | scale_x_y = 1.05
1250 | iou_thresh=0.213
1251 | cls_normalizer=1.0
1252 | iou_normalizer=0.07
1253 | iou_loss=ciou
1254 | nms_kind=greedynms
1255 | beta_nms=0.6
1256 | 


--------------------------------------------------------------------------------
/cfg/yolov4-pacsp-s-mish.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | #batch=1
  4 | #subdivisions=1
  5 | # Training
  6 | batch=64
  7 | subdivisions=8
  8 | width=512
  9 | height=512
 10 | channels=3
 11 | momentum=0.949
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.00261
 19 | burn_in=1000
 20 | max_batches = 500500
 21 | policy=steps
 22 | steps=400000,450000
 23 | scales=.1,.1
 24 | 
 25 | mosaic=1
 26 | 
 27 | [convolutional]
 28 | batch_normalize=1
 29 | filters=32
 30 | size=3
 31 | stride=1
 32 | pad=1
 33 | activation=mish
 34 | 
 35 | # Downsample
 36 | 
 37 | [convolutional]
 38 | batch_normalize=1
 39 | filters=32
 40 | size=3
 41 | stride=2
 42 | pad=1
 43 | activation=mish
 44 | 
 45 | [convolutional]
 46 | batch_normalize=1
 47 | filters=32
 48 | size=1
 49 | stride=1
 50 | pad=1
 51 | activation=mish
 52 | 
 53 | [convolutional]
 54 | batch_normalize=1
 55 | filters=32
 56 | size=3
 57 | stride=1
 58 | pad=1
 59 | activation=mish
 60 | 
 61 | [shortcut]
 62 | from=-3
 63 | activation=linear
 64 | 
 65 | # Downsample
 66 | 
 67 | [convolutional]
 68 | batch_normalize=1
 69 | filters=64
 70 | size=3
 71 | stride=2
 72 | pad=1
 73 | activation=mish
 74 | 
 75 | [convolutional]
 76 | batch_normalize=1
 77 | filters=32
 78 | size=1
 79 | stride=1
 80 | pad=1
 81 | activation=mish
 82 | 
 83 | [route]
 84 | layers = -2
 85 | 
 86 | [convolutional]
 87 | batch_normalize=1
 88 | filters=32
 89 | size=1
 90 | stride=1
 91 | pad=1
 92 | activation=mish
 93 | 
 94 | [convolutional]
 95 | batch_normalize=1
 96 | filters=32
 97 | size=1
 98 | stride=1
 99 | pad=1
100 | activation=mish
101 | 
102 | [convolutional]
103 | batch_normalize=1
104 | filters=32
105 | size=3
106 | stride=1
107 | pad=1
108 | activation=mish
109 | 
110 | [shortcut]
111 | from=-3
112 | activation=linear
113 | 
114 | [convolutional]
115 | batch_normalize=1
116 | filters=32
117 | size=1
118 | stride=1
119 | pad=1
120 | activation=mish
121 | 
122 | [route]
123 | layers = -1,-7
124 | 
125 | [convolutional]
126 | batch_normalize=1
127 | filters=64
128 | size=1
129 | stride=1
130 | pad=1
131 | activation=mish
132 | 
133 | # Downsample
134 | 
135 | [convolutional]
136 | batch_normalize=1
137 | filters=128
138 | size=3
139 | stride=2
140 | pad=1
141 | activation=mish
142 | 
143 | [convolutional]
144 | batch_normalize=1
145 | filters=64
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=mish
150 | 
151 | [route]
152 | layers = -2
153 | 
154 | [convolutional]
155 | batch_normalize=1
156 | filters=64
157 | size=1
158 | stride=1
159 | pad=1
160 | activation=mish
161 | 
162 | [convolutional]
163 | batch_normalize=1
164 | filters=64
165 | size=1
166 | stride=1
167 | pad=1
168 | activation=mish
169 | 
170 | [convolutional]
171 | batch_normalize=1
172 | filters=64
173 | size=3
174 | stride=1
175 | pad=1
176 | activation=mish
177 | 
178 | [shortcut]
179 | from=-3
180 | activation=linear
181 | 
182 | [convolutional]
183 | batch_normalize=1
184 | filters=64
185 | size=1
186 | stride=1
187 | pad=1
188 | activation=mish
189 | 
190 | [route]
191 | layers = -1,-7
192 | 
193 | [convolutional]
194 | batch_normalize=1
195 | filters=128
196 | size=1
197 | stride=1
198 | pad=1
199 | activation=mish
200 | 
201 | # Downsample
202 | 
203 | [convolutional]
204 | batch_normalize=1
205 | filters=256
206 | size=3
207 | stride=2
208 | pad=1
209 | activation=mish
210 | 
211 | [convolutional]
212 | batch_normalize=1
213 | filters=128
214 | size=1
215 | stride=1
216 | pad=1
217 | activation=mish
218 | 
219 | [route]
220 | layers = -2
221 | 
222 | [convolutional]
223 | batch_normalize=1
224 | filters=128
225 | size=1
226 | stride=1
227 | pad=1
228 | activation=mish
229 | 
230 | [convolutional]
231 | batch_normalize=1
232 | filters=128
233 | size=1
234 | stride=1
235 | pad=1
236 | activation=mish
237 | 
238 | [convolutional]
239 | batch_normalize=1
240 | filters=128
241 | size=3
242 | stride=1
243 | pad=1
244 | activation=mish
245 | 
246 | [shortcut]
247 | from=-3
248 | activation=linear
249 | 
250 | [convolutional]
251 | batch_normalize=1
252 | filters=128
253 | size=1
254 | stride=1
255 | pad=1
256 | activation=mish
257 | 
258 | [route]
259 | layers = -1,-7
260 | 
261 | [convolutional]
262 | batch_normalize=1
263 | filters=256
264 | size=1
265 | stride=1
266 | pad=1
267 | activation=mish
268 | 
269 | # Downsample
270 | 
271 | [convolutional]
272 | batch_normalize=1
273 | filters=512
274 | size=3
275 | stride=2
276 | pad=1
277 | activation=mish
278 | 
279 | [convolutional]
280 | batch_normalize=1
281 | filters=256
282 | size=1
283 | stride=1
284 | pad=1
285 | activation=mish
286 | 
287 | [route]
288 | layers = -2
289 | 
290 | [convolutional]
291 | batch_normalize=1
292 | filters=256
293 | size=1
294 | stride=1
295 | pad=1
296 | activation=mish
297 | 
298 | [convolutional]
299 | batch_normalize=1
300 | filters=256
301 | size=1
302 | stride=1
303 | pad=1
304 | activation=mish
305 | 
306 | [convolutional]
307 | batch_normalize=1
308 | filters=256
309 | size=3
310 | stride=1
311 | pad=1
312 | activation=mish
313 | 
314 | [shortcut]
315 | from=-3
316 | activation=linear
317 | 
318 | [convolutional]
319 | batch_normalize=1
320 | filters=256
321 | size=1
322 | stride=1
323 | pad=1
324 | activation=mish
325 | 
326 | [route]
327 | layers = -1,-7
328 | 
329 | [convolutional]
330 | batch_normalize=1
331 | filters=512
332 | size=1
333 | stride=1
334 | pad=1
335 | activation=mish
336 | 
337 | ##########################
338 | 
339 | [convolutional]
340 | batch_normalize=1
341 | filters=256
342 | size=1
343 | stride=1
344 | pad=1
345 | activation=mish
346 | 
347 | [route]
348 | layers = -2
349 | 
350 | [convolutional]
351 | batch_normalize=1
352 | filters=256
353 | size=1
354 | stride=1
355 | pad=1
356 | activation=mish
357 | 
358 | ### SPP ###
359 | [maxpool]
360 | stride=1
361 | size=5
362 | 
363 | [route]
364 | layers=-2
365 | 
366 | [maxpool]
367 | stride=1
368 | size=9
369 | 
370 | [route]
371 | layers=-4
372 | 
373 | [maxpool]
374 | stride=1
375 | size=13
376 | 
377 | [route]
378 | layers=-1,-3,-5,-6
379 | ### End SPP ###
380 | 
381 | [convolutional]
382 | batch_normalize=1
383 | filters=256
384 | size=1
385 | stride=1
386 | pad=1
387 | activation=mish
388 | 
389 | [convolutional]
390 | batch_normalize=1
391 | size=3
392 | stride=1
393 | pad=1
394 | filters=256
395 | activation=mish
396 | 
397 | [route]
398 | layers = -1, -11
399 | 
400 | [convolutional]
401 | batch_normalize=1
402 | filters=256
403 | size=1
404 | stride=1
405 | pad=1
406 | activation=mish
407 | 
408 | [convolutional]
409 | batch_normalize=1
410 | filters=128
411 | size=1
412 | stride=1
413 | pad=1
414 | activation=mish
415 | 
416 | [upsample]
417 | stride=2
418 | 
419 | [route]
420 | layers = 34
421 | 
422 | [convolutional]
423 | batch_normalize=1
424 | filters=128
425 | size=1
426 | stride=1
427 | pad=1
428 | activation=mish
429 | 
430 | [route]
431 | layers = -1, -3
432 | 
433 | [convolutional]
434 | batch_normalize=1
435 | filters=128
436 | size=1
437 | stride=1
438 | pad=1
439 | activation=mish
440 | 
441 | [convolutional]
442 | batch_normalize=1
443 | filters=128
444 | size=1
445 | stride=1
446 | pad=1
447 | activation=mish
448 | 
449 | [route]
450 | layers = -2
451 | 
452 | [convolutional]
453 | batch_normalize=1
454 | filters=128
455 | size=1
456 | stride=1
457 | pad=1
458 | activation=mish
459 | 
460 | [convolutional]
461 | batch_normalize=1
462 | size=3
463 | stride=1
464 | pad=1
465 | filters=128
466 | activation=mish
467 | 
468 | [route]
469 | layers = -1, -4
470 | 
471 | [convolutional]
472 | batch_normalize=1
473 | filters=128
474 | size=1
475 | stride=1
476 | pad=1
477 | activation=mish
478 | 
479 | [convolutional]
480 | batch_normalize=1
481 | filters=64
482 | size=1
483 | stride=1
484 | pad=1
485 | activation=mish
486 | 
487 | [upsample]
488 | stride=2
489 | 
490 | [route]
491 | layers = 24
492 | 
493 | [convolutional]
494 | batch_normalize=1
495 | filters=64
496 | size=1
497 | stride=1
498 | pad=1
499 | activation=mish
500 | 
501 | [route]
502 | layers = -1, -3
503 | 
504 | [convolutional]
505 | batch_normalize=1
506 | filters=64
507 | size=1
508 | stride=1
509 | pad=1
510 | activation=mish
511 | 
512 | [convolutional]
513 | batch_normalize=1
514 | filters=64
515 | size=1
516 | stride=1
517 | pad=1
518 | activation=mish
519 | 
520 | [route]
521 | layers = -2
522 | 
523 | [convolutional]
524 | batch_normalize=1
525 | filters=64
526 | size=1
527 | stride=1
528 | pad=1
529 | activation=mish
530 | 
531 | [convolutional]
532 | batch_normalize=1
533 | size=3
534 | stride=1
535 | pad=1
536 | filters=64
537 | activation=mish
538 | 
539 | [route]
540 | layers = -1, -4
541 | 
542 | [convolutional]
543 | batch_normalize=1
544 | filters=64
545 | size=1
546 | stride=1
547 | pad=1
548 | activation=mish
549 | 
550 | ##########################
551 | 
552 | [convolutional]
553 | batch_normalize=1
554 | size=3
555 | stride=1
556 | pad=1
557 | filters=128
558 | activation=mish
559 | 
560 | [convolutional]
561 | size=1
562 | stride=1
563 | pad=1
564 | filters=255
565 | activation=linear
566 | 
567 | 
568 | [yolo]
569 | mask = 0,1,2
570 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
571 | classes=80
572 | num=9
573 | jitter=.3
574 | ignore_thresh = .7
575 | truth_thresh = 1
576 | random=1
577 | scale_x_y = 1.05
578 | iou_thresh=0.213
579 | cls_normalizer=1.0
580 | iou_normalizer=0.07
581 | iou_loss=ciou
582 | nms_kind=greedynms
583 | beta_nms=0.6
584 | 
585 | [route]
586 | layers = -4
587 | 
588 | [convolutional]
589 | batch_normalize=1
590 | size=3
591 | stride=2
592 | pad=1
593 | filters=128
594 | activation=mish
595 | 
596 | [route]
597 | layers = -1, -18
598 | 
599 | [convolutional]
600 | batch_normalize=1
601 | filters=128
602 | size=1
603 | stride=1
604 | pad=1
605 | activation=mish
606 | 
607 | [convolutional]
608 | batch_normalize=1
609 | filters=128
610 | size=1
611 | stride=1
612 | pad=1
613 | activation=mish
614 | 
615 | [route]
616 | layers = -2
617 | 
618 | [convolutional]
619 | batch_normalize=1
620 | filters=128
621 | size=1
622 | stride=1
623 | pad=1
624 | activation=mish
625 | 
626 | [convolutional]
627 | batch_normalize=1
628 | size=3
629 | stride=1
630 | pad=1
631 | filters=128
632 | activation=mish
633 | 
634 | [route]
635 | layers = -1,-4
636 | 
637 | [convolutional]
638 | batch_normalize=1
639 | filters=128
640 | size=1
641 | stride=1
642 | pad=1
643 | activation=mish
644 | 
645 | [convolutional]
646 | batch_normalize=1
647 | size=3
648 | stride=1
649 | pad=1
650 | filters=256
651 | activation=mish
652 | 
653 | [convolutional]
654 | size=1
655 | stride=1
656 | pad=1
657 | filters=255
658 | activation=linear
659 | 
660 | 
661 | [yolo]
662 | mask = 3,4,5
663 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
664 | classes=80
665 | num=9
666 | jitter=.3
667 | ignore_thresh = .7
668 | truth_thresh = 1
669 | random=1
670 | scale_x_y = 1.05
671 | iou_thresh=0.213
672 | cls_normalizer=1.0
673 | iou_normalizer=0.07
674 | iou_loss=ciou
675 | nms_kind=greedynms
676 | beta_nms=0.6
677 | 
678 | [route]
679 | layers = -4
680 | 
681 | [convolutional]
682 | batch_normalize=1
683 | size=3
684 | stride=2
685 | pad=1
686 | filters=256
687 | activation=mish
688 | 
689 | [route]
690 | layers = -1, -43
691 | 
692 | [convolutional]
693 | batch_normalize=1
694 | filters=256
695 | size=1
696 | stride=1
697 | pad=1
698 | activation=mish
699 | 
700 | [convolutional]
701 | batch_normalize=1
702 | filters=256
703 | size=1
704 | stride=1
705 | pad=1
706 | activation=mish
707 | 
708 | [route]
709 | layers = -2
710 | 
711 | [convolutional]
712 | batch_normalize=1
713 | filters=256
714 | size=1
715 | stride=1
716 | pad=1
717 | activation=mish
718 | 
719 | [convolutional]
720 | batch_normalize=1
721 | size=3
722 | stride=1
723 | pad=1
724 | filters=256
725 | activation=mish
726 | 
727 | [route]
728 | layers = -1,-4
729 | 
730 | [convolutional]
731 | batch_normalize=1
732 | filters=256
733 | size=1
734 | stride=1
735 | pad=1
736 | activation=mish
737 | 
738 | [convolutional]
739 | batch_normalize=1
740 | size=3
741 | stride=1
742 | pad=1
743 | filters=512
744 | activation=mish
745 | 
746 | [convolutional]
747 | size=1
748 | stride=1
749 | pad=1
750 | filters=255
751 | activation=linear
752 | 
753 | 
754 | [yolo]
755 | mask = 6,7,8
756 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
757 | classes=80
758 | num=9
759 | jitter=.3
760 | ignore_thresh = .7
761 | truth_thresh = 1
762 | random=1
763 | scale_x_y = 1.05
764 | iou_thresh=0.213
765 | cls_normalizer=1.0
766 | iou_normalizer=0.07
767 | iou_loss=ciou
768 | nms_kind=greedynms
769 | beta_nms=0.6
770 | 


--------------------------------------------------------------------------------
/cfg/yolov4-pacsp-s.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | #batch=1
  4 | #subdivisions=1
  5 | # Training
  6 | batch=64
  7 | subdivisions=8
  8 | width=512
  9 | height=512
 10 | channels=3
 11 | momentum=0.949
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.00261
 19 | burn_in=1000
 20 | max_batches = 500500
 21 | policy=steps
 22 | steps=400000,450000
 23 | scales=.1,.1
 24 | 
 25 | mosaic=1
 26 | 
 27 | [convolutional]
 28 | batch_normalize=1
 29 | filters=32
 30 | size=3
 31 | stride=1
 32 | pad=1
 33 | activation=leaky
 34 | 
 35 | # Downsample
 36 | 
 37 | [convolutional]
 38 | batch_normalize=1
 39 | filters=32
 40 | size=3
 41 | stride=2
 42 | pad=1
 43 | activation=leaky
 44 | 
 45 | [convolutional]
 46 | batch_normalize=1
 47 | filters=32
 48 | size=1
 49 | stride=1
 50 | pad=1
 51 | activation=leaky
 52 | 
 53 | [convolutional]
 54 | batch_normalize=1
 55 | filters=32
 56 | size=3
 57 | stride=1
 58 | pad=1
 59 | activation=leaky
 60 | 
 61 | [shortcut]
 62 | from=-3
 63 | activation=linear
 64 | 
 65 | # Downsample
 66 | 
 67 | [convolutional]
 68 | batch_normalize=1
 69 | filters=64
 70 | size=3
 71 | stride=2
 72 | pad=1
 73 | activation=leaky
 74 | 
 75 | [convolutional]
 76 | batch_normalize=1
 77 | filters=32
 78 | size=1
 79 | stride=1
 80 | pad=1
 81 | activation=leaky
 82 | 
 83 | [route]
 84 | layers = -2
 85 | 
 86 | [convolutional]
 87 | batch_normalize=1
 88 | filters=32
 89 | size=1
 90 | stride=1
 91 | pad=1
 92 | activation=leaky
 93 | 
 94 | [convolutional]
 95 | batch_normalize=1
 96 | filters=32
 97 | size=1
 98 | stride=1
 99 | pad=1
100 | activation=leaky
101 | 
102 | [convolutional]
103 | batch_normalize=1
104 | filters=32
105 | size=3
106 | stride=1
107 | pad=1
108 | activation=leaky
109 | 
110 | [shortcut]
111 | from=-3
112 | activation=linear
113 | 
114 | [convolutional]
115 | batch_normalize=1
116 | filters=32
117 | size=1
118 | stride=1
119 | pad=1
120 | activation=leaky
121 | 
122 | [route]
123 | layers = -1,-7
124 | 
125 | [convolutional]
126 | batch_normalize=1
127 | filters=64
128 | size=1
129 | stride=1
130 | pad=1
131 | activation=leaky
132 | 
133 | # Downsample
134 | 
135 | [convolutional]
136 | batch_normalize=1
137 | filters=128
138 | size=3
139 | stride=2
140 | pad=1
141 | activation=leaky
142 | 
143 | [convolutional]
144 | batch_normalize=1
145 | filters=64
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=leaky
150 | 
151 | [route]
152 | layers = -2
153 | 
154 | [convolutional]
155 | batch_normalize=1
156 | filters=64
157 | size=1
158 | stride=1
159 | pad=1
160 | activation=leaky
161 | 
162 | [convolutional]
163 | batch_normalize=1
164 | filters=64
165 | size=1
166 | stride=1
167 | pad=1
168 | activation=leaky
169 | 
170 | [convolutional]
171 | batch_normalize=1
172 | filters=64
173 | size=3
174 | stride=1
175 | pad=1
176 | activation=leaky
177 | 
178 | [shortcut]
179 | from=-3
180 | activation=linear
181 | 
182 | [convolutional]
183 | batch_normalize=1
184 | filters=64
185 | size=1
186 | stride=1
187 | pad=1
188 | activation=leaky
189 | 
190 | [route]
191 | layers = -1,-7
192 | 
193 | [convolutional]
194 | batch_normalize=1
195 | filters=128
196 | size=1
197 | stride=1
198 | pad=1
199 | activation=leaky
200 | 
201 | # Downsample
202 | 
203 | [convolutional]
204 | batch_normalize=1
205 | filters=256
206 | size=3
207 | stride=2
208 | pad=1
209 | activation=leaky
210 | 
211 | [convolutional]
212 | batch_normalize=1
213 | filters=128
214 | size=1
215 | stride=1
216 | pad=1
217 | activation=leaky
218 | 
219 | [route]
220 | layers = -2
221 | 
222 | [convolutional]
223 | batch_normalize=1
224 | filters=128
225 | size=1
226 | stride=1
227 | pad=1
228 | activation=leaky
229 | 
230 | [convolutional]
231 | batch_normalize=1
232 | filters=128
233 | size=1
234 | stride=1
235 | pad=1
236 | activation=leaky
237 | 
238 | [convolutional]
239 | batch_normalize=1
240 | filters=128
241 | size=3
242 | stride=1
243 | pad=1
244 | activation=leaky
245 | 
246 | [shortcut]
247 | from=-3
248 | activation=linear
249 | 
250 | [convolutional]
251 | batch_normalize=1
252 | filters=128
253 | size=1
254 | stride=1
255 | pad=1
256 | activation=leaky
257 | 
258 | [route]
259 | layers = -1,-7
260 | 
261 | [convolutional]
262 | batch_normalize=1
263 | filters=256
264 | size=1
265 | stride=1
266 | pad=1
267 | activation=leaky
268 | 
269 | # Downsample
270 | 
271 | [convolutional]
272 | batch_normalize=1
273 | filters=512
274 | size=3
275 | stride=2
276 | pad=1
277 | activation=leaky
278 | 
279 | [convolutional]
280 | batch_normalize=1
281 | filters=256
282 | size=1
283 | stride=1
284 | pad=1
285 | activation=leaky
286 | 
287 | [route]
288 | layers = -2
289 | 
290 | [convolutional]
291 | batch_normalize=1
292 | filters=256
293 | size=1
294 | stride=1
295 | pad=1
296 | activation=leaky
297 | 
298 | [convolutional]
299 | batch_normalize=1
300 | filters=256
301 | size=1
302 | stride=1
303 | pad=1
304 | activation=leaky
305 | 
306 | [convolutional]
307 | batch_normalize=1
308 | filters=256
309 | size=3
310 | stride=1
311 | pad=1
312 | activation=leaky
313 | 
314 | [shortcut]
315 | from=-3
316 | activation=linear
317 | 
318 | [convolutional]
319 | batch_normalize=1
320 | filters=256
321 | size=1
322 | stride=1
323 | pad=1
324 | activation=leaky
325 | 
326 | [route]
327 | layers = -1,-7
328 | 
329 | [convolutional]
330 | batch_normalize=1
331 | filters=512
332 | size=1
333 | stride=1
334 | pad=1
335 | activation=leaky
336 | 
337 | ##########################
338 | 
339 | [convolutional]
340 | batch_normalize=1
341 | filters=256
342 | size=1
343 | stride=1
344 | pad=1
345 | activation=leaky
346 | 
347 | [route]
348 | layers = -2
349 | 
350 | [convolutional]
351 | batch_normalize=1
352 | filters=256
353 | size=1
354 | stride=1
355 | pad=1
356 | activation=leaky
357 | 
358 | ### SPP ###
359 | [maxpool]
360 | stride=1
361 | size=5
362 | 
363 | [route]
364 | layers=-2
365 | 
366 | [maxpool]
367 | stride=1
368 | size=9
369 | 
370 | [route]
371 | layers=-4
372 | 
373 | [maxpool]
374 | stride=1
375 | size=13
376 | 
377 | [route]
378 | layers=-1,-3,-5,-6
379 | ### End SPP ###
380 | 
381 | [convolutional]
382 | batch_normalize=1
383 | filters=256
384 | size=1
385 | stride=1
386 | pad=1
387 | activation=leaky
388 | 
389 | [convolutional]
390 | batch_normalize=1
391 | size=3
392 | stride=1
393 | pad=1
394 | filters=256
395 | activation=leaky
396 | 
397 | [route]
398 | layers = -1, -11
399 | 
400 | [convolutional]
401 | batch_normalize=1
402 | filters=256
403 | size=1
404 | stride=1
405 | pad=1
406 | activation=leaky
407 | 
408 | [convolutional]
409 | batch_normalize=1
410 | filters=128
411 | size=1
412 | stride=1
413 | pad=1
414 | activation=leaky
415 | 
416 | [upsample]
417 | stride=2
418 | 
419 | [route]
420 | layers = 34
421 | 
422 | [convolutional]
423 | batch_normalize=1
424 | filters=128
425 | size=1
426 | stride=1
427 | pad=1
428 | activation=leaky
429 | 
430 | [route]
431 | layers = -1, -3
432 | 
433 | [convolutional]
434 | batch_normalize=1
435 | filters=128
436 | size=1
437 | stride=1
438 | pad=1
439 | activation=leaky
440 | 
441 | [convolutional]
442 | batch_normalize=1
443 | filters=128
444 | size=1
445 | stride=1
446 | pad=1
447 | activation=leaky
448 | 
449 | [route]
450 | layers = -2
451 | 
452 | [convolutional]
453 | batch_normalize=1
454 | filters=128
455 | size=1
456 | stride=1
457 | pad=1
458 | activation=leaky
459 | 
460 | [convolutional]
461 | batch_normalize=1
462 | size=3
463 | stride=1
464 | pad=1
465 | filters=128
466 | activation=leaky
467 | 
468 | [route]
469 | layers = -1, -4
470 | 
471 | [convolutional]
472 | batch_normalize=1
473 | filters=128
474 | size=1
475 | stride=1
476 | pad=1
477 | activation=leaky
478 | 
479 | [convolutional]
480 | batch_normalize=1
481 | filters=64
482 | size=1
483 | stride=1
484 | pad=1
485 | activation=leaky
486 | 
487 | [upsample]
488 | stride=2
489 | 
490 | [route]
491 | layers = 24
492 | 
493 | [convolutional]
494 | batch_normalize=1
495 | filters=64
496 | size=1
497 | stride=1
498 | pad=1
499 | activation=leaky
500 | 
501 | [route]
502 | layers = -1, -3
503 | 
504 | [convolutional]
505 | batch_normalize=1
506 | filters=64
507 | size=1
508 | stride=1
509 | pad=1
510 | activation=leaky
511 | 
512 | [convolutional]
513 | batch_normalize=1
514 | filters=64
515 | size=1
516 | stride=1
517 | pad=1
518 | activation=leaky
519 | 
520 | [route]
521 | layers = -2
522 | 
523 | [convolutional]
524 | batch_normalize=1
525 | filters=64
526 | size=1
527 | stride=1
528 | pad=1
529 | activation=leaky
530 | 
531 | [convolutional]
532 | batch_normalize=1
533 | size=3
534 | stride=1
535 | pad=1
536 | filters=64
537 | activation=leaky
538 | 
539 | [route]
540 | layers = -1, -4
541 | 
542 | [convolutional]
543 | batch_normalize=1
544 | filters=64
545 | size=1
546 | stride=1
547 | pad=1
548 | activation=leaky
549 | 
550 | ##########################
551 | 
552 | [convolutional]
553 | batch_normalize=1
554 | size=3
555 | stride=1
556 | pad=1
557 | filters=128
558 | activation=leaky
559 | 
560 | [convolutional]
561 | size=1
562 | stride=1
563 | pad=1
564 | filters=255
565 | activation=linear
566 | 
567 | 
568 | [yolo]
569 | mask = 0,1,2
570 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
571 | classes=80
572 | num=9
573 | jitter=.3
574 | ignore_thresh = .7
575 | truth_thresh = 1
576 | random=1
577 | scale_x_y = 1.05
578 | iou_thresh=0.213
579 | cls_normalizer=1.0
580 | iou_normalizer=0.07
581 | iou_loss=ciou
582 | nms_kind=greedynms
583 | beta_nms=0.6
584 | 
585 | [route]
586 | layers = -4
587 | 
588 | [convolutional]
589 | batch_normalize=1
590 | size=3
591 | stride=2
592 | pad=1
593 | filters=128
594 | activation=leaky
595 | 
596 | [route]
597 | layers = -1, -18
598 | 
599 | [convolutional]
600 | batch_normalize=1
601 | filters=128
602 | size=1
603 | stride=1
604 | pad=1
605 | activation=leaky
606 | 
607 | [convolutional]
608 | batch_normalize=1
609 | filters=128
610 | size=1
611 | stride=1
612 | pad=1
613 | activation=leaky
614 | 
615 | [route]
616 | layers = -2
617 | 
618 | [convolutional]
619 | batch_normalize=1
620 | filters=128
621 | size=1
622 | stride=1
623 | pad=1
624 | activation=leaky
625 | 
626 | [convolutional]
627 | batch_normalize=1
628 | size=3
629 | stride=1
630 | pad=1
631 | filters=128
632 | activation=leaky
633 | 
634 | [route]
635 | layers = -1,-4
636 | 
637 | [convolutional]
638 | batch_normalize=1
639 | filters=128
640 | size=1
641 | stride=1
642 | pad=1
643 | activation=leaky
644 | 
645 | [convolutional]
646 | batch_normalize=1
647 | size=3
648 | stride=1
649 | pad=1
650 | filters=256
651 | activation=leaky
652 | 
653 | [convolutional]
654 | size=1
655 | stride=1
656 | pad=1
657 | filters=255
658 | activation=linear
659 | 
660 | 
661 | [yolo]
662 | mask = 3,4,5
663 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
664 | classes=80
665 | num=9
666 | jitter=.3
667 | ignore_thresh = .7
668 | truth_thresh = 1
669 | random=1
670 | scale_x_y = 1.05
671 | iou_thresh=0.213
672 | cls_normalizer=1.0
673 | iou_normalizer=0.07
674 | iou_loss=ciou
675 | nms_kind=greedynms
676 | beta_nms=0.6
677 | 
678 | [route]
679 | layers = -4
680 | 
681 | [convolutional]
682 | batch_normalize=1
683 | size=3
684 | stride=2
685 | pad=1
686 | filters=256
687 | activation=leaky
688 | 
689 | [route]
690 | layers = -1, -43
691 | 
692 | [convolutional]
693 | batch_normalize=1
694 | filters=256
695 | size=1
696 | stride=1
697 | pad=1
698 | activation=leaky
699 | 
700 | [convolutional]
701 | batch_normalize=1
702 | filters=256
703 | size=1
704 | stride=1
705 | pad=1
706 | activation=leaky
707 | 
708 | [route]
709 | layers = -2
710 | 
711 | [convolutional]
712 | batch_normalize=1
713 | filters=256
714 | size=1
715 | stride=1
716 | pad=1
717 | activation=leaky
718 | 
719 | [convolutional]
720 | batch_normalize=1
721 | size=3
722 | stride=1
723 | pad=1
724 | filters=256
725 | activation=leaky
726 | 
727 | [route]
728 | layers = -1,-4
729 | 
730 | [convolutional]
731 | batch_normalize=1
732 | filters=256
733 | size=1
734 | stride=1
735 | pad=1
736 | activation=leaky
737 | 
738 | [convolutional]
739 | batch_normalize=1
740 | size=3
741 | stride=1
742 | pad=1
743 | filters=512
744 | activation=leaky
745 | 
746 | [convolutional]
747 | size=1
748 | stride=1
749 | pad=1
750 | filters=255
751 | activation=linear
752 | 
753 | 
754 | [yolo]
755 | mask = 6,7,8
756 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
757 | classes=80
758 | num=9
759 | jitter=.3
760 | ignore_thresh = .7
761 | truth_thresh = 1
762 | random=1
763 | scale_x_y = 1.05
764 | iou_thresh=0.213
765 | cls_normalizer=1.0
766 | iou_normalizer=0.07
767 | iou_loss=ciou
768 | nms_kind=greedynms
769 | beta_nms=0.6
770 | 


--------------------------------------------------------------------------------
/cfg/yolov4-pacsp.cfg:
--------------------------------------------------------------------------------
   1 | [net]
   2 | # Testing
   3 | #batch=1
   4 | #subdivisions=1
   5 | # Training
   6 | batch=64
   7 | subdivisions=8
   8 | width=512
   9 | height=512
  10 | channels=3
  11 | momentum=0.949
  12 | decay=0.0005
  13 | angle=0
  14 | saturation = 1.5
  15 | exposure = 1.5
  16 | hue=.1
  17 | 
  18 | learning_rate=0.00261
  19 | burn_in=1000
  20 | max_batches = 500500
  21 | policy=steps
  22 | steps=400000,450000
  23 | scales=.1,.1
  24 | 
  25 | #cutmix=1
  26 | mosaic=1
  27 | 
  28 | #23:104x104 54:52x52 85:26x26 104:13x13 for 416
  29 | 
  30 | 
  31 | 
  32 | [convolutional]
  33 | batch_normalize=1
  34 | filters=32
  35 | size=3
  36 | stride=1
  37 | pad=1
  38 | activation=leaky
  39 | 
  40 | # Downsample
  41 | 
  42 | [convolutional]
  43 | batch_normalize=1
  44 | filters=64
  45 | size=3
  46 | stride=2
  47 | pad=1
  48 | activation=leaky
  49 | 
  50 | #[convolutional]
  51 | #batch_normalize=1
  52 | #filters=64
  53 | #size=1
  54 | #stride=1
  55 | #pad=1
  56 | #activation=leaky
  57 | 
  58 | #[route]
  59 | #layers = -2
  60 | 
  61 | #[convolutional]
  62 | #batch_normalize=1
  63 | #filters=64
  64 | #size=1
  65 | #stride=1
  66 | #pad=1
  67 | #activation=leaky
  68 | 
  69 | [convolutional]
  70 | batch_normalize=1
  71 | filters=32
  72 | size=1
  73 | stride=1
  74 | pad=1
  75 | activation=leaky
  76 | 
  77 | [convolutional]
  78 | batch_normalize=1
  79 | filters=64
  80 | size=3
  81 | stride=1
  82 | pad=1
  83 | activation=leaky
  84 | 
  85 | [shortcut]
  86 | from=-3
  87 | activation=linear
  88 | 
  89 | #[convolutional]
  90 | #batch_normalize=1
  91 | #filters=64
  92 | #size=1
  93 | #stride=1
  94 | #pad=1
  95 | #activation=leaky
  96 | 
  97 | #[route]
  98 | #layers = -1,-7
  99 | 
 100 | #[convolutional]
 101 | #batch_normalize=1
 102 | #filters=64
 103 | #size=1
 104 | #stride=1
 105 | #pad=1
 106 | #activation=leaky
 107 | 
 108 | # Downsample
 109 | 
 110 | [convolutional]
 111 | batch_normalize=1
 112 | filters=128
 113 | size=3
 114 | stride=2
 115 | pad=1
 116 | activation=leaky
 117 | 
 118 | [convolutional]
 119 | batch_normalize=1
 120 | filters=64
 121 | size=1
 122 | stride=1
 123 | pad=1
 124 | activation=leaky
 125 | 
 126 | [route]
 127 | layers = -2
 128 | 
 129 | [convolutional]
 130 | batch_normalize=1
 131 | filters=64
 132 | size=1
 133 | stride=1
 134 | pad=1
 135 | activation=leaky
 136 | 
 137 | [convolutional]
 138 | batch_normalize=1
 139 | filters=64
 140 | size=1
 141 | stride=1
 142 | pad=1
 143 | activation=leaky
 144 | 
 145 | [convolutional]
 146 | batch_normalize=1
 147 | filters=64
 148 | size=3
 149 | stride=1
 150 | pad=1
 151 | activation=leaky
 152 | 
 153 | [shortcut]
 154 | from=-3
 155 | activation=linear
 156 | 
 157 | [convolutional]
 158 | batch_normalize=1
 159 | filters=64
 160 | size=1
 161 | stride=1
 162 | pad=1
 163 | activation=leaky
 164 | 
 165 | [convolutional]
 166 | batch_normalize=1
 167 | filters=64
 168 | size=3
 169 | stride=1
 170 | pad=1
 171 | activation=leaky
 172 | 
 173 | [shortcut]
 174 | from=-3
 175 | activation=linear
 176 | 
 177 | [convolutional]
 178 | batch_normalize=1
 179 | filters=64
 180 | size=1
 181 | stride=1
 182 | pad=1
 183 | activation=leaky
 184 | 
 185 | [route]
 186 | layers = -1,-10
 187 | 
 188 | [convolutional]
 189 | batch_normalize=1
 190 | filters=128
 191 | size=1
 192 | stride=1
 193 | pad=1
 194 | activation=leaky
 195 | 
 196 | # Downsample
 197 | 
 198 | [convolutional]
 199 | batch_normalize=1
 200 | filters=256
 201 | size=3
 202 | stride=2
 203 | pad=1
 204 | activation=leaky
 205 | 
 206 | [convolutional]
 207 | batch_normalize=1
 208 | filters=128
 209 | size=1
 210 | stride=1
 211 | pad=1
 212 | activation=leaky
 213 | 
 214 | [route]
 215 | layers = -2
 216 | 
 217 | [convolutional]
 218 | batch_normalize=1
 219 | filters=128
 220 | size=1
 221 | stride=1
 222 | pad=1
 223 | activation=leaky
 224 | 
 225 | [convolutional]
 226 | batch_normalize=1
 227 | filters=128
 228 | size=1
 229 | stride=1
 230 | pad=1
 231 | activation=leaky
 232 | 
 233 | [convolutional]
 234 | batch_normalize=1
 235 | filters=128
 236 | size=3
 237 | stride=1
 238 | pad=1
 239 | activation=leaky
 240 | 
 241 | [shortcut]
 242 | from=-3
 243 | activation=linear
 244 | 
 245 | [convolutional]
 246 | batch_normalize=1
 247 | filters=128
 248 | size=1
 249 | stride=1
 250 | pad=1
 251 | activation=leaky
 252 | 
 253 | [convolutional]
 254 | batch_normalize=1
 255 | filters=128
 256 | size=3
 257 | stride=1
 258 | pad=1
 259 | activation=leaky
 260 | 
 261 | [shortcut]
 262 | from=-3
 263 | activation=linear
 264 | 
 265 | [convolutional]
 266 | batch_normalize=1
 267 | filters=128
 268 | size=1
 269 | stride=1
 270 | pad=1
 271 | activation=leaky
 272 | 
 273 | [convolutional]
 274 | batch_normalize=1
 275 | filters=128
 276 | size=3
 277 | stride=1
 278 | pad=1
 279 | activation=leaky
 280 | 
 281 | [shortcut]
 282 | from=-3
 283 | activation=linear
 284 | 
 285 | [convolutional]
 286 | batch_normalize=1
 287 | filters=128
 288 | size=1
 289 | stride=1
 290 | pad=1
 291 | activation=leaky
 292 | 
 293 | [convolutional]
 294 | batch_normalize=1
 295 | filters=128
 296 | size=3
 297 | stride=1
 298 | pad=1
 299 | activation=leaky
 300 | 
 301 | [shortcut]
 302 | from=-3
 303 | activation=linear
 304 | 
 305 | 
 306 | [convolutional]
 307 | batch_normalize=1
 308 | filters=128
 309 | size=1
 310 | stride=1
 311 | pad=1
 312 | activation=leaky
 313 | 
 314 | [convolutional]
 315 | batch_normalize=1
 316 | filters=128
 317 | size=3
 318 | stride=1
 319 | pad=1
 320 | activation=leaky
 321 | 
 322 | [shortcut]
 323 | from=-3
 324 | activation=linear
 325 | 
 326 | [convolutional]
 327 | batch_normalize=1
 328 | filters=128
 329 | size=1
 330 | stride=1
 331 | pad=1
 332 | activation=leaky
 333 | 
 334 | [convolutional]
 335 | batch_normalize=1
 336 | filters=128
 337 | size=3
 338 | stride=1
 339 | pad=1
 340 | activation=leaky
 341 | 
 342 | [shortcut]
 343 | from=-3
 344 | activation=linear
 345 | 
 346 | [convolutional]
 347 | batch_normalize=1
 348 | filters=128
 349 | size=1
 350 | stride=1
 351 | pad=1
 352 | activation=leaky
 353 | 
 354 | [convolutional]
 355 | batch_normalize=1
 356 | filters=128
 357 | size=3
 358 | stride=1
 359 | pad=1
 360 | activation=leaky
 361 | 
 362 | [shortcut]
 363 | from=-3
 364 | activation=linear
 365 | 
 366 | [convolutional]
 367 | batch_normalize=1
 368 | filters=128
 369 | size=1
 370 | stride=1
 371 | pad=1
 372 | activation=leaky
 373 | 
 374 | [convolutional]
 375 | batch_normalize=1
 376 | filters=128
 377 | size=3
 378 | stride=1
 379 | pad=1
 380 | activation=leaky
 381 | 
 382 | [shortcut]
 383 | from=-3
 384 | activation=linear
 385 | 
 386 | [convolutional]
 387 | batch_normalize=1
 388 | filters=128
 389 | size=1
 390 | stride=1
 391 | pad=1
 392 | activation=leaky
 393 | 
 394 | [route]
 395 | layers = -1,-28
 396 | 
 397 | [convolutional]
 398 | batch_normalize=1
 399 | filters=256
 400 | size=1
 401 | stride=1
 402 | pad=1
 403 | activation=leaky
 404 | 
 405 | # Downsample
 406 | 
 407 | [convolutional]
 408 | batch_normalize=1
 409 | filters=512
 410 | size=3
 411 | stride=2
 412 | pad=1
 413 | activation=leaky
 414 | 
 415 | [convolutional]
 416 | batch_normalize=1
 417 | filters=256
 418 | size=1
 419 | stride=1
 420 | pad=1
 421 | activation=leaky
 422 | 
 423 | [route]
 424 | layers = -2
 425 | 
 426 | [convolutional]
 427 | batch_normalize=1
 428 | filters=256
 429 | size=1
 430 | stride=1
 431 | pad=1
 432 | activation=leaky
 433 | 
 434 | [convolutional]
 435 | batch_normalize=1
 436 | filters=256
 437 | size=1
 438 | stride=1
 439 | pad=1
 440 | activation=leaky
 441 | 
 442 | [convolutional]
 443 | batch_normalize=1
 444 | filters=256
 445 | size=3
 446 | stride=1
 447 | pad=1
 448 | activation=leaky
 449 | 
 450 | [shortcut]
 451 | from=-3
 452 | activation=linear
 453 | 
 454 | 
 455 | [convolutional]
 456 | batch_normalize=1
 457 | filters=256
 458 | size=1
 459 | stride=1
 460 | pad=1
 461 | activation=leaky
 462 | 
 463 | [convolutional]
 464 | batch_normalize=1
 465 | filters=256
 466 | size=3
 467 | stride=1
 468 | pad=1
 469 | activation=leaky
 470 | 
 471 | [shortcut]
 472 | from=-3
 473 | activation=linear
 474 | 
 475 | 
 476 | [convolutional]
 477 | batch_normalize=1
 478 | filters=256
 479 | size=1
 480 | stride=1
 481 | pad=1
 482 | activation=leaky
 483 | 
 484 | [convolutional]
 485 | batch_normalize=1
 486 | filters=256
 487 | size=3
 488 | stride=1
 489 | pad=1
 490 | activation=leaky
 491 | 
 492 | [shortcut]
 493 | from=-3
 494 | activation=linear
 495 | 
 496 | 
 497 | [convolutional]
 498 | batch_normalize=1
 499 | filters=256
 500 | size=1
 501 | stride=1
 502 | pad=1
 503 | activation=leaky
 504 | 
 505 | [convolutional]
 506 | batch_normalize=1
 507 | filters=256
 508 | size=3
 509 | stride=1
 510 | pad=1
 511 | activation=leaky
 512 | 
 513 | [shortcut]
 514 | from=-3
 515 | activation=linear
 516 | 
 517 | 
 518 | [convolutional]
 519 | batch_normalize=1
 520 | filters=256
 521 | size=1
 522 | stride=1
 523 | pad=1
 524 | activation=leaky
 525 | 
 526 | [convolutional]
 527 | batch_normalize=1
 528 | filters=256
 529 | size=3
 530 | stride=1
 531 | pad=1
 532 | activation=leaky
 533 | 
 534 | [shortcut]
 535 | from=-3
 536 | activation=linear
 537 | 
 538 | 
 539 | [convolutional]
 540 | batch_normalize=1
 541 | filters=256
 542 | size=1
 543 | stride=1
 544 | pad=1
 545 | activation=leaky
 546 | 
 547 | [convolutional]
 548 | batch_normalize=1
 549 | filters=256
 550 | size=3
 551 | stride=1
 552 | pad=1
 553 | activation=leaky
 554 | 
 555 | [shortcut]
 556 | from=-3
 557 | activation=linear
 558 | 
 559 | 
 560 | [convolutional]
 561 | batch_normalize=1
 562 | filters=256
 563 | size=1
 564 | stride=1
 565 | pad=1
 566 | activation=leaky
 567 | 
 568 | [convolutional]
 569 | batch_normalize=1
 570 | filters=256
 571 | size=3
 572 | stride=1
 573 | pad=1
 574 | activation=leaky
 575 | 
 576 | [shortcut]
 577 | from=-3
 578 | activation=linear
 579 | 
 580 | [convolutional]
 581 | batch_normalize=1
 582 | filters=256
 583 | size=1
 584 | stride=1
 585 | pad=1
 586 | activation=leaky
 587 | 
 588 | [convolutional]
 589 | batch_normalize=1
 590 | filters=256
 591 | size=3
 592 | stride=1
 593 | pad=1
 594 | activation=leaky
 595 | 
 596 | [shortcut]
 597 | from=-3
 598 | activation=linear
 599 | 
 600 | [convolutional]
 601 | batch_normalize=1
 602 | filters=256
 603 | size=1
 604 | stride=1
 605 | pad=1
 606 | activation=leaky
 607 | 
 608 | [route]
 609 | layers = -1,-28
 610 | 
 611 | [convolutional]
 612 | batch_normalize=1
 613 | filters=512
 614 | size=1
 615 | stride=1
 616 | pad=1
 617 | activation=leaky
 618 | 
 619 | # Downsample
 620 | 
 621 | [convolutional]
 622 | batch_normalize=1
 623 | filters=1024
 624 | size=3
 625 | stride=2
 626 | pad=1
 627 | activation=leaky
 628 | 
 629 | [convolutional]
 630 | batch_normalize=1
 631 | filters=512
 632 | size=1
 633 | stride=1
 634 | pad=1
 635 | activation=leaky
 636 | 
 637 | [route]
 638 | layers = -2
 639 | 
 640 | [convolutional]
 641 | batch_normalize=1
 642 | filters=512
 643 | size=1
 644 | stride=1
 645 | pad=1
 646 | activation=leaky
 647 | 
 648 | [convolutional]
 649 | batch_normalize=1
 650 | filters=512
 651 | size=1
 652 | stride=1
 653 | pad=1
 654 | activation=leaky
 655 | 
 656 | [convolutional]
 657 | batch_normalize=1
 658 | filters=512
 659 | size=3
 660 | stride=1
 661 | pad=1
 662 | activation=leaky
 663 | 
 664 | [shortcut]
 665 | from=-3
 666 | activation=linear
 667 | 
 668 | [convolutional]
 669 | batch_normalize=1
 670 | filters=512
 671 | size=1
 672 | stride=1
 673 | pad=1
 674 | activation=leaky
 675 | 
 676 | [convolutional]
 677 | batch_normalize=1
 678 | filters=512
 679 | size=3
 680 | stride=1
 681 | pad=1
 682 | activation=leaky
 683 | 
 684 | [shortcut]
 685 | from=-3
 686 | activation=linear
 687 | 
 688 | [convolutional]
 689 | batch_normalize=1
 690 | filters=512
 691 | size=1
 692 | stride=1
 693 | pad=1
 694 | activation=leaky
 695 | 
 696 | [convolutional]
 697 | batch_normalize=1
 698 | filters=512
 699 | size=3
 700 | stride=1
 701 | pad=1
 702 | activation=leaky
 703 | 
 704 | [shortcut]
 705 | from=-3
 706 | activation=linear
 707 | 
 708 | [convolutional]
 709 | batch_normalize=1
 710 | filters=512
 711 | size=1
 712 | stride=1
 713 | pad=1
 714 | activation=leaky
 715 | 
 716 | [convolutional]
 717 | batch_normalize=1
 718 | filters=512
 719 | size=3
 720 | stride=1
 721 | pad=1
 722 | activation=leaky
 723 | 
 724 | [shortcut]
 725 | from=-3
 726 | activation=linear
 727 | 
 728 | [convolutional]
 729 | batch_normalize=1
 730 | filters=512
 731 | size=1
 732 | stride=1
 733 | pad=1
 734 | activation=leaky
 735 | 
 736 | [route]
 737 | layers = -1,-16
 738 | 
 739 | [convolutional]
 740 | batch_normalize=1
 741 | filters=1024
 742 | size=1
 743 | stride=1
 744 | pad=1
 745 | activation=leaky
 746 | 
 747 | ##########################
 748 | 
 749 | [convolutional]
 750 | batch_normalize=1
 751 | filters=512
 752 | size=1
 753 | stride=1
 754 | pad=1
 755 | activation=leaky
 756 | 
 757 | [route]
 758 | layers = -2
 759 | 
 760 | [convolutional]
 761 | batch_normalize=1
 762 | filters=512
 763 | size=1
 764 | stride=1
 765 | pad=1
 766 | activation=leaky
 767 | 
 768 | [convolutional]
 769 | batch_normalize=1
 770 | size=3
 771 | stride=1
 772 | pad=1
 773 | filters=512
 774 | activation=leaky
 775 | 
 776 | [convolutional]
 777 | batch_normalize=1
 778 | filters=512
 779 | size=1
 780 | stride=1
 781 | pad=1
 782 | activation=leaky
 783 | 
 784 | ### SPP ###
 785 | [maxpool]
 786 | stride=1
 787 | size=5
 788 | 
 789 | [route]
 790 | layers=-2
 791 | 
 792 | [maxpool]
 793 | stride=1
 794 | size=9
 795 | 
 796 | [route]
 797 | layers=-4
 798 | 
 799 | [maxpool]
 800 | stride=1
 801 | size=13
 802 | 
 803 | [route]
 804 | layers=-1,-3,-5,-6
 805 | ### End SPP ###
 806 | 
 807 | [convolutional]
 808 | batch_normalize=1
 809 | filters=512
 810 | size=1
 811 | stride=1
 812 | pad=1
 813 | activation=leaky
 814 | 
 815 | [convolutional]
 816 | batch_normalize=1
 817 | size=3
 818 | stride=1
 819 | pad=1
 820 | filters=512
 821 | activation=leaky
 822 | 
 823 | [route]
 824 | layers = -1, -13
 825 | 
 826 | [convolutional]
 827 | batch_normalize=1
 828 | filters=512
 829 | size=1
 830 | stride=1
 831 | pad=1
 832 | activation=leaky
 833 | 
 834 | [convolutional]
 835 | batch_normalize=1
 836 | filters=256
 837 | size=1
 838 | stride=1
 839 | pad=1
 840 | activation=leaky
 841 | 
 842 | [upsample]
 843 | stride=2
 844 | 
 845 | [route]
 846 | layers = 79
 847 | 
 848 | [convolutional]
 849 | batch_normalize=1
 850 | filters=256
 851 | size=1
 852 | stride=1
 853 | pad=1
 854 | activation=leaky
 855 | 
 856 | [route]
 857 | layers = -1, -3
 858 | 
 859 | [convolutional]
 860 | batch_normalize=1
 861 | filters=256
 862 | size=1
 863 | stride=1
 864 | pad=1
 865 | activation=leaky
 866 | 
 867 | [convolutional]
 868 | batch_normalize=1
 869 | filters=256
 870 | size=1
 871 | stride=1
 872 | pad=1
 873 | activation=leaky
 874 | 
 875 | [route]
 876 | layers = -2
 877 | 
 878 | [convolutional]
 879 | batch_normalize=1
 880 | filters=256
 881 | size=1
 882 | stride=1
 883 | pad=1
 884 | activation=leaky
 885 | 
 886 | [convolutional]
 887 | batch_normalize=1
 888 | size=3
 889 | stride=1
 890 | pad=1
 891 | filters=256
 892 | activation=leaky
 893 | 
 894 | [convolutional]
 895 | batch_normalize=1
 896 | filters=256
 897 | size=1
 898 | stride=1
 899 | pad=1
 900 | activation=leaky
 901 | 
 902 | [convolutional]
 903 | batch_normalize=1
 904 | size=3
 905 | stride=1
 906 | pad=1
 907 | filters=256
 908 | activation=leaky
 909 | 
 910 | [route]
 911 | layers = -1, -6
 912 | 
 913 | [convolutional]
 914 | batch_normalize=1
 915 | filters=256
 916 | size=1
 917 | stride=1
 918 | pad=1
 919 | activation=leaky
 920 | 
 921 | [convolutional]
 922 | batch_normalize=1
 923 | filters=128
 924 | size=1
 925 | stride=1
 926 | pad=1
 927 | activation=leaky
 928 | 
 929 | [upsample]
 930 | stride=2
 931 | 
 932 | [route]
 933 | layers = 48
 934 | 
 935 | [convolutional]
 936 | batch_normalize=1
 937 | filters=128
 938 | size=1
 939 | stride=1
 940 | pad=1
 941 | activation=leaky
 942 | 
 943 | [route]
 944 | layers = -1, -3
 945 | 
 946 | [convolutional]
 947 | batch_normalize=1
 948 | filters=128
 949 | size=1
 950 | stride=1
 951 | pad=1
 952 | activation=leaky
 953 | 
 954 | [convolutional]
 955 | batch_normalize=1
 956 | filters=128
 957 | size=1
 958 | stride=1
 959 | pad=1
 960 | activation=leaky
 961 | 
 962 | [route]
 963 | layers = -2
 964 | 
 965 | [convolutional]
 966 | batch_normalize=1
 967 | filters=128
 968 | size=1
 969 | stride=1
 970 | pad=1
 971 | activation=leaky
 972 | 
 973 | [convolutional]
 974 | batch_normalize=1
 975 | size=3
 976 | stride=1
 977 | pad=1
 978 | filters=128
 979 | activation=leaky
 980 | 
 981 | [convolutional]
 982 | batch_normalize=1
 983 | filters=128
 984 | size=1
 985 | stride=1
 986 | pad=1
 987 | activation=leaky
 988 | 
 989 | [convolutional]
 990 | batch_normalize=1
 991 | size=3
 992 | stride=1
 993 | pad=1
 994 | filters=128
 995 | activation=leaky
 996 | 
 997 | [route]
 998 | layers = -1, -6
 999 | 
1000 | [convolutional]
1001 | batch_normalize=1
1002 | filters=128
1003 | size=1
1004 | stride=1
1005 | pad=1
1006 | activation=leaky
1007 | 
1008 | ##########################
1009 | 
1010 | [convolutional]
1011 | batch_normalize=1
1012 | size=3
1013 | stride=1
1014 | pad=1
1015 | filters=256
1016 | activation=leaky
1017 | 
1018 | [convolutional]
1019 | size=1
1020 | stride=1
1021 | pad=1
1022 | filters=255
1023 | activation=linear
1024 | 
1025 | 
1026 | [yolo]
1027 | mask = 0,1,2
1028 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1029 | classes=80
1030 | num=9
1031 | jitter=.3
1032 | ignore_thresh = .7
1033 | truth_thresh = 1
1034 | random=1
1035 | scale_x_y = 1.05
1036 | iou_thresh=0.213
1037 | cls_normalizer=1.0
1038 | iou_normalizer=0.07
1039 | iou_loss=ciou
1040 | nms_kind=greedynms
1041 | beta_nms=0.6
1042 | 
1043 | [route]
1044 | layers = -4
1045 | 
1046 | [convolutional]
1047 | batch_normalize=1
1048 | size=3
1049 | stride=2
1050 | pad=1
1051 | filters=256
1052 | activation=leaky
1053 | 
1054 | [route]
1055 | layers = -1, -20
1056 | 
1057 | [convolutional]
1058 | batch_normalize=1
1059 | filters=256
1060 | size=1
1061 | stride=1
1062 | pad=1
1063 | activation=leaky
1064 | 
1065 | [convolutional]
1066 | batch_normalize=1
1067 | filters=256
1068 | size=1
1069 | stride=1
1070 | pad=1
1071 | activation=leaky
1072 | 
1073 | [route]
1074 | layers = -2
1075 | 
1076 | [convolutional]
1077 | batch_normalize=1
1078 | filters=256
1079 | size=1
1080 | stride=1
1081 | pad=1
1082 | activation=leaky
1083 | 
1084 | [convolutional]
1085 | batch_normalize=1
1086 | size=3
1087 | stride=1
1088 | pad=1
1089 | filters=256
1090 | activation=leaky
1091 | 
1092 | [convolutional]
1093 | batch_normalize=1
1094 | filters=256
1095 | size=1
1096 | stride=1
1097 | pad=1
1098 | activation=leaky
1099 | 
1100 | [convolutional]
1101 | batch_normalize=1
1102 | size=3
1103 | stride=1
1104 | pad=1
1105 | filters=256
1106 | activation=leaky
1107 | 
1108 | [route]
1109 | layers = -1,-6
1110 | 
1111 | [convolutional]
1112 | batch_normalize=1
1113 | filters=256
1114 | size=1
1115 | stride=1
1116 | pad=1
1117 | activation=leaky
1118 | 
1119 | [convolutional]
1120 | batch_normalize=1
1121 | size=3
1122 | stride=1
1123 | pad=1
1124 | filters=512
1125 | activation=leaky
1126 | 
1127 | [convolutional]
1128 | size=1
1129 | stride=1
1130 | pad=1
1131 | filters=255
1132 | activation=linear
1133 | 
1134 | 
1135 | [yolo]
1136 | mask = 3,4,5
1137 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1138 | classes=80
1139 | num=9
1140 | jitter=.3
1141 | ignore_thresh = .7
1142 | truth_thresh = 1
1143 | random=1
1144 | scale_x_y = 1.05
1145 | iou_thresh=0.213
1146 | cls_normalizer=1.0
1147 | iou_normalizer=0.07
1148 | iou_loss=ciou
1149 | nms_kind=greedynms
1150 | beta_nms=0.6
1151 | 
1152 | [route]
1153 | layers = -4
1154 | 
1155 | [convolutional]
1156 | batch_normalize=1
1157 | size=3
1158 | stride=2
1159 | pad=1
1160 | filters=512
1161 | activation=leaky
1162 | 
1163 | [route]
1164 | layers = -1, -49
1165 | 
1166 | [convolutional]
1167 | batch_normalize=1
1168 | filters=512
1169 | size=1
1170 | stride=1
1171 | pad=1
1172 | activation=leaky
1173 | 
1174 | [convolutional]
1175 | batch_normalize=1
1176 | filters=512
1177 | size=1
1178 | stride=1
1179 | pad=1
1180 | activation=leaky
1181 | 
1182 | [route]
1183 | layers = -2
1184 | 
1185 | [convolutional]
1186 | batch_normalize=1
1187 | filters=512
1188 | size=1
1189 | stride=1
1190 | pad=1
1191 | activation=leaky
1192 | 
1193 | [convolutional]
1194 | batch_normalize=1
1195 | size=3
1196 | stride=1
1197 | pad=1
1198 | filters=512
1199 | activation=leaky
1200 | 
1201 | [convolutional]
1202 | batch_normalize=1
1203 | filters=512
1204 | size=1
1205 | stride=1
1206 | pad=1
1207 | activation=leaky
1208 | 
1209 | [convolutional]
1210 | batch_normalize=1
1211 | size=3
1212 | stride=1
1213 | pad=1
1214 | filters=512
1215 | activation=leaky
1216 | 
1217 | [route]
1218 | layers = -1,-6
1219 | 
1220 | [convolutional]
1221 | batch_normalize=1
1222 | filters=512
1223 | size=1
1224 | stride=1
1225 | pad=1
1226 | activation=leaky
1227 | 
1228 | [convolutional]
1229 | batch_normalize=1
1230 | size=3
1231 | stride=1
1232 | pad=1
1233 | filters=1024
1234 | activation=leaky
1235 | 
1236 | [convolutional]
1237 | size=1
1238 | stride=1
1239 | pad=1
1240 | filters=255
1241 | activation=linear
1242 | 
1243 | 
1244 | [yolo]
1245 | mask = 6,7,8
1246 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1247 | classes=80
1248 | num=9
1249 | jitter=.3
1250 | ignore_thresh = .7
1251 | truth_thresh = 1
1252 | random=1
1253 | scale_x_y = 1.05
1254 | iou_thresh=0.213
1255 | cls_normalizer=1.0
1256 | iou_normalizer=0.07
1257 | iou_loss=ciou
1258 | nms_kind=greedynms
1259 | beta_nms=0.6
1260 | 


--------------------------------------------------------------------------------
/cfg/yolov4-paspp.cfg:
--------------------------------------------------------------------------------
   1 | [net]
   2 | # Testing
   3 | #batch=1
   4 | #subdivisions=1
   5 | # Training
   6 | batch=64
   7 | subdivisions=16
   8 | width=512
   9 | height=512
  10 | channels=3
  11 | momentum=0.949
  12 | decay=0.0005
  13 | angle=0
  14 | saturation = 1.5
  15 | exposure = 1.5
  16 | hue=.1
  17 | 
  18 | learning_rate=0.0013
  19 | burn_in=1000
  20 | max_batches = 500500
  21 | policy=steps
  22 | steps=400000,450000
  23 | scales=.1,.1
  24 | 
  25 | #cutmix=1
  26 | mosaic=1
  27 | 
  28 | #:104x104 54:52x52 85:26x26 104:13x13 for 416
  29 | 
  30 | [convolutional]
  31 | batch_normalize=1
  32 | filters=32
  33 | size=3
  34 | stride=1
  35 | pad=1
  36 | activation=leaky
  37 | 
  38 | # Downsample
  39 | 
  40 | [convolutional]
  41 | batch_normalize=1
  42 | filters=64
  43 | size=3
  44 | stride=2
  45 | pad=1
  46 | activation=leaky
  47 | 
  48 | [convolutional]
  49 | batch_normalize=1
  50 | filters=64
  51 | size=1
  52 | stride=1
  53 | pad=1
  54 | activation=leaky
  55 | 
  56 | [route]
  57 | layers = -2
  58 | 
  59 | [convolutional]
  60 | batch_normalize=1
  61 | filters=64
  62 | size=1
  63 | stride=1
  64 | pad=1
  65 | activation=leaky
  66 | 
  67 | [convolutional]
  68 | batch_normalize=1
  69 | filters=32
  70 | size=1
  71 | stride=1
  72 | pad=1
  73 | activation=leaky
  74 | 
  75 | [convolutional]
  76 | batch_normalize=1
  77 | filters=64
  78 | size=3
  79 | stride=1
  80 | pad=1
  81 | activation=leaky
  82 | 
  83 | [shortcut]
  84 | from=-3
  85 | activation=linear
  86 | 
  87 | [convolutional]
  88 | batch_normalize=1
  89 | filters=64
  90 | size=1
  91 | stride=1
  92 | pad=1
  93 | activation=leaky
  94 | 
  95 | [route]
  96 | layers = -1,-7
  97 | 
  98 | [convolutional]
  99 | batch_normalize=1
 100 | filters=64
 101 | size=1
 102 | stride=1
 103 | pad=1
 104 | activation=leaky
 105 | 
 106 | # Downsample
 107 | 
 108 | [convolutional]
 109 | batch_normalize=1
 110 | filters=128
 111 | size=3
 112 | stride=2
 113 | pad=1
 114 | activation=leaky
 115 | 
 116 | [convolutional]
 117 | batch_normalize=1
 118 | filters=64
 119 | size=1
 120 | stride=1
 121 | pad=1
 122 | activation=leaky
 123 | 
 124 | [route]
 125 | layers = -2
 126 | 
 127 | [convolutional]
 128 | batch_normalize=1
 129 | filters=64
 130 | size=1
 131 | stride=1
 132 | pad=1
 133 | activation=leaky
 134 | 
 135 | [convolutional]
 136 | batch_normalize=1
 137 | filters=64
 138 | size=1
 139 | stride=1
 140 | pad=1
 141 | activation=leaky
 142 | 
 143 | [convolutional]
 144 | batch_normalize=1
 145 | filters=64
 146 | size=3
 147 | stride=1
 148 | pad=1
 149 | activation=leaky
 150 | 
 151 | [shortcut]
 152 | from=-3
 153 | activation=linear
 154 | 
 155 | [convolutional]
 156 | batch_normalize=1
 157 | filters=64
 158 | size=1
 159 | stride=1
 160 | pad=1
 161 | activation=leaky
 162 | 
 163 | [convolutional]
 164 | batch_normalize=1
 165 | filters=64
 166 | size=3
 167 | stride=1
 168 | pad=1
 169 | activation=leaky
 170 | 
 171 | [shortcut]
 172 | from=-3
 173 | activation=linear
 174 | 
 175 | [convolutional]
 176 | batch_normalize=1
 177 | filters=64
 178 | size=1
 179 | stride=1
 180 | pad=1
 181 | activation=leaky
 182 | 
 183 | [route]
 184 | layers = -1,-10
 185 | 
 186 | [convolutional]
 187 | batch_normalize=1
 188 | filters=128
 189 | size=1
 190 | stride=1
 191 | pad=1
 192 | activation=leaky
 193 | 
 194 | # Downsample
 195 | 
 196 | [convolutional]
 197 | batch_normalize=1
 198 | filters=256
 199 | size=3
 200 | stride=2
 201 | pad=1
 202 | activation=leaky
 203 | 
 204 | [convolutional]
 205 | batch_normalize=1
 206 | filters=128
 207 | size=1
 208 | stride=1
 209 | pad=1
 210 | activation=leaky
 211 | 
 212 | [route]
 213 | layers = -2
 214 | 
 215 | [convolutional]
 216 | batch_normalize=1
 217 | filters=128
 218 | size=1
 219 | stride=1
 220 | pad=1
 221 | activation=leaky
 222 | 
 223 | [convolutional]
 224 | batch_normalize=1
 225 | filters=128
 226 | size=1
 227 | stride=1
 228 | pad=1
 229 | activation=leaky
 230 | 
 231 | [convolutional]
 232 | batch_normalize=1
 233 | filters=128
 234 | size=3
 235 | stride=1
 236 | pad=1
 237 | activation=leaky
 238 | 
 239 | [shortcut]
 240 | from=-3
 241 | activation=linear
 242 | 
 243 | [convolutional]
 244 | batch_normalize=1
 245 | filters=128
 246 | size=1
 247 | stride=1
 248 | pad=1
 249 | activation=leaky
 250 | 
 251 | [convolutional]
 252 | batch_normalize=1
 253 | filters=128
 254 | size=3
 255 | stride=1
 256 | pad=1
 257 | activation=leaky
 258 | 
 259 | [shortcut]
 260 | from=-3
 261 | activation=linear
 262 | 
 263 | [convolutional]
 264 | batch_normalize=1
 265 | filters=128
 266 | size=1
 267 | stride=1
 268 | pad=1
 269 | activation=leaky
 270 | 
 271 | [convolutional]
 272 | batch_normalize=1
 273 | filters=128
 274 | size=3
 275 | stride=1
 276 | pad=1
 277 | activation=leaky
 278 | 
 279 | [shortcut]
 280 | from=-3
 281 | activation=linear
 282 | 
 283 | [convolutional]
 284 | batch_normalize=1
 285 | filters=128
 286 | size=1
 287 | stride=1
 288 | pad=1
 289 | activation=leaky
 290 | 
 291 | [convolutional]
 292 | batch_normalize=1
 293 | filters=128
 294 | size=3
 295 | stride=1
 296 | pad=1
 297 | activation=leaky
 298 | 
 299 | [shortcut]
 300 | from=-3
 301 | activation=linear
 302 | 
 303 | 
 304 | [convolutional]
 305 | batch_normalize=1
 306 | filters=128
 307 | size=1
 308 | stride=1
 309 | pad=1
 310 | activation=leaky
 311 | 
 312 | [convolutional]
 313 | batch_normalize=1
 314 | filters=128
 315 | size=3
 316 | stride=1
 317 | pad=1
 318 | activation=leaky
 319 | 
 320 | [shortcut]
 321 | from=-3
 322 | activation=linear
 323 | 
 324 | [convolutional]
 325 | batch_normalize=1
 326 | filters=128
 327 | size=1
 328 | stride=1
 329 | pad=1
 330 | activation=leaky
 331 | 
 332 | [convolutional]
 333 | batch_normalize=1
 334 | filters=128
 335 | size=3
 336 | stride=1
 337 | pad=1
 338 | activation=leaky
 339 | 
 340 | [shortcut]
 341 | from=-3
 342 | activation=linear
 343 | 
 344 | [convolutional]
 345 | batch_normalize=1
 346 | filters=128
 347 | size=1
 348 | stride=1
 349 | pad=1
 350 | activation=leaky
 351 | 
 352 | [convolutional]
 353 | batch_normalize=1
 354 | filters=128
 355 | size=3
 356 | stride=1
 357 | pad=1
 358 | activation=leaky
 359 | 
 360 | [shortcut]
 361 | from=-3
 362 | activation=linear
 363 | 
 364 | [convolutional]
 365 | batch_normalize=1
 366 | filters=128
 367 | size=1
 368 | stride=1
 369 | pad=1
 370 | activation=leaky
 371 | 
 372 | [convolutional]
 373 | batch_normalize=1
 374 | filters=128
 375 | size=3
 376 | stride=1
 377 | pad=1
 378 | activation=leaky
 379 | 
 380 | [shortcut]
 381 | from=-3
 382 | activation=linear
 383 | 
 384 | [convolutional]
 385 | batch_normalize=1
 386 | filters=128
 387 | size=1
 388 | stride=1
 389 | pad=1
 390 | activation=leaky
 391 | 
 392 | [route]
 393 | layers = -1,-28
 394 | 
 395 | [convolutional]
 396 | batch_normalize=1
 397 | filters=256
 398 | size=1
 399 | stride=1
 400 | pad=1
 401 | activation=leaky
 402 | 
 403 | # Downsample
 404 | 
 405 | [convolutional]
 406 | batch_normalize=1
 407 | filters=512
 408 | size=3
 409 | stride=2
 410 | pad=1
 411 | activation=leaky
 412 | 
 413 | [convolutional]
 414 | batch_normalize=1
 415 | filters=256
 416 | size=1
 417 | stride=1
 418 | pad=1
 419 | activation=leaky
 420 | 
 421 | [route]
 422 | layers = -2
 423 | 
 424 | [convolutional]
 425 | batch_normalize=1
 426 | filters=256
 427 | size=1
 428 | stride=1
 429 | pad=1
 430 | activation=leaky
 431 | 
 432 | [convolutional]
 433 | batch_normalize=1
 434 | filters=256
 435 | size=1
 436 | stride=1
 437 | pad=1
 438 | activation=leaky
 439 | 
 440 | [convolutional]
 441 | batch_normalize=1
 442 | filters=256
 443 | size=3
 444 | stride=1
 445 | pad=1
 446 | activation=leaky
 447 | 
 448 | [shortcut]
 449 | from=-3
 450 | activation=linear
 451 | 
 452 | 
 453 | [convolutional]
 454 | batch_normalize=1
 455 | filters=256
 456 | size=1
 457 | stride=1
 458 | pad=1
 459 | activation=leaky
 460 | 
 461 | [convolutional]
 462 | batch_normalize=1
 463 | filters=256
 464 | size=3
 465 | stride=1
 466 | pad=1
 467 | activation=leaky
 468 | 
 469 | [shortcut]
 470 | from=-3
 471 | activation=linear
 472 | 
 473 | 
 474 | [convolutional]
 475 | batch_normalize=1
 476 | filters=256
 477 | size=1
 478 | stride=1
 479 | pad=1
 480 | activation=leaky
 481 | 
 482 | [convolutional]
 483 | batch_normalize=1
 484 | filters=256
 485 | size=3
 486 | stride=1
 487 | pad=1
 488 | activation=leaky
 489 | 
 490 | [shortcut]
 491 | from=-3
 492 | activation=linear
 493 | 
 494 | 
 495 | [convolutional]
 496 | batch_normalize=1
 497 | filters=256
 498 | size=1
 499 | stride=1
 500 | pad=1
 501 | activation=leaky
 502 | 
 503 | [convolutional]
 504 | batch_normalize=1
 505 | filters=256
 506 | size=3
 507 | stride=1
 508 | pad=1
 509 | activation=leaky
 510 | 
 511 | [shortcut]
 512 | from=-3
 513 | activation=linear
 514 | 
 515 | 
 516 | [convolutional]
 517 | batch_normalize=1
 518 | filters=256
 519 | size=1
 520 | stride=1
 521 | pad=1
 522 | activation=leaky
 523 | 
 524 | [convolutional]
 525 | batch_normalize=1
 526 | filters=256
 527 | size=3
 528 | stride=1
 529 | pad=1
 530 | activation=leaky
 531 | 
 532 | [shortcut]
 533 | from=-3
 534 | activation=linear
 535 | 
 536 | 
 537 | [convolutional]
 538 | batch_normalize=1
 539 | filters=256
 540 | size=1
 541 | stride=1
 542 | pad=1
 543 | activation=leaky
 544 | 
 545 | [convolutional]
 546 | batch_normalize=1
 547 | filters=256
 548 | size=3
 549 | stride=1
 550 | pad=1
 551 | activation=leaky
 552 | 
 553 | [shortcut]
 554 | from=-3
 555 | activation=linear
 556 | 
 557 | 
 558 | [convolutional]
 559 | batch_normalize=1
 560 | filters=256
 561 | size=1
 562 | stride=1
 563 | pad=1
 564 | activation=leaky
 565 | 
 566 | [convolutional]
 567 | batch_normalize=1
 568 | filters=256
 569 | size=3
 570 | stride=1
 571 | pad=1
 572 | activation=leaky
 573 | 
 574 | [shortcut]
 575 | from=-3
 576 | activation=linear
 577 | 
 578 | [convolutional]
 579 | batch_normalize=1
 580 | filters=256
 581 | size=1
 582 | stride=1
 583 | pad=1
 584 | activation=leaky
 585 | 
 586 | [convolutional]
 587 | batch_normalize=1
 588 | filters=256
 589 | size=3
 590 | stride=1
 591 | pad=1
 592 | activation=leaky
 593 | 
 594 | [shortcut]
 595 | from=-3
 596 | activation=linear
 597 | 
 598 | [convolutional]
 599 | batch_normalize=1
 600 | filters=256
 601 | size=1
 602 | stride=1
 603 | pad=1
 604 | activation=leaky
 605 | 
 606 | [route]
 607 | layers = -1,-28
 608 | 
 609 | [convolutional]
 610 | batch_normalize=1
 611 | filters=512
 612 | size=1
 613 | stride=1
 614 | pad=1
 615 | activation=leaky
 616 | 
 617 | # Downsample
 618 | 
 619 | [convolutional]
 620 | batch_normalize=1
 621 | filters=1024
 622 | size=3
 623 | stride=2
 624 | pad=1
 625 | activation=leaky
 626 | 
 627 | [convolutional]
 628 | batch_normalize=1
 629 | filters=512
 630 | size=1
 631 | stride=1
 632 | pad=1
 633 | activation=leaky
 634 | 
 635 | [route]
 636 | layers = -2
 637 | 
 638 | [convolutional]
 639 | batch_normalize=1
 640 | filters=512
 641 | size=1
 642 | stride=1
 643 | pad=1
 644 | activation=leaky
 645 | 
 646 | [convolutional]
 647 | batch_normalize=1
 648 | filters=512
 649 | size=1
 650 | stride=1
 651 | pad=1
 652 | activation=leaky
 653 | 
 654 | [convolutional]
 655 | batch_normalize=1
 656 | filters=512
 657 | size=3
 658 | stride=1
 659 | pad=1
 660 | activation=leaky
 661 | 
 662 | [shortcut]
 663 | from=-3
 664 | activation=linear
 665 | 
 666 | [convolutional]
 667 | batch_normalize=1
 668 | filters=512
 669 | size=1
 670 | stride=1
 671 | pad=1
 672 | activation=leaky
 673 | 
 674 | [convolutional]
 675 | batch_normalize=1
 676 | filters=512
 677 | size=3
 678 | stride=1
 679 | pad=1
 680 | activation=leaky
 681 | 
 682 | [shortcut]
 683 | from=-3
 684 | activation=linear
 685 | 
 686 | [convolutional]
 687 | batch_normalize=1
 688 | filters=512
 689 | size=1
 690 | stride=1
 691 | pad=1
 692 | activation=leaky
 693 | 
 694 | [convolutional]
 695 | batch_normalize=1
 696 | filters=512
 697 | size=3
 698 | stride=1
 699 | pad=1
 700 | activation=leaky
 701 | 
 702 | [shortcut]
 703 | from=-3
 704 | activation=linear
 705 | 
 706 | [convolutional]
 707 | batch_normalize=1
 708 | filters=512
 709 | size=1
 710 | stride=1
 711 | pad=1
 712 | activation=leaky
 713 | 
 714 | [convolutional]
 715 | batch_normalize=1
 716 | filters=512
 717 | size=3
 718 | stride=1
 719 | pad=1
 720 | activation=leaky
 721 | 
 722 | [shortcut]
 723 | from=-3
 724 | activation=linear
 725 | 
 726 | [convolutional]
 727 | batch_normalize=1
 728 | filters=512
 729 | size=1
 730 | stride=1
 731 | pad=1
 732 | activation=leaky
 733 | 
 734 | [route]
 735 | layers = -1,-16
 736 | 
 737 | [convolutional]
 738 | batch_normalize=1
 739 | filters=1024
 740 | size=1
 741 | stride=1
 742 | pad=1
 743 | activation=leaky
 744 | 
 745 | ##########################
 746 | 
 747 | [convolutional]
 748 | batch_normalize=1
 749 | filters=512
 750 | size=1
 751 | stride=1
 752 | pad=1
 753 | activation=leaky
 754 | 
 755 | [convolutional]
 756 | batch_normalize=1
 757 | size=3
 758 | stride=1
 759 | pad=1
 760 | filters=1024
 761 | activation=leaky
 762 | 
 763 | [convolutional]
 764 | batch_normalize=1
 765 | filters=512
 766 | size=1
 767 | stride=1
 768 | pad=1
 769 | activation=leaky
 770 | 
 771 | ### SPP ###
 772 | [maxpool]
 773 | stride=1
 774 | size=5
 775 | 
 776 | [route]
 777 | layers=-2
 778 | 
 779 | [maxpool]
 780 | stride=1
 781 | size=9
 782 | 
 783 | [route]
 784 | layers=-4
 785 | 
 786 | [maxpool]
 787 | stride=1
 788 | size=13
 789 | 
 790 | [route]
 791 | layers=-1,-3,-5,-6
 792 | ### End SPP ###
 793 | 
 794 | [convolutional]
 795 | batch_normalize=1
 796 | filters=512
 797 | size=1
 798 | stride=1
 799 | pad=1
 800 | activation=leaky
 801 | 
 802 | [convolutional]
 803 | batch_normalize=1
 804 | size=3
 805 | stride=1
 806 | pad=1
 807 | filters=1024
 808 | activation=leaky
 809 | 
 810 | [convolutional]
 811 | batch_normalize=1
 812 | filters=512
 813 | size=1
 814 | stride=1
 815 | pad=1
 816 | activation=leaky
 817 | 
 818 | [convolutional]
 819 | batch_normalize=1
 820 | filters=256
 821 | size=1
 822 | stride=1
 823 | pad=1
 824 | activation=leaky
 825 | 
 826 | [upsample]
 827 | stride=2
 828 | 
 829 | [route]
 830 | layers = 85
 831 | 
 832 | [convolutional]
 833 | batch_normalize=1
 834 | filters=256
 835 | size=1
 836 | stride=1
 837 | pad=1
 838 | activation=leaky
 839 | 
 840 | [route]
 841 | layers = -1, -3
 842 | 
 843 | [convolutional]
 844 | batch_normalize=1
 845 | filters=256
 846 | size=1
 847 | stride=1
 848 | pad=1
 849 | activation=leaky
 850 | 
 851 | [convolutional]
 852 | batch_normalize=1
 853 | size=3
 854 | stride=1
 855 | pad=1
 856 | filters=512
 857 | activation=leaky
 858 | 
 859 | [convolutional]
 860 | batch_normalize=1
 861 | filters=256
 862 | size=1
 863 | stride=1
 864 | pad=1
 865 | activation=leaky
 866 | 
 867 | [convolutional]
 868 | batch_normalize=1
 869 | size=3
 870 | stride=1
 871 | pad=1
 872 | filters=512
 873 | activation=leaky
 874 | 
 875 | [convolutional]
 876 | batch_normalize=1
 877 | filters=256
 878 | size=1
 879 | stride=1
 880 | pad=1
 881 | activation=leaky
 882 | 
 883 | [convolutional]
 884 | batch_normalize=1
 885 | filters=128
 886 | size=1
 887 | stride=1
 888 | pad=1
 889 | activation=leaky
 890 | 
 891 | [upsample]
 892 | stride=2
 893 | 
 894 | [route]
 895 | layers = 54
 896 | 
 897 | [convolutional]
 898 | batch_normalize=1
 899 | filters=128
 900 | size=1
 901 | stride=1
 902 | pad=1
 903 | activation=leaky
 904 | 
 905 | [route]
 906 | layers = -1, -3
 907 | 
 908 | [convolutional]
 909 | batch_normalize=1
 910 | filters=128
 911 | size=1
 912 | stride=1
 913 | pad=1
 914 | activation=leaky
 915 | 
 916 | [convolutional]
 917 | batch_normalize=1
 918 | size=3
 919 | stride=1
 920 | pad=1
 921 | filters=256
 922 | activation=leaky
 923 | 
 924 | [convolutional]
 925 | batch_normalize=1
 926 | filters=128
 927 | size=1
 928 | stride=1
 929 | pad=1
 930 | activation=leaky
 931 | 
 932 | [convolutional]
 933 | batch_normalize=1
 934 | size=3
 935 | stride=1
 936 | pad=1
 937 | filters=256
 938 | activation=leaky
 939 | 
 940 | [convolutional]
 941 | batch_normalize=1
 942 | filters=128
 943 | size=1
 944 | stride=1
 945 | pad=1
 946 | activation=leaky
 947 | 
 948 | ##########################
 949 | 
 950 | [convolutional]
 951 | batch_normalize=1
 952 | size=3
 953 | stride=1
 954 | pad=1
 955 | filters=256
 956 | activation=leaky
 957 | 
 958 | [convolutional]
 959 | size=1
 960 | stride=1
 961 | pad=1
 962 | filters=255
 963 | activation=linear
 964 | 
 965 | 
 966 | [yolo]
 967 | mask = 0,1,2
 968 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
 969 | classes=80
 970 | num=9
 971 | jitter=.3
 972 | ignore_thresh = .7
 973 | truth_thresh = 1
 974 | scale_x_y = 1.2
 975 | iou_thresh=0.213
 976 | cls_normalizer=1.0
 977 | iou_normalizer=0.07
 978 | iou_loss=ciou
 979 | nms_kind=greedynms
 980 | beta_nms=0.6
 981 | 
 982 | 
 983 | [route]
 984 | layers = -4
 985 | 
 986 | [convolutional]
 987 | batch_normalize=1
 988 | size=3
 989 | stride=2
 990 | pad=1
 991 | filters=256
 992 | activation=leaky
 993 | 
 994 | [route]
 995 | layers = -1, -16
 996 | 
 997 | [convolutional]
 998 | batch_normalize=1
 999 | filters=256
1000 | size=1
1001 | stride=1
1002 | pad=1
1003 | activation=leaky
1004 | 
1005 | [convolutional]
1006 | batch_normalize=1
1007 | size=3
1008 | stride=1
1009 | pad=1
1010 | filters=512
1011 | activation=leaky
1012 | 
1013 | [convolutional]
1014 | batch_normalize=1
1015 | filters=256
1016 | size=1
1017 | stride=1
1018 | pad=1
1019 | activation=leaky
1020 | 
1021 | [convolutional]
1022 | batch_normalize=1
1023 | size=3
1024 | stride=1
1025 | pad=1
1026 | filters=512
1027 | activation=leaky
1028 | 
1029 | [convolutional]
1030 | batch_normalize=1
1031 | filters=256
1032 | size=1
1033 | stride=1
1034 | pad=1
1035 | activation=leaky
1036 | 
1037 | [convolutional]
1038 | batch_normalize=1
1039 | size=3
1040 | stride=1
1041 | pad=1
1042 | filters=512
1043 | activation=leaky
1044 | 
1045 | [convolutional]
1046 | size=1
1047 | stride=1
1048 | pad=1
1049 | filters=255
1050 | activation=linear
1051 | 
1052 | 
1053 | [yolo]
1054 | mask = 3,4,5
1055 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1056 | classes=80
1057 | num=9
1058 | jitter=.3
1059 | ignore_thresh = .7
1060 | truth_thresh = 1
1061 | scale_x_y = 1.1
1062 | iou_thresh=0.213
1063 | cls_normalizer=1.0
1064 | iou_normalizer=0.07
1065 | iou_loss=ciou
1066 | nms_kind=greedynms
1067 | beta_nms=0.6
1068 | 
1069 | 
1070 | [route]
1071 | layers = -4
1072 | 
1073 | [convolutional]
1074 | batch_normalize=1
1075 | size=3
1076 | stride=2
1077 | pad=1
1078 | filters=512
1079 | activation=leaky
1080 | 
1081 | [route]
1082 | layers = -1, -37
1083 | 
1084 | [convolutional]
1085 | batch_normalize=1
1086 | filters=512
1087 | size=1
1088 | stride=1
1089 | pad=1
1090 | activation=leaky
1091 | 
1092 | [convolutional]
1093 | batch_normalize=1
1094 | size=3
1095 | stride=1
1096 | pad=1
1097 | filters=1024
1098 | activation=leaky
1099 | 
1100 | [convolutional]
1101 | batch_normalize=1
1102 | filters=512
1103 | size=1
1104 | stride=1
1105 | pad=1
1106 | activation=leaky
1107 | 
1108 | [convolutional]
1109 | batch_normalize=1
1110 | size=3
1111 | stride=1
1112 | pad=1
1113 | filters=1024
1114 | activation=leaky
1115 | 
1116 | [convolutional]
1117 | batch_normalize=1
1118 | filters=512
1119 | size=1
1120 | stride=1
1121 | pad=1
1122 | activation=leaky
1123 | 
1124 | [convolutional]
1125 | batch_normalize=1
1126 | size=3
1127 | stride=1
1128 | pad=1
1129 | filters=1024
1130 | activation=leaky
1131 | 
1132 | [convolutional]
1133 | size=1
1134 | stride=1
1135 | pad=1
1136 | filters=255
1137 | activation=linear
1138 | 
1139 | 
1140 | [yolo]
1141 | mask = 6,7,8
1142 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1143 | classes=80
1144 | num=9
1145 | jitter=.3
1146 | ignore_thresh = .7
1147 | truth_thresh = 1
1148 | random=1
1149 | scale_x_y = 1.05
1150 | iou_thresh=0.213
1151 | cls_normalizer=1.0
1152 | iou_normalizer=0.07
1153 | iou_loss=ciou
1154 | nms_kind=greedynms
1155 | beta_nms=0.6
1156 | 


--------------------------------------------------------------------------------
/cfg/yolov4-tiny.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | #batch=1
  4 | #subdivisions=1
  5 | # Training
  6 | batch=64
  7 | subdivisions=1
  8 | width=416
  9 | height=416
 10 | channels=3
 11 | momentum=0.9
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.00261
 19 | burn_in=1000
 20 | max_batches = 500200
 21 | policy=steps
 22 | steps=400000,450000
 23 | scales=.1,.1
 24 | 
 25 | [convolutional]
 26 | batch_normalize=1
 27 | filters=32
 28 | size=3
 29 | stride=2
 30 | pad=1
 31 | activation=leaky
 32 | 
 33 | [convolutional]
 34 | batch_normalize=1
 35 | filters=64
 36 | size=3
 37 | stride=2
 38 | pad=1
 39 | activation=leaky
 40 | 
 41 | [convolutional]
 42 | batch_normalize=1
 43 | filters=64
 44 | size=3
 45 | stride=1
 46 | pad=1
 47 | activation=leaky
 48 | 
 49 | [route_lhalf]
 50 | layers=-1
 51 | 
 52 | [convolutional]
 53 | batch_normalize=1
 54 | filters=32
 55 | size=3
 56 | stride=1
 57 | pad=1
 58 | activation=leaky
 59 | 
 60 | [convolutional]
 61 | batch_normalize=1
 62 | filters=32
 63 | size=3
 64 | stride=1
 65 | pad=1
 66 | activation=leaky
 67 | 
 68 | [route]
 69 | layers = -1,-2
 70 | 
 71 | [convolutional]
 72 | batch_normalize=1
 73 | filters=64
 74 | size=1
 75 | stride=1
 76 | pad=1
 77 | activation=leaky
 78 | 
 79 | [route]
 80 | layers = -6,-1
 81 | 
 82 | [maxpool]
 83 | size=2
 84 | stride=2
 85 | 
 86 | [convolutional]
 87 | batch_normalize=1
 88 | filters=128
 89 | size=3
 90 | stride=1
 91 | pad=1
 92 | activation=leaky
 93 | 
 94 | [route_lhalf]
 95 | layers=-1
 96 | 
 97 | [convolutional]
 98 | batch_normalize=1
 99 | filters=64
100 | size=3
101 | stride=1
102 | pad=1
103 | activation=leaky
104 | 
105 | [convolutional]
106 | batch_normalize=1
107 | filters=64
108 | size=3
109 | stride=1
110 | pad=1
111 | activation=leaky
112 | 
113 | [route]
114 | layers = -1,-2
115 | 
116 | [convolutional]
117 | batch_normalize=1
118 | filters=128
119 | size=1
120 | stride=1
121 | pad=1
122 | activation=leaky
123 | 
124 | [route]
125 | layers = -6,-1
126 | 
127 | [maxpool]
128 | size=2
129 | stride=2
130 | 
131 | [convolutional]
132 | batch_normalize=1
133 | filters=256
134 | size=3
135 | stride=1
136 | pad=1
137 | activation=leaky
138 | 
139 | [route_lhalf]
140 | layers=-1
141 | 
142 | [convolutional]
143 | batch_normalize=1
144 | filters=128
145 | size=3
146 | stride=1
147 | pad=1
148 | activation=leaky
149 | 
150 | [convolutional]
151 | batch_normalize=1
152 | filters=128
153 | size=3
154 | stride=1
155 | pad=1
156 | activation=leaky
157 | 
158 | [route]
159 | layers = -1,-2
160 | 
161 | [convolutional]
162 | batch_normalize=1
163 | filters=256
164 | size=1
165 | stride=1
166 | pad=1
167 | activation=leaky
168 | 
169 | [route]
170 | layers = -6,-1
171 | 
172 | [maxpool]
173 | size=2
174 | stride=2
175 | 
176 | [convolutional]
177 | batch_normalize=1
178 | filters=512
179 | size=3
180 | stride=1
181 | pad=1
182 | activation=leaky
183 | 
184 | ##################################
185 | 
186 | [convolutional]
187 | batch_normalize=1
188 | filters=256
189 | size=1
190 | stride=1
191 | pad=1
192 | activation=leaky
193 | 
194 | [convolutional]
195 | batch_normalize=1
196 | filters=512
197 | size=3
198 | stride=1
199 | pad=1
200 | activation=leaky
201 | 
202 | [convolutional]
203 | size=1
204 | stride=1
205 | pad=1
206 | filters=255
207 | activation=linear
208 | 
209 | 
210 | 
211 | [yolo]
212 | mask = 3,4,5
213 | anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
214 | classes=80
215 | num=6
216 | jitter=.3
217 | scale_x_y = 1.05
218 | cls_normalizer=1.0
219 | iou_normalizer=0.07
220 | iou_loss=ciou
221 | ignore_thresh = .7
222 | truth_thresh = 1
223 | random=0
224 | nms_kind=greedynms
225 | beta_nms=0.6
226 | 
227 | [route]
228 | layers = -4
229 | 
230 | [convolutional]
231 | batch_normalize=1
232 | filters=128
233 | size=1
234 | stride=1
235 | pad=1
236 | activation=leaky
237 | 
238 | [upsample]
239 | stride=2
240 | 
241 | [route]
242 | layers = -1, 23
243 | 
244 | [convolutional]
245 | batch_normalize=1
246 | filters=256
247 | size=3
248 | stride=1
249 | pad=1
250 | activation=leaky
251 | 
252 | [convolutional]
253 | size=1
254 | stride=1
255 | pad=1
256 | filters=255
257 | activation=linear
258 | 
259 | [yolo]
260 | mask = 1,2,3
261 | anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
262 | classes=80
263 | num=6
264 | jitter=.3
265 | scale_x_y = 1.05
266 | cls_normalizer=1.0
267 | iou_normalizer=0.07
268 | iou_loss=ciou
269 | ignore_thresh = .7
270 | truth_thresh = 1
271 | random=0
272 | nms_kind=greedynms
273 | beta_nms=0.6
274 | 


--------------------------------------------------------------------------------
/data/coco.data:
--------------------------------------------------------------------------------
1 | classes=80
2 | train=../coco/train2017.txt
3 | valid=../coco/testdev2017.txt
4 | names=data/coco.names
5 | 


--------------------------------------------------------------------------------
/data/coco.names:
--------------------------------------------------------------------------------
 1 | person
 2 | bicycle
 3 | car
 4 | motorcycle
 5 | airplane
 6 | bus
 7 | train
 8 | truck
 9 | boat
10 | traffic light
11 | fire hydrant
12 | stop sign
13 | parking meter
14 | bench
15 | bird
16 | cat
17 | dog
18 | horse
19 | sheep
20 | cow
21 | elephant
22 | bear
23 | zebra
24 | giraffe
25 | backpack
26 | umbrella
27 | handbag
28 | tie
29 | suitcase
30 | frisbee
31 | skis
32 | snowboard
33 | sports ball
34 | kite
35 | baseball bat
36 | baseball glove
37 | skateboard
38 | surfboard
39 | tennis racket
40 | bottle
41 | wine glass
42 | cup
43 | fork
44 | knife
45 | spoon
46 | bowl
47 | banana
48 | apple
49 | sandwich
50 | orange
51 | broccoli
52 | carrot
53 | hot dog
54 | pizza
55 | donut
56 | cake
57 | chair
58 | couch
59 | potted plant
60 | bed
61 | dining table
62 | toilet
63 | tv
64 | laptop
65 | mouse
66 | remote
67 | keyboard
68 | cell phone
69 | microwave
70 | oven
71 | toaster
72 | sink
73 | refrigerator
74 | book
75 | clock
76 | vase
77 | scissors
78 | teddy bear
79 | hair drier
80 | toothbrush
81 | 


--------------------------------------------------------------------------------
/data/coco1.data:
--------------------------------------------------------------------------------
1 | classes=80
2 | train=data/coco1.txt
3 | valid=data/coco1.txt
4 | names=data/coco.names
5 | 


--------------------------------------------------------------------------------
/data/coco1.txt:
--------------------------------------------------------------------------------
1 | ../coco/images/train2017/000000109622.jpg
2 | 


--------------------------------------------------------------------------------
/data/coco16.data:
--------------------------------------------------------------------------------
1 | classes=80
2 | train=data/coco16.txt
3 | valid=data/coco16.txt
4 | names=data/coco.names
5 | 


--------------------------------------------------------------------------------
/data/coco16.txt:
--------------------------------------------------------------------------------
 1 | ../coco/images/train2017/000000109622.jpg
 2 | ../coco/images/train2017/000000160694.jpg
 3 | ../coco/images/train2017/000000308590.jpg
 4 | ../coco/images/train2017/000000327573.jpg
 5 | ../coco/images/train2017/000000062929.jpg
 6 | ../coco/images/train2017/000000512793.jpg
 7 | ../coco/images/train2017/000000371735.jpg
 8 | ../coco/images/train2017/000000148118.jpg
 9 | ../coco/images/train2017/000000309856.jpg
10 | ../coco/images/train2017/000000141882.jpg
11 | ../coco/images/train2017/000000318783.jpg
12 | ../coco/images/train2017/000000337760.jpg
13 | ../coco/images/train2017/000000298197.jpg
14 | ../coco/images/train2017/000000042421.jpg
15 | ../coco/images/train2017/000000328898.jpg
16 | ../coco/images/train2017/000000458856.jpg
17 | 


--------------------------------------------------------------------------------
/data/coco1cls.data:
--------------------------------------------------------------------------------
1 | classes=1
2 | train=data/coco1cls.txt
3 | valid=data/coco1cls.txt
4 | names=data/coco.names
5 | 


--------------------------------------------------------------------------------
/data/coco1cls.txt:
--------------------------------------------------------------------------------
 1 | ../coco/images/train2017/000000000901.jpg
 2 | ../coco/images/train2017/000000001464.jpg
 3 | ../coco/images/train2017/000000003220.jpg
 4 | ../coco/images/train2017/000000003365.jpg
 5 | ../coco/images/train2017/000000004772.jpg
 6 | ../coco/images/train2017/000000009987.jpg
 7 | ../coco/images/train2017/000000010498.jpg
 8 | ../coco/images/train2017/000000012455.jpg
 9 | ../coco/images/train2017/000000013992.jpg
10 | ../coco/images/train2017/000000014125.jpg
11 | ../coco/images/train2017/000000016314.jpg
12 | ../coco/images/train2017/000000016670.jpg
13 | ../coco/images/train2017/000000018412.jpg
14 | ../coco/images/train2017/000000021212.jpg
15 | ../coco/images/train2017/000000021826.jpg
16 | ../coco/images/train2017/000000030566.jpg
17 | 


--------------------------------------------------------------------------------
/data/coco2014.data:
--------------------------------------------------------------------------------
1 | classes=80
2 | train=../coco/trainvalno5k.txt
3 | valid=../coco/5k.txt
4 | names=data/coco.names
5 | 


--------------------------------------------------------------------------------
/data/coco2017.data:
--------------------------------------------------------------------------------
1 | classes=80
2 | train=../coco/train2017.txt
3 | valid=../coco/val2017.txt
4 | names=data/coco.names
5 | 


--------------------------------------------------------------------------------
/data/coco64.data:
--------------------------------------------------------------------------------
1 | classes=80
2 | train=data/coco64.txt
3 | valid=data/coco64.txt
4 | names=data/coco.names
5 | 


--------------------------------------------------------------------------------
/data/coco64.txt:
--------------------------------------------------------------------------------
 1 | ../coco/images/train2017/000000109622.jpg
 2 | ../coco/images/train2017/000000160694.jpg
 3 | ../coco/images/train2017/000000308590.jpg
 4 | ../coco/images/train2017/000000327573.jpg
 5 | ../coco/images/train2017/000000062929.jpg
 6 | ../coco/images/train2017/000000512793.jpg
 7 | ../coco/images/train2017/000000371735.jpg
 8 | ../coco/images/train2017/000000148118.jpg
 9 | ../coco/images/train2017/000000309856.jpg
10 | ../coco/images/train2017/000000141882.jpg
11 | ../coco/images/train2017/000000318783.jpg
12 | ../coco/images/train2017/000000337760.jpg
13 | ../coco/images/train2017/000000298197.jpg
14 | ../coco/images/train2017/000000042421.jpg
15 | ../coco/images/train2017/000000328898.jpg
16 | ../coco/images/train2017/000000458856.jpg
17 | ../coco/images/train2017/000000073824.jpg
18 | ../coco/images/train2017/000000252846.jpg
19 | ../coco/images/train2017/000000459590.jpg
20 | ../coco/images/train2017/000000273650.jpg
21 | ../coco/images/train2017/000000331311.jpg
22 | ../coco/images/train2017/000000156326.jpg
23 | ../coco/images/train2017/000000262985.jpg
24 | ../coco/images/train2017/000000253580.jpg
25 | ../coco/images/train2017/000000447976.jpg
26 | ../coco/images/train2017/000000378077.jpg
27 | ../coco/images/train2017/000000259913.jpg
28 | ../coco/images/train2017/000000424553.jpg
29 | ../coco/images/train2017/000000000612.jpg
30 | ../coco/images/train2017/000000267625.jpg
31 | ../coco/images/train2017/000000566012.jpg
32 | ../coco/images/train2017/000000196664.jpg
33 | ../coco/images/train2017/000000363331.jpg
34 | ../coco/images/train2017/000000057992.jpg
35 | ../coco/images/train2017/000000520047.jpg
36 | ../coco/images/train2017/000000453903.jpg
37 | ../coco/images/train2017/000000162083.jpg
38 | ../coco/images/train2017/000000268516.jpg
39 | ../coco/images/train2017/000000277436.jpg
40 | ../coco/images/train2017/000000189744.jpg
41 | ../coco/images/train2017/000000041128.jpg
42 | ../coco/images/train2017/000000527728.jpg
43 | ../coco/images/train2017/000000465269.jpg
44 | ../coco/images/train2017/000000246833.jpg
45 | ../coco/images/train2017/000000076784.jpg
46 | ../coco/images/train2017/000000323715.jpg
47 | ../coco/images/train2017/000000560463.jpg
48 | ../coco/images/train2017/000000006263.jpg
49 | ../coco/images/train2017/000000094701.jpg
50 | ../coco/images/train2017/000000521359.jpg
51 | ../coco/images/train2017/000000302903.jpg
52 | ../coco/images/train2017/000000047559.jpg
53 | ../coco/images/train2017/000000480583.jpg
54 | ../coco/images/train2017/000000050025.jpg
55 | ../coco/images/train2017/000000084512.jpg
56 | ../coco/images/train2017/000000508913.jpg
57 | ../coco/images/train2017/000000093708.jpg
58 | ../coco/images/train2017/000000070493.jpg
59 | ../coco/images/train2017/000000539270.jpg
60 | ../coco/images/train2017/000000474402.jpg
61 | ../coco/images/train2017/000000209842.jpg
62 | ../coco/images/train2017/000000028820.jpg
63 | ../coco/images/train2017/000000154257.jpg
64 | ../coco/images/train2017/000000342499.jpg
65 | 


--------------------------------------------------------------------------------
/data/coco_paper.names:
--------------------------------------------------------------------------------
 1 | person
 2 | bicycle
 3 | car
 4 | motorcycle
 5 | airplane
 6 | bus
 7 | train
 8 | truck
 9 | boat
10 | traffic light
11 | fire hydrant
12 | street sign
13 | stop sign
14 | parking meter
15 | bench
16 | bird
17 | cat
18 | dog
19 | horse
20 | sheep
21 | cow
22 | elephant
23 | bear
24 | zebra
25 | giraffe
26 | hat
27 | backpack
28 | umbrella
29 | shoe
30 | eye glasses
31 | handbag
32 | tie
33 | suitcase
34 | frisbee
35 | skis
36 | snowboard
37 | sports ball
38 | kite
39 | baseball bat
40 | baseball glove
41 | skateboard
42 | surfboard
43 | tennis racket
44 | bottle
45 | plate
46 | wine glass
47 | cup
48 | fork
49 | knife
50 | spoon
51 | bowl
52 | banana
53 | apple
54 | sandwich
55 | orange
56 | broccoli
57 | carrot
58 | hot dog
59 | pizza
60 | donut
61 | cake
62 | chair
63 | couch
64 | potted plant
65 | bed
66 | mirror
67 | dining table
68 | window
69 | desk
70 | toilet
71 | door
72 | tv
73 | laptop
74 | mouse
75 | remote
76 | keyboard
77 | cell phone
78 | microwave
79 | oven
80 | toaster
81 | sink
82 | refrigerator
83 | blender
84 | book
85 | clock
86 | vase
87 | scissors
88 | teddy bear
89 | hair drier
90 | toothbrush
91 | hair brush


--------------------------------------------------------------------------------
/data/get_coco2014.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | # Zip coco folder
 3 | # zip -r coco.zip coco
 4 | # tar -czvf coco.tar.gz coco
 5 | 
 6 | # Download labels from Google Drive, accepting presented query
 7 | filename="coco2014labels.zip"
 8 | fileid="1s6-CmF5_SElM28r52P1OUrCcuXZN-SFo"
 9 | curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id=${fileid}" > /dev/null
10 | curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=${fileid}" -o ${filename}
11 | rm ./cookie
12 | 
13 | # Unzip labels
14 | unzip -q ${filename}  # for coco.zip
15 | # tar -xzf ${filename}  # for coco.tar.gz
16 | rm ${filename}
17 | 
18 | # Download and unzip images
19 | cd coco/images
20 | f="train2014.zip" && curl http://images.cocodataset.org/zips/$f -o $f && unzip -q $f && rm $f
21 | f="val2014.zip" && curl http://images.cocodataset.org/zips/$f -o $f && unzip -q $f && rm $f
22 | 
23 | # cd out
24 | cd ../..
25 | 


--------------------------------------------------------------------------------
/data/get_coco2017.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | # Zip coco folder
 3 | # zip -r coco.zip coco
 4 | # tar -czvf coco.tar.gz coco
 5 | 
 6 | # Download labels from Google Drive, accepting presented query
 7 | filename="coco2017labels.zip"
 8 | fileid="1cXZR_ckHki6nddOmcysCuuJFM--T-Q6L"
 9 | curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id=${fileid}" > /dev/null
10 | curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=${fileid}" -o ${filename}
11 | rm ./cookie
12 | 
13 | # Unzip labels
14 | unzip -q ${filename}  # for coco.zip
15 | # tar -xzf ${filename}  # for coco.tar.gz
16 | rm ${filename}
17 | 
18 | # Download and unzip images
19 | cd coco/images
20 | f="train2017.zip" && curl http://images.cocodataset.org/zips/$f -o $f && unzip -q $f && rm $f
21 | f="val2017.zip" && curl http://images.cocodataset.org/zips/$f -o $f && unzip -q $f && rm $f
22 | 
23 | # cd out
24 | cd ../..
25 | 


--------------------------------------------------------------------------------
/data/myData.data:
--------------------------------------------------------------------------------
1 | classes=3
2 | train=data/myData/myData_train.txt
3 | valid=data/myData/myData_val.txt
4 | names=data/myData.names
5 | 


--------------------------------------------------------------------------------
/data/myData.names:
--------------------------------------------------------------------------------
1 | QP
2 | NY
3 | QG
4 | 


--------------------------------------------------------------------------------
/data/myData/score/images/train/readme:
--------------------------------------------------------------------------------
1 | 此处存放train的image


--------------------------------------------------------------------------------
/data/myData/score/images/val/readme:
--------------------------------------------------------------------------------
1 | 此处存放val的image


--------------------------------------------------------------------------------
/data/myData/score/labels/train/readme:
--------------------------------------------------------------------------------
1 | 此处存放train的标注
2 | xxx.txt
3 | xxxx.txt
4 | xxxxx.txt


--------------------------------------------------------------------------------
/data/myData/score/labels/val/readme:
--------------------------------------------------------------------------------
1 | 此处存放val的标注
2 | 
3 | xxx.txt
4 | xxxx.txt
5 | xxxxx.txt


--------------------------------------------------------------------------------
/detect.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | from sys import platform
  3 | 
  4 | from models import *  # set ONNX_EXPORT in models.py
  5 | from utils.datasets import *
  6 | from utils.utils import *
  7 | 
  8 | 
  9 | def detect(save_img=False):
 10 |     img_size = (320, 192) if ONNX_EXPORT else opt.img_size  # (320, 192) or (416, 256) or (608, 352) for (height, width)
 11 |     out, source, weights, half, view_img, save_txt = opt.output, opt.source, opt.weights, opt.half, opt.view_img, opt.save_txt
 12 |     webcam = source == '0' or source.startswith('rtsp') or source.startswith('http') or source.endswith('.txt')
 13 | 
 14 |     # Initialize
 15 |     device = torch_utils.select_device(device='cpu' if ONNX_EXPORT else opt.device)
 16 |     if os.path.exists(out):
 17 |         shutil.rmtree(out)  # delete output folder
 18 |     os.makedirs(out)  # make new output folder
 19 | 
 20 |     # Initialize model
 21 |     model = Darknet(opt.cfg, img_size)
 22 | 
 23 |     # Load weights
 24 |     attempt_download(weights)
 25 |     if weights.endswith('.pt'):  # pytorch format
 26 |         model.load_state_dict(torch.load(weights, map_location=device)['model'])
 27 |     else:  # darknet format
 28 |         load_darknet_weights(model, weights)
 29 | 
 30 |     # Second-stage classifier
 31 |     classify = False
 32 |     if classify:
 33 |         modelc = torch_utils.load_classifier(name='resnet101', n=2)  # initialize
 34 |         modelc.load_state_dict(torch.load('weights/resnet101.pt', map_location=device)['model'])  # load weights
 35 |         modelc.to(device).eval()
 36 | 
 37 |     # Eval mode
 38 |     model.to(device).eval()
 39 | 
 40 |     # Fuse Conv2d + BatchNorm2d layers
 41 |     # model.fuse()
 42 | 
 43 |     # Export mode
 44 |     if ONNX_EXPORT:
 45 |         model.fuse()
 46 |         img = torch.zeros((1, 3) + img_size)  # (1, 3, 320, 192)
 47 |         f = opt.weights.replace(opt.weights.split('.')[-1], 'onnx')  # *.onnx filename
 48 |         torch.onnx.export(model, img, f, verbose=False, opset_version=11,
 49 |                           input_names=['images'], output_names=['classes', 'boxes'])
 50 | 
 51 |         # Validate exported model
 52 |         import onnx
 53 |         model = onnx.load(f)  # Load the ONNX model
 54 |         onnx.checker.check_model(model)  # Check that the IR is well formed
 55 |         print(onnx.helper.printable_graph(model.graph))  # Print a human readable representation of the graph
 56 |         return
 57 | 
 58 |     # Half precision
 59 |     half = half and device.type != 'cpu'  # half precision only supported on CUDA
 60 |     if half:
 61 |         model.half()
 62 | 
 63 |     # Set Dataloader
 64 |     vid_path, vid_writer = None, None
 65 |     if webcam:
 66 |         view_img = True
 67 |         torch.backends.cudnn.benchmark = True  # set True to speed up constant image size inference
 68 |         dataset = LoadStreams(source, img_size=img_size)
 69 |     else:
 70 |         save_img = True
 71 |         dataset = LoadImages(source, img_size=img_size)
 72 | 
 73 |     # Get names and colors
 74 |     names = load_classes(opt.names)
 75 |     colors = [[random.randint(0, 255) for _ in range(3)] for _ in range(len(names))]
 76 | 
 77 |     # Run inference
 78 |     t0 = time.time()
 79 |     img = torch.zeros((1, 3, img_size, img_size), device=device)  # init img
 80 |     _ = model(img.half() if half else img.float()) if device.type != 'cpu' else None  # run once
 81 |     for path, img, im0s, vid_cap in dataset:
 82 |         img = torch.from_numpy(img).to(device)
 83 |         img = img.half() if half else img.float()  # uint8 to fp16/32
 84 |         img /= 255.0  # 0 - 255 to 0.0 - 1.0
 85 |         if img.ndimension() == 3:
 86 |             img = img.unsqueeze(0)
 87 | 
 88 |         # Inference
 89 |         t1 = torch_utils.time_synchronized()
 90 |         pred = model(img, augment=opt.augment)[0]
 91 |         t2 = torch_utils.time_synchronized()
 92 | 
 93 |         # to float
 94 |         if half:
 95 |             pred = pred.float()
 96 | 
 97 |         # Apply NMS
 98 |         pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres,
 99 |                                    merge=False, classes=opt.classes, agnostic=opt.agnostic_nms)
100 | 
101 |         # Apply Classifier
102 |         if classify:
103 |             pred = apply_classifier(pred, modelc, img, im0s)
104 | 
105 |         # Process detections
106 |         for i, det in enumerate(pred):  # detections per image
107 |             if webcam:  # batch_size >= 1
108 |                 p, s, im0 = path[i], '%g: ' % i, im0s[i]
109 |             else:
110 |                 p, s, im0 = path, '', im0s
111 | 
112 |             save_path = str(Path(out) / Path(p).name)
113 |             s += '%gx%g ' % img.shape[2:]  # print string
114 |             if det is not None and len(det):
115 |                 # Rescale boxes from img_size to im0 size
116 |                 det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()
117 | 
118 |                 # Print results
119 |                 for c in det[:, -1].unique():
120 |                     n = (det[:, -1] == c).sum()  # detections per class
121 |                     s += '%g %ss, ' % (n, names[int(c)])  # add to string
122 | 
123 |                 # Write results
124 |                 for *xyxy, conf, cls in det:
125 |                     if save_txt:  # Write to file
126 |                         with open(save_path + '.txt', 'a') as file:
127 |                             file.write(('%g ' * 6 + '\n') % (*xyxy, cls, conf))
128 | 
129 |                     if save_img or view_img:  # Add bbox to image
130 |                         label = '%s %.2f' % (names[int(cls)], conf)
131 |                         plot_one_box(xyxy, im0, label=label, color=colors[int(cls)])
132 | 
133 |             # Print time (inference + NMS)
134 |             print('%sDone. (%.3fs)' % (s, t2 - t1))
135 | 
136 |             # Stream results
137 |             if view_img:
138 |                 cv2.imshow(p, im0)
139 |                 if cv2.waitKey(1) == ord('q'):  # q to quit
140 |                     raise StopIteration
141 | 
142 |             # Save results (image with detections)
143 |             if save_img:
144 |                 if dataset.mode == 'images':
145 |                     cv2.imwrite(save_path, im0)
146 |                 else:
147 |                     if vid_path != save_path:  # new video
148 |                         vid_path = save_path
149 |                         if isinstance(vid_writer, cv2.VideoWriter):
150 |                             vid_writer.release()  # release previous video writer
151 | 
152 |                         fps = vid_cap.get(cv2.CAP_PROP_FPS)
153 |                         w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
154 |                         h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
155 |                         vid_writer = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*opt.fourcc), fps, (w, h))
156 |                     vid_writer.write(im0)
157 | 
158 |     if save_txt or save_img:
159 |         print('Results saved to %s' % os.getcwd() + os.sep + out)
160 |         if platform == 'darwin':  # MacOS
161 |             os.system('open ' + save_path)
162 | 
163 |     print('Done. (%.3fs)' % (time.time() - t0))
164 | 
165 | 
166 | if __name__ == '__main__':
167 |     parser = argparse.ArgumentParser()
168 |     parser.add_argument('--cfg', type=str, default='cfg/yolov4-pacsp.cfg', help='*.cfg path')
169 |     parser.add_argument('--names', type=str, default='data/coco.names', help='*.names path')
170 |     parser.add_argument('--weights', type=str, default='weights/yolov4-pacsp.pt', help='weights path')
171 |     parser.add_argument('--source', type=str, default='data/samples', help='source')  # input file/folder, 0 for webcam
172 |     parser.add_argument('--output', type=str, default='output', help='output folder')  # output folder
173 |     parser.add_argument('--img-size', type=int, default=512, help='inference size (pixels)')
174 |     parser.add_argument('--conf-thres', type=float, default=0.3, help='object confidence threshold')
175 |     parser.add_argument('--iou-thres', type=float, default=0.6, help='IOU threshold for NMS')
176 |     parser.add_argument('--fourcc', type=str, default='mp4v', help='output video codec (verify ffmpeg support)')
177 |     parser.add_argument('--half', action='store_true', help='half precision FP16 inference')
178 |     parser.add_argument('--device', default='', help='device id (i.e. 0 or 0,1) or cpu')
179 |     parser.add_argument('--view-img', action='store_true', help='display results')
180 |     parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
181 |     parser.add_argument('--classes', nargs='+', type=int, help='filter by class')
182 |     parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')
183 |     parser.add_argument('--augment', action='store_true', help='augmented inference')
184 |     opt = parser.parse_args()
185 |     print(opt)
186 | 
187 |     with torch.no_grad():
188 |         detect()
189 | 


--------------------------------------------------------------------------------
/experiments.md:
--------------------------------------------------------------------------------
 1 | ## Experimental results on MSCOCO 2017 test-dev set
 2 | 
 3 | | Model | Test Size | AP<sup>test</sup> | AP<sub>50</sub><sup>test</sup> | AP<sub>75</sub><sup>test</sup> | AP<sub>S</sub><sup>test</sup> | AP<sub>M</sub><sup>test</sup> | AP<sub>L</sub><sup>test</sup> |
 4 | | :-- | :-: | :-: | :-: | :-: | :-: | :-: | :-: |
 5 | | **YOLOv4**<sub>paspp</sub> | 608 |  |  |  |  |  |  |
 6 | | **YOLOv4**<sub>pacsp-s</sub> | 608 |  |  |  |  |  |  |
 7 | | **YOLOv4**<sub>pacsp</sub> | 608 | 45.9% | 64.3% | 50.4% | 25.4% | 50.6% | 59.0% |
 8 | | **YOLOv4**<sub>pacsp-x</sub> | 608 | 47.7% | 66.0% | 52.2% | 27.4% | 52.3% | 61.0% |
 9 | |  |  |  |  |  |  |  |
10 | | **YOLOv4**<sub>pacsp-s-mish</sub> | 608 |  |  |  |  |  |  |
11 | | **YOLOv4**<sub>pacsp-mish</sub> | 608 | 46.6% | 64.9% | 51.0% | 26.1% | 51.3% | 59.6% |
12 | | **YOLOv4**<sub>pacsp-x-mish</sub> | 608 | **48.5%** | **66.6%** | **53.2%** | **28.4%** | **53.2%** | **61.7%** |
13 | |  |  |  |  |  |  |  |
14 | | **YOLOv4**<sub>tiny</sub> | 416 | 22.6% | 38.7% | 23.2% | 6.6% | 25.9% | 33.3% |
15 | |  |  |  |  |  |  |  |
16 | 


--------------------------------------------------------------------------------
/images/scalingCSP.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/images/scalingCSP.png


--------------------------------------------------------------------------------
/models.py:
--------------------------------------------------------------------------------
  1 | from utils.google_utils import *
  2 | from utils.layers import *
  3 | from utils.parse_config import *
  4 | 
  5 | ONNX_EXPORT = False
  6 | 
  7 | 
  8 | def create_modules(module_defs, img_size, cfg):
  9 |     # Constructs module list of layer blocks from module configuration in module_defs
 10 | 
 11 |     img_size = [img_size] * 2 if isinstance(img_size, int) else img_size  # expand if necessary
 12 |     _ = module_defs.pop(0)  # cfg training hyperparams (unused)
 13 |     output_filters = [3]  # input channels
 14 |     module_list = nn.ModuleList()
 15 |     routs = []  # list of layers which rout to deeper layers
 16 |     yolo_index = -1
 17 | 
 18 |     for i, mdef in enumerate(module_defs):
 19 |         modules = nn.Sequential()
 20 | 
 21 |         if mdef['type'] == 'convolutional':
 22 |             bn = mdef['batch_normalize']
 23 |             filters = mdef['filters']
 24 |             k = mdef['size']  # kernel size
 25 |             stride = mdef['stride'] if 'stride' in mdef else (mdef['stride_y'], mdef['stride_x'])
 26 |             if isinstance(k, int):  # single-size conv
 27 |                 modules.add_module('Conv2d', nn.Conv2d(in_channels=output_filters[-1],
 28 |                                                        out_channels=filters,
 29 |                                                        kernel_size=k,
 30 |                                                        stride=stride,
 31 |                                                        padding=k // 2 if mdef['pad'] else 0,
 32 |                                                        groups=mdef['groups'] if 'groups' in mdef else 1,
 33 |                                                        bias=not bn))
 34 |             else:  # multiple-size conv
 35 |                 modules.add_module('MixConv2d', MixConv2d(in_ch=output_filters[-1],
 36 |                                                           out_ch=filters,
 37 |                                                           k=k,
 38 |                                                           stride=stride,
 39 |                                                           bias=not bn))
 40 | 
 41 |             if bn:
 42 |                 modules.add_module('BatchNorm2d', nn.BatchNorm2d(filters, momentum=0.03, eps=1E-4))
 43 |             else:
 44 |                 routs.append(i)  # detection output (goes into yolo layer)
 45 | 
 46 |             if mdef['activation'] == 'leaky':  # activation study https://github.com/ultralytics/yolov3/issues/441
 47 |                 modules.add_module('activation', nn.LeakyReLU(0.1, inplace=True))
 48 |             elif mdef['activation'] == 'swish':
 49 |                 modules.add_module('activation', Swish())
 50 |             elif mdef['activation'] == 'mish':
 51 |                 modules.add_module('activation', Mish())
 52 | 
 53 |         elif mdef['type'] == 'BatchNorm2d':
 54 |             filters = output_filters[-1]
 55 |             modules = nn.BatchNorm2d(filters, momentum=0.03, eps=1E-4)
 56 |             if i == 0 and filters == 3:  # normalize RGB image
 57 |                 # imagenet mean and var https://pytorch.org/docs/stable/torchvision/models.html#classification
 58 |                 modules.running_mean = torch.tensor([0.485, 0.456, 0.406])
 59 |                 modules.running_var = torch.tensor([0.0524, 0.0502, 0.0506])
 60 | 
 61 |         elif mdef['type'] == 'maxpool':
 62 |             k = mdef['size']  # kernel size
 63 |             stride = mdef['stride']
 64 |             maxpool = nn.MaxPool2d(kernel_size=k, stride=stride, padding=(k - 1) // 2)
 65 |             if k == 2 and stride == 1:  # yolov3-tiny
 66 |                 modules.add_module('ZeroPad2d', nn.ZeroPad2d((0, 1, 0, 1)))
 67 |                 modules.add_module('MaxPool2d', maxpool)
 68 |             else:
 69 |                 modules = maxpool
 70 | 
 71 |         elif mdef['type'] == 'upsample':
 72 |             if ONNX_EXPORT:  # explicitly state size, avoid scale_factor
 73 |                 g = (yolo_index + 1) * 2 / 32  # gain
 74 |                 modules = nn.Upsample(size=tuple(int(x * g) for x in img_size))  # img_size = (320, 192)
 75 |             else:
 76 |                 modules = nn.Upsample(scale_factor=mdef['stride'])
 77 | 
 78 |         elif mdef['type'] == 'route':  # nn.Sequential() placeholder for 'route' layer
 79 |             layers = mdef['layers']
 80 |             filters = sum([output_filters[l + 1 if l > 0 else l] for l in layers])
 81 |             routs.extend([i + l if l < 0 else l for l in layers])
 82 |             modules = FeatureConcat(layers=layers)
 83 | 
 84 |         elif mdef['type'] == 'route_lhalf':  # nn.Sequential() placeholder for 'route' layer
 85 |             layers = mdef['layers']
 86 |             filters = sum([output_filters[l + 1 if l > 0 else l] for l in layers])//2
 87 |             routs.extend([i + l if l < 0 else l for l in layers])
 88 |             modules = FeatureConcat_l(layers=layers)
 89 | 
 90 |         elif mdef['type'] == 'shortcut':  # nn.Sequential() placeholder for 'shortcut' layer
 91 |             layers = mdef['from']
 92 |             filters = output_filters[-1]
 93 |             routs.extend([i + l if l < 0 else l for l in layers])
 94 |             modules = WeightedFeatureFusion(layers=layers, weight='weights_type' in mdef)
 95 | 
 96 |         elif mdef['type'] == 'reorg3d':  # yolov3-spp-pan-scale
 97 |             pass
 98 | 
 99 |         elif mdef['type'] == 'yolo':
100 |             yolo_index += 1
101 |             stride = [8, 16, 32]  # P5, P4, P3 strides
102 |             if any(x in cfg for x in ['yolov4-tiny']):  # stride order reversed
103 |                 stride = [32, 16, 8]
104 |             layers = mdef['from'] if 'from' in mdef else []
105 |             modules = YOLOLayer(anchors=mdef['anchors'][mdef['mask']],  # anchor list
106 |                                 nc=mdef['classes'],  # number of classes
107 |                                 img_size=img_size,  # (416, 416)
108 |                                 yolo_index=yolo_index,  # 0, 1, 2...
109 |                                 layers=layers,  # output layers
110 |                                 stride=stride[yolo_index])
111 | 
112 |             # Initialize preceding Conv2d() bias (https://arxiv.org/pdf/1708.02002.pdf section 3.3)
113 |             try:
114 |                 j = layers[yolo_index] if 'from' in mdef else -1
115 |                 bias_ = module_list[j][0].bias  # shape(255,)
116 |                 bias = bias_[:modules.no * modules.na].view(modules.na, -1)  # shape(3,85)
117 |                 bias[:, 4] += -4.5  # obj
118 |                 bias[:, 5:] += math.log(0.6 / (modules.nc - 0.99))  # cls (sigmoid(p) = 1/nc)
119 |                 module_list[j][0].bias = torch.nn.Parameter(bias_, requires_grad=bias_.requires_grad)
120 |             except:
121 |                 print('WARNING: smart bias initialization failure.')
122 | 
123 |         else:
124 |             print('Warning: Unrecognized Layer Type: ' + mdef['type'])
125 | 
126 |         # Register module list and number of output filters
127 |         module_list.append(modules)
128 |         output_filters.append(filters)
129 | 
130 |     routs_binary = [False] * (i + 1)
131 |     for i in routs:
132 |         routs_binary[i] = True
133 |     return module_list, routs_binary
134 | 
135 | 
136 | class YOLOLayer(nn.Module):
137 |     def __init__(self, anchors, nc, img_size, yolo_index, layers, stride):
138 |         super(YOLOLayer, self).__init__()
139 |         self.anchors = torch.Tensor(anchors)
140 |         self.index = yolo_index  # index of this layer in layers
141 |         self.layers = layers  # model output layer indices
142 |         self.stride = stride  # layer stride
143 |         self.nl = len(layers)  # number of output layers (3)
144 |         self.na = len(anchors)  # number of anchors (3)
145 |         self.nc = nc  # number of classes (80)
146 |         self.no = nc + 5  # number of outputs (85)
147 |         self.nx, self.ny, self.ng = 0, 0, 0  # initialize number of x, y gridpoints
148 |         self.anchor_vec = self.anchors / self.stride
149 |         self.anchor_wh = self.anchor_vec.view(1, self.na, 1, 1, 2)
150 | 
151 |         if ONNX_EXPORT:
152 |             self.training = False
153 |             self.create_grids((img_size[1] // stride, img_size[0] // stride))  # number x, y grid points
154 | 
155 |     def create_grids(self, ng=(13, 13), device='cpu'):
156 |         self.nx, self.ny = ng  # x and y grid size
157 |         self.ng = torch.tensor(ng, dtype=torch.float)
158 | 
159 |         # build xy offsets
160 |         if not self.training:
161 |             yv, xv = torch.meshgrid([torch.arange(self.ny, device=device), torch.arange(self.nx, device=device)])
162 |             self.grid = torch.stack((xv, yv), 2).view((1, 1, self.ny, self.nx, 2)).float()
163 | 
164 |         if self.anchor_vec.device != device:
165 |             self.anchor_vec = self.anchor_vec.to(device)
166 |             self.anchor_wh = self.anchor_wh.to(device)
167 | 
168 |     def forward(self, p, out):
169 |         ASFF = False  # https://arxiv.org/abs/1911.09516
170 |         if ASFF:
171 |             i, n = self.index, self.nl  # index in layers, number of layers
172 |             p = out[self.layers[i]]
173 |             bs, _, ny, nx = p.shape  # bs, 255, 13, 13
174 |             if (self.nx, self.ny) != (nx, ny):
175 |                 self.create_grids((nx, ny), p.device)
176 | 
177 |             # outputs and weights
178 |             # w = F.softmax(p[:, -n:], 1)  # normalized weights
179 |             w = torch.sigmoid(p[:, -n:]) * (2 / n)  # sigmoid weights (faster)
180 |             # w = w / w.sum(1).unsqueeze(1)  # normalize across layer dimension
181 | 
182 |             # weighted ASFF sum
183 |             p = out[self.layers[i]][:, :-n] * w[:, i:i + 1]
184 |             for j in range(n):
185 |                 if j != i:
186 |                     p += w[:, j:j + 1] * \
187 |                          F.interpolate(out[self.layers[j]][:, :-n], size=[ny, nx], mode='bilinear', align_corners=False)
188 | 
189 |         elif ONNX_EXPORT:
190 |             bs = 1  # batch size
191 |         else:
192 |             bs, _, ny, nx = p.shape  # bs, 255, 13, 13
193 |             if (self.nx, self.ny) != (nx, ny):
194 |                 self.create_grids((nx, ny), p.device)
195 | 
196 |         # p.view(bs, 255, 13, 13) -- > (bs, 3, 13, 13, 85)  # (bs, anchors, grid, grid, classes + xywh)
197 |         p = p.view(bs, self.na, self.no, self.ny, self.nx).permute(0, 1, 3, 4, 2).contiguous()  # prediction
198 | 
199 |         if self.training:
200 |             return p
201 | 
202 |         elif ONNX_EXPORT:
203 |             # Avoid broadcasting for ANE operations
204 |             m = self.na * self.nx * self.ny
205 |             ng = 1. / self.ng.repeat(m, 1)
206 |             grid = self.grid.repeat(1, self.na, 1, 1, 1).view(m, 2)
207 |             anchor_wh = self.anchor_wh.repeat(1, 1, self.nx, self.ny, 1).view(m, 2) * ng
208 | 
209 |             p = p.view(m, self.no)
210 |             xy = torch.sigmoid(p[:, 0:2]) + grid  # x, y
211 |             wh = torch.exp(p[:, 2:4]) * anchor_wh  # width, height
212 |             p_cls = torch.sigmoid(p[:, 4:5]) if self.nc == 1 else \
213 |                 torch.sigmoid(p[:, 5:self.no]) * torch.sigmoid(p[:, 4:5])  # conf
214 |             return p_cls, xy * ng, wh
215 | 
216 |         else:  # inference
217 |             io = p.clone()  # inference output
218 |             io[..., :2] = torch.sigmoid(io[..., :2]) + self.grid  # xy
219 |             io[..., 2:4] = torch.exp(io[..., 2:4]) * self.anchor_wh  # wh yolo method
220 |             io[..., :4] *= self.stride
221 |             torch.sigmoid_(io[..., 4:])
222 |             return io.view(bs, -1, self.no), p  # view [1, 3, 13, 13, 85] as [1, 507, 85]
223 | 
224 | 
225 | class Darknet(nn.Module):
226 |     # YOLOv3 object detection model
227 | 
228 |     def __init__(self, cfg, img_size=(416, 416), verbose=False):
229 |         super(Darknet, self).__init__()
230 | 
231 |         self.module_defs = parse_model_cfg(cfg)
232 |         self.module_list, self.routs = create_modules(self.module_defs, img_size, cfg)
233 |         self.yolo_layers = get_yolo_layers(self)
234 |         # torch_utils.initialize_weights(self)
235 | 
236 |         # Darknet Header https://github.com/AlexeyAB/darknet/issues/2914#issuecomment-496675346
237 |         self.version = np.array([0, 2, 5], dtype=np.int32)  # (int32) version info: major, minor, revision
238 |         self.seen = np.array([0], dtype=np.int64)  # (int64) number of images seen during training
239 |         self.info(verbose) if not ONNX_EXPORT else None  # print model description
240 | 
241 |     def forward(self, x, augment=False, verbose=False):
242 | 
243 |         if not augment:
244 |             return self.forward_once(x)
245 |         else:  # Augment images (inference and test only) https://github.com/ultralytics/yolov3/issues/931
246 |             img_size = x.shape[-2:]  # height, width
247 |             s = [0.83, 0.67]  # scales
248 |             y = []
249 |             for i, xi in enumerate((x,
250 |                                     torch_utils.scale_img(x.flip(3), s[0], same_shape=False),  # flip-lr and scale
251 |                                     torch_utils.scale_img(x, s[1], same_shape=False),  # scale
252 |                                     )):
253 |                 # cv2.imwrite('img%g.jpg' % i, 255 * xi[0].numpy().transpose((1, 2, 0))[:, :, ::-1])
254 |                 y.append(self.forward_once(xi)[0])
255 | 
256 |             y[1][..., :4] /= s[0]  # scale
257 |             y[1][..., 0] = img_size[1] - y[1][..., 0]  # flip lr
258 |             y[2][..., :4] /= s[1]  # scale
259 | 
260 |             # for i, yi in enumerate(y):  # coco small, medium, large = < 32**2 < 96**2 <
261 |             #     area = yi[..., 2:4].prod(2)[:, :, None]
262 |             #     if i == 1:
263 |             #         yi *= (area < 96. ** 2).float()
264 |             #     elif i == 2:
265 |             #         yi *= (area > 32. ** 2).float()
266 |             #     y[i] = yi
267 | 
268 |             y = torch.cat(y, 1)
269 |             return y, None
270 | 
271 |     def forward_once(self, x, augment=False, verbose=False):
272 |         img_size = x.shape[-2:]  # height, width
273 |         yolo_out, out = [], []
274 |         if verbose:
275 |             print('0', x.shape)
276 |             str = ''
277 | 
278 |         # Augment images (inference and test only)
279 |         if augment:  # https://github.com/ultralytics/yolov3/issues/931
280 |             nb = x.shape[0]  # batch size
281 |             s = [0.83, 0.67]  # scales
282 |             x = torch.cat((x,
283 |                            torch_utils.scale_img(x.flip(3), s[0]),  # flip-lr and scale
284 |                            torch_utils.scale_img(x, s[1]),  # scale
285 |                            ), 0)
286 | 
287 |         for i, module in enumerate(self.module_list):
288 |             name = module.__class__.__name__
289 |             if name in ['WeightedFeatureFusion', 'FeatureConcat', 'FeatureConcat_l']:  # sum, concat
290 |                 if verbose:
291 |                     l = [i - 1] + module.layers  # layers
292 |                     sh = [list(x.shape)] + [list(out[i].shape) for i in module.layers]  # shapes
293 |                     str = ' >> ' + ' + '.join(['layer %g %s' % x for x in zip(l, sh)])
294 |                 x = module(x, out)  # WeightedFeatureFusion(), FeatureConcat()
295 |             elif name == 'YOLOLayer':
296 |                 yolo_out.append(module(x, out))
297 |             else:  # run module directly, i.e. mtype = 'convolutional', 'upsample', 'maxpool', 'batchnorm2d' etc.
298 |                 x = module(x)
299 | 
300 |             out.append(x if self.routs[i] else [])
301 |             if verbose:
302 |                 print('%g/%g %s -' % (i, len(self.module_list), name), list(x.shape), str)
303 |                 str = ''
304 | 
305 |         if self.training:  # train
306 |             return yolo_out
307 |         elif ONNX_EXPORT:  # export
308 |             x = [torch.cat(x, 0) for x in zip(*yolo_out)]
309 |             return x[0], torch.cat(x[1:3], 1)  # scores, boxes: 3780x80, 3780x4
310 |         else:  # inference or test
311 |             x, p = zip(*yolo_out)  # inference output, training output
312 |             x = torch.cat(x, 1)  # cat yolo outputs
313 |             if augment:  # de-augment results
314 |                 x = torch.split(x, nb, dim=0)
315 |                 x[1][..., :4] /= s[0]  # scale
316 |                 x[1][..., 0] = img_size[1] - x[1][..., 0]  # flip lr
317 |                 x[2][..., :4] /= s[1]  # scale
318 |                 x = torch.cat(x, 1)
319 |             return x, p
320 | 
321 |     def fuse(self):
322 |         # Fuse Conv2d + BatchNorm2d layers throughout model
323 |         print('Fusing layers...')
324 |         fused_list = nn.ModuleList()
325 |         for a in list(self.children())[0]:
326 |             if isinstance(a, nn.Sequential):
327 |                 for i, b in enumerate(a):
328 |                     if isinstance(b, nn.modules.batchnorm.BatchNorm2d):
329 |                         # fuse this bn layer with the previous conv2d layer
330 |                         conv = a[i - 1]
331 |                         fused = torch_utils.fuse_conv_and_bn(conv, b)
332 |                         a = nn.Sequential(fused, *list(a.children())[i + 1:])
333 |                         break
334 |             fused_list.append(a)
335 |         self.module_list = fused_list
336 |         self.info() if not ONNX_EXPORT else None  # yolov3-spp reduced from 225 to 152 layers
337 | 
338 |     def info(self, verbose=False):
339 |         torch_utils.model_info(self, verbose)
340 | 
341 | 
342 | def get_yolo_layers(model):
343 |     return [i for i, m in enumerate(model.module_list) if m.__class__.__name__ == 'YOLOLayer']  # [89, 101, 113]
344 | 
345 | 
346 | def load_darknet_weights(self, weights, cutoff=-1):
347 |     # Parses and loads the weights stored in 'weights'
348 | 
349 |     # Establish cutoffs (load layers between 0 and cutoff. if cutoff = -1 all are loaded)
350 |     file = Path(weights).name
351 |     if file == 'darknet53.conv.74':
352 |         cutoff = 75
353 |     elif file == 'yolov3-tiny.conv.15':
354 |         cutoff = 15
355 | 
356 |     # Read weights file
357 |     with open(weights, 'rb') as f:
358 |         # Read Header https://github.com/AlexeyAB/darknet/issues/2914#issuecomment-496675346
359 |         self.version = np.fromfile(f, dtype=np.int32, count=3)  # (int32) version info: major, minor, revision
360 |         self.seen = np.fromfile(f, dtype=np.int64, count=1)  # (int64) number of images seen during training
361 | 
362 |         weights = np.fromfile(f, dtype=np.float32)  # the rest are weights
363 | 
364 |     ptr = 0
365 |     for i, (mdef, module) in enumerate(zip(self.module_defs[:cutoff], self.module_list[:cutoff])):
366 |         if mdef['type'] == 'convolutional':
367 |             conv = module[0]
368 |             if mdef['batch_normalize']:
369 |                 # Load BN bias, weights, running mean and running variance
370 |                 bn = module[1]
371 |                 nb = bn.bias.numel()  # number of biases
372 |                 # Bias
373 |                 bn.bias.data.copy_(torch.from_numpy(weights[ptr:ptr + nb]).view_as(bn.bias))
374 |                 ptr += nb
375 |                 # Weight
376 |                 bn.weight.data.copy_(torch.from_numpy(weights[ptr:ptr + nb]).view_as(bn.weight))
377 |                 ptr += nb
378 |                 # Running Mean
379 |                 bn.running_mean.data.copy_(torch.from_numpy(weights[ptr:ptr + nb]).view_as(bn.running_mean))
380 |                 ptr += nb
381 |                 # Running Var
382 |                 bn.running_var.data.copy_(torch.from_numpy(weights[ptr:ptr + nb]).view_as(bn.running_var))
383 |                 ptr += nb
384 |             else:
385 |                 # Load conv. bias
386 |                 nb = conv.bias.numel()
387 |                 conv_b = torch.from_numpy(weights[ptr:ptr + nb]).view_as(conv.bias)
388 |                 conv.bias.data.copy_(conv_b)
389 |                 ptr += nb
390 |             # Load conv. weights
391 |             nw = conv.weight.numel()  # number of weights
392 |             conv.weight.data.copy_(torch.from_numpy(weights[ptr:ptr + nw]).view_as(conv.weight))
393 |             ptr += nw
394 | 
395 | 
396 | def save_weights(self, path='model.weights', cutoff=-1):
397 |     # Converts a PyTorch model to Darket format (*.pt to *.weights)
398 |     # Note: Does not work if model.fuse() is applied
399 |     with open(path, 'wb') as f:
400 |         # Write Header https://github.com/AlexeyAB/darknet/issues/2914#issuecomment-496675346
401 |         self.version.tofile(f)  # (int32) version info: major, minor, revision
402 |         self.seen.tofile(f)  # (int64) number of images seen during training
403 | 
404 |         # Iterate through layers
405 |         for i, (mdef, module) in enumerate(zip(self.module_defs[:cutoff], self.module_list[:cutoff])):
406 |             if mdef['type'] == 'convolutional':
407 |                 conv_layer = module[0]
408 |                 # If batch norm, load bn first
409 |                 if mdef['batch_normalize']:
410 |                     bn_layer = module[1]
411 |                     bn_layer.bias.data.cpu().numpy().tofile(f)
412 |                     bn_layer.weight.data.cpu().numpy().tofile(f)
413 |                     bn_layer.running_mean.data.cpu().numpy().tofile(f)
414 |                     bn_layer.running_var.data.cpu().numpy().tofile(f)
415 |                 # Load conv bias
416 |                 else:
417 |                     conv_layer.bias.data.cpu().numpy().tofile(f)
418 |                 # Load conv weights
419 |                 conv_layer.weight.data.cpu().numpy().tofile(f)
420 | 
421 | 
422 | def convert(cfg='cfg/yolov4-pacsp.cfg', weights='weights/yolov4-pacsp.weights'):
423 |     # Converts between PyTorch and Darknet format per extension (i.e. *.weights convert to *.pt and vice versa)
424 |     # from models import *; convert('cfg/yolov3-spp.cfg', 'weights/yolov3-spp.weights')
425 | 
426 |     # Initialize model
427 |     model = Darknet(cfg)
428 | 
429 |     # Load weights and save
430 |     if weights.endswith('.pt'):  # if PyTorch format
431 |         model.load_state_dict(torch.load(weights, map_location='cpu')['model'])
432 |         save_weights(model, path='converted.weights', cutoff=-1)
433 |         print("Success: converted '%s' to 'converted.weights'" % weights)
434 | 
435 |     elif weights.endswith('.weights'):  # darknet format
436 |         _ = load_darknet_weights(model, weights)
437 | 
438 |         chkpt = {'epoch': -1,
439 |                  'best_fitness': None,
440 |                  'training_results': None,
441 |                  'model': model.state_dict(),
442 |                  'optimizer': None}
443 | 
444 |         torch.save(chkpt, 'converted.pt')
445 |         print("Success: converted '%s' to 'converted.pt'" % weights)
446 | 
447 |     else:
448 |         print('Error: extension not supported.')
449 | 
450 | 
451 | def attempt_download(weights):
452 |     # Attempt to download pretrained weights if not found locally
453 |     weights = weights.strip()
454 |     msg = weights + ' missing, try downloading from https://drive.google.com/open?id=1LezFG5g3BCW6iYaV89B2i64cqEUZD7e0'
455 | 
456 |     if len(weights) > 0 and not os.path.isfile(weights):
457 |         d = {'': ''}
458 | 
459 |         file = Path(weights).name
460 |         if file in d:
461 |             r = gdrive_download(id=d[file], name=weights)
462 |         else:  # download from pjreddie.com
463 |             url = 'https://pjreddie.com/media/files/' + file
464 |             print('Downloading ' + url)
465 |             r = os.system('curl -f ' + url + ' -o ' + weights)
466 | 
467 |         # Error check
468 |         if not (r == 0 and os.path.exists(weights) and os.path.getsize(weights) > 1E6):  # weights exist and > 1MB
469 |             os.system('rm ' + weights)  # remove partial downloads
470 |             raise Exception(msg)
471 | 


--------------------------------------------------------------------------------
/pic/p0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/pic/p0.png


--------------------------------------------------------------------------------
/pic/p1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/pic/p1.png


--------------------------------------------------------------------------------
/pic/p2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/pic/p2.png


--------------------------------------------------------------------------------
/pic/p3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/pic/p3.png


--------------------------------------------------------------------------------
/pic/p4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/pic/p4.png


--------------------------------------------------------------------------------
/pic/p5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/pic/p5.png


--------------------------------------------------------------------------------
/pic/test1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/pic/test1.jpg


--------------------------------------------------------------------------------
/pic/test2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/pic/test2.jpg


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | numpy == 1.17
 2 | opencv-python >= 4.1
 3 | torch >= 1.5
 4 | torchvision
 5 | matplotlib
 6 | pycocotools
 7 | tqdm
 8 | pillow
 9 | tensorboard >= 1.14
10 | 
11 | # Nvidia Apex (optional) for mixed precision training --------------------------
12 | # git clone https://github.com/NVIDIA/apex && cd apex && pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" . --user
13 | 


--------------------------------------------------------------------------------
/runs/readme:
--------------------------------------------------------------------------------
1 | tensorboard的log存放在此


--------------------------------------------------------------------------------
/test.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import json
  3 | 
  4 | from torch.utils.data import DataLoader
  5 | 
  6 | from models import *
  7 | from utils.datasets import *
  8 | from utils.utils import *
  9 | 
 10 | 
 11 | def test(cfg,
 12 |          data,
 13 |          weights=None,
 14 |          batch_size=16,
 15 |          img_size=416,
 16 |          conf_thres=0.001,
 17 |          iou_thres=0.6,  # for nms
 18 |          save_json=False,
 19 |          single_cls=False,
 20 |          augment=False,
 21 |          model=None,
 22 |          dataloader=None):
 23 |     # Initialize/load model and set device
 24 |     if model is None:
 25 |         device = torch_utils.select_device(opt.device, batch_size=batch_size)
 26 |         verbose = opt.task == 'test'
 27 | 
 28 |         # Remove previous
 29 |         for f in glob.glob('test_batch*.jpg'):
 30 |             os.remove(f)
 31 | 
 32 |         # Initialize model
 33 |         model = Darknet(cfg, img_size)
 34 | 
 35 |         # Load weights
 36 |         attempt_download(weights)
 37 |         if weights.endswith('.pt'):  # pytorch format
 38 |             model.load_state_dict(torch.load(weights, map_location=device)['model'])
 39 |         else:  # darknet format
 40 |             load_darknet_weights(model, weights)
 41 | 
 42 |         # Fuse
 43 |         model.fuse()
 44 |         model.to(device)
 45 | 
 46 |         if device.type != 'cpu' and torch.cuda.device_count() > 1:
 47 |             model = nn.DataParallel(model)
 48 |     else:  # called by train.py
 49 |         device = next(model.parameters()).device  # get model device
 50 |         verbose = False
 51 | 
 52 |     # Configure run
 53 |     data = parse_data_cfg(data)
 54 |     nc = 1 if single_cls else int(data['classes'])  # number of classes
 55 |     path = data['valid']  # path to test images
 56 |     names = load_classes(data['names'])  # class names
 57 |     iouv = torch.linspace(0.5, 0.95, 10).to(device)  # iou vector for mAP@0.5:0.95
 58 |     iouv = iouv[0].view(1)  # comment for mAP@0.5:0.95
 59 |     niou = iouv.numel()
 60 | 
 61 |     # Dataloader
 62 |     if dataloader is None:
 63 |         dataset = LoadImagesAndLabels(path, img_size, batch_size, rect=True, single_cls=opt.single_cls)
 64 |         batch_size = min(batch_size, len(dataset))
 65 |         dataloader = DataLoader(dataset,
 66 |                                 batch_size=batch_size,
 67 |                                 num_workers=min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]),
 68 |                                 pin_memory=True,
 69 |                                 collate_fn=dataset.collate_fn)
 70 | 
 71 |     seen = 0
 72 |     model.eval()
 73 |     _ = model(torch.zeros((1, 3, img_size, img_size), device=device)) if device.type != 'cpu' else None  # run once
 74 |     coco91class = coco80_to_coco91_class()
 75 |     s = ('%20s' + '%10s' * 6) % ('Class', 'Images', 'Targets', 'P', 'R', 'mAP@0.5', 'F1')
 76 |     p, r, f1, mp, mr, map, mf1, t0, t1 = 0., 0., 0., 0., 0., 0., 0., 0., 0.
 77 |     loss = torch.zeros(3, device=device)
 78 |     jdict, stats, ap, ap_class = [], [], [], []
 79 |     for batch_i, (imgs, targets, paths, shapes) in enumerate(tqdm(dataloader, desc=s)):
 80 |         imgs = imgs.to(device).float() / 255.0  # uint8 to float32, 0 - 255 to 0.0 - 1.0
 81 |         targets = targets.to(device)
 82 |         nb, _, height, width = imgs.shape  # batch size, channels, height, width
 83 |         whwh = torch.Tensor([width, height, width, height]).to(device)
 84 | 
 85 |         # Plot images with bounding boxes
 86 |         f = 'test_batch%g.jpg' % batch_i  # filename
 87 |         # if batch_i < 1 and not os.path.exists(f):   #<---------------不打印
 88 |         #     plot_images(imgs=imgs, targets=targets, paths=paths, fname=f)
 89 | 
 90 |         # Disable gradients
 91 |         with torch.no_grad():
 92 |             # Run model
 93 |             t = torch_utils.time_synchronized()
 94 |             inf_out, train_out = model(imgs, augment=augment)  # inference and training outputs
 95 |             t0 += torch_utils.time_synchronized() - t
 96 | 
 97 |             # Compute loss
 98 |             if hasattr(model, 'hyp'):  # if model has loss hyperparameters
 99 |                 loss += compute_loss(train_out, targets, model)[1][:3]  # GIoU, obj, cls
100 | 
101 |             # Run NMS
102 |             t = torch_utils.time_synchronized()
103 |             output = non_max_suppression(inf_out, conf_thres=conf_thres, iou_thres=iou_thres)  # nms
104 |             t1 += torch_utils.time_synchronized() - t
105 | 
106 |         # Statistics per image
107 |         for si, pred in enumerate(output):
108 |             labels = targets[targets[:, 0] == si, 1:]
109 |             nl = len(labels)
110 |             tcls = labels[:, 0].tolist() if nl else []  # target class
111 |             seen += 1
112 | 
113 |             if pred is None:
114 |                 if nl:
115 |                     stats.append((torch.zeros(0, niou, dtype=torch.bool), torch.Tensor(), torch.Tensor(), tcls))
116 |                 continue
117 | 
118 |             # Append to text file
119 |             # with open('test.txt', 'a') as file:
120 |             #    [file.write('%11.5g' * 7 % tuple(x) + '\n') for x in pred]
121 | 
122 |             # Clip boxes to image bounds
123 |             clip_coords(pred, (height, width))
124 | 
125 |             # Append to pycocotools JSON dictionary
126 |             if save_json:
127 |                 # [{"image_id": 42, "category_id": 18, "bbox": [258.15, 41.29, 348.26, 243.78], "score": 0.236}, ...
128 |                 image_id = int(Path(paths[si]).stem.split('_')[-1])
129 |                 box = pred[:, :4].clone()  # xyxy
130 |                 scale_coords(imgs[si].shape[1:], box, shapes[si][0], shapes[si][1])  # to original shape
131 |                 box = xyxy2xywh(box)  # xywh
132 |                 box[:, :2] -= box[:, 2:] / 2  # xy center to top-left corner
133 |                 for p, b in zip(pred.tolist(), box.tolist()):
134 |                     jdict.append({'image_id': image_id,
135 |                                   'category_id': coco91class[int(p[5])],
136 |                                   'bbox': [round(x, 3) for x in b],
137 |                                   'score': round(p[4], 5)})
138 | 
139 |             # Assign all predictions as incorrect
140 |             correct = torch.zeros(pred.shape[0], niou, dtype=torch.bool, device=device)
141 |             if nl:
142 |                 detected = []  # target indices
143 |                 tcls_tensor = labels[:, 0]
144 | 
145 |                 # target boxes
146 |                 tbox = xywh2xyxy(labels[:, 1:5]) * whwh
147 | 
148 |                 # Per target class
149 |                 for cls in torch.unique(tcls_tensor):
150 |                     ti = (cls == tcls_tensor).nonzero().view(-1)  # prediction indices
151 |                     pi = (cls == pred[:, 5]).nonzero().view(-1)  # target indices
152 | 
153 |                     # Search for detections
154 |                     if pi.shape[0]:
155 |                         # Prediction to target ious
156 |                         ious, i = box_iou(pred[pi, :4], tbox[ti]).max(1)  # best ious, indices
157 | 
158 |                         # Append detections
159 |                         for j in (ious > iouv[0]).nonzero():
160 |                             d = ti[i[j]]  # detected target
161 |                             if d not in detected:
162 |                                 detected.append(d)
163 |                                 correct[pi[j]] = ious[j] > iouv  # iou_thres is 1xn
164 |                                 if len(detected) == nl:  # all targets already located in image
165 |                                     break
166 | 
167 |             # Append statistics (correct, conf, pcls, tcls)
168 |             stats.append((correct.cpu(), pred[:, 4].cpu(), pred[:, 5].cpu(), tcls))
169 | 
170 |     # Compute statistics
171 |     stats = [np.concatenate(x, 0) for x in zip(*stats)]  # to numpy
172 |     if len(stats):
173 |         p, r, ap, f1, ap_class = ap_per_class(*stats)
174 |         if niou > 1:
175 |             p, r, ap, f1 = p[:, 0], r[:, 0], ap.mean(1), ap[:, 0]  # [P, R, AP@0.5:0.95, AP@0.5]
176 |         mp, mr, map, mf1 = p.mean(), r.mean(), ap.mean(), f1.mean()
177 |         nt = np.bincount(stats[3].astype(np.int64), minlength=nc)  # number of targets per class
178 |     else:
179 |         nt = torch.zeros(1)
180 | 
181 |     # Print results
182 |     pf = '%20s' + '%10.3g' * 6  # print format
183 |     print(pf % ('all', seen, nt.sum(), mp, mr, map, mf1))
184 | 
185 |     # Print results per class
186 |     if verbose and nc > 1 and len(stats):
187 |         for i, c in enumerate(ap_class):
188 |             print(pf % (names[c], seen, nt[c], p[i], r[i], ap[i], f1[i]))
189 | 
190 |     # Print speeds
191 |     if verbose or save_json:
192 |         t = tuple(x / seen * 1E3 for x in (t0, t1, t0 + t1)) + (img_size, img_size, batch_size)  # tuple
193 |         print('Speed: %.1f/%.1f/%.1f ms inference/NMS/total per %gx%g image at batch-size %g' % t)
194 | 
195 |     maps = np.zeros(nc) + map
196 |     # Save JSON
197 |     if save_json and map and len(jdict):
198 |         print('\nCOCO mAP with pycocotools...')
199 |         imgIds = [int(Path(x).stem.split('_')[-1]) for x in dataloader.dataset.img_files]
200 |         with open('results.json', 'w') as file:
201 |             json.dump(jdict, file)
202 | 
203 |         try:
204 |             from pycocotools.coco import COCO
205 |             from pycocotools.cocoeval import COCOeval
206 |         except:
207 |             print('WARNING: missing pycocotools package, can not compute official COCO mAP. See requirements.txt.')
208 | 
209 |         # https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb
210 |         cocoGt = COCO(glob.glob('../coco/annotations/instances_val*.json')[0])  # initialize COCO ground truth api
211 |         cocoDt = cocoGt.loadRes('results.json')  # initialize COCO pred api
212 | 
213 |         cocoEval = COCOeval(cocoGt, cocoDt, 'bbox')
214 |         cocoEval.params.imgIds = imgIds  # [:32]  # only evaluate these images
215 |         cocoEval.evaluate()
216 |         cocoEval.accumulate()
217 |         cocoEval.summarize()
218 |         map, map50 = cocoEval.stats[:2]  # update results (mAP@0.5:0.95, mAP@0.5)
219 |         return (mp, mr, map50, map, *(loss.cpu() / len(dataloader)).tolist()), maps, t
220 | 
221 |     # Return results
222 |     for i, c in enumerate(ap_class):
223 |         maps[c] = ap[i]
224 |     return (mp, mr, map, mf1, *(loss.cpu() / len(dataloader)).tolist()), maps
225 | 
226 | 
227 | if __name__ == '__main__':
228 |     parser = argparse.ArgumentParser(prog='test.py')
229 |     parser.add_argument('--cfg', type=str, default='cfg/yolov4-pacsp.cfg', help='*.cfg path')
230 |     parser.add_argument('--data', type=str, default='data/coco2017.data', help='*.data path')
231 |     parser.add_argument('--weights', type=str, default='weights/yolov4-pacsp.pt', help='weights path')
232 |     parser.add_argument('--batch-size', type=int, default=16, help='size of each image batch')
233 |     parser.add_argument('--img-size', type=int, default=512, help='inference size (pixels)')
234 |     parser.add_argument('--conf-thres', type=float, default=0.001, help='object confidence threshold')
235 |     parser.add_argument('--iou-thres', type=float, default=0.6, help='IOU threshold for NMS')
236 |     parser.add_argument('--save-json', action='store_true', help='save a cocoapi-compatible JSON results file')
237 |     parser.add_argument('--task', default='test', help="'test', 'study', 'benchmark'")
238 |     parser.add_argument('--device', default='', help='device id (i.e. 0 or 0,1) or cpu')
239 |     parser.add_argument('--single-cls', action='store_true', help='train as single-class dataset')
240 |     parser.add_argument('--augment', action='store_true', help='augmented inference')
241 |     opt = parser.parse_args()
242 |     opt.save_json = opt.save_json or any([x in opt.data for x in ['coco.data', 'coco2014.data', 'coco2017.data']])
243 |     print(opt)
244 | 
245 |     # task = 'test', 'study', 'benchmark'
246 |     if opt.task == 'test':  # (default) test normally
247 |         test(opt.cfg,
248 |              opt.data,
249 |              opt.weights,
250 |              opt.batch_size,
251 |              opt.img_size,
252 |              opt.conf_thres,
253 |              opt.iou_thres,
254 |              opt.save_json,
255 |              opt.single_cls,
256 |              opt.augment)
257 | 
258 |     elif opt.task == 'benchmark':  # mAPs at 320-608 at conf 0.5 and 0.7
259 |         y = []
260 |         x = list(range(288, 896, 64))
261 |         f = 'benchmark_%s_%s.txt' % (Path(opt.data).stem, Path(opt.weights).stem)  # filename to save to
262 |         for i in x:  # img-size
263 |             for j in [0.7]:  # iou-thres
264 |                 r, _, t = test(opt.cfg, opt.data, opt.weights, opt.batch_size, i, opt.conf_thres, j, opt.save_json)
265 |                 y.append(r + t)
266 |         np.savetxt(f, y, fmt='%10.6g')  # save
267 | 
268 |     elif opt.task == 'study':  # Parameter study
269 |         y = []
270 |         x = np.arange(0.4, 0.9, 0.05)  # iou-thres
271 |         for i in x:
272 |             t = time.time()
273 |             r = test(opt.cfg, opt.data, opt.weights, opt.batch_size, opt.img_size, opt.conf_thres, i, opt.save_json)[0]
274 |             y.append(r + (time.time() - t,))
275 |         np.savetxt('study.txt', y, fmt='%10.4g')  # y = np.loadtxt('study.txt')
276 | 
277 |         # Plot
278 |         fig, ax = plt.subplots(3, 1, figsize=(6, 6))
279 |         y = np.stack(y, 0)
280 |         ax[0].plot(x, y[:, 2], marker='.', label='mAP@0.5')
281 |         ax[0].set_ylabel('mAP')
282 |         ax[1].plot(x, y[:, 3], marker='.', label='mAP@0.5:0.95')
283 |         ax[1].set_ylabel('mAP')
284 |         ax[2].plot(x, y[:, -1], marker='.', label='time')
285 |         ax[2].set_ylabel('time (s)')
286 |         for i in range(3):
287 |             ax[i].legend()
288 |             ax[i].set_xlabel('iou_thr')
289 |         fig.tight_layout()
290 |         plt.savefig('study.jpg', dpi=200)
291 | 


--------------------------------------------------------------------------------
/test_half.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import json
  3 | 
  4 | from torch.utils.data import DataLoader
  5 | 
  6 | from models import *
  7 | from utils.datasets import *
  8 | from utils.utils import *
  9 | 
 10 | 
 11 | def test(cfg,
 12 |          data,
 13 |          weights=None,
 14 |          batch_size=16,
 15 |          img_size=416,
 16 |          conf_thres=0.001,
 17 |          iou_thres=0.6,  # for nms
 18 |          save_json=False,
 19 |          single_cls=False,
 20 |          augment=False,
 21 |          model=None,
 22 |          dataloader=None):
 23 |     # Initialize/load model and set device
 24 |     if model is None:
 25 |         device = torch_utils.select_device(opt.device, batch_size=batch_size)
 26 |         verbose = opt.task == 'test'
 27 | 
 28 |         # Remove previous
 29 |         for f in glob.glob('test_batch*.jpg'):
 30 |             os.remove(f)
 31 | 
 32 |         # Initialize model
 33 |         model = Darknet(cfg, img_size)
 34 | 
 35 |         # Load weights
 36 |         attempt_download(weights)
 37 |         if weights.endswith('.pt'):  # pytorch format
 38 |             model.load_state_dict(torch.load(weights, map_location=device)['model'])
 39 |         else:  # darknet format
 40 |             load_darknet_weights(model, weights)
 41 | 
 42 |         # Fuse
 43 |         model.fuse()
 44 |         model.to(device)
 45 |         model.half()
 46 | 
 47 |         if device.type != 'cpu' and torch.cuda.device_count() > 1:
 48 |             model = nn.DataParallel(model)
 49 |     else:  # called by train.py
 50 |         device = next(model.parameters()).device  # get model device
 51 |         verbose = False
 52 | 
 53 |     # Configure run
 54 |     data = parse_data_cfg(data)
 55 |     nc = 1 if single_cls else int(data['classes'])  # number of classes
 56 |     path = data['valid']  # path to test images
 57 |     names = load_classes(data['names'])  # class names
 58 |     iouv = torch.linspace(0.5, 0.95, 10).to(device)  # iou vector for mAP@0.5:0.95
 59 |     iouv = iouv[0].view(1)  # comment for mAP@0.5:0.95
 60 |     niou = iouv.numel()
 61 | 
 62 |     # Dataloader
 63 |     if dataloader is None:
 64 |         dataset = LoadImagesAndLabels(path, img_size, batch_size, rect=True, single_cls=opt.single_cls)
 65 |         batch_size = min(batch_size, len(dataset))
 66 |         dataloader = DataLoader(dataset,
 67 |                                 batch_size=batch_size,
 68 |                                 num_workers=min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]),
 69 |                                 pin_memory=True,
 70 |                                 collate_fn=dataset.collate_fn)
 71 | 
 72 |     seen = 0
 73 |     model.eval()
 74 |     _ = model(torch.zeros((1, 3, img_size, img_size), device=device).half()) if device.type != 'cpu' else None  # run once
 75 |     coco91class = coco80_to_coco91_class()
 76 |     s = ('%20s' + '%10s' * 6) % ('Class', 'Images', 'Targets', 'P', 'R', 'mAP@0.5', 'F1')
 77 |     p, r, f1, mp, mr, map, mf1, t0, t1 = 0., 0., 0., 0., 0., 0., 0., 0., 0.
 78 |     loss = torch.zeros(3, device=device)
 79 |     jdict, stats, ap, ap_class = [], [], [], []
 80 |     for batch_i, (imgs, targets, paths, shapes) in enumerate(tqdm(dataloader, desc=s)):
 81 |         imgs = imgs.to(device).half() / 255.0  # uint8 to float32, 0 - 255 to 0.0 - 1.0
 82 |         targets = targets.to(device)
 83 |         nb, _, height, width = imgs.shape  # batch size, channels, height, width
 84 |         whwh = torch.Tensor([width, height, width, height]).to(device)
 85 | 
 86 |         # Plot images with bounding boxes
 87 |         f = 'test_batch%g.jpg' % batch_i  # filename
 88 |         #if batch_i < 1 and not os.path.exists(f):
 89 |         #    plot_images(imgs=imgs, targets=targets, paths=paths, fname=f)
 90 | 
 91 |         # Disable gradients
 92 |         with torch.no_grad():
 93 |             # Run model
 94 |             t = torch_utils.time_synchronized()
 95 |             inf_out, train_out = model(imgs, augment=augment)  # inference and training outputs
 96 |             t0 += torch_utils.time_synchronized() - t
 97 | 
 98 |             # Compute loss
 99 |             if hasattr(model, 'hyp'):  # if model has loss hyperparameters
100 |                 loss += compute_loss(train_out, targets, model)[1][:3]  # GIoU, obj, cls
101 | 
102 |             # Run NMS
103 |             t = torch_utils.time_synchronized()
104 |             output = non_max_suppression(inf_out, conf_thres=conf_thres, iou_thres=iou_thres)  # nms
105 |             t1 += torch_utils.time_synchronized() - t
106 | 
107 |         # Statistics per image
108 |         for si, pred in enumerate(output):
109 |             labels = targets[targets[:, 0] == si, 1:]
110 |             nl = len(labels)
111 |             tcls = labels[:, 0].tolist() if nl else []  # target class
112 |             seen += 1
113 | 
114 |             if pred is None:
115 |                 if nl:
116 |                     stats.append((torch.zeros(0, niou, dtype=torch.bool), torch.Tensor(), torch.Tensor(), tcls))
117 |                 continue
118 | 
119 |             # Append to text file
120 |             # with open('test.txt', 'a') as file:
121 |             #    [file.write('%11.5g' * 7 % tuple(x) + '\n') for x in pred]
122 | 
123 |             # Clip boxes to image bounds
124 |             clip_coords(pred, (height, width))
125 | 
126 |             # Append to pycocotools JSON dictionary
127 |             if save_json:
128 |                 # [{"image_id": 42, "category_id": 18, "bbox": [258.15, 41.29, 348.26, 243.78], "score": 0.236}, ...
129 |                 image_id = int(Path(paths[si]).stem.split('_')[-1])
130 |                 box = pred[:, :4].clone()  # xyxy
131 |                 scale_coords(imgs[si].shape[1:], box, shapes[si][0], shapes[si][1])  # to original shape
132 |                 box = xyxy2xywh(box)  # xywh
133 |                 box[:, :2] -= box[:, 2:] / 2  # xy center to top-left corner
134 |                 for p, b in zip(pred.tolist(), box.tolist()):
135 |                     jdict.append({'image_id': image_id,
136 |                                   'category_id': coco91class[int(p[5])],
137 |                                   'bbox': [round(x, 3) for x in b],
138 |                                   'score': round(p[4], 5)})
139 | 
140 |             # Assign all predictions as incorrect
141 |             correct = torch.zeros(pred.shape[0], niou, dtype=torch.bool, device=device)
142 |             if nl:
143 |                 detected = []  # target indices
144 |                 tcls_tensor = labels[:, 0]
145 | 
146 |                 # target boxes
147 |                 tbox = xywh2xyxy(labels[:, 1:5]) * whwh
148 | 
149 |                 # Per target class
150 |                 for cls in torch.unique(tcls_tensor):
151 |                     ti = (cls == tcls_tensor).nonzero().view(-1)  # prediction indices
152 |                     pi = (cls == pred[:, 5]).nonzero().view(-1)  # target indices
153 | 
154 |                     # Search for detections
155 |                     if pi.shape[0]:
156 |                         # Prediction to target ious
157 |                         ious, i = box_iou(pred[pi, :4], tbox[ti]).max(1)  # best ious, indices
158 | 
159 |                         # Append detections
160 |                         for j in (ious > iouv[0]).nonzero():
161 |                             d = ti[i[j]]  # detected target
162 |                             if d not in detected:
163 |                                 detected.append(d)
164 |                                 correct[pi[j]] = ious[j] > iouv  # iou_thres is 1xn
165 |                                 if len(detected) == nl:  # all targets already located in image
166 |                                     break
167 | 
168 |             # Append statistics (correct, conf, pcls, tcls)
169 |             stats.append((correct.cpu(), pred[:, 4].cpu(), pred[:, 5].cpu(), tcls))
170 | 
171 |     # Compute statistics
172 |     stats = [np.concatenate(x, 0) for x in zip(*stats)]  # to numpy
173 |     if len(stats):
174 |         p, r, ap, f1, ap_class = ap_per_class(*stats)
175 |         if niou > 1:
176 |             p, r, ap, f1 = p[:, 0], r[:, 0], ap.mean(1), ap[:, 0]  # [P, R, AP@0.5:0.95, AP@0.5]
177 |         mp, mr, map, mf1 = p.mean(), r.mean(), ap.mean(), f1.mean()
178 |         nt = np.bincount(stats[3].astype(np.int64), minlength=nc)  # number of targets per class
179 |     else:
180 |         nt = torch.zeros(1)
181 | 
182 |     # Print results
183 |     pf = '%20s' + '%10.3g' * 6  # print format
184 |     print(pf % ('all', seen, nt.sum(), mp, mr, map, mf1))
185 | 
186 |     # Print results per class
187 |     if verbose and nc > 1 and len(stats):
188 |         for i, c in enumerate(ap_class):
189 |             print(pf % (names[c], seen, nt[c], p[i], r[i], ap[i], f1[i]))
190 | 
191 |     # Print speeds
192 |     if verbose or save_json:
193 |         t = tuple(x / seen * 1E3 for x in (t0, t1, t0 + t1)) + (img_size, img_size, batch_size)  # tuple
194 |         print('Speed: %.1f/%.1f/%.1f ms inference/NMS/total per %gx%g image at batch-size %g' % t)
195 | 
196 |     maps = np.zeros(nc) + map
197 |     # Save JSON
198 |     if save_json and map and len(jdict):
199 |         print('\nCOCO mAP with pycocotools...')
200 |         imgIds = [int(Path(x).stem.split('_')[-1]) for x in dataloader.dataset.img_files]
201 |         with open('results.json', 'w') as file:
202 |             json.dump(jdict, file)
203 | 
204 |         try:
205 |             from pycocotools.coco import COCO
206 |             from pycocotools.cocoeval import COCOeval
207 |         except:
208 |             print('WARNING: missing pycocotools package, can not compute official COCO mAP. See requirements.txt.')
209 | 
210 |         # https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb
211 |         cocoGt = COCO(glob.glob('../coco/annotations/instances_val*.json')[0])  # initialize COCO ground truth api
212 |         cocoDt = cocoGt.loadRes('results.json')  # initialize COCO pred api
213 | 
214 |         cocoEval = COCOeval(cocoGt, cocoDt, 'bbox')
215 |         cocoEval.params.imgIds = imgIds  # [:32]  # only evaluate these images
216 |         cocoEval.evaluate()
217 |         cocoEval.accumulate()
218 |         cocoEval.summarize()
219 |         map, map50 = cocoEval.stats[:2]  # update results (mAP@0.5:0.95, mAP@0.5)
220 |         return (mp, mr, map50, map, *(loss.cpu() / len(dataloader)).tolist()), maps, t
221 | 
222 |     # Return results
223 |     for i, c in enumerate(ap_class):
224 |         maps[c] = ap[i]
225 |     return (mp, mr, map, mf1, *(loss.cpu() / len(dataloader)).tolist()), maps
226 | 
227 | 
228 | if __name__ == '__main__':
229 |     parser = argparse.ArgumentParser(prog='test.py')
230 |     parser.add_argument('--cfg', type=str, default='cfg/yolov4-pacsp.cfg', help='*.cfg path')
231 |     parser.add_argument('--data', type=str, default='data/coco2017.data', help='*.data path')
232 |     parser.add_argument('--weights', type=str, default='weights/yolov4-pacsp.pt', help='weights path')
233 |     parser.add_argument('--batch-size', type=int, default=16, help='size of each image batch')
234 |     parser.add_argument('--img-size', type=int, default=512, help='inference size (pixels)')
235 |     parser.add_argument('--conf-thres', type=float, default=0.001, help='object confidence threshold')
236 |     parser.add_argument('--iou-thres', type=float, default=0.6, help='IOU threshold for NMS')
237 |     parser.add_argument('--save-json', action='store_true', help='save a cocoapi-compatible JSON results file')
238 |     parser.add_argument('--task', default='test', help="'test', 'study', 'benchmark'")
239 |     parser.add_argument('--device', default='', help='device id (i.e. 0 or 0,1) or cpu')
240 |     parser.add_argument('--single-cls', action='store_true', help='train as single-class dataset')
241 |     parser.add_argument('--augment', action='store_true', help='augmented inference')
242 |     opt = parser.parse_args()
243 |     opt.save_json = opt.save_json or any([x in opt.data for x in ['coco.data', 'coco2014.data', 'coco2017.data']])
244 |     print(opt)
245 | 
246 |     # task = 'test', 'study', 'benchmark'
247 |     if opt.task == 'test':  # (default) test normally
248 |         test(opt.cfg,
249 |              opt.data,
250 |              opt.weights,
251 |              opt.batch_size,
252 |              opt.img_size,
253 |              opt.conf_thres,
254 |              opt.iou_thres,
255 |              opt.save_json,
256 |              opt.single_cls,
257 |              opt.augment)
258 | 
259 |     elif opt.task == 'benchmark':  # mAPs at 320-608 at conf 0.5 and 0.7
260 |         y = []
261 |         x = list(range(288, 896, 64))
262 |         f = 'study_%s_%s.txt' % (Path(opt.data).stem, Path(opt.weights).stem)  # filename to save to
263 |         for i in x:  # img-size
264 |             for j in [0.7]:  # iou-thres
265 |                 r, _, t = test(opt.cfg, opt.data, opt.weights, opt.batch_size, i, opt.conf_thres, j, opt.save_json)
266 |                 y.append(r + t)
267 |         np.savetxt(f, y, fmt='%10.6g')  # save
268 | 
269 |     elif opt.task == 'study':  # Parameter study
270 |         y = []
271 |         x = np.arange(0.4, 0.9, 0.05)  # iou-thres
272 |         for i in x:
273 |             t = time.time()
274 |             r = test(opt.cfg, opt.data, opt.weights, opt.batch_size, opt.img_size, opt.conf_thres, i, opt.save_json)[0]
275 |             y.append(r + (time.time() - t,))
276 |         np.savetxt('study.txt', y, fmt='%10.4g')  # y = np.loadtxt('study.txt')
277 | 
278 |         # Plot
279 |         fig, ax = plt.subplots(3, 1, figsize=(6, 6))
280 |         y = np.stack(y, 0)
281 |         ax[0].plot(x, y[:, 2], marker='.', label='mAP@0.5')
282 |         ax[0].set_ylabel('mAP')
283 |         ax[1].plot(x, y[:, 3], marker='.', label='mAP@0.5:0.95')
284 |         ax[1].set_ylabel('mAP')
285 |         ax[2].plot(x, y[:, -1], marker='.', label='time')
286 |         ax[2].set_ylabel('time (s)')
287 |         for i in range(3):
288 |             ax[i].legend()
289 |             ax[i].set_xlabel('iou_thr')
290 |         fig.tight_layout()
291 |         plt.savefig('study.jpg', dpi=200)
292 | 


--------------------------------------------------------------------------------
/utils/__init__.py:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/utils/__pycache__/__init__.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/utils/__pycache__/__init__.cpython-35.pyc


--------------------------------------------------------------------------------
/utils/__pycache__/datasets.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/utils/__pycache__/datasets.cpython-35.pyc


--------------------------------------------------------------------------------
/utils/__pycache__/google_utils.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/utils/__pycache__/google_utils.cpython-35.pyc


--------------------------------------------------------------------------------
/utils/__pycache__/layers.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/utils/__pycache__/layers.cpython-35.pyc


--------------------------------------------------------------------------------
/utils/__pycache__/parse_config.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/utils/__pycache__/parse_config.cpython-35.pyc


--------------------------------------------------------------------------------
/utils/__pycache__/torch_utils.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/utils/__pycache__/torch_utils.cpython-35.pyc


--------------------------------------------------------------------------------
/utils/__pycache__/utils.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/utils/__pycache__/utils.cpython-35.pyc


--------------------------------------------------------------------------------
/utils/adabound.py:
--------------------------------------------------------------------------------
  1 | import math
  2 | 
  3 | import torch
  4 | from torch.optim.optimizer import Optimizer
  5 | 
  6 | 
  7 | class AdaBound(Optimizer):
  8 |     """Implements AdaBound algorithm.
  9 |     It has been proposed in `Adaptive Gradient Methods with Dynamic Bound of Learning Rate`_.
 10 |     Arguments:
 11 |         params (iterable): iterable of parameters to optimize or dicts defining
 12 |             parameter groups
 13 |         lr (float, optional): Adam learning rate (default: 1e-3)
 14 |         betas (Tuple[float, float], optional): coefficients used for computing
 15 |             running averages of gradient and its square (default: (0.9, 0.999))
 16 |         final_lr (float, optional): final (SGD) learning rate (default: 0.1)
 17 |         gamma (float, optional): convergence speed of the bound functions (default: 1e-3)
 18 |         eps (float, optional): term added to the denominator to improve
 19 |             numerical stability (default: 1e-8)
 20 |         weight_decay (float, optional): weight decay (L2 penalty) (default: 0)
 21 |         amsbound (boolean, optional): whether to use the AMSBound variant of this algorithm
 22 |     .. Adaptive Gradient Methods with Dynamic Bound of Learning Rate:
 23 |         https://openreview.net/forum?id=Bkg3g2R9FX
 24 |     """
 25 | 
 26 |     def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), final_lr=0.1, gamma=1e-3,
 27 |                  eps=1e-8, weight_decay=0, amsbound=False):
 28 |         if not 0.0 <= lr:
 29 |             raise ValueError("Invalid learning rate: {}".format(lr))
 30 |         if not 0.0 <= eps:
 31 |             raise ValueError("Invalid epsilon value: {}".format(eps))
 32 |         if not 0.0 <= betas[0] < 1.0:
 33 |             raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0]))
 34 |         if not 0.0 <= betas[1] < 1.0:
 35 |             raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1]))
 36 |         if not 0.0 <= final_lr:
 37 |             raise ValueError("Invalid final learning rate: {}".format(final_lr))
 38 |         if not 0.0 <= gamma < 1.0:
 39 |             raise ValueError("Invalid gamma parameter: {}".format(gamma))
 40 |         defaults = dict(lr=lr, betas=betas, final_lr=final_lr, gamma=gamma, eps=eps,
 41 |                         weight_decay=weight_decay, amsbound=amsbound)
 42 |         super(AdaBound, self).__init__(params, defaults)
 43 | 
 44 |         self.base_lrs = list(map(lambda group: group['lr'], self.param_groups))
 45 | 
 46 |     def __setstate__(self, state):
 47 |         super(AdaBound, self).__setstate__(state)
 48 |         for group in self.param_groups:
 49 |             group.setdefault('amsbound', False)
 50 | 
 51 |     def step(self, closure=None):
 52 |         """Performs a single optimization step.
 53 |         Arguments:
 54 |             closure (callable, optional): A closure that reevaluates the model
 55 |                 and returns the loss.
 56 |         """
 57 |         loss = None
 58 |         if closure is not None:
 59 |             loss = closure()
 60 | 
 61 |         for group, base_lr in zip(self.param_groups, self.base_lrs):
 62 |             for p in group['params']:
 63 |                 if p.grad is None:
 64 |                     continue
 65 |                 grad = p.grad.data
 66 |                 if grad.is_sparse:
 67 |                     raise RuntimeError(
 68 |                         'Adam does not support sparse gradients, please consider SparseAdam instead')
 69 |                 amsbound = group['amsbound']
 70 | 
 71 |                 state = self.state[p]
 72 | 
 73 |                 # State initialization
 74 |                 if len(state) == 0:
 75 |                     state['step'] = 0
 76 |                     # Exponential moving average of gradient values
 77 |                     state['exp_avg'] = torch.zeros_like(p.data)
 78 |                     # Exponential moving average of squared gradient values
 79 |                     state['exp_avg_sq'] = torch.zeros_like(p.data)
 80 |                     if amsbound:
 81 |                         # Maintains max of all exp. moving avg. of sq. grad. values
 82 |                         state['max_exp_avg_sq'] = torch.zeros_like(p.data)
 83 | 
 84 |                 exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
 85 |                 if amsbound:
 86 |                     max_exp_avg_sq = state['max_exp_avg_sq']
 87 |                 beta1, beta2 = group['betas']
 88 | 
 89 |                 state['step'] += 1
 90 | 
 91 |                 if group['weight_decay'] != 0:
 92 |                     grad = grad.add(group['weight_decay'], p.data)
 93 | 
 94 |                 # Decay the first and second moment running average coefficient
 95 |                 exp_avg.mul_(beta1).add_(1 - beta1, grad)
 96 |                 exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
 97 |                 if amsbound:
 98 |                     # Maintains the maximum of all 2nd moment running avg. till now
 99 |                     torch.max(max_exp_avg_sq, exp_avg_sq, out=max_exp_avg_sq)
100 |                     # Use the max. for normalizing running avg. of gradient
101 |                     denom = max_exp_avg_sq.sqrt().add_(group['eps'])
102 |                 else:
103 |                     denom = exp_avg_sq.sqrt().add_(group['eps'])
104 | 
105 |                 bias_correction1 = 1 - beta1 ** state['step']
106 |                 bias_correction2 = 1 - beta2 ** state['step']
107 |                 step_size = group['lr'] * math.sqrt(bias_correction2) / bias_correction1
108 | 
109 |                 # Applies bounds on actual learning rate
110 |                 # lr_scheduler cannot affect final_lr, this is a workaround to apply lr decay
111 |                 final_lr = group['final_lr'] * group['lr'] / base_lr
112 |                 lower_bound = final_lr * (1 - 1 / (group['gamma'] * state['step'] + 1))
113 |                 upper_bound = final_lr * (1 + 1 / (group['gamma'] * state['step']))
114 |                 step_size = torch.full_like(denom, step_size)
115 |                 step_size.div_(denom).clamp_(lower_bound, upper_bound).mul_(exp_avg)
116 | 
117 |                 p.data.add_(-step_size)
118 | 
119 |         return loss
120 | 
121 | 
122 | class AdaBoundW(Optimizer):
123 |     """Implements AdaBound algorithm with Decoupled Weight Decay (arxiv.org/abs/1711.05101)
124 |     It has been proposed in `Adaptive Gradient Methods with Dynamic Bound of Learning Rate`_.
125 |     Arguments:
126 |         params (iterable): iterable of parameters to optimize or dicts defining
127 |             parameter groups
128 |         lr (float, optional): Adam learning rate (default: 1e-3)
129 |         betas (Tuple[float, float], optional): coefficients used for computing
130 |             running averages of gradient and its square (default: (0.9, 0.999))
131 |         final_lr (float, optional): final (SGD) learning rate (default: 0.1)
132 |         gamma (float, optional): convergence speed of the bound functions (default: 1e-3)
133 |         eps (float, optional): term added to the denominator to improve
134 |             numerical stability (default: 1e-8)
135 |         weight_decay (float, optional): weight decay (L2 penalty) (default: 0)
136 |         amsbound (boolean, optional): whether to use the AMSBound variant of this algorithm
137 |     .. Adaptive Gradient Methods with Dynamic Bound of Learning Rate:
138 |         https://openreview.net/forum?id=Bkg3g2R9FX
139 |     """
140 | 
141 |     def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), final_lr=0.1, gamma=1e-3,
142 |                  eps=1e-8, weight_decay=0, amsbound=False):
143 |         if not 0.0 <= lr:
144 |             raise ValueError("Invalid learning rate: {}".format(lr))
145 |         if not 0.0 <= eps:
146 |             raise ValueError("Invalid epsilon value: {}".format(eps))
147 |         if not 0.0 <= betas[0] < 1.0:
148 |             raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0]))
149 |         if not 0.0 <= betas[1] < 1.0:
150 |             raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1]))
151 |         if not 0.0 <= final_lr:
152 |             raise ValueError("Invalid final learning rate: {}".format(final_lr))
153 |         if not 0.0 <= gamma < 1.0:
154 |             raise ValueError("Invalid gamma parameter: {}".format(gamma))
155 |         defaults = dict(lr=lr, betas=betas, final_lr=final_lr, gamma=gamma, eps=eps,
156 |                         weight_decay=weight_decay, amsbound=amsbound)
157 |         super(AdaBoundW, self).__init__(params, defaults)
158 | 
159 |         self.base_lrs = list(map(lambda group: group['lr'], self.param_groups))
160 | 
161 |     def __setstate__(self, state):
162 |         super(AdaBoundW, self).__setstate__(state)
163 |         for group in self.param_groups:
164 |             group.setdefault('amsbound', False)
165 | 
166 |     def step(self, closure=None):
167 |         """Performs a single optimization step.
168 |         Arguments:
169 |             closure (callable, optional): A closure that reevaluates the model
170 |                 and returns the loss.
171 |         """
172 |         loss = None
173 |         if closure is not None:
174 |             loss = closure()
175 | 
176 |         for group, base_lr in zip(self.param_groups, self.base_lrs):
177 |             for p in group['params']:
178 |                 if p.grad is None:
179 |                     continue
180 |                 grad = p.grad.data
181 |                 if grad.is_sparse:
182 |                     raise RuntimeError(
183 |                         'Adam does not support sparse gradients, please consider SparseAdam instead')
184 |                 amsbound = group['amsbound']
185 | 
186 |                 state = self.state[p]
187 | 
188 |                 # State initialization
189 |                 if len(state) == 0:
190 |                     state['step'] = 0
191 |                     # Exponential moving average of gradient values
192 |                     state['exp_avg'] = torch.zeros_like(p.data)
193 |                     # Exponential moving average of squared gradient values
194 |                     state['exp_avg_sq'] = torch.zeros_like(p.data)
195 |                     if amsbound:
196 |                         # Maintains max of all exp. moving avg. of sq. grad. values
197 |                         state['max_exp_avg_sq'] = torch.zeros_like(p.data)
198 | 
199 |                 exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
200 |                 if amsbound:
201 |                     max_exp_avg_sq = state['max_exp_avg_sq']
202 |                 beta1, beta2 = group['betas']
203 | 
204 |                 state['step'] += 1
205 | 
206 |                 # Decay the first and second moment running average coefficient
207 |                 exp_avg.mul_(beta1).add_(1 - beta1, grad)
208 |                 exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
209 |                 if amsbound:
210 |                     # Maintains the maximum of all 2nd moment running avg. till now
211 |                     torch.max(max_exp_avg_sq, exp_avg_sq, out=max_exp_avg_sq)
212 |                     # Use the max. for normalizing running avg. of gradient
213 |                     denom = max_exp_avg_sq.sqrt().add_(group['eps'])
214 |                 else:
215 |                     denom = exp_avg_sq.sqrt().add_(group['eps'])
216 | 
217 |                 bias_correction1 = 1 - beta1 ** state['step']
218 |                 bias_correction2 = 1 - beta2 ** state['step']
219 |                 step_size = group['lr'] * math.sqrt(bias_correction2) / bias_correction1
220 | 
221 |                 # Applies bounds on actual learning rate
222 |                 # lr_scheduler cannot affect final_lr, this is a workaround to apply lr decay
223 |                 final_lr = group['final_lr'] * group['lr'] / base_lr
224 |                 lower_bound = final_lr * (1 - 1 / (group['gamma'] * state['step'] + 1))
225 |                 upper_bound = final_lr * (1 + 1 / (group['gamma'] * state['step']))
226 |                 step_size = torch.full_like(denom, step_size)
227 |                 step_size.div_(denom).clamp_(lower_bound, upper_bound).mul_(exp_avg)
228 | 
229 |                 if group['weight_decay'] != 0:
230 |                     decayed_weights = torch.mul(p.data, group['weight_decay'])
231 |                     p.data.add_(-step_size)
232 |                     p.data.sub_(decayed_weights)
233 |                 else:
234 |                     p.data.add_(-step_size)
235 | 
236 |         return loss
237 | 


--------------------------------------------------------------------------------
/utils/evolve.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | #for i in 0 1 2 3
 3 | #do
 4 | #  t=ultralytics/yolov3:v139 && sudo docker pull $t && sudo nvidia-docker run -d --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t utils/evolve.sh $i
 5 | #  sleep 30
 6 | #done
 7 | 
 8 | while true; do
 9 |   # python3 train.py --data ../data/sm4/out.data --img-size 320 --epochs 100 --batch 64 --accum 1 --weights yolov3-tiny.conv.15 --multi --bucket ult/wer --evolve --cache --device $1 --cfg yolov3-tiny3-1cls.cfg --single --adam
10 |   # python3 train.py --data ../out/data.data --img-size 608 --epochs 10 --batch 8 --accum 8 --weights ultralytics68.pt --multi --bucket ult/athena --evolve --device $1 --cfg yolov3-spp-1cls.cfg
11 | 
12 |   python3 train.py --data coco2014.data --img-size 512 608 --epochs 27 --batch 8 --accum 8 --evolve --weights '' --bucket ult/coco/sppa_512 --device $1 --cfg yolov3-sppa.cfg --multi
13 | done
14 | 
15 | 
16 | # coco epoch times --img-size 416 608 --epochs 27 --batch 16 --accum 4
17 | # 36:34 2080ti
18 | # 21:58 V100
19 | # 63:00 T4


--------------------------------------------------------------------------------
/utils/gcp.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | # New VM
 4 | rm -rf sample_data yolov3
 5 | git clone https://github.com/ultralytics/yolov3
 6 | # git clone -b test --depth 1 https://github.com/ultralytics/yolov3 test  # branch
 7 | # sudo apt-get install zip
 8 | #git clone https://github.com/NVIDIA/apex && cd apex && pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" . --user && cd .. && rm -rf apex
 9 | sudo conda install -yc conda-forge scikit-image pycocotools
10 | # python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('193Zp_ye-3qXMonR1nZj3YyxMtQkMy50k','coco2014.zip')"
11 | python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('1WQT6SOktSe8Uw6r10-2JhbEhMY5DJaph','coco2017.zip')"
12 | python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('1C3HewOG9akA3y456SZLBJZfNDPkBwAto','knife.zip')"
13 | python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('13g3LqdpkNE8sPosVJT6KFXlfoMypzRP4','sm4.zip')"
14 | sudo shutdown
15 | 
16 | # Mount local SSD
17 | lsblk
18 | sudo mkfs.ext4 -F /dev/nvme0n1
19 | sudo mkdir -p /mnt/disks/nvme0n1
20 | sudo mount /dev/nvme0n1 /mnt/disks/nvme0n1
21 | sudo chmod a+w /mnt/disks/nvme0n1
22 | cp -r coco /mnt/disks/nvme0n1
23 | 
24 | # Kill All
25 | t=ultralytics/yolov3:v1
26 | docker kill $(docker ps -a -q --filter ancestor=$t)
27 | 
28 | # Evolve coco
29 | sudo -s
30 | t=ultralytics/yolov3:evolve
31 | # docker kill $(docker ps -a -q --filter ancestor=$t)
32 | for i in 0 1 6 7
33 | do
34 |   docker pull $t && docker run --gpus all -d --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t bash utils/evolve.sh $i
35 |   sleep 30
36 | done
37 | 
38 | #COCO training
39 | n=131 && t=ultralytics/coco:v131 && sudo docker pull $t && sudo docker run -it --gpus all --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t python3 train.py --data coco2014.data --img-size 320 640 --epochs 300 --batch 16 --weights '' --device 0 --cfg yolov3-spp.cfg --bucket ult/coco --name $n && sudo shutdown
40 | n=132 && t=ultralytics/coco:v131 && sudo docker pull $t && sudo docker run -it --gpus all --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t python3 train.py --data coco2014.data --img-size 320 640 --epochs 300 --batch 64 --weights '' --device 0 --cfg yolov3-tiny.cfg --bucket ult/coco --name $n && sudo shutdown
41 | 


--------------------------------------------------------------------------------
/utils/google_utils.py:
--------------------------------------------------------------------------------
 1 | # This file contains google utils: https://cloud.google.com/storage/docs/reference/libraries
 2 | # pip install --upgrade google-cloud-storage
 3 | 
 4 | import os
 5 | import time
 6 | 
 7 | 
 8 | # from google.cloud import storage
 9 | 
10 | 
11 | def gdrive_download(id='1HaXkef9z6y5l4vUnCYgdmEAj61c6bfWO', name='coco.zip'):
12 |     # https://gist.github.com/tanaikech/f0f2d122e05bf5f971611258c22c110f
13 |     # Downloads a file from Google Drive, accepting presented query
14 |     # from utils.google_utils import *; gdrive_download()
15 |     t = time.time()
16 | 
17 |     print('Downloading https://drive.google.com/uc?export=download&id=%s as %s... ' % (id, name), end='')
18 |     os.remove(name) if os.path.exists(name) else None  # remove existing
19 |     os.remove('cookie') if os.path.exists('cookie') else None
20 | 
21 |     # Attempt file download
22 |     os.system("curl -c ./cookie -s -L \"https://drive.google.com/uc?export=download&id=%s\" > /dev/null" % id)
23 |     if os.path.exists('cookie'):  # large file
24 |         s = "curl -Lb ./cookie \"https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=%s\" -o %s" % (
25 |             id, name)
26 |     else:  # small file
27 |         s = "curl -s -L -o %s 'https://drive.google.com/uc?export=download&id=%s'" % (name, id)
28 |     r = os.system(s)  # execute, capture return values
29 |     os.remove('cookie') if os.path.exists('cookie') else None
30 | 
31 |     # Error check
32 |     if r != 0:
33 |         os.remove(name) if os.path.exists(name) else None  # remove partial
34 |         print('Download error ')  # raise Exception('Download error')
35 |         return r
36 | 
37 |     # Unzip if archive
38 |     if name.endswith('.zip'):
39 |         print('unzipping... ', end='')
40 |         os.system('unzip -q %s' % name)  # unzip
41 |         os.remove(name)  # remove zip to free space
42 | 
43 |     print('Done (%.1fs)' % (time.time() - t))
44 |     return r
45 | 
46 | 
47 | def upload_blob(bucket_name, source_file_name, destination_blob_name):
48 |     # Uploads a file to a bucket
49 |     # https://cloud.google.com/storage/docs/uploading-objects#storage-upload-object-python
50 | 
51 |     storage_client = storage.Client()
52 |     bucket = storage_client.get_bucket(bucket_name)
53 |     blob = bucket.blob(destination_blob_name)
54 | 
55 |     blob.upload_from_filename(source_file_name)
56 | 
57 |     print('File {} uploaded to {}.'.format(
58 |         source_file_name,
59 |         destination_blob_name))
60 | 
61 | 
62 | def download_blob(bucket_name, source_blob_name, destination_file_name):
63 |     # Uploads a blob from a bucket
64 |     storage_client = storage.Client()
65 |     bucket = storage_client.get_bucket(bucket_name)
66 |     blob = bucket.blob(source_blob_name)
67 | 
68 |     blob.download_to_filename(destination_file_name)
69 | 
70 |     print('Blob {} downloaded to {}.'.format(
71 |         source_blob_name,
72 |         destination_file_name))
73 | 


--------------------------------------------------------------------------------
/utils/layers.py:
--------------------------------------------------------------------------------
  1 | import torch.nn.functional as F
  2 | 
  3 | from utils.utils import *
  4 | 
  5 | try:
  6 |     from mish_cuda import MishCuda as Mish
  7 | except:
  8 |     class Mish(nn.Module):  # https://github.com/digantamisra98/Mish
  9 |         def forward(self, x):
 10 |             return x * F.softplus(x).tanh()
 11 | 
 12 | 
 13 | def make_divisible(v, divisor):
 14 |     # Function ensures all layers have a channel number that is divisible by 8
 15 |     # https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
 16 |     return math.ceil(v / divisor) * divisor
 17 | 
 18 | 
 19 | class Flatten(nn.Module):
 20 |     # Use after nn.AdaptiveAvgPool2d(1) to remove last 2 dimensions
 21 |     def forward(self, x):
 22 |         return x.view(x.size(0), -1)
 23 | 
 24 | 
 25 | class Concat(nn.Module):
 26 |     # Concatenate a list of tensors along dimension
 27 |     def __init__(self, dimension=1):
 28 |         super(Concat, self).__init__()
 29 |         self.d = dimension
 30 | 
 31 |     def forward(self, x):
 32 |         return torch.cat(x, self.d)
 33 | 
 34 | 
 35 | class FeatureConcat(nn.Module):
 36 |     def __init__(self, layers):
 37 |         super(FeatureConcat, self).__init__()
 38 |         self.layers = layers  # layer indices
 39 |         self.multiple = len(layers) > 1  # multiple layers flag
 40 | 
 41 |     def forward(self, x, outputs):
 42 |         return torch.cat([outputs[i] for i in self.layers], 1) if self.multiple else outputs[self.layers[0]]
 43 | 
 44 | 
 45 | class FeatureConcat_l(nn.Module):
 46 |     def __init__(self, layers):
 47 |         super(FeatureConcat_l, self).__init__()
 48 |         self.layers = layers  # layer indices
 49 |         self.multiple = len(layers) > 1  # multiple layers flag
 50 | 
 51 |     def forward(self, x, outputs):
 52 |         return torch.cat([outputs[i][:,:outputs[i].shape[1]//2,:,:] for i in self.layers], 1) if self.multiple else outputs[self.layers[0]][:,:outputs[self.layers[0]].shape[1]//2,:,:]
 53 | 
 54 | 
 55 | class WeightedFeatureFusion(nn.Module):  # weighted sum of 2 or more layers https://arxiv.org/abs/1911.09070
 56 |     def __init__(self, layers, weight=False):
 57 |         super(WeightedFeatureFusion, self).__init__()
 58 |         self.layers = layers  # layer indices
 59 |         self.weight = weight  # apply weights boolean
 60 |         self.n = len(layers) + 1  # number of layers
 61 |         if weight:
 62 |             self.w = nn.Parameter(torch.zeros(self.n), requires_grad=True)  # layer weights
 63 | 
 64 |     def forward(self, x, outputs):
 65 |         # Weights
 66 |         if self.weight:
 67 |             w = torch.sigmoid(self.w) * (2 / self.n)  # sigmoid weights (0-1)
 68 |             x = x * w[0]
 69 | 
 70 |         # Fusion
 71 |         nx = x.shape[1]  # input channels
 72 |         for i in range(self.n - 1):
 73 |             a = outputs[self.layers[i]] * w[i + 1] if self.weight else outputs[self.layers[i]]  # feature to add
 74 |             na = a.shape[1]  # feature channels
 75 | 
 76 |             # Adjust channels
 77 |             if nx == na:  # same shape
 78 |                 x = x + a
 79 |             elif nx > na:  # slice input
 80 |                 x[:, :na] = x[:, :na] + a  # or a = nn.ZeroPad2d((0, 0, 0, 0, 0, dc))(a); x = x + a
 81 |             else:  # slice feature
 82 |                 x = x + a[:, :nx]
 83 | 
 84 |         return x
 85 | 
 86 | 
 87 | class MixConv2d(nn.Module):  # MixConv: Mixed Depthwise Convolutional Kernels https://arxiv.org/abs/1907.09595
 88 |     def __init__(self, in_ch, out_ch, k=(3, 5, 7), stride=1, dilation=1, bias=True, method='equal_params'):
 89 |         super(MixConv2d, self).__init__()
 90 | 
 91 |         groups = len(k)
 92 |         if method == 'equal_ch':  # equal channels per group
 93 |             i = torch.linspace(0, groups - 1E-6, out_ch).floor()  # out_ch indices
 94 |             ch = [(i == g).sum() for g in range(groups)]
 95 |         else:  # 'equal_params': equal parameter count per group
 96 |             b = [out_ch] + [0] * groups
 97 |             a = np.eye(groups + 1, groups, k=-1)
 98 |             a -= np.roll(a, 1, axis=1)
 99 |             a *= np.array(k) ** 2
100 |             a[0] = 1
101 |             ch = np.linalg.lstsq(a, b, rcond=None)[0].round().astype(int)  # solve for equal weight indices, ax = b
102 | 
103 |         self.m = nn.ModuleList([nn.Conv2d(in_channels=in_ch,
104 |                                           out_channels=ch[g],
105 |                                           kernel_size=k[g],
106 |                                           stride=stride,
107 |                                           padding=k[g] // 2,  # 'same' pad
108 |                                           dilation=dilation,
109 |                                           bias=bias) for g in range(groups)])
110 | 
111 |     def forward(self, x):
112 |         return torch.cat([m(x) for m in self.m], 1)
113 | 
114 | 
115 | # Activation functions below -------------------------------------------------------------------------------------------
116 | class SwishImplementation(torch.autograd.Function):
117 |     @staticmethod
118 |     def forward(ctx, x):
119 |         ctx.save_for_backward(x)
120 |         return x * torch.sigmoid(x)
121 | 
122 |     @staticmethod
123 |     def backward(ctx, grad_output):
124 |         x = ctx.saved_tensors[0]
125 |         sx = torch.sigmoid(x)  # sigmoid(ctx)
126 |         return grad_output * (sx * (1 + x * (1 - sx)))
127 | 
128 | 
129 | class MishImplementation(torch.autograd.Function):
130 |     @staticmethod
131 |     def forward(ctx, x):
132 |         ctx.save_for_backward(x)
133 |         return x.mul(torch.tanh(F.softplus(x)))  # x * tanh(ln(1 + exp(x)))
134 | 
135 |     @staticmethod
136 |     def backward(ctx, grad_output):
137 |         x = ctx.saved_tensors[0]
138 |         sx = torch.sigmoid(x)
139 |         fx = F.softplus(x).tanh()
140 |         return grad_output * (fx + x * sx * (1 - fx * fx))
141 | 
142 | 
143 | class MemoryEfficientSwish(nn.Module):
144 |     def forward(self, x):
145 |         return SwishImplementation.apply(x)
146 | 
147 | 
148 | class MemoryEfficientMish(nn.Module):
149 |     def forward(self, x):
150 |         return MishImplementation.apply(x)
151 | 
152 | 
153 | class Swish(nn.Module):
154 |     def forward(self, x):
155 |         return x * torch.sigmoid(x)
156 | 
157 | 
158 | class HardSwish(nn.Module):  # https://arxiv.org/pdf/1905.02244.pdf
159 |     def forward(self, x):
160 |         return x * F.hardtanh(x + 3, 0., 6., True) / 6.
161 | 


--------------------------------------------------------------------------------
/utils/parse_config.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | 
 3 | import numpy as np
 4 | 
 5 | 
 6 | def parse_model_cfg(path):
 7 |     # Parse the yolo *.cfg file and return module definitions path may be 'cfg/yolov3.cfg', 'yolov3.cfg', or 'yolov3'
 8 |     if not path.endswith('.cfg'):  # add .cfg suffix if omitted
 9 |         path += '.cfg'
10 |     if not os.path.exists(path) and os.path.exists('cfg' + os.sep + path):  # add cfg/ prefix if omitted
11 |         path = 'cfg' + os.sep + path
12 | 
13 |     with open(path, 'r') as f:
14 |         lines = f.read().split('\n')
15 |     lines = [x for x in lines if x and not x.startswith('#')]
16 |     lines = [x.rstrip().lstrip() for x in lines]  # get rid of fringe whitespaces
17 |     mdefs = []  # module definitions
18 |     for line in lines:
19 |         if line.startswith('['):  # This marks the start of a new block
20 |             mdefs.append({})
21 |             mdefs[-1]['type'] = line[1:-1].rstrip()
22 |             if mdefs[-1]['type'] == 'convolutional':
23 |                 mdefs[-1]['batch_normalize'] = 0  # pre-populate with zeros (may be overwritten later)
24 |         else:
25 |             key, val = line.split("=")
26 |             key = key.rstrip()
27 | 
28 |             if key == 'anchors':  # return nparray
29 |                 mdefs[-1][key] = np.array([float(x) for x in val.split(',')]).reshape((-1, 2))  # np anchors
30 |             elif (key in ['from', 'layers', 'mask']) or (key == 'size' and ',' in val):  # return array
31 |                 mdefs[-1][key] = [int(x) for x in val.split(',')]
32 |             else:
33 |                 val = val.strip()
34 |                 if val.isnumeric():  # return int or float
35 |                     mdefs[-1][key] = int(val) if (int(val) - float(val)) == 0 else float(val)
36 |                 else:
37 |                     mdefs[-1][key] = val  # return string
38 | 
39 |     # Check all fields are supported
40 |     supported = ['type', 'batch_normalize', 'filters', 'size', 'stride', 'pad', 'activation', 'layers', 'groups',
41 |                  'from', 'mask', 'anchors', 'classes', 'num', 'jitter', 'ignore_thresh', 'truth_thresh', 'random',
42 |                  'stride_x', 'stride_y', 'weights_type', 'weights_normalization', 'scale_x_y', 'beta_nms', 'nms_kind',
43 |                  'iou_loss', 'iou_normalizer', 'cls_normalizer', 'iou_thresh']
44 | 
45 |     f = []  # fields
46 |     for x in mdefs[1:]:
47 |         [f.append(k) for k in x if k not in f]
48 |     u = [x for x in f if x not in supported]  # unsupported fields
49 |     assert not any(u), "Unsupported fields %s in %s. See https://github.com/ultralytics/yolov3/issues/631" % (u, path)
50 | 
51 |     return mdefs
52 | 
53 | 
54 | def parse_data_cfg(path):
55 |     # Parses the data configuration file
56 |     if not os.path.exists(path) and os.path.exists('data' + os.sep + path):  # add data/ prefix if omitted
57 |         path = 'data' + os.sep + path
58 | 
59 |     with open(path, 'r') as f:
60 |         lines = f.readlines()
61 | 
62 |     options = dict()
63 |     for line in lines:
64 |         line = line.strip()
65 |         if line == '' or line.startswith('#'):
66 |             continue
67 |         key, val = line.split('=')
68 |         options[key.strip()] = val.strip()
69 | 
70 |     return options
71 | 


--------------------------------------------------------------------------------
/utils/torch_utils.py:
--------------------------------------------------------------------------------
  1 | import math
  2 | import os
  3 | import time
  4 | from copy import deepcopy
  5 | 
  6 | import torch
  7 | import torch.backends.cudnn as cudnn
  8 | import torch.nn as nn
  9 | import torch.nn.functional as F
 10 | 
 11 | 
 12 | def init_seeds(seed=0):
 13 |     torch.manual_seed(seed)
 14 | 
 15 |     # Remove randomness (may be slower on Tesla GPUs) # https://pytorch.org/docs/stable/notes/randomness.html
 16 |     if seed == 0:
 17 |         cudnn.deterministic = True
 18 |         cudnn.benchmark = False
 19 | 
 20 | 
 21 | def select_device(device='', apex=False, batch_size=None):
 22 |     # device = 'cpu' or '0' or '0,1,2,3'
 23 |     cpu_request = device.lower() == 'cpu'
 24 |     if device and not cpu_request:  # if device requested other than 'cpu'
 25 |         os.environ['CUDA_VISIBLE_DEVICES'] = device  # set environment variable
 26 |         assert torch.cuda.is_available(), 'CUDA unavailable, invalid device %s requested' % device  # check availablity
 27 | 
 28 |     cuda = False if cpu_request else torch.cuda.is_available()
 29 |     if cuda:
 30 |         c = 1024 ** 2  # bytes to MB
 31 |         ng = torch.cuda.device_count()
 32 |         if ng > 1 and batch_size:  # check that batch_size is compatible with device_count
 33 |             assert batch_size % ng == 0, 'batch-size %g not multiple of GPU count %g' % (batch_size, ng)
 34 |         x = [torch.cuda.get_device_properties(i) for i in range(ng)]
 35 |         s = 'Using CUDA ' + ('Apex ' if apex else '')  # apex for mixed precision https://github.com/NVIDIA/apex
 36 |         for i in range(0, ng):
 37 |             if i == 1:
 38 |                 s = ' ' * len(s)
 39 |             print("%sdevice%g _CudaDeviceProperties(name='%s', total_memory=%dMB)" %
 40 |                   (s, i, x[i].name, x[i].total_memory / c))
 41 |     else:
 42 |         print('Using CPU')
 43 | 
 44 |     print('')  # skip a line
 45 |     return torch.device('cuda:0' if cuda else 'cpu')
 46 | 
 47 | 
 48 | def time_synchronized():
 49 |     torch.cuda.synchronize() if torch.cuda.is_available() else None
 50 |     return time.time()
 51 | 
 52 | 
 53 | def initialize_weights(model):
 54 |     for m in model.modules():
 55 |         t = type(m)
 56 |         if t is nn.Conv2d:
 57 |             pass  # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
 58 |         elif t is nn.BatchNorm2d:
 59 |             m.eps = 1e-4
 60 |             m.momentum = 0.03
 61 |         elif t in [nn.LeakyReLU, nn.ReLU, nn.ReLU6]:
 62 |             m.inplace = True
 63 | 
 64 | 
 65 | def find_modules(model, mclass=nn.Conv2d):
 66 |     # finds layer indices matching module class 'mclass'
 67 |     return [i for i, m in enumerate(model.module_list) if isinstance(m, mclass)]
 68 | 
 69 | 
 70 | def fuse_conv_and_bn(conv, bn):
 71 |     # https://tehnokv.com/posts/fusing-batchnorm-and-conv/
 72 |     with torch.no_grad():
 73 |         # init
 74 |         fusedconv = torch.nn.Conv2d(conv.in_channels,
 75 |                                     conv.out_channels,
 76 |                                     kernel_size=conv.kernel_size,
 77 |                                     stride=conv.stride,
 78 |                                     padding=conv.padding,
 79 |                                     bias=True)
 80 | 
 81 |         # prepare filters
 82 |         w_conv = conv.weight.clone().view(conv.out_channels, -1)
 83 |         w_bn = torch.diag(bn.weight.div(torch.sqrt(bn.eps + bn.running_var)))
 84 |         fusedconv.weight.copy_(torch.mm(w_bn, w_conv).view(fusedconv.weight.size()))
 85 | 
 86 |         # prepare spatial bias
 87 |         if conv.bias is not None:
 88 |             b_conv = conv.bias
 89 |         else:
 90 |             b_conv = torch.zeros(conv.weight.size(0))
 91 |         b_bn = bn.bias - bn.weight.mul(bn.running_mean).div(torch.sqrt(bn.running_var + bn.eps))
 92 |         fusedconv.bias.copy_(torch.mm(w_bn, b_conv.reshape(-1, 1)).reshape(-1) + b_bn)
 93 | 
 94 |         return fusedconv
 95 | 
 96 | 
 97 | def model_info(model, verbose=False):
 98 |     # Plots a line-by-line description of a PyTorch model
 99 |     n_p = sum(x.numel() for x in model.parameters())  # number parameters
100 |     n_g = sum(x.numel() for x in model.parameters() if x.requires_grad)  # number gradients
101 |     if verbose:
102 |         print('%5s %40s %9s %12s %20s %10s %10s' % ('layer', 'name', 'gradient', 'parameters', 'shape', 'mu', 'sigma'))
103 |         for i, (name, p) in enumerate(model.named_parameters()):
104 |             name = name.replace('module_list.', '')
105 |             print('%5g %40s %9s %12g %20s %10.3g %10.3g' %
106 |                   (i, name, p.requires_grad, p.numel(), list(p.shape), p.mean(), p.std()))
107 | 
108 |     try:  # FLOPS
109 |         from thop import profile
110 |         macs, _ = profile(model, inputs=(torch.zeros(1, 3, 480, 640),), verbose=False)
111 |         fs = ', %.1f GFLOPS' % (macs / 1E9 * 2)
112 |     except:
113 |         fs = ''
114 | 
115 |     print('Model Summary: %g layers, %g parameters, %g gradients%s' % (len(list(model.parameters())), n_p, n_g, fs))
116 | 
117 | 
118 | def load_classifier(name='resnet101', n=2):
119 |     # Loads a pretrained model reshaped to n-class output
120 |     import pretrainedmodels  # https://github.com/Cadene/pretrained-models.pytorch#torchvision
121 |     model = pretrainedmodels.__dict__[name](num_classes=1000, pretrained='imagenet')
122 | 
123 |     # Display model properties
124 |     for x in ['model.input_size', 'model.input_space', 'model.input_range', 'model.mean', 'model.std']:
125 |         print(x + ' =', eval(x))
126 | 
127 |     # Reshape output to n classes
128 |     filters = model.last_linear.weight.shape[1]
129 |     model.last_linear.bias = torch.nn.Parameter(torch.zeros(n))
130 |     model.last_linear.weight = torch.nn.Parameter(torch.zeros(n, filters))
131 |     model.last_linear.out_features = n
132 |     return model
133 | 
134 | 
135 | def scale_img(img, ratio=1.0, same_shape=True):  # img(16,3,256,416), r=ratio
136 |     # scales img(bs,3,y,x) by ratio
137 |     h, w = img.shape[2:]
138 |     s = (int(h * ratio), int(w * ratio))  # new size
139 |     img = F.interpolate(img, size=s, mode='bilinear', align_corners=False)  # resize
140 |     if not same_shape:  # pad/crop img
141 |         gs = 64  # (pixels) grid size
142 |         h, w = [math.ceil(x * ratio / gs) * gs for x in (h, w)]
143 |     return F.pad(img, [0, w - s[1], 0, h - s[0]], value=0.447)  # value = imagenet mean
144 | 
145 | 
146 | class ModelEMA:
147 |     """ Model Exponential Moving Average from https://github.com/rwightman/pytorch-image-models
148 |     Keep a moving average of everything in the model state_dict (parameters and buffers).
149 |     This is intended to allow functionality like
150 |     https://www.tensorflow.org/api_docs/python/tf/train/ExponentialMovingAverage
151 |     A smoothed version of the weights is necessary for some training schemes to perform well.
152 |     E.g. Google's hyper-params for training MNASNet, MobileNet-V3, EfficientNet, etc that use
153 |     RMSprop with a short 2.4-3 epoch decay period and slow LR decay rate of .96-.99 requires EMA
154 |     smoothing of weights to match results. Pay attention to the decay constant you are using
155 |     relative to your update count per epoch.
156 |     To keep EMA from using GPU resources, set device='cpu'. This will save a bit of memory but
157 |     disable validation of the EMA weights. Validation will have to be done manually in a separate
158 |     process, or after the training stops converging.
159 |     This class is sensitive where it is initialized in the sequence of model init,
160 |     GPU assignment and distributed training wrappers.
161 |     I've tested with the sequence in my own train.py for torch.DataParallel, apex.DDP, and single-GPU.
162 |     """
163 | 
164 |     def __init__(self, model, decay=0.9999, device=''):
165 |         # make a copy of the model for accumulating moving average of weights
166 |         self.ema = deepcopy(model)
167 |         self.ema.eval()
168 |         self.updates = 0  # number of EMA updates
169 |         self.decay = lambda x: decay * (1 - math.exp(-x / 2000))  # decay exponential ramp (to help early epochs)
170 |         self.device = device  # perform ema on different device from model if set
171 |         if device:
172 |             self.ema.to(device=device)
173 |         for p in self.ema.parameters():
174 |             p.requires_grad_(False)
175 | 
176 |     def update(self, model):
177 |         self.updates += 1
178 |         d = self.decay(self.updates)
179 |         with torch.no_grad():
180 |             if type(model) in (nn.parallel.DataParallel, nn.parallel.DistributedDataParallel):
181 |                 msd, esd = model.module.state_dict(), self.ema.module.state_dict()
182 |             else:
183 |                 msd, esd = model.state_dict(), self.ema.state_dict()
184 | 
185 |             for k, v in esd.items():
186 |                 if v.dtype.is_floating_point:
187 |                     v *= d
188 |                     v += (1. - d) * msd[k].detach()
189 | 
190 |     def update_attr(self, model):
191 |         # Assign attributes (which may change during training)
192 |         for k in model.__dict__.keys():
193 |             if not k.startswith('_'):
194 |                 setattr(self.ema, k, getattr(model, k))
195 | 


--------------------------------------------------------------------------------
/weights/put your weights file here.txt:
--------------------------------------------------------------------------------
1 | yolov4-paspp.pt
2 | yolov4-pacsp-s.pt
3 | yolov4-pacsp.pt
4 | yolov4-pacsp-x.pt


--------------------------------------------------------------------------------