├── README.md ├── README.pdf ├── README_YOLOv4.md ├── cfg ├── wei_score │ └── yolov4-pacsp-x-mish.cfg ├── yolov4-pacsp-mish.cfg ├── yolov4-pacsp-s-mish.cfg ├── yolov4-pacsp-s.cfg ├── yolov4-pacsp-x-mish.cfg ├── yolov4-pacsp-x.cfg ├── yolov4-pacsp.cfg ├── yolov4-paspp.cfg └── yolov4-tiny.cfg ├── data ├── coco.data ├── coco.names ├── coco1.data ├── coco1.txt ├── coco16.data ├── coco16.txt ├── coco1cls.data ├── coco1cls.txt ├── coco2014.data ├── coco2017.data ├── coco64.data ├── coco64.txt ├── coco_paper.names ├── get_coco2014.sh ├── get_coco2017.sh ├── myData.data ├── myData.names └── myData │ └── score │ ├── images │ ├── train │ │ └── readme │ └── val │ │ └── readme │ └── labels │ ├── train │ └── readme │ └── val │ └── readme ├── detect.py ├── experiments.md ├── images └── scalingCSP.png ├── models.py ├── pic ├── p0.png ├── p1.png ├── p2.png ├── p3.png ├── p4.png ├── p5.png ├── test1.jpg └── test2.jpg ├── requirements.txt ├── results_yolov4-pacsp-x-mish.txt ├── runs └── readme ├── test.py ├── test_half.py ├── train.py ├── utils ├── __init__.py ├── __pycache__ │ ├── __init__.cpython-35.pyc │ ├── datasets.cpython-35.pyc │ ├── google_utils.cpython-35.pyc │ ├── layers.cpython-35.pyc │ ├── parse_config.cpython-35.pyc │ ├── torch_utils.cpython-35.pyc │ └── utils.cpython-35.pyc ├── adabound.py ├── datasets.py ├── evolve.sh ├── gcp.sh ├── google_utils.py ├── layers.py ├── parse_config.py ├── torch_utils.py └── utils.py └── weights └── put your weights file here.txt /README.md: -------------------------------------------------------------------------------- 1 | ## [Pytorch-YOLO v4](https://github.com/WongKinYiu/PyTorch_YOLOv4)训练自己的数据集 2 | 3 | 该版本的复现者是YOLOv4的二作:**Chien-Yao Wang**,他也是CSPNet的一作。再值得说的是YOLOv4 和 YOLOv5都用到了CSPNet。 这个PyTorch版本的YOLOv4是基于 ultralytic的YOLOv3基础上实现的。ultralytic 复现的YOLOv3 应该最强的YOLOv3 PyTorch复现:https://github.com/ultralytics/yolov3。我们将使用该本本的YOLO v4训练自己的数据集,并提供详细的代码修改和训练,测试的整个过程。 4 | 5 | ![](pic/p0.png) 6 | 7 | ### 1.数据准备 8 | 9 | 数据集的构建参考 10 | 11 | 12 | **1. 将数据转化为darknet fromat.** 13 | 14 | 使用LabelImg或Labelbox标注后的数据后,需要将数据转化为darknet format. 其中images和labels需要放在同级的两个文件夹下,每一个image对应一个label标注文件(如果该图像没有标注,则没有标注文件对应),标注文件满足: 15 | 16 | + 一个标注box对应一行 17 | + 每行内容: class, x_center,y_center, width,height 18 | + Box的坐标时标准化后的(0-1) 19 | + class的index从0开始 20 | 21 | 每一个image和label文件的存放满足如下的关系 22 | 23 | ``` 24 | ../coco/images/train2017/000000109622.jpg # image 25 | ../coco/labels/train2017/000000109622.txt # label 26 | ``` 27 | 28 | 这是一个label文件的例子,包含5个person(class=0)的类别: 29 | 30 | ![](pic/p2.png) 31 | 32 | **2. 创建 train 和 test \*.txt 文件.** 33 | 34 | 存放了train和test的图像的路径,例如: 35 | 36 | ![](pic/p3.png) 37 | 38 | **3. 创建新的 \*.names 文件** 39 | 40 | 存放了类别名称,例如新建`myData.names`(3个类别) 41 | 42 | ``` 43 | class_1 44 | class_2 45 | class_3 46 | ``` 47 | 48 | **4. 创建 新的 \*.data 文件** 49 | 50 | 新建`myData.data` 51 | 52 | ``` 53 | classes=3 54 | train=data/myData/myData_train.txt 55 | valid=data/myData/myData_val.txt 56 | names=data/myData.names 57 | ``` 58 | 59 | 60 | ### 2.环境安装 61 | 62 | 需要的安装环境 63 | 64 | ``` 65 | numpy == 1.17 66 | opencv-python >= 4.1 67 | torch==1.3.0 68 | torchvision==0.4.1 69 | matplotlib 70 | pycocotools 71 | tqdm 72 | pillow 73 | tensorboard >= 1.14 74 | ``` 75 | ※ 运行Mish model需要安装 https://github.com/thomasbrandon/mish-cuda 76 | 77 | ``` 78 | sudo pip3 install git+https://github.com/thomasbrandon/mish-cuda.git 79 | ``` 80 | 81 | 82 | 83 | 84 | ### 3.模型配置文件修改 85 | 86 | 配置文件的修改个darknet版本的YOLO v3和YOLO v4是相同的,可以参考其进行修改,主要包括了一些超参数和网络的参数。 87 | 88 | ![](pic/p4.png) 89 | 90 | ### 4.预训练模型下载 91 | 92 | 预训练模型的下载: 93 | 94 | baidu链接:https://pan.baidu.com/s/1nyQlH-GHrmddCEkuv-VmAg 95 | 提取码:78bg 96 | 97 | 98 | ### 5.模型训练 99 | 100 | ``` 101 | python3 train.py --data data/myData.data --cfg cfg/wei_score/yolov4-pacsp-x-mish.cfg --weights './weights/yolov4-pacsp-x-mish.pt' --name yolov4-pacsp-x-mish --img 640 640 640 102 | ``` 103 | 104 | 105 | 106 | ### 6.模型推断 107 | 108 | **1.在验证集上的性能测试** 109 | 110 | ```shell 111 | python3 test_half.py --data data/myData.data\ 112 | --cfg cfg/wei_score/yolov4-pacsp-x-mish.cfg\ 113 | --weights weights/best_yolov4-pacsp-x-mish.pt\ 114 | --img 640\ 115 | --iou-thr 0.6\ 116 | --conf-thres 0.5\ 117 | --batch-size 1 118 | ``` 119 | 120 | ```shell 121 | python3 test.py --data data/myData.data\ 122 | --cfg cfg/wei_score/yolov4-pacsp-x-mish.cfg\ 123 | --weights weights/best_yolov4-pacsp-x-mish.pt\ 124 | --img 640\ 125 | --iou-thr 0.6\ 126 | --conf-thres 0.5\ 127 | --batch-size 1 128 | ``` 129 | 130 | ```shell 131 | Model Summary: 408 layers, 9.92329e+07 parameters, 9.92329e+07 gradients 132 | Fusing layers... 133 | Model Summary: 274 layers, 9.91849e+07 parameters, 9.91849e+07 gradients 134 | Caching labels (285 found, 0 missing, 0 empty, 0 duplicate, for 285 images): 100%|███████████████████████████████████████| 285/285 [00:00<00:00, 8858.32it/s] 135 | Class Images Targets P R mAP@0.5 F1: 100%|████████████████████████████████████| 285/285 [00:17<00:00, 16.44it/s] 136 | all 285 645 0.847 0.66 0.623 0.74 137 | QP 285 175 0.856 0.611 0.586 0.713 138 | NY 285 289 0.894 0.671 0.647 0.767 139 | QG 285 181 0.792 0.696 0.638 0.741 140 | Speed: 23.4/1.1/24.5 ms inference/NMS/total per 640x640 image at batch-size 1 141 | 142 | ``` 143 | 144 | **2.单张图片或视频的推断** 145 | 146 | ```shell 147 | python3 detect.py --cfg cfg/wei_score/yolov4-pacsp-x-mish.cfg\ 148 | --names data/myData.names\ 149 | --weights weights/best_yolov4-pacsp-x-mish.pt\ 150 | --source data/myData/score/images/val\ 151 | --img-size 640\ 152 | --conf-thres 0.3\ 153 | --iou-thres 0.2\ 154 | --device 0 155 | 156 | ``` 157 | 158 | 159 | 160 | 161 | 162 | ```shell 163 | tensorboard --logdir=runs 164 | ``` 165 | 166 | ![](pic/p5.png) 167 | 168 | ### 7.DEMO展示 169 | 170 | ![](pic/test2.jpg) 171 | 172 | ![](pic/test1.jpg) 173 | 174 | ### 8.TensorRT加速推断 175 | 176 | **TODO** 177 | 178 | -------------------------------------------------------------------------------- /README.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/README.pdf -------------------------------------------------------------------------------- /README_YOLOv4.md: -------------------------------------------------------------------------------- 1 | # YOLOv4 2 | 3 | This is PyTorch implementation of [YOLOv4](https://github.com/AlexeyAB/darknet) which is based on [ultralytics/yolov3](https://github.com/ultralytics/yolov3). 4 | 5 | * [[original Darknet implementation of YOLOv4]](https://github.com/AlexeyAB/darknet) 6 | 7 | * [[ultralytics/yolov5 based PyTorch implementation of YOLOv4]](https://github.com/WongKinYiu/PyTorch_YOLOv4/tree/u5_preview). 8 | 9 | ### development log 10 | 11 |
Expand 12 | 13 | * `2020-07-23` - support CUDA accelerated Mish activation function. 14 | * `2020-07-19` - support and training tiny YOLOv4. [`yolov4-tiny`]() 15 | * `2020-07-15` - design and training conditional YOLOv4. [`yolov4-pacsp-conditional`]() 16 | * `2020-07-13` - support MixUp data augmentation. 17 | * `2020-07-03` - design new stem layers. 18 | * `2020-06-16` - support floating16 of GPU inference. 19 | * `2020-06-14` - convert .pt to .weights for darknet fine-tuning. 20 | * `2020-06-13` - update multi-scale training strategy. 21 | * `2020-06-12` - design scaled YOLOv4 follow [ultralytics](https://github.com/ultralytics/yolov5). [`yolov4-pacsp-s`]() [`yolov4-pacsp-m`]() [`yolov4-pacsp-l`]() [`yolov4-pacsp-x`]() 22 | * `2020-06-07` - design [scaling methods](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/images/scalingCSP.png) for CSP-based models. [`yolov4-pacsp-25`]() [`yolov4-pacsp-75`]() 23 | * `2020-06-03` - update COCO2014 to COCO2017. 24 | * `2020-05-30` - update FPN neck to CSPFPN. [`yolov4-yocsp`]() [`yolov4-yocsp-mish`]() 25 | * `2020-05-24` - update neck of YOLOv4 to CSPPAN. [`yolov4-pacsp`]() [`yolov4-pacsp-mish`]() 26 | * `2020-05-15` - training YOLOv4 with Mish activation function. [`yolov4-yospp-mish`]() [`yolov4-paspp-mish`]() 27 | * `2020-05-08` - design and training YOLOv4 with FPN neck. [`yolov4-yospp`]() 28 | * `2020-05-01` - training YOLOv4 with Leaky activation function using PyTorch. [`yolov4-paspp`]() 29 | 30 |
31 | 32 | ## Pretrained Models & Comparison 33 | 34 | | Model | Test Size | APval | AP50val | AP75val | APSval | APMval | APLval | cfg | weights | 35 | | :-- | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | 36 | | **YOLOv4**paspp | 736 | 45.7% | 64.2% | 50.3% | 27.4% | 51.3% | 58.6% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-paspp.cfg) | [weights](https://drive.google.com/file/d/1FraA4vmlBh5RoQB7ZGVc01UyCgxSlbpO/view?usp=sharing) | 37 | | **YOLOv4**pacsp-s | 736 | 36.0% | 54.2% | 39.4% | 18.7% | 41.2% | 48.0% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-s.cfg) | [weights](https://drive.google.com/file/d/1saE6CEvNDPA_Xv34RdxYT4BbCtozuTta/view?usp=sharing) | 38 | | **YOLOv4**pacsp | 736 | 46.4% | 64.8% | 51.0% | 28.5% | 51.9% | 59.5% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp.cfg) | [weights](https://drive.google.com/file/d/1SPCjPnMgA8jlfIGsAnFsMPdJU8dJeo7E/view?usp=sharing) | 39 | | **YOLOv4**pacsp-x | 736 | **47.6%** | **66.1%** | **52.2%** | **29.9%** | **53.3%** | **61.5%** | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-x.cfg) | [weights](https://drive.google.com/file/d/1MtwO5tvXvvyloc12-wZ2lMBzGKd9hsof/view?usp=sharing) | 40 | | | | | | | | | 41 | | **YOLOv4**pacsp-s-mish | 736 | 37.4% | 56.3% | 40.0% | 20.9% | 43.0% | 49.3% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-s-mish.cfg) | [weights](https://drive.google.com/file/d/1Gmy2Q6af1DQ5CAb6415cVFkIgtOIt9xs/view?usp=sharing) | 42 | | **YOLOv4**pacsp-mish | 736 | 46.5% | 65.7% | 50.2% | 30.0% | 52.0% | 59.4% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-mish.cfg) | [weights](https://drive.google.com/file/d/10pw28weUtOceEexRQQrdpOjxBb79sk3u/view?usp=sharing) | 43 | | **YOLOv4**pacsp-x-mish | 736 | **48.5%** | **67.4%** | **52.7%** | **30.9%** | **54.0%** | **62.0%** | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-x-mish.cfg) | [weights](https://drive.google.com/file/d/1GsLaQLfl54Qt2C07mya00S0_FTpcXBdy/view?usp=sharing) | 44 | | | | | | | | | 45 | | **YOLOv4**tiny | 416 | **22.5%** | **39.3%** | **22.5%** | **7.4%** | **26.3%** | **34.8%** | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-tiny.cfg) | [weights](https://drive.google.com/file/d/1aQKcCvTAl1uOWzzHVE9Z8Ixgikc3AuYQ/view?usp=sharing) | 46 | | | | | | | | | 47 | 48 | ## Requirements 49 | 50 | ``` 51 | pip install -r requirements.txt 52 | ``` 53 | ※ For running Mish models, please install https://github.com/thomasbrandon/mish-cuda 54 | 55 | ## Training 56 | 57 | ``` 58 | python train.py --data coco2017.data --cfg yolov4-pacsp.cfg --weights '' --name yolov4-pacsp --img 640 640 640 59 | ``` 60 | 61 | ## Testing 62 | 63 | ``` 64 | python test_half.py --data coco2017.data --cfg yolov4-pacsp.cfg --weights yolov4-pacsp.pt --img 736 --iou-thr 0.7 --batch-size 8 65 | ``` 66 | 67 | ## Citation 68 | 69 | ``` 70 | @article{bochkovskiy2020yolov4, 71 | title={{YOLOv4}: Optimal Speed and Accuracy of Object Detection}, 72 | author={Bochkovskiy, Alexey and Wang, Chien-Yao and Liao, Hong-Yuan Mark}, 73 | journal={arXiv preprint arXiv:2004.10934}, 74 | year={2020} 75 | } 76 | ``` 77 | 78 | ``` 79 | @inproceedings{wang2020cspnet, 80 | title={{CSPNet}: A New Backbone That Can Enhance Learning Capability of {CNN}}, 81 | author={Wang, Chien-Yao and Mark Liao, Hong-Yuan and Wu, Yueh-Hua and Chen, Ping-Yang and Hsieh, Jun-Wei and Yeh, I-Hau}, 82 | booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops}, 83 | pages={390--391}, 84 | year={2020} 85 | } 86 | ``` 87 | 88 | ## Acknowledgements 89 | 90 | * [https://github.com/AlexeyAB/darknet](https://github.com/AlexeyAB/darknet) 91 | * [https://github.com/ultralytics/yolov3](https://github.com/ultralytics/yolov3) 92 | * [https://github.com/ultralytics/yolov5](https://github.com/ultralytics/yolov5) 93 | -------------------------------------------------------------------------------- /cfg/yolov4-pacsp-mish.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | #batch=1 4 | #subdivisions=1 5 | # Training 6 | batch=64 7 | subdivisions=8 8 | width=640 9 | height=640 10 | channels=3 11 | momentum=0.949 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.00261 19 | burn_in=1000 20 | max_batches = 500500 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | #cutmix=1 26 | mosaic=1 27 | 28 | [convolutional] 29 | batch_normalize=1 30 | filters=32 31 | size=3 32 | stride=1 33 | pad=1 34 | activation=mish 35 | 36 | # Downsample 37 | 38 | [convolutional] 39 | batch_normalize=1 40 | filters=64 41 | size=3 42 | stride=2 43 | pad=1 44 | activation=mish 45 | 46 | #[convolutional] 47 | #batch_normalize=1 48 | #filters=64 49 | #size=1 50 | #stride=1 51 | #pad=1 52 | #activation=mish 53 | 54 | #[route] 55 | #layers = -2 56 | 57 | #[convolutional] 58 | #batch_normalize=1 59 | #filters=64 60 | #size=1 61 | #stride=1 62 | #pad=1 63 | #activation=mish 64 | 65 | [convolutional] 66 | batch_normalize=1 67 | filters=32 68 | size=1 69 | stride=1 70 | pad=1 71 | activation=mish 72 | 73 | [convolutional] 74 | batch_normalize=1 75 | filters=64 76 | size=3 77 | stride=1 78 | pad=1 79 | activation=mish 80 | 81 | [shortcut] 82 | from=-3 83 | activation=linear 84 | 85 | #[convolutional] 86 | #batch_normalize=1 87 | #filters=64 88 | #size=1 89 | #stride=1 90 | #pad=1 91 | #activation=mish 92 | 93 | #[route] 94 | #layers = -1,-7 95 | 96 | #[convolutional] 97 | #batch_normalize=1 98 | #filters=64 99 | #size=1 100 | #stride=1 101 | #pad=1 102 | #activation=mish 103 | 104 | # Downsample 105 | 106 | [convolutional] 107 | batch_normalize=1 108 | filters=128 109 | size=3 110 | stride=2 111 | pad=1 112 | activation=mish 113 | 114 | [convolutional] 115 | batch_normalize=1 116 | filters=64 117 | size=1 118 | stride=1 119 | pad=1 120 | activation=mish 121 | 122 | [route] 123 | layers = -2 124 | 125 | [convolutional] 126 | batch_normalize=1 127 | filters=64 128 | size=1 129 | stride=1 130 | pad=1 131 | activation=mish 132 | 133 | [convolutional] 134 | batch_normalize=1 135 | filters=64 136 | size=1 137 | stride=1 138 | pad=1 139 | activation=mish 140 | 141 | [convolutional] 142 | batch_normalize=1 143 | filters=64 144 | size=3 145 | stride=1 146 | pad=1 147 | activation=mish 148 | 149 | [shortcut] 150 | from=-3 151 | activation=linear 152 | 153 | [convolutional] 154 | batch_normalize=1 155 | filters=64 156 | size=1 157 | stride=1 158 | pad=1 159 | activation=mish 160 | 161 | [convolutional] 162 | batch_normalize=1 163 | filters=64 164 | size=3 165 | stride=1 166 | pad=1 167 | activation=mish 168 | 169 | [shortcut] 170 | from=-3 171 | activation=linear 172 | 173 | [convolutional] 174 | batch_normalize=1 175 | filters=64 176 | size=1 177 | stride=1 178 | pad=1 179 | activation=mish 180 | 181 | [route] 182 | layers = -1,-10 183 | 184 | [convolutional] 185 | batch_normalize=1 186 | filters=128 187 | size=1 188 | stride=1 189 | pad=1 190 | activation=mish 191 | 192 | # Downsample 193 | 194 | [convolutional] 195 | batch_normalize=1 196 | filters=256 197 | size=3 198 | stride=2 199 | pad=1 200 | activation=mish 201 | 202 | [convolutional] 203 | batch_normalize=1 204 | filters=128 205 | size=1 206 | stride=1 207 | pad=1 208 | activation=mish 209 | 210 | [route] 211 | layers = -2 212 | 213 | [convolutional] 214 | batch_normalize=1 215 | filters=128 216 | size=1 217 | stride=1 218 | pad=1 219 | activation=mish 220 | 221 | [convolutional] 222 | batch_normalize=1 223 | filters=128 224 | size=1 225 | stride=1 226 | pad=1 227 | activation=mish 228 | 229 | [convolutional] 230 | batch_normalize=1 231 | filters=128 232 | size=3 233 | stride=1 234 | pad=1 235 | activation=mish 236 | 237 | [shortcut] 238 | from=-3 239 | activation=linear 240 | 241 | [convolutional] 242 | batch_normalize=1 243 | filters=128 244 | size=1 245 | stride=1 246 | pad=1 247 | activation=mish 248 | 249 | [convolutional] 250 | batch_normalize=1 251 | filters=128 252 | size=3 253 | stride=1 254 | pad=1 255 | activation=mish 256 | 257 | [shortcut] 258 | from=-3 259 | activation=linear 260 | 261 | [convolutional] 262 | batch_normalize=1 263 | filters=128 264 | size=1 265 | stride=1 266 | pad=1 267 | activation=mish 268 | 269 | [convolutional] 270 | batch_normalize=1 271 | filters=128 272 | size=3 273 | stride=1 274 | pad=1 275 | activation=mish 276 | 277 | [shortcut] 278 | from=-3 279 | activation=linear 280 | 281 | [convolutional] 282 | batch_normalize=1 283 | filters=128 284 | size=1 285 | stride=1 286 | pad=1 287 | activation=mish 288 | 289 | [convolutional] 290 | batch_normalize=1 291 | filters=128 292 | size=3 293 | stride=1 294 | pad=1 295 | activation=mish 296 | 297 | [shortcut] 298 | from=-3 299 | activation=linear 300 | 301 | 302 | [convolutional] 303 | batch_normalize=1 304 | filters=128 305 | size=1 306 | stride=1 307 | pad=1 308 | activation=mish 309 | 310 | [convolutional] 311 | batch_normalize=1 312 | filters=128 313 | size=3 314 | stride=1 315 | pad=1 316 | activation=mish 317 | 318 | [shortcut] 319 | from=-3 320 | activation=linear 321 | 322 | [convolutional] 323 | batch_normalize=1 324 | filters=128 325 | size=1 326 | stride=1 327 | pad=1 328 | activation=mish 329 | 330 | [convolutional] 331 | batch_normalize=1 332 | filters=128 333 | size=3 334 | stride=1 335 | pad=1 336 | activation=mish 337 | 338 | [shortcut] 339 | from=-3 340 | activation=linear 341 | 342 | [convolutional] 343 | batch_normalize=1 344 | filters=128 345 | size=1 346 | stride=1 347 | pad=1 348 | activation=mish 349 | 350 | [convolutional] 351 | batch_normalize=1 352 | filters=128 353 | size=3 354 | stride=1 355 | pad=1 356 | activation=mish 357 | 358 | [shortcut] 359 | from=-3 360 | activation=linear 361 | 362 | [convolutional] 363 | batch_normalize=1 364 | filters=128 365 | size=1 366 | stride=1 367 | pad=1 368 | activation=mish 369 | 370 | [convolutional] 371 | batch_normalize=1 372 | filters=128 373 | size=3 374 | stride=1 375 | pad=1 376 | activation=mish 377 | 378 | [shortcut] 379 | from=-3 380 | activation=linear 381 | 382 | [convolutional] 383 | batch_normalize=1 384 | filters=128 385 | size=1 386 | stride=1 387 | pad=1 388 | activation=mish 389 | 390 | [route] 391 | layers = -1,-28 392 | 393 | [convolutional] 394 | batch_normalize=1 395 | filters=256 396 | size=1 397 | stride=1 398 | pad=1 399 | activation=mish 400 | 401 | # Downsample 402 | 403 | [convolutional] 404 | batch_normalize=1 405 | filters=512 406 | size=3 407 | stride=2 408 | pad=1 409 | activation=mish 410 | 411 | [convolutional] 412 | batch_normalize=1 413 | filters=256 414 | size=1 415 | stride=1 416 | pad=1 417 | activation=mish 418 | 419 | [route] 420 | layers = -2 421 | 422 | [convolutional] 423 | batch_normalize=1 424 | filters=256 425 | size=1 426 | stride=1 427 | pad=1 428 | activation=mish 429 | 430 | [convolutional] 431 | batch_normalize=1 432 | filters=256 433 | size=1 434 | stride=1 435 | pad=1 436 | activation=mish 437 | 438 | [convolutional] 439 | batch_normalize=1 440 | filters=256 441 | size=3 442 | stride=1 443 | pad=1 444 | activation=mish 445 | 446 | [shortcut] 447 | from=-3 448 | activation=linear 449 | 450 | 451 | [convolutional] 452 | batch_normalize=1 453 | filters=256 454 | size=1 455 | stride=1 456 | pad=1 457 | activation=mish 458 | 459 | [convolutional] 460 | batch_normalize=1 461 | filters=256 462 | size=3 463 | stride=1 464 | pad=1 465 | activation=mish 466 | 467 | [shortcut] 468 | from=-3 469 | activation=linear 470 | 471 | 472 | [convolutional] 473 | batch_normalize=1 474 | filters=256 475 | size=1 476 | stride=1 477 | pad=1 478 | activation=mish 479 | 480 | [convolutional] 481 | batch_normalize=1 482 | filters=256 483 | size=3 484 | stride=1 485 | pad=1 486 | activation=mish 487 | 488 | [shortcut] 489 | from=-3 490 | activation=linear 491 | 492 | 493 | [convolutional] 494 | batch_normalize=1 495 | filters=256 496 | size=1 497 | stride=1 498 | pad=1 499 | activation=mish 500 | 501 | [convolutional] 502 | batch_normalize=1 503 | filters=256 504 | size=3 505 | stride=1 506 | pad=1 507 | activation=mish 508 | 509 | [shortcut] 510 | from=-3 511 | activation=linear 512 | 513 | 514 | [convolutional] 515 | batch_normalize=1 516 | filters=256 517 | size=1 518 | stride=1 519 | pad=1 520 | activation=mish 521 | 522 | [convolutional] 523 | batch_normalize=1 524 | filters=256 525 | size=3 526 | stride=1 527 | pad=1 528 | activation=mish 529 | 530 | [shortcut] 531 | from=-3 532 | activation=linear 533 | 534 | 535 | [convolutional] 536 | batch_normalize=1 537 | filters=256 538 | size=1 539 | stride=1 540 | pad=1 541 | activation=mish 542 | 543 | [convolutional] 544 | batch_normalize=1 545 | filters=256 546 | size=3 547 | stride=1 548 | pad=1 549 | activation=mish 550 | 551 | [shortcut] 552 | from=-3 553 | activation=linear 554 | 555 | 556 | [convolutional] 557 | batch_normalize=1 558 | filters=256 559 | size=1 560 | stride=1 561 | pad=1 562 | activation=mish 563 | 564 | [convolutional] 565 | batch_normalize=1 566 | filters=256 567 | size=3 568 | stride=1 569 | pad=1 570 | activation=mish 571 | 572 | [shortcut] 573 | from=-3 574 | activation=linear 575 | 576 | [convolutional] 577 | batch_normalize=1 578 | filters=256 579 | size=1 580 | stride=1 581 | pad=1 582 | activation=mish 583 | 584 | [convolutional] 585 | batch_normalize=1 586 | filters=256 587 | size=3 588 | stride=1 589 | pad=1 590 | activation=mish 591 | 592 | [shortcut] 593 | from=-3 594 | activation=linear 595 | 596 | [convolutional] 597 | batch_normalize=1 598 | filters=256 599 | size=1 600 | stride=1 601 | pad=1 602 | activation=mish 603 | 604 | [route] 605 | layers = -1,-28 606 | 607 | [convolutional] 608 | batch_normalize=1 609 | filters=512 610 | size=1 611 | stride=1 612 | pad=1 613 | activation=mish 614 | 615 | # Downsample 616 | 617 | [convolutional] 618 | batch_normalize=1 619 | filters=1024 620 | size=3 621 | stride=2 622 | pad=1 623 | activation=mish 624 | 625 | [convolutional] 626 | batch_normalize=1 627 | filters=512 628 | size=1 629 | stride=1 630 | pad=1 631 | activation=mish 632 | 633 | [route] 634 | layers = -2 635 | 636 | [convolutional] 637 | batch_normalize=1 638 | filters=512 639 | size=1 640 | stride=1 641 | pad=1 642 | activation=mish 643 | 644 | [convolutional] 645 | batch_normalize=1 646 | filters=512 647 | size=1 648 | stride=1 649 | pad=1 650 | activation=mish 651 | 652 | [convolutional] 653 | batch_normalize=1 654 | filters=512 655 | size=3 656 | stride=1 657 | pad=1 658 | activation=mish 659 | 660 | [shortcut] 661 | from=-3 662 | activation=linear 663 | 664 | [convolutional] 665 | batch_normalize=1 666 | filters=512 667 | size=1 668 | stride=1 669 | pad=1 670 | activation=mish 671 | 672 | [convolutional] 673 | batch_normalize=1 674 | filters=512 675 | size=3 676 | stride=1 677 | pad=1 678 | activation=mish 679 | 680 | [shortcut] 681 | from=-3 682 | activation=linear 683 | 684 | [convolutional] 685 | batch_normalize=1 686 | filters=512 687 | size=1 688 | stride=1 689 | pad=1 690 | activation=mish 691 | 692 | [convolutional] 693 | batch_normalize=1 694 | filters=512 695 | size=3 696 | stride=1 697 | pad=1 698 | activation=mish 699 | 700 | [shortcut] 701 | from=-3 702 | activation=linear 703 | 704 | [convolutional] 705 | batch_normalize=1 706 | filters=512 707 | size=1 708 | stride=1 709 | pad=1 710 | activation=mish 711 | 712 | [convolutional] 713 | batch_normalize=1 714 | filters=512 715 | size=3 716 | stride=1 717 | pad=1 718 | activation=mish 719 | 720 | [shortcut] 721 | from=-3 722 | activation=linear 723 | 724 | [convolutional] 725 | batch_normalize=1 726 | filters=512 727 | size=1 728 | stride=1 729 | pad=1 730 | activation=mish 731 | 732 | [route] 733 | layers = -1,-16 734 | 735 | [convolutional] 736 | batch_normalize=1 737 | filters=1024 738 | size=1 739 | stride=1 740 | pad=1 741 | activation=mish 742 | 743 | ########################## 744 | 745 | [convolutional] 746 | batch_normalize=1 747 | filters=512 748 | size=1 749 | stride=1 750 | pad=1 751 | activation=mish 752 | 753 | [route] 754 | layers = -2 755 | 756 | [convolutional] 757 | batch_normalize=1 758 | filters=512 759 | size=1 760 | stride=1 761 | pad=1 762 | activation=mish 763 | 764 | [convolutional] 765 | batch_normalize=1 766 | size=3 767 | stride=1 768 | pad=1 769 | filters=512 770 | activation=mish 771 | 772 | [convolutional] 773 | batch_normalize=1 774 | filters=512 775 | size=1 776 | stride=1 777 | pad=1 778 | activation=mish 779 | 780 | ### SPP ### 781 | [maxpool] 782 | stride=1 783 | size=5 784 | 785 | [route] 786 | layers=-2 787 | 788 | [maxpool] 789 | stride=1 790 | size=9 791 | 792 | [route] 793 | layers=-4 794 | 795 | [maxpool] 796 | stride=1 797 | size=13 798 | 799 | [route] 800 | layers=-1,-3,-5,-6 801 | ### End SPP ### 802 | 803 | [convolutional] 804 | batch_normalize=1 805 | filters=512 806 | size=1 807 | stride=1 808 | pad=1 809 | activation=mish 810 | 811 | [convolutional] 812 | batch_normalize=1 813 | size=3 814 | stride=1 815 | pad=1 816 | filters=512 817 | activation=mish 818 | 819 | [route] 820 | layers = -1, -13 821 | 822 | [convolutional] 823 | batch_normalize=1 824 | filters=512 825 | size=1 826 | stride=1 827 | pad=1 828 | activation=mish 829 | 830 | [convolutional] 831 | batch_normalize=1 832 | filters=256 833 | size=1 834 | stride=1 835 | pad=1 836 | activation=mish 837 | 838 | [upsample] 839 | stride=2 840 | 841 | [route] 842 | layers = 79 843 | 844 | [convolutional] 845 | batch_normalize=1 846 | filters=256 847 | size=1 848 | stride=1 849 | pad=1 850 | activation=mish 851 | 852 | [route] 853 | layers = -1, -3 854 | 855 | [convolutional] 856 | batch_normalize=1 857 | filters=256 858 | size=1 859 | stride=1 860 | pad=1 861 | activation=mish 862 | 863 | [convolutional] 864 | batch_normalize=1 865 | filters=256 866 | size=1 867 | stride=1 868 | pad=1 869 | activation=mish 870 | 871 | [route] 872 | layers = -2 873 | 874 | [convolutional] 875 | batch_normalize=1 876 | filters=256 877 | size=1 878 | stride=1 879 | pad=1 880 | activation=mish 881 | 882 | [convolutional] 883 | batch_normalize=1 884 | size=3 885 | stride=1 886 | pad=1 887 | filters=256 888 | activation=mish 889 | 890 | [convolutional] 891 | batch_normalize=1 892 | filters=256 893 | size=1 894 | stride=1 895 | pad=1 896 | activation=mish 897 | 898 | [convolutional] 899 | batch_normalize=1 900 | size=3 901 | stride=1 902 | pad=1 903 | filters=256 904 | activation=mish 905 | 906 | [route] 907 | layers = -1, -6 908 | 909 | [convolutional] 910 | batch_normalize=1 911 | filters=256 912 | size=1 913 | stride=1 914 | pad=1 915 | activation=mish 916 | 917 | [convolutional] 918 | batch_normalize=1 919 | filters=128 920 | size=1 921 | stride=1 922 | pad=1 923 | activation=mish 924 | 925 | [upsample] 926 | stride=2 927 | 928 | [route] 929 | layers = 48 930 | 931 | [convolutional] 932 | batch_normalize=1 933 | filters=128 934 | size=1 935 | stride=1 936 | pad=1 937 | activation=mish 938 | 939 | [route] 940 | layers = -1, -3 941 | 942 | [convolutional] 943 | batch_normalize=1 944 | filters=128 945 | size=1 946 | stride=1 947 | pad=1 948 | activation=mish 949 | 950 | [convolutional] 951 | batch_normalize=1 952 | filters=128 953 | size=1 954 | stride=1 955 | pad=1 956 | activation=mish 957 | 958 | [route] 959 | layers = -2 960 | 961 | [convolutional] 962 | batch_normalize=1 963 | filters=128 964 | size=1 965 | stride=1 966 | pad=1 967 | activation=mish 968 | 969 | [convolutional] 970 | batch_normalize=1 971 | size=3 972 | stride=1 973 | pad=1 974 | filters=128 975 | activation=mish 976 | 977 | [convolutional] 978 | batch_normalize=1 979 | filters=128 980 | size=1 981 | stride=1 982 | pad=1 983 | activation=mish 984 | 985 | [convolutional] 986 | batch_normalize=1 987 | size=3 988 | stride=1 989 | pad=1 990 | filters=128 991 | activation=mish 992 | 993 | [route] 994 | layers = -1, -6 995 | 996 | [convolutional] 997 | batch_normalize=1 998 | filters=128 999 | size=1 1000 | stride=1 1001 | pad=1 1002 | activation=mish 1003 | 1004 | ########################## 1005 | 1006 | [convolutional] 1007 | batch_normalize=1 1008 | size=3 1009 | stride=1 1010 | pad=1 1011 | filters=256 1012 | activation=mish 1013 | 1014 | [convolutional] 1015 | size=1 1016 | stride=1 1017 | pad=1 1018 | filters=255 1019 | activation=linear 1020 | 1021 | 1022 | [yolo] 1023 | mask = 0,1,2 1024 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 1025 | classes=80 1026 | num=9 1027 | jitter=.3 1028 | ignore_thresh = .7 1029 | truth_thresh = 1 1030 | random=1 1031 | scale_x_y = 1.05 1032 | iou_thresh=0.213 1033 | cls_normalizer=1.0 1034 | iou_normalizer=0.07 1035 | iou_loss=ciou 1036 | nms_kind=greedynms 1037 | beta_nms=0.6 1038 | 1039 | [route] 1040 | layers = -4 1041 | 1042 | [convolutional] 1043 | batch_normalize=1 1044 | size=3 1045 | stride=2 1046 | pad=1 1047 | filters=256 1048 | activation=mish 1049 | 1050 | [route] 1051 | layers = -1, -20 1052 | 1053 | [convolutional] 1054 | batch_normalize=1 1055 | filters=256 1056 | size=1 1057 | stride=1 1058 | pad=1 1059 | activation=mish 1060 | 1061 | [convolutional] 1062 | batch_normalize=1 1063 | filters=256 1064 | size=1 1065 | stride=1 1066 | pad=1 1067 | activation=mish 1068 | 1069 | [route] 1070 | layers = -2 1071 | 1072 | [convolutional] 1073 | batch_normalize=1 1074 | filters=256 1075 | size=1 1076 | stride=1 1077 | pad=1 1078 | activation=mish 1079 | 1080 | [convolutional] 1081 | batch_normalize=1 1082 | size=3 1083 | stride=1 1084 | pad=1 1085 | filters=256 1086 | activation=mish 1087 | 1088 | [convolutional] 1089 | batch_normalize=1 1090 | filters=256 1091 | size=1 1092 | stride=1 1093 | pad=1 1094 | activation=mish 1095 | 1096 | [convolutional] 1097 | batch_normalize=1 1098 | size=3 1099 | stride=1 1100 | pad=1 1101 | filters=256 1102 | activation=mish 1103 | 1104 | [route] 1105 | layers = -1,-6 1106 | 1107 | [convolutional] 1108 | batch_normalize=1 1109 | filters=256 1110 | size=1 1111 | stride=1 1112 | pad=1 1113 | activation=mish 1114 | 1115 | [convolutional] 1116 | batch_normalize=1 1117 | size=3 1118 | stride=1 1119 | pad=1 1120 | filters=512 1121 | activation=mish 1122 | 1123 | [convolutional] 1124 | size=1 1125 | stride=1 1126 | pad=1 1127 | filters=255 1128 | activation=linear 1129 | 1130 | 1131 | [yolo] 1132 | mask = 3,4,5 1133 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 1134 | classes=80 1135 | num=9 1136 | jitter=.3 1137 | ignore_thresh = .7 1138 | truth_thresh = 1 1139 | random=1 1140 | scale_x_y = 1.05 1141 | iou_thresh=0.213 1142 | cls_normalizer=1.0 1143 | iou_normalizer=0.07 1144 | iou_loss=ciou 1145 | nms_kind=greedynms 1146 | beta_nms=0.6 1147 | 1148 | [route] 1149 | layers = -4 1150 | 1151 | [convolutional] 1152 | batch_normalize=1 1153 | size=3 1154 | stride=2 1155 | pad=1 1156 | filters=512 1157 | activation=mish 1158 | 1159 | [route] 1160 | layers = -1, -49 1161 | 1162 | [convolutional] 1163 | batch_normalize=1 1164 | filters=512 1165 | size=1 1166 | stride=1 1167 | pad=1 1168 | activation=mish 1169 | 1170 | [convolutional] 1171 | batch_normalize=1 1172 | filters=512 1173 | size=1 1174 | stride=1 1175 | pad=1 1176 | activation=mish 1177 | 1178 | [route] 1179 | layers = -2 1180 | 1181 | [convolutional] 1182 | batch_normalize=1 1183 | filters=512 1184 | size=1 1185 | stride=1 1186 | pad=1 1187 | activation=mish 1188 | 1189 | [convolutional] 1190 | batch_normalize=1 1191 | size=3 1192 | stride=1 1193 | pad=1 1194 | filters=512 1195 | activation=mish 1196 | 1197 | [convolutional] 1198 | batch_normalize=1 1199 | filters=512 1200 | size=1 1201 | stride=1 1202 | pad=1 1203 | activation=mish 1204 | 1205 | [convolutional] 1206 | batch_normalize=1 1207 | size=3 1208 | stride=1 1209 | pad=1 1210 | filters=512 1211 | activation=mish 1212 | 1213 | [route] 1214 | layers = -1,-6 1215 | 1216 | [convolutional] 1217 | batch_normalize=1 1218 | filters=512 1219 | size=1 1220 | stride=1 1221 | pad=1 1222 | activation=mish 1223 | 1224 | [convolutional] 1225 | batch_normalize=1 1226 | size=3 1227 | stride=1 1228 | pad=1 1229 | filters=1024 1230 | activation=mish 1231 | 1232 | [convolutional] 1233 | size=1 1234 | stride=1 1235 | pad=1 1236 | filters=255 1237 | activation=linear 1238 | 1239 | 1240 | [yolo] 1241 | mask = 6,7,8 1242 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 1243 | classes=80 1244 | num=9 1245 | jitter=.3 1246 | ignore_thresh = .7 1247 | truth_thresh = 1 1248 | random=1 1249 | scale_x_y = 1.05 1250 | iou_thresh=0.213 1251 | cls_normalizer=1.0 1252 | iou_normalizer=0.07 1253 | iou_loss=ciou 1254 | nms_kind=greedynms 1255 | beta_nms=0.6 1256 | -------------------------------------------------------------------------------- /cfg/yolov4-pacsp-s-mish.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | #batch=1 4 | #subdivisions=1 5 | # Training 6 | batch=64 7 | subdivisions=8 8 | width=512 9 | height=512 10 | channels=3 11 | momentum=0.949 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.00261 19 | burn_in=1000 20 | max_batches = 500500 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | mosaic=1 26 | 27 | [convolutional] 28 | batch_normalize=1 29 | filters=32 30 | size=3 31 | stride=1 32 | pad=1 33 | activation=mish 34 | 35 | # Downsample 36 | 37 | [convolutional] 38 | batch_normalize=1 39 | filters=32 40 | size=3 41 | stride=2 42 | pad=1 43 | activation=mish 44 | 45 | [convolutional] 46 | batch_normalize=1 47 | filters=32 48 | size=1 49 | stride=1 50 | pad=1 51 | activation=mish 52 | 53 | [convolutional] 54 | batch_normalize=1 55 | filters=32 56 | size=3 57 | stride=1 58 | pad=1 59 | activation=mish 60 | 61 | [shortcut] 62 | from=-3 63 | activation=linear 64 | 65 | # Downsample 66 | 67 | [convolutional] 68 | batch_normalize=1 69 | filters=64 70 | size=3 71 | stride=2 72 | pad=1 73 | activation=mish 74 | 75 | [convolutional] 76 | batch_normalize=1 77 | filters=32 78 | size=1 79 | stride=1 80 | pad=1 81 | activation=mish 82 | 83 | [route] 84 | layers = -2 85 | 86 | [convolutional] 87 | batch_normalize=1 88 | filters=32 89 | size=1 90 | stride=1 91 | pad=1 92 | activation=mish 93 | 94 | [convolutional] 95 | batch_normalize=1 96 | filters=32 97 | size=1 98 | stride=1 99 | pad=1 100 | activation=mish 101 | 102 | [convolutional] 103 | batch_normalize=1 104 | filters=32 105 | size=3 106 | stride=1 107 | pad=1 108 | activation=mish 109 | 110 | [shortcut] 111 | from=-3 112 | activation=linear 113 | 114 | [convolutional] 115 | batch_normalize=1 116 | filters=32 117 | size=1 118 | stride=1 119 | pad=1 120 | activation=mish 121 | 122 | [route] 123 | layers = -1,-7 124 | 125 | [convolutional] 126 | batch_normalize=1 127 | filters=64 128 | size=1 129 | stride=1 130 | pad=1 131 | activation=mish 132 | 133 | # Downsample 134 | 135 | [convolutional] 136 | batch_normalize=1 137 | filters=128 138 | size=3 139 | stride=2 140 | pad=1 141 | activation=mish 142 | 143 | [convolutional] 144 | batch_normalize=1 145 | filters=64 146 | size=1 147 | stride=1 148 | pad=1 149 | activation=mish 150 | 151 | [route] 152 | layers = -2 153 | 154 | [convolutional] 155 | batch_normalize=1 156 | filters=64 157 | size=1 158 | stride=1 159 | pad=1 160 | activation=mish 161 | 162 | [convolutional] 163 | batch_normalize=1 164 | filters=64 165 | size=1 166 | stride=1 167 | pad=1 168 | activation=mish 169 | 170 | [convolutional] 171 | batch_normalize=1 172 | filters=64 173 | size=3 174 | stride=1 175 | pad=1 176 | activation=mish 177 | 178 | [shortcut] 179 | from=-3 180 | activation=linear 181 | 182 | [convolutional] 183 | batch_normalize=1 184 | filters=64 185 | size=1 186 | stride=1 187 | pad=1 188 | activation=mish 189 | 190 | [route] 191 | layers = -1,-7 192 | 193 | [convolutional] 194 | batch_normalize=1 195 | filters=128 196 | size=1 197 | stride=1 198 | pad=1 199 | activation=mish 200 | 201 | # Downsample 202 | 203 | [convolutional] 204 | batch_normalize=1 205 | filters=256 206 | size=3 207 | stride=2 208 | pad=1 209 | activation=mish 210 | 211 | [convolutional] 212 | batch_normalize=1 213 | filters=128 214 | size=1 215 | stride=1 216 | pad=1 217 | activation=mish 218 | 219 | [route] 220 | layers = -2 221 | 222 | [convolutional] 223 | batch_normalize=1 224 | filters=128 225 | size=1 226 | stride=1 227 | pad=1 228 | activation=mish 229 | 230 | [convolutional] 231 | batch_normalize=1 232 | filters=128 233 | size=1 234 | stride=1 235 | pad=1 236 | activation=mish 237 | 238 | [convolutional] 239 | batch_normalize=1 240 | filters=128 241 | size=3 242 | stride=1 243 | pad=1 244 | activation=mish 245 | 246 | [shortcut] 247 | from=-3 248 | activation=linear 249 | 250 | [convolutional] 251 | batch_normalize=1 252 | filters=128 253 | size=1 254 | stride=1 255 | pad=1 256 | activation=mish 257 | 258 | [route] 259 | layers = -1,-7 260 | 261 | [convolutional] 262 | batch_normalize=1 263 | filters=256 264 | size=1 265 | stride=1 266 | pad=1 267 | activation=mish 268 | 269 | # Downsample 270 | 271 | [convolutional] 272 | batch_normalize=1 273 | filters=512 274 | size=3 275 | stride=2 276 | pad=1 277 | activation=mish 278 | 279 | [convolutional] 280 | batch_normalize=1 281 | filters=256 282 | size=1 283 | stride=1 284 | pad=1 285 | activation=mish 286 | 287 | [route] 288 | layers = -2 289 | 290 | [convolutional] 291 | batch_normalize=1 292 | filters=256 293 | size=1 294 | stride=1 295 | pad=1 296 | activation=mish 297 | 298 | [convolutional] 299 | batch_normalize=1 300 | filters=256 301 | size=1 302 | stride=1 303 | pad=1 304 | activation=mish 305 | 306 | [convolutional] 307 | batch_normalize=1 308 | filters=256 309 | size=3 310 | stride=1 311 | pad=1 312 | activation=mish 313 | 314 | [shortcut] 315 | from=-3 316 | activation=linear 317 | 318 | [convolutional] 319 | batch_normalize=1 320 | filters=256 321 | size=1 322 | stride=1 323 | pad=1 324 | activation=mish 325 | 326 | [route] 327 | layers = -1,-7 328 | 329 | [convolutional] 330 | batch_normalize=1 331 | filters=512 332 | size=1 333 | stride=1 334 | pad=1 335 | activation=mish 336 | 337 | ########################## 338 | 339 | [convolutional] 340 | batch_normalize=1 341 | filters=256 342 | size=1 343 | stride=1 344 | pad=1 345 | activation=mish 346 | 347 | [route] 348 | layers = -2 349 | 350 | [convolutional] 351 | batch_normalize=1 352 | filters=256 353 | size=1 354 | stride=1 355 | pad=1 356 | activation=mish 357 | 358 | ### SPP ### 359 | [maxpool] 360 | stride=1 361 | size=5 362 | 363 | [route] 364 | layers=-2 365 | 366 | [maxpool] 367 | stride=1 368 | size=9 369 | 370 | [route] 371 | layers=-4 372 | 373 | [maxpool] 374 | stride=1 375 | size=13 376 | 377 | [route] 378 | layers=-1,-3,-5,-6 379 | ### End SPP ### 380 | 381 | [convolutional] 382 | batch_normalize=1 383 | filters=256 384 | size=1 385 | stride=1 386 | pad=1 387 | activation=mish 388 | 389 | [convolutional] 390 | batch_normalize=1 391 | size=3 392 | stride=1 393 | pad=1 394 | filters=256 395 | activation=mish 396 | 397 | [route] 398 | layers = -1, -11 399 | 400 | [convolutional] 401 | batch_normalize=1 402 | filters=256 403 | size=1 404 | stride=1 405 | pad=1 406 | activation=mish 407 | 408 | [convolutional] 409 | batch_normalize=1 410 | filters=128 411 | size=1 412 | stride=1 413 | pad=1 414 | activation=mish 415 | 416 | [upsample] 417 | stride=2 418 | 419 | [route] 420 | layers = 34 421 | 422 | [convolutional] 423 | batch_normalize=1 424 | filters=128 425 | size=1 426 | stride=1 427 | pad=1 428 | activation=mish 429 | 430 | [route] 431 | layers = -1, -3 432 | 433 | [convolutional] 434 | batch_normalize=1 435 | filters=128 436 | size=1 437 | stride=1 438 | pad=1 439 | activation=mish 440 | 441 | [convolutional] 442 | batch_normalize=1 443 | filters=128 444 | size=1 445 | stride=1 446 | pad=1 447 | activation=mish 448 | 449 | [route] 450 | layers = -2 451 | 452 | [convolutional] 453 | batch_normalize=1 454 | filters=128 455 | size=1 456 | stride=1 457 | pad=1 458 | activation=mish 459 | 460 | [convolutional] 461 | batch_normalize=1 462 | size=3 463 | stride=1 464 | pad=1 465 | filters=128 466 | activation=mish 467 | 468 | [route] 469 | layers = -1, -4 470 | 471 | [convolutional] 472 | batch_normalize=1 473 | filters=128 474 | size=1 475 | stride=1 476 | pad=1 477 | activation=mish 478 | 479 | [convolutional] 480 | batch_normalize=1 481 | filters=64 482 | size=1 483 | stride=1 484 | pad=1 485 | activation=mish 486 | 487 | [upsample] 488 | stride=2 489 | 490 | [route] 491 | layers = 24 492 | 493 | [convolutional] 494 | batch_normalize=1 495 | filters=64 496 | size=1 497 | stride=1 498 | pad=1 499 | activation=mish 500 | 501 | [route] 502 | layers = -1, -3 503 | 504 | [convolutional] 505 | batch_normalize=1 506 | filters=64 507 | size=1 508 | stride=1 509 | pad=1 510 | activation=mish 511 | 512 | [convolutional] 513 | batch_normalize=1 514 | filters=64 515 | size=1 516 | stride=1 517 | pad=1 518 | activation=mish 519 | 520 | [route] 521 | layers = -2 522 | 523 | [convolutional] 524 | batch_normalize=1 525 | filters=64 526 | size=1 527 | stride=1 528 | pad=1 529 | activation=mish 530 | 531 | [convolutional] 532 | batch_normalize=1 533 | size=3 534 | stride=1 535 | pad=1 536 | filters=64 537 | activation=mish 538 | 539 | [route] 540 | layers = -1, -4 541 | 542 | [convolutional] 543 | batch_normalize=1 544 | filters=64 545 | size=1 546 | stride=1 547 | pad=1 548 | activation=mish 549 | 550 | ########################## 551 | 552 | [convolutional] 553 | batch_normalize=1 554 | size=3 555 | stride=1 556 | pad=1 557 | filters=128 558 | activation=mish 559 | 560 | [convolutional] 561 | size=1 562 | stride=1 563 | pad=1 564 | filters=255 565 | activation=linear 566 | 567 | 568 | [yolo] 569 | mask = 0,1,2 570 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 571 | classes=80 572 | num=9 573 | jitter=.3 574 | ignore_thresh = .7 575 | truth_thresh = 1 576 | random=1 577 | scale_x_y = 1.05 578 | iou_thresh=0.213 579 | cls_normalizer=1.0 580 | iou_normalizer=0.07 581 | iou_loss=ciou 582 | nms_kind=greedynms 583 | beta_nms=0.6 584 | 585 | [route] 586 | layers = -4 587 | 588 | [convolutional] 589 | batch_normalize=1 590 | size=3 591 | stride=2 592 | pad=1 593 | filters=128 594 | activation=mish 595 | 596 | [route] 597 | layers = -1, -18 598 | 599 | [convolutional] 600 | batch_normalize=1 601 | filters=128 602 | size=1 603 | stride=1 604 | pad=1 605 | activation=mish 606 | 607 | [convolutional] 608 | batch_normalize=1 609 | filters=128 610 | size=1 611 | stride=1 612 | pad=1 613 | activation=mish 614 | 615 | [route] 616 | layers = -2 617 | 618 | [convolutional] 619 | batch_normalize=1 620 | filters=128 621 | size=1 622 | stride=1 623 | pad=1 624 | activation=mish 625 | 626 | [convolutional] 627 | batch_normalize=1 628 | size=3 629 | stride=1 630 | pad=1 631 | filters=128 632 | activation=mish 633 | 634 | [route] 635 | layers = -1,-4 636 | 637 | [convolutional] 638 | batch_normalize=1 639 | filters=128 640 | size=1 641 | stride=1 642 | pad=1 643 | activation=mish 644 | 645 | [convolutional] 646 | batch_normalize=1 647 | size=3 648 | stride=1 649 | pad=1 650 | filters=256 651 | activation=mish 652 | 653 | [convolutional] 654 | size=1 655 | stride=1 656 | pad=1 657 | filters=255 658 | activation=linear 659 | 660 | 661 | [yolo] 662 | mask = 3,4,5 663 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 664 | classes=80 665 | num=9 666 | jitter=.3 667 | ignore_thresh = .7 668 | truth_thresh = 1 669 | random=1 670 | scale_x_y = 1.05 671 | iou_thresh=0.213 672 | cls_normalizer=1.0 673 | iou_normalizer=0.07 674 | iou_loss=ciou 675 | nms_kind=greedynms 676 | beta_nms=0.6 677 | 678 | [route] 679 | layers = -4 680 | 681 | [convolutional] 682 | batch_normalize=1 683 | size=3 684 | stride=2 685 | pad=1 686 | filters=256 687 | activation=mish 688 | 689 | [route] 690 | layers = -1, -43 691 | 692 | [convolutional] 693 | batch_normalize=1 694 | filters=256 695 | size=1 696 | stride=1 697 | pad=1 698 | activation=mish 699 | 700 | [convolutional] 701 | batch_normalize=1 702 | filters=256 703 | size=1 704 | stride=1 705 | pad=1 706 | activation=mish 707 | 708 | [route] 709 | layers = -2 710 | 711 | [convolutional] 712 | batch_normalize=1 713 | filters=256 714 | size=1 715 | stride=1 716 | pad=1 717 | activation=mish 718 | 719 | [convolutional] 720 | batch_normalize=1 721 | size=3 722 | stride=1 723 | pad=1 724 | filters=256 725 | activation=mish 726 | 727 | [route] 728 | layers = -1,-4 729 | 730 | [convolutional] 731 | batch_normalize=1 732 | filters=256 733 | size=1 734 | stride=1 735 | pad=1 736 | activation=mish 737 | 738 | [convolutional] 739 | batch_normalize=1 740 | size=3 741 | stride=1 742 | pad=1 743 | filters=512 744 | activation=mish 745 | 746 | [convolutional] 747 | size=1 748 | stride=1 749 | pad=1 750 | filters=255 751 | activation=linear 752 | 753 | 754 | [yolo] 755 | mask = 6,7,8 756 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 757 | classes=80 758 | num=9 759 | jitter=.3 760 | ignore_thresh = .7 761 | truth_thresh = 1 762 | random=1 763 | scale_x_y = 1.05 764 | iou_thresh=0.213 765 | cls_normalizer=1.0 766 | iou_normalizer=0.07 767 | iou_loss=ciou 768 | nms_kind=greedynms 769 | beta_nms=0.6 770 | -------------------------------------------------------------------------------- /cfg/yolov4-pacsp-s.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | #batch=1 4 | #subdivisions=1 5 | # Training 6 | batch=64 7 | subdivisions=8 8 | width=512 9 | height=512 10 | channels=3 11 | momentum=0.949 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.00261 19 | burn_in=1000 20 | max_batches = 500500 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | mosaic=1 26 | 27 | [convolutional] 28 | batch_normalize=1 29 | filters=32 30 | size=3 31 | stride=1 32 | pad=1 33 | activation=leaky 34 | 35 | # Downsample 36 | 37 | [convolutional] 38 | batch_normalize=1 39 | filters=32 40 | size=3 41 | stride=2 42 | pad=1 43 | activation=leaky 44 | 45 | [convolutional] 46 | batch_normalize=1 47 | filters=32 48 | size=1 49 | stride=1 50 | pad=1 51 | activation=leaky 52 | 53 | [convolutional] 54 | batch_normalize=1 55 | filters=32 56 | size=3 57 | stride=1 58 | pad=1 59 | activation=leaky 60 | 61 | [shortcut] 62 | from=-3 63 | activation=linear 64 | 65 | # Downsample 66 | 67 | [convolutional] 68 | batch_normalize=1 69 | filters=64 70 | size=3 71 | stride=2 72 | pad=1 73 | activation=leaky 74 | 75 | [convolutional] 76 | batch_normalize=1 77 | filters=32 78 | size=1 79 | stride=1 80 | pad=1 81 | activation=leaky 82 | 83 | [route] 84 | layers = -2 85 | 86 | [convolutional] 87 | batch_normalize=1 88 | filters=32 89 | size=1 90 | stride=1 91 | pad=1 92 | activation=leaky 93 | 94 | [convolutional] 95 | batch_normalize=1 96 | filters=32 97 | size=1 98 | stride=1 99 | pad=1 100 | activation=leaky 101 | 102 | [convolutional] 103 | batch_normalize=1 104 | filters=32 105 | size=3 106 | stride=1 107 | pad=1 108 | activation=leaky 109 | 110 | [shortcut] 111 | from=-3 112 | activation=linear 113 | 114 | [convolutional] 115 | batch_normalize=1 116 | filters=32 117 | size=1 118 | stride=1 119 | pad=1 120 | activation=leaky 121 | 122 | [route] 123 | layers = -1,-7 124 | 125 | [convolutional] 126 | batch_normalize=1 127 | filters=64 128 | size=1 129 | stride=1 130 | pad=1 131 | activation=leaky 132 | 133 | # Downsample 134 | 135 | [convolutional] 136 | batch_normalize=1 137 | filters=128 138 | size=3 139 | stride=2 140 | pad=1 141 | activation=leaky 142 | 143 | [convolutional] 144 | batch_normalize=1 145 | filters=64 146 | size=1 147 | stride=1 148 | pad=1 149 | activation=leaky 150 | 151 | [route] 152 | layers = -2 153 | 154 | [convolutional] 155 | batch_normalize=1 156 | filters=64 157 | size=1 158 | stride=1 159 | pad=1 160 | activation=leaky 161 | 162 | [convolutional] 163 | batch_normalize=1 164 | filters=64 165 | size=1 166 | stride=1 167 | pad=1 168 | activation=leaky 169 | 170 | [convolutional] 171 | batch_normalize=1 172 | filters=64 173 | size=3 174 | stride=1 175 | pad=1 176 | activation=leaky 177 | 178 | [shortcut] 179 | from=-3 180 | activation=linear 181 | 182 | [convolutional] 183 | batch_normalize=1 184 | filters=64 185 | size=1 186 | stride=1 187 | pad=1 188 | activation=leaky 189 | 190 | [route] 191 | layers = -1,-7 192 | 193 | [convolutional] 194 | batch_normalize=1 195 | filters=128 196 | size=1 197 | stride=1 198 | pad=1 199 | activation=leaky 200 | 201 | # Downsample 202 | 203 | [convolutional] 204 | batch_normalize=1 205 | filters=256 206 | size=3 207 | stride=2 208 | pad=1 209 | activation=leaky 210 | 211 | [convolutional] 212 | batch_normalize=1 213 | filters=128 214 | size=1 215 | stride=1 216 | pad=1 217 | activation=leaky 218 | 219 | [route] 220 | layers = -2 221 | 222 | [convolutional] 223 | batch_normalize=1 224 | filters=128 225 | size=1 226 | stride=1 227 | pad=1 228 | activation=leaky 229 | 230 | [convolutional] 231 | batch_normalize=1 232 | filters=128 233 | size=1 234 | stride=1 235 | pad=1 236 | activation=leaky 237 | 238 | [convolutional] 239 | batch_normalize=1 240 | filters=128 241 | size=3 242 | stride=1 243 | pad=1 244 | activation=leaky 245 | 246 | [shortcut] 247 | from=-3 248 | activation=linear 249 | 250 | [convolutional] 251 | batch_normalize=1 252 | filters=128 253 | size=1 254 | stride=1 255 | pad=1 256 | activation=leaky 257 | 258 | [route] 259 | layers = -1,-7 260 | 261 | [convolutional] 262 | batch_normalize=1 263 | filters=256 264 | size=1 265 | stride=1 266 | pad=1 267 | activation=leaky 268 | 269 | # Downsample 270 | 271 | [convolutional] 272 | batch_normalize=1 273 | filters=512 274 | size=3 275 | stride=2 276 | pad=1 277 | activation=leaky 278 | 279 | [convolutional] 280 | batch_normalize=1 281 | filters=256 282 | size=1 283 | stride=1 284 | pad=1 285 | activation=leaky 286 | 287 | [route] 288 | layers = -2 289 | 290 | [convolutional] 291 | batch_normalize=1 292 | filters=256 293 | size=1 294 | stride=1 295 | pad=1 296 | activation=leaky 297 | 298 | [convolutional] 299 | batch_normalize=1 300 | filters=256 301 | size=1 302 | stride=1 303 | pad=1 304 | activation=leaky 305 | 306 | [convolutional] 307 | batch_normalize=1 308 | filters=256 309 | size=3 310 | stride=1 311 | pad=1 312 | activation=leaky 313 | 314 | [shortcut] 315 | from=-3 316 | activation=linear 317 | 318 | [convolutional] 319 | batch_normalize=1 320 | filters=256 321 | size=1 322 | stride=1 323 | pad=1 324 | activation=leaky 325 | 326 | [route] 327 | layers = -1,-7 328 | 329 | [convolutional] 330 | batch_normalize=1 331 | filters=512 332 | size=1 333 | stride=1 334 | pad=1 335 | activation=leaky 336 | 337 | ########################## 338 | 339 | [convolutional] 340 | batch_normalize=1 341 | filters=256 342 | size=1 343 | stride=1 344 | pad=1 345 | activation=leaky 346 | 347 | [route] 348 | layers = -2 349 | 350 | [convolutional] 351 | batch_normalize=1 352 | filters=256 353 | size=1 354 | stride=1 355 | pad=1 356 | activation=leaky 357 | 358 | ### SPP ### 359 | [maxpool] 360 | stride=1 361 | size=5 362 | 363 | [route] 364 | layers=-2 365 | 366 | [maxpool] 367 | stride=1 368 | size=9 369 | 370 | [route] 371 | layers=-4 372 | 373 | [maxpool] 374 | stride=1 375 | size=13 376 | 377 | [route] 378 | layers=-1,-3,-5,-6 379 | ### End SPP ### 380 | 381 | [convolutional] 382 | batch_normalize=1 383 | filters=256 384 | size=1 385 | stride=1 386 | pad=1 387 | activation=leaky 388 | 389 | [convolutional] 390 | batch_normalize=1 391 | size=3 392 | stride=1 393 | pad=1 394 | filters=256 395 | activation=leaky 396 | 397 | [route] 398 | layers = -1, -11 399 | 400 | [convolutional] 401 | batch_normalize=1 402 | filters=256 403 | size=1 404 | stride=1 405 | pad=1 406 | activation=leaky 407 | 408 | [convolutional] 409 | batch_normalize=1 410 | filters=128 411 | size=1 412 | stride=1 413 | pad=1 414 | activation=leaky 415 | 416 | [upsample] 417 | stride=2 418 | 419 | [route] 420 | layers = 34 421 | 422 | [convolutional] 423 | batch_normalize=1 424 | filters=128 425 | size=1 426 | stride=1 427 | pad=1 428 | activation=leaky 429 | 430 | [route] 431 | layers = -1, -3 432 | 433 | [convolutional] 434 | batch_normalize=1 435 | filters=128 436 | size=1 437 | stride=1 438 | pad=1 439 | activation=leaky 440 | 441 | [convolutional] 442 | batch_normalize=1 443 | filters=128 444 | size=1 445 | stride=1 446 | pad=1 447 | activation=leaky 448 | 449 | [route] 450 | layers = -2 451 | 452 | [convolutional] 453 | batch_normalize=1 454 | filters=128 455 | size=1 456 | stride=1 457 | pad=1 458 | activation=leaky 459 | 460 | [convolutional] 461 | batch_normalize=1 462 | size=3 463 | stride=1 464 | pad=1 465 | filters=128 466 | activation=leaky 467 | 468 | [route] 469 | layers = -1, -4 470 | 471 | [convolutional] 472 | batch_normalize=1 473 | filters=128 474 | size=1 475 | stride=1 476 | pad=1 477 | activation=leaky 478 | 479 | [convolutional] 480 | batch_normalize=1 481 | filters=64 482 | size=1 483 | stride=1 484 | pad=1 485 | activation=leaky 486 | 487 | [upsample] 488 | stride=2 489 | 490 | [route] 491 | layers = 24 492 | 493 | [convolutional] 494 | batch_normalize=1 495 | filters=64 496 | size=1 497 | stride=1 498 | pad=1 499 | activation=leaky 500 | 501 | [route] 502 | layers = -1, -3 503 | 504 | [convolutional] 505 | batch_normalize=1 506 | filters=64 507 | size=1 508 | stride=1 509 | pad=1 510 | activation=leaky 511 | 512 | [convolutional] 513 | batch_normalize=1 514 | filters=64 515 | size=1 516 | stride=1 517 | pad=1 518 | activation=leaky 519 | 520 | [route] 521 | layers = -2 522 | 523 | [convolutional] 524 | batch_normalize=1 525 | filters=64 526 | size=1 527 | stride=1 528 | pad=1 529 | activation=leaky 530 | 531 | [convolutional] 532 | batch_normalize=1 533 | size=3 534 | stride=1 535 | pad=1 536 | filters=64 537 | activation=leaky 538 | 539 | [route] 540 | layers = -1, -4 541 | 542 | [convolutional] 543 | batch_normalize=1 544 | filters=64 545 | size=1 546 | stride=1 547 | pad=1 548 | activation=leaky 549 | 550 | ########################## 551 | 552 | [convolutional] 553 | batch_normalize=1 554 | size=3 555 | stride=1 556 | pad=1 557 | filters=128 558 | activation=leaky 559 | 560 | [convolutional] 561 | size=1 562 | stride=1 563 | pad=1 564 | filters=255 565 | activation=linear 566 | 567 | 568 | [yolo] 569 | mask = 0,1,2 570 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 571 | classes=80 572 | num=9 573 | jitter=.3 574 | ignore_thresh = .7 575 | truth_thresh = 1 576 | random=1 577 | scale_x_y = 1.05 578 | iou_thresh=0.213 579 | cls_normalizer=1.0 580 | iou_normalizer=0.07 581 | iou_loss=ciou 582 | nms_kind=greedynms 583 | beta_nms=0.6 584 | 585 | [route] 586 | layers = -4 587 | 588 | [convolutional] 589 | batch_normalize=1 590 | size=3 591 | stride=2 592 | pad=1 593 | filters=128 594 | activation=leaky 595 | 596 | [route] 597 | layers = -1, -18 598 | 599 | [convolutional] 600 | batch_normalize=1 601 | filters=128 602 | size=1 603 | stride=1 604 | pad=1 605 | activation=leaky 606 | 607 | [convolutional] 608 | batch_normalize=1 609 | filters=128 610 | size=1 611 | stride=1 612 | pad=1 613 | activation=leaky 614 | 615 | [route] 616 | layers = -2 617 | 618 | [convolutional] 619 | batch_normalize=1 620 | filters=128 621 | size=1 622 | stride=1 623 | pad=1 624 | activation=leaky 625 | 626 | [convolutional] 627 | batch_normalize=1 628 | size=3 629 | stride=1 630 | pad=1 631 | filters=128 632 | activation=leaky 633 | 634 | [route] 635 | layers = -1,-4 636 | 637 | [convolutional] 638 | batch_normalize=1 639 | filters=128 640 | size=1 641 | stride=1 642 | pad=1 643 | activation=leaky 644 | 645 | [convolutional] 646 | batch_normalize=1 647 | size=3 648 | stride=1 649 | pad=1 650 | filters=256 651 | activation=leaky 652 | 653 | [convolutional] 654 | size=1 655 | stride=1 656 | pad=1 657 | filters=255 658 | activation=linear 659 | 660 | 661 | [yolo] 662 | mask = 3,4,5 663 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 664 | classes=80 665 | num=9 666 | jitter=.3 667 | ignore_thresh = .7 668 | truth_thresh = 1 669 | random=1 670 | scale_x_y = 1.05 671 | iou_thresh=0.213 672 | cls_normalizer=1.0 673 | iou_normalizer=0.07 674 | iou_loss=ciou 675 | nms_kind=greedynms 676 | beta_nms=0.6 677 | 678 | [route] 679 | layers = -4 680 | 681 | [convolutional] 682 | batch_normalize=1 683 | size=3 684 | stride=2 685 | pad=1 686 | filters=256 687 | activation=leaky 688 | 689 | [route] 690 | layers = -1, -43 691 | 692 | [convolutional] 693 | batch_normalize=1 694 | filters=256 695 | size=1 696 | stride=1 697 | pad=1 698 | activation=leaky 699 | 700 | [convolutional] 701 | batch_normalize=1 702 | filters=256 703 | size=1 704 | stride=1 705 | pad=1 706 | activation=leaky 707 | 708 | [route] 709 | layers = -2 710 | 711 | [convolutional] 712 | batch_normalize=1 713 | filters=256 714 | size=1 715 | stride=1 716 | pad=1 717 | activation=leaky 718 | 719 | [convolutional] 720 | batch_normalize=1 721 | size=3 722 | stride=1 723 | pad=1 724 | filters=256 725 | activation=leaky 726 | 727 | [route] 728 | layers = -1,-4 729 | 730 | [convolutional] 731 | batch_normalize=1 732 | filters=256 733 | size=1 734 | stride=1 735 | pad=1 736 | activation=leaky 737 | 738 | [convolutional] 739 | batch_normalize=1 740 | size=3 741 | stride=1 742 | pad=1 743 | filters=512 744 | activation=leaky 745 | 746 | [convolutional] 747 | size=1 748 | stride=1 749 | pad=1 750 | filters=255 751 | activation=linear 752 | 753 | 754 | [yolo] 755 | mask = 6,7,8 756 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 757 | classes=80 758 | num=9 759 | jitter=.3 760 | ignore_thresh = .7 761 | truth_thresh = 1 762 | random=1 763 | scale_x_y = 1.05 764 | iou_thresh=0.213 765 | cls_normalizer=1.0 766 | iou_normalizer=0.07 767 | iou_loss=ciou 768 | nms_kind=greedynms 769 | beta_nms=0.6 770 | -------------------------------------------------------------------------------- /cfg/yolov4-pacsp.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | #batch=1 4 | #subdivisions=1 5 | # Training 6 | batch=64 7 | subdivisions=8 8 | width=512 9 | height=512 10 | channels=3 11 | momentum=0.949 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.00261 19 | burn_in=1000 20 | max_batches = 500500 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | #cutmix=1 26 | mosaic=1 27 | 28 | #23:104x104 54:52x52 85:26x26 104:13x13 for 416 29 | 30 | 31 | 32 | [convolutional] 33 | batch_normalize=1 34 | filters=32 35 | size=3 36 | stride=1 37 | pad=1 38 | activation=leaky 39 | 40 | # Downsample 41 | 42 | [convolutional] 43 | batch_normalize=1 44 | filters=64 45 | size=3 46 | stride=2 47 | pad=1 48 | activation=leaky 49 | 50 | #[convolutional] 51 | #batch_normalize=1 52 | #filters=64 53 | #size=1 54 | #stride=1 55 | #pad=1 56 | #activation=leaky 57 | 58 | #[route] 59 | #layers = -2 60 | 61 | #[convolutional] 62 | #batch_normalize=1 63 | #filters=64 64 | #size=1 65 | #stride=1 66 | #pad=1 67 | #activation=leaky 68 | 69 | [convolutional] 70 | batch_normalize=1 71 | filters=32 72 | size=1 73 | stride=1 74 | pad=1 75 | activation=leaky 76 | 77 | [convolutional] 78 | batch_normalize=1 79 | filters=64 80 | size=3 81 | stride=1 82 | pad=1 83 | activation=leaky 84 | 85 | [shortcut] 86 | from=-3 87 | activation=linear 88 | 89 | #[convolutional] 90 | #batch_normalize=1 91 | #filters=64 92 | #size=1 93 | #stride=1 94 | #pad=1 95 | #activation=leaky 96 | 97 | #[route] 98 | #layers = -1,-7 99 | 100 | #[convolutional] 101 | #batch_normalize=1 102 | #filters=64 103 | #size=1 104 | #stride=1 105 | #pad=1 106 | #activation=leaky 107 | 108 | # Downsample 109 | 110 | [convolutional] 111 | batch_normalize=1 112 | filters=128 113 | size=3 114 | stride=2 115 | pad=1 116 | activation=leaky 117 | 118 | [convolutional] 119 | batch_normalize=1 120 | filters=64 121 | size=1 122 | stride=1 123 | pad=1 124 | activation=leaky 125 | 126 | [route] 127 | layers = -2 128 | 129 | [convolutional] 130 | batch_normalize=1 131 | filters=64 132 | size=1 133 | stride=1 134 | pad=1 135 | activation=leaky 136 | 137 | [convolutional] 138 | batch_normalize=1 139 | filters=64 140 | size=1 141 | stride=1 142 | pad=1 143 | activation=leaky 144 | 145 | [convolutional] 146 | batch_normalize=1 147 | filters=64 148 | size=3 149 | stride=1 150 | pad=1 151 | activation=leaky 152 | 153 | [shortcut] 154 | from=-3 155 | activation=linear 156 | 157 | [convolutional] 158 | batch_normalize=1 159 | filters=64 160 | size=1 161 | stride=1 162 | pad=1 163 | activation=leaky 164 | 165 | [convolutional] 166 | batch_normalize=1 167 | filters=64 168 | size=3 169 | stride=1 170 | pad=1 171 | activation=leaky 172 | 173 | [shortcut] 174 | from=-3 175 | activation=linear 176 | 177 | [convolutional] 178 | batch_normalize=1 179 | filters=64 180 | size=1 181 | stride=1 182 | pad=1 183 | activation=leaky 184 | 185 | [route] 186 | layers = -1,-10 187 | 188 | [convolutional] 189 | batch_normalize=1 190 | filters=128 191 | size=1 192 | stride=1 193 | pad=1 194 | activation=leaky 195 | 196 | # Downsample 197 | 198 | [convolutional] 199 | batch_normalize=1 200 | filters=256 201 | size=3 202 | stride=2 203 | pad=1 204 | activation=leaky 205 | 206 | [convolutional] 207 | batch_normalize=1 208 | filters=128 209 | size=1 210 | stride=1 211 | pad=1 212 | activation=leaky 213 | 214 | [route] 215 | layers = -2 216 | 217 | [convolutional] 218 | batch_normalize=1 219 | filters=128 220 | size=1 221 | stride=1 222 | pad=1 223 | activation=leaky 224 | 225 | [convolutional] 226 | batch_normalize=1 227 | filters=128 228 | size=1 229 | stride=1 230 | pad=1 231 | activation=leaky 232 | 233 | [convolutional] 234 | batch_normalize=1 235 | filters=128 236 | size=3 237 | stride=1 238 | pad=1 239 | activation=leaky 240 | 241 | [shortcut] 242 | from=-3 243 | activation=linear 244 | 245 | [convolutional] 246 | batch_normalize=1 247 | filters=128 248 | size=1 249 | stride=1 250 | pad=1 251 | activation=leaky 252 | 253 | [convolutional] 254 | batch_normalize=1 255 | filters=128 256 | size=3 257 | stride=1 258 | pad=1 259 | activation=leaky 260 | 261 | [shortcut] 262 | from=-3 263 | activation=linear 264 | 265 | [convolutional] 266 | batch_normalize=1 267 | filters=128 268 | size=1 269 | stride=1 270 | pad=1 271 | activation=leaky 272 | 273 | [convolutional] 274 | batch_normalize=1 275 | filters=128 276 | size=3 277 | stride=1 278 | pad=1 279 | activation=leaky 280 | 281 | [shortcut] 282 | from=-3 283 | activation=linear 284 | 285 | [convolutional] 286 | batch_normalize=1 287 | filters=128 288 | size=1 289 | stride=1 290 | pad=1 291 | activation=leaky 292 | 293 | [convolutional] 294 | batch_normalize=1 295 | filters=128 296 | size=3 297 | stride=1 298 | pad=1 299 | activation=leaky 300 | 301 | [shortcut] 302 | from=-3 303 | activation=linear 304 | 305 | 306 | [convolutional] 307 | batch_normalize=1 308 | filters=128 309 | size=1 310 | stride=1 311 | pad=1 312 | activation=leaky 313 | 314 | [convolutional] 315 | batch_normalize=1 316 | filters=128 317 | size=3 318 | stride=1 319 | pad=1 320 | activation=leaky 321 | 322 | [shortcut] 323 | from=-3 324 | activation=linear 325 | 326 | [convolutional] 327 | batch_normalize=1 328 | filters=128 329 | size=1 330 | stride=1 331 | pad=1 332 | activation=leaky 333 | 334 | [convolutional] 335 | batch_normalize=1 336 | filters=128 337 | size=3 338 | stride=1 339 | pad=1 340 | activation=leaky 341 | 342 | [shortcut] 343 | from=-3 344 | activation=linear 345 | 346 | [convolutional] 347 | batch_normalize=1 348 | filters=128 349 | size=1 350 | stride=1 351 | pad=1 352 | activation=leaky 353 | 354 | [convolutional] 355 | batch_normalize=1 356 | filters=128 357 | size=3 358 | stride=1 359 | pad=1 360 | activation=leaky 361 | 362 | [shortcut] 363 | from=-3 364 | activation=linear 365 | 366 | [convolutional] 367 | batch_normalize=1 368 | filters=128 369 | size=1 370 | stride=1 371 | pad=1 372 | activation=leaky 373 | 374 | [convolutional] 375 | batch_normalize=1 376 | filters=128 377 | size=3 378 | stride=1 379 | pad=1 380 | activation=leaky 381 | 382 | [shortcut] 383 | from=-3 384 | activation=linear 385 | 386 | [convolutional] 387 | batch_normalize=1 388 | filters=128 389 | size=1 390 | stride=1 391 | pad=1 392 | activation=leaky 393 | 394 | [route] 395 | layers = -1,-28 396 | 397 | [convolutional] 398 | batch_normalize=1 399 | filters=256 400 | size=1 401 | stride=1 402 | pad=1 403 | activation=leaky 404 | 405 | # Downsample 406 | 407 | [convolutional] 408 | batch_normalize=1 409 | filters=512 410 | size=3 411 | stride=2 412 | pad=1 413 | activation=leaky 414 | 415 | [convolutional] 416 | batch_normalize=1 417 | filters=256 418 | size=1 419 | stride=1 420 | pad=1 421 | activation=leaky 422 | 423 | [route] 424 | layers = -2 425 | 426 | [convolutional] 427 | batch_normalize=1 428 | filters=256 429 | size=1 430 | stride=1 431 | pad=1 432 | activation=leaky 433 | 434 | [convolutional] 435 | batch_normalize=1 436 | filters=256 437 | size=1 438 | stride=1 439 | pad=1 440 | activation=leaky 441 | 442 | [convolutional] 443 | batch_normalize=1 444 | filters=256 445 | size=3 446 | stride=1 447 | pad=1 448 | activation=leaky 449 | 450 | [shortcut] 451 | from=-3 452 | activation=linear 453 | 454 | 455 | [convolutional] 456 | batch_normalize=1 457 | filters=256 458 | size=1 459 | stride=1 460 | pad=1 461 | activation=leaky 462 | 463 | [convolutional] 464 | batch_normalize=1 465 | filters=256 466 | size=3 467 | stride=1 468 | pad=1 469 | activation=leaky 470 | 471 | [shortcut] 472 | from=-3 473 | activation=linear 474 | 475 | 476 | [convolutional] 477 | batch_normalize=1 478 | filters=256 479 | size=1 480 | stride=1 481 | pad=1 482 | activation=leaky 483 | 484 | [convolutional] 485 | batch_normalize=1 486 | filters=256 487 | size=3 488 | stride=1 489 | pad=1 490 | activation=leaky 491 | 492 | [shortcut] 493 | from=-3 494 | activation=linear 495 | 496 | 497 | [convolutional] 498 | batch_normalize=1 499 | filters=256 500 | size=1 501 | stride=1 502 | pad=1 503 | activation=leaky 504 | 505 | [convolutional] 506 | batch_normalize=1 507 | filters=256 508 | size=3 509 | stride=1 510 | pad=1 511 | activation=leaky 512 | 513 | [shortcut] 514 | from=-3 515 | activation=linear 516 | 517 | 518 | [convolutional] 519 | batch_normalize=1 520 | filters=256 521 | size=1 522 | stride=1 523 | pad=1 524 | activation=leaky 525 | 526 | [convolutional] 527 | batch_normalize=1 528 | filters=256 529 | size=3 530 | stride=1 531 | pad=1 532 | activation=leaky 533 | 534 | [shortcut] 535 | from=-3 536 | activation=linear 537 | 538 | 539 | [convolutional] 540 | batch_normalize=1 541 | filters=256 542 | size=1 543 | stride=1 544 | pad=1 545 | activation=leaky 546 | 547 | [convolutional] 548 | batch_normalize=1 549 | filters=256 550 | size=3 551 | stride=1 552 | pad=1 553 | activation=leaky 554 | 555 | [shortcut] 556 | from=-3 557 | activation=linear 558 | 559 | 560 | [convolutional] 561 | batch_normalize=1 562 | filters=256 563 | size=1 564 | stride=1 565 | pad=1 566 | activation=leaky 567 | 568 | [convolutional] 569 | batch_normalize=1 570 | filters=256 571 | size=3 572 | stride=1 573 | pad=1 574 | activation=leaky 575 | 576 | [shortcut] 577 | from=-3 578 | activation=linear 579 | 580 | [convolutional] 581 | batch_normalize=1 582 | filters=256 583 | size=1 584 | stride=1 585 | pad=1 586 | activation=leaky 587 | 588 | [convolutional] 589 | batch_normalize=1 590 | filters=256 591 | size=3 592 | stride=1 593 | pad=1 594 | activation=leaky 595 | 596 | [shortcut] 597 | from=-3 598 | activation=linear 599 | 600 | [convolutional] 601 | batch_normalize=1 602 | filters=256 603 | size=1 604 | stride=1 605 | pad=1 606 | activation=leaky 607 | 608 | [route] 609 | layers = -1,-28 610 | 611 | [convolutional] 612 | batch_normalize=1 613 | filters=512 614 | size=1 615 | stride=1 616 | pad=1 617 | activation=leaky 618 | 619 | # Downsample 620 | 621 | [convolutional] 622 | batch_normalize=1 623 | filters=1024 624 | size=3 625 | stride=2 626 | pad=1 627 | activation=leaky 628 | 629 | [convolutional] 630 | batch_normalize=1 631 | filters=512 632 | size=1 633 | stride=1 634 | pad=1 635 | activation=leaky 636 | 637 | [route] 638 | layers = -2 639 | 640 | [convolutional] 641 | batch_normalize=1 642 | filters=512 643 | size=1 644 | stride=1 645 | pad=1 646 | activation=leaky 647 | 648 | [convolutional] 649 | batch_normalize=1 650 | filters=512 651 | size=1 652 | stride=1 653 | pad=1 654 | activation=leaky 655 | 656 | [convolutional] 657 | batch_normalize=1 658 | filters=512 659 | size=3 660 | stride=1 661 | pad=1 662 | activation=leaky 663 | 664 | [shortcut] 665 | from=-3 666 | activation=linear 667 | 668 | [convolutional] 669 | batch_normalize=1 670 | filters=512 671 | size=1 672 | stride=1 673 | pad=1 674 | activation=leaky 675 | 676 | [convolutional] 677 | batch_normalize=1 678 | filters=512 679 | size=3 680 | stride=1 681 | pad=1 682 | activation=leaky 683 | 684 | [shortcut] 685 | from=-3 686 | activation=linear 687 | 688 | [convolutional] 689 | batch_normalize=1 690 | filters=512 691 | size=1 692 | stride=1 693 | pad=1 694 | activation=leaky 695 | 696 | [convolutional] 697 | batch_normalize=1 698 | filters=512 699 | size=3 700 | stride=1 701 | pad=1 702 | activation=leaky 703 | 704 | [shortcut] 705 | from=-3 706 | activation=linear 707 | 708 | [convolutional] 709 | batch_normalize=1 710 | filters=512 711 | size=1 712 | stride=1 713 | pad=1 714 | activation=leaky 715 | 716 | [convolutional] 717 | batch_normalize=1 718 | filters=512 719 | size=3 720 | stride=1 721 | pad=1 722 | activation=leaky 723 | 724 | [shortcut] 725 | from=-3 726 | activation=linear 727 | 728 | [convolutional] 729 | batch_normalize=1 730 | filters=512 731 | size=1 732 | stride=1 733 | pad=1 734 | activation=leaky 735 | 736 | [route] 737 | layers = -1,-16 738 | 739 | [convolutional] 740 | batch_normalize=1 741 | filters=1024 742 | size=1 743 | stride=1 744 | pad=1 745 | activation=leaky 746 | 747 | ########################## 748 | 749 | [convolutional] 750 | batch_normalize=1 751 | filters=512 752 | size=1 753 | stride=1 754 | pad=1 755 | activation=leaky 756 | 757 | [route] 758 | layers = -2 759 | 760 | [convolutional] 761 | batch_normalize=1 762 | filters=512 763 | size=1 764 | stride=1 765 | pad=1 766 | activation=leaky 767 | 768 | [convolutional] 769 | batch_normalize=1 770 | size=3 771 | stride=1 772 | pad=1 773 | filters=512 774 | activation=leaky 775 | 776 | [convolutional] 777 | batch_normalize=1 778 | filters=512 779 | size=1 780 | stride=1 781 | pad=1 782 | activation=leaky 783 | 784 | ### SPP ### 785 | [maxpool] 786 | stride=1 787 | size=5 788 | 789 | [route] 790 | layers=-2 791 | 792 | [maxpool] 793 | stride=1 794 | size=9 795 | 796 | [route] 797 | layers=-4 798 | 799 | [maxpool] 800 | stride=1 801 | size=13 802 | 803 | [route] 804 | layers=-1,-3,-5,-6 805 | ### End SPP ### 806 | 807 | [convolutional] 808 | batch_normalize=1 809 | filters=512 810 | size=1 811 | stride=1 812 | pad=1 813 | activation=leaky 814 | 815 | [convolutional] 816 | batch_normalize=1 817 | size=3 818 | stride=1 819 | pad=1 820 | filters=512 821 | activation=leaky 822 | 823 | [route] 824 | layers = -1, -13 825 | 826 | [convolutional] 827 | batch_normalize=1 828 | filters=512 829 | size=1 830 | stride=1 831 | pad=1 832 | activation=leaky 833 | 834 | [convolutional] 835 | batch_normalize=1 836 | filters=256 837 | size=1 838 | stride=1 839 | pad=1 840 | activation=leaky 841 | 842 | [upsample] 843 | stride=2 844 | 845 | [route] 846 | layers = 79 847 | 848 | [convolutional] 849 | batch_normalize=1 850 | filters=256 851 | size=1 852 | stride=1 853 | pad=1 854 | activation=leaky 855 | 856 | [route] 857 | layers = -1, -3 858 | 859 | [convolutional] 860 | batch_normalize=1 861 | filters=256 862 | size=1 863 | stride=1 864 | pad=1 865 | activation=leaky 866 | 867 | [convolutional] 868 | batch_normalize=1 869 | filters=256 870 | size=1 871 | stride=1 872 | pad=1 873 | activation=leaky 874 | 875 | [route] 876 | layers = -2 877 | 878 | [convolutional] 879 | batch_normalize=1 880 | filters=256 881 | size=1 882 | stride=1 883 | pad=1 884 | activation=leaky 885 | 886 | [convolutional] 887 | batch_normalize=1 888 | size=3 889 | stride=1 890 | pad=1 891 | filters=256 892 | activation=leaky 893 | 894 | [convolutional] 895 | batch_normalize=1 896 | filters=256 897 | size=1 898 | stride=1 899 | pad=1 900 | activation=leaky 901 | 902 | [convolutional] 903 | batch_normalize=1 904 | size=3 905 | stride=1 906 | pad=1 907 | filters=256 908 | activation=leaky 909 | 910 | [route] 911 | layers = -1, -6 912 | 913 | [convolutional] 914 | batch_normalize=1 915 | filters=256 916 | size=1 917 | stride=1 918 | pad=1 919 | activation=leaky 920 | 921 | [convolutional] 922 | batch_normalize=1 923 | filters=128 924 | size=1 925 | stride=1 926 | pad=1 927 | activation=leaky 928 | 929 | [upsample] 930 | stride=2 931 | 932 | [route] 933 | layers = 48 934 | 935 | [convolutional] 936 | batch_normalize=1 937 | filters=128 938 | size=1 939 | stride=1 940 | pad=1 941 | activation=leaky 942 | 943 | [route] 944 | layers = -1, -3 945 | 946 | [convolutional] 947 | batch_normalize=1 948 | filters=128 949 | size=1 950 | stride=1 951 | pad=1 952 | activation=leaky 953 | 954 | [convolutional] 955 | batch_normalize=1 956 | filters=128 957 | size=1 958 | stride=1 959 | pad=1 960 | activation=leaky 961 | 962 | [route] 963 | layers = -2 964 | 965 | [convolutional] 966 | batch_normalize=1 967 | filters=128 968 | size=1 969 | stride=1 970 | pad=1 971 | activation=leaky 972 | 973 | [convolutional] 974 | batch_normalize=1 975 | size=3 976 | stride=1 977 | pad=1 978 | filters=128 979 | activation=leaky 980 | 981 | [convolutional] 982 | batch_normalize=1 983 | filters=128 984 | size=1 985 | stride=1 986 | pad=1 987 | activation=leaky 988 | 989 | [convolutional] 990 | batch_normalize=1 991 | size=3 992 | stride=1 993 | pad=1 994 | filters=128 995 | activation=leaky 996 | 997 | [route] 998 | layers = -1, -6 999 | 1000 | [convolutional] 1001 | batch_normalize=1 1002 | filters=128 1003 | size=1 1004 | stride=1 1005 | pad=1 1006 | activation=leaky 1007 | 1008 | ########################## 1009 | 1010 | [convolutional] 1011 | batch_normalize=1 1012 | size=3 1013 | stride=1 1014 | pad=1 1015 | filters=256 1016 | activation=leaky 1017 | 1018 | [convolutional] 1019 | size=1 1020 | stride=1 1021 | pad=1 1022 | filters=255 1023 | activation=linear 1024 | 1025 | 1026 | [yolo] 1027 | mask = 0,1,2 1028 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 1029 | classes=80 1030 | num=9 1031 | jitter=.3 1032 | ignore_thresh = .7 1033 | truth_thresh = 1 1034 | random=1 1035 | scale_x_y = 1.05 1036 | iou_thresh=0.213 1037 | cls_normalizer=1.0 1038 | iou_normalizer=0.07 1039 | iou_loss=ciou 1040 | nms_kind=greedynms 1041 | beta_nms=0.6 1042 | 1043 | [route] 1044 | layers = -4 1045 | 1046 | [convolutional] 1047 | batch_normalize=1 1048 | size=3 1049 | stride=2 1050 | pad=1 1051 | filters=256 1052 | activation=leaky 1053 | 1054 | [route] 1055 | layers = -1, -20 1056 | 1057 | [convolutional] 1058 | batch_normalize=1 1059 | filters=256 1060 | size=1 1061 | stride=1 1062 | pad=1 1063 | activation=leaky 1064 | 1065 | [convolutional] 1066 | batch_normalize=1 1067 | filters=256 1068 | size=1 1069 | stride=1 1070 | pad=1 1071 | activation=leaky 1072 | 1073 | [route] 1074 | layers = -2 1075 | 1076 | [convolutional] 1077 | batch_normalize=1 1078 | filters=256 1079 | size=1 1080 | stride=1 1081 | pad=1 1082 | activation=leaky 1083 | 1084 | [convolutional] 1085 | batch_normalize=1 1086 | size=3 1087 | stride=1 1088 | pad=1 1089 | filters=256 1090 | activation=leaky 1091 | 1092 | [convolutional] 1093 | batch_normalize=1 1094 | filters=256 1095 | size=1 1096 | stride=1 1097 | pad=1 1098 | activation=leaky 1099 | 1100 | [convolutional] 1101 | batch_normalize=1 1102 | size=3 1103 | stride=1 1104 | pad=1 1105 | filters=256 1106 | activation=leaky 1107 | 1108 | [route] 1109 | layers = -1,-6 1110 | 1111 | [convolutional] 1112 | batch_normalize=1 1113 | filters=256 1114 | size=1 1115 | stride=1 1116 | pad=1 1117 | activation=leaky 1118 | 1119 | [convolutional] 1120 | batch_normalize=1 1121 | size=3 1122 | stride=1 1123 | pad=1 1124 | filters=512 1125 | activation=leaky 1126 | 1127 | [convolutional] 1128 | size=1 1129 | stride=1 1130 | pad=1 1131 | filters=255 1132 | activation=linear 1133 | 1134 | 1135 | [yolo] 1136 | mask = 3,4,5 1137 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 1138 | classes=80 1139 | num=9 1140 | jitter=.3 1141 | ignore_thresh = .7 1142 | truth_thresh = 1 1143 | random=1 1144 | scale_x_y = 1.05 1145 | iou_thresh=0.213 1146 | cls_normalizer=1.0 1147 | iou_normalizer=0.07 1148 | iou_loss=ciou 1149 | nms_kind=greedynms 1150 | beta_nms=0.6 1151 | 1152 | [route] 1153 | layers = -4 1154 | 1155 | [convolutional] 1156 | batch_normalize=1 1157 | size=3 1158 | stride=2 1159 | pad=1 1160 | filters=512 1161 | activation=leaky 1162 | 1163 | [route] 1164 | layers = -1, -49 1165 | 1166 | [convolutional] 1167 | batch_normalize=1 1168 | filters=512 1169 | size=1 1170 | stride=1 1171 | pad=1 1172 | activation=leaky 1173 | 1174 | [convolutional] 1175 | batch_normalize=1 1176 | filters=512 1177 | size=1 1178 | stride=1 1179 | pad=1 1180 | activation=leaky 1181 | 1182 | [route] 1183 | layers = -2 1184 | 1185 | [convolutional] 1186 | batch_normalize=1 1187 | filters=512 1188 | size=1 1189 | stride=1 1190 | pad=1 1191 | activation=leaky 1192 | 1193 | [convolutional] 1194 | batch_normalize=1 1195 | size=3 1196 | stride=1 1197 | pad=1 1198 | filters=512 1199 | activation=leaky 1200 | 1201 | [convolutional] 1202 | batch_normalize=1 1203 | filters=512 1204 | size=1 1205 | stride=1 1206 | pad=1 1207 | activation=leaky 1208 | 1209 | [convolutional] 1210 | batch_normalize=1 1211 | size=3 1212 | stride=1 1213 | pad=1 1214 | filters=512 1215 | activation=leaky 1216 | 1217 | [route] 1218 | layers = -1,-6 1219 | 1220 | [convolutional] 1221 | batch_normalize=1 1222 | filters=512 1223 | size=1 1224 | stride=1 1225 | pad=1 1226 | activation=leaky 1227 | 1228 | [convolutional] 1229 | batch_normalize=1 1230 | size=3 1231 | stride=1 1232 | pad=1 1233 | filters=1024 1234 | activation=leaky 1235 | 1236 | [convolutional] 1237 | size=1 1238 | stride=1 1239 | pad=1 1240 | filters=255 1241 | activation=linear 1242 | 1243 | 1244 | [yolo] 1245 | mask = 6,7,8 1246 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 1247 | classes=80 1248 | num=9 1249 | jitter=.3 1250 | ignore_thresh = .7 1251 | truth_thresh = 1 1252 | random=1 1253 | scale_x_y = 1.05 1254 | iou_thresh=0.213 1255 | cls_normalizer=1.0 1256 | iou_normalizer=0.07 1257 | iou_loss=ciou 1258 | nms_kind=greedynms 1259 | beta_nms=0.6 1260 | -------------------------------------------------------------------------------- /cfg/yolov4-paspp.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | #batch=1 4 | #subdivisions=1 5 | # Training 6 | batch=64 7 | subdivisions=16 8 | width=512 9 | height=512 10 | channels=3 11 | momentum=0.949 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.0013 19 | burn_in=1000 20 | max_batches = 500500 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | #cutmix=1 26 | mosaic=1 27 | 28 | #:104x104 54:52x52 85:26x26 104:13x13 for 416 29 | 30 | [convolutional] 31 | batch_normalize=1 32 | filters=32 33 | size=3 34 | stride=1 35 | pad=1 36 | activation=leaky 37 | 38 | # Downsample 39 | 40 | [convolutional] 41 | batch_normalize=1 42 | filters=64 43 | size=3 44 | stride=2 45 | pad=1 46 | activation=leaky 47 | 48 | [convolutional] 49 | batch_normalize=1 50 | filters=64 51 | size=1 52 | stride=1 53 | pad=1 54 | activation=leaky 55 | 56 | [route] 57 | layers = -2 58 | 59 | [convolutional] 60 | batch_normalize=1 61 | filters=64 62 | size=1 63 | stride=1 64 | pad=1 65 | activation=leaky 66 | 67 | [convolutional] 68 | batch_normalize=1 69 | filters=32 70 | size=1 71 | stride=1 72 | pad=1 73 | activation=leaky 74 | 75 | [convolutional] 76 | batch_normalize=1 77 | filters=64 78 | size=3 79 | stride=1 80 | pad=1 81 | activation=leaky 82 | 83 | [shortcut] 84 | from=-3 85 | activation=linear 86 | 87 | [convolutional] 88 | batch_normalize=1 89 | filters=64 90 | size=1 91 | stride=1 92 | pad=1 93 | activation=leaky 94 | 95 | [route] 96 | layers = -1,-7 97 | 98 | [convolutional] 99 | batch_normalize=1 100 | filters=64 101 | size=1 102 | stride=1 103 | pad=1 104 | activation=leaky 105 | 106 | # Downsample 107 | 108 | [convolutional] 109 | batch_normalize=1 110 | filters=128 111 | size=3 112 | stride=2 113 | pad=1 114 | activation=leaky 115 | 116 | [convolutional] 117 | batch_normalize=1 118 | filters=64 119 | size=1 120 | stride=1 121 | pad=1 122 | activation=leaky 123 | 124 | [route] 125 | layers = -2 126 | 127 | [convolutional] 128 | batch_normalize=1 129 | filters=64 130 | size=1 131 | stride=1 132 | pad=1 133 | activation=leaky 134 | 135 | [convolutional] 136 | batch_normalize=1 137 | filters=64 138 | size=1 139 | stride=1 140 | pad=1 141 | activation=leaky 142 | 143 | [convolutional] 144 | batch_normalize=1 145 | filters=64 146 | size=3 147 | stride=1 148 | pad=1 149 | activation=leaky 150 | 151 | [shortcut] 152 | from=-3 153 | activation=linear 154 | 155 | [convolutional] 156 | batch_normalize=1 157 | filters=64 158 | size=1 159 | stride=1 160 | pad=1 161 | activation=leaky 162 | 163 | [convolutional] 164 | batch_normalize=1 165 | filters=64 166 | size=3 167 | stride=1 168 | pad=1 169 | activation=leaky 170 | 171 | [shortcut] 172 | from=-3 173 | activation=linear 174 | 175 | [convolutional] 176 | batch_normalize=1 177 | filters=64 178 | size=1 179 | stride=1 180 | pad=1 181 | activation=leaky 182 | 183 | [route] 184 | layers = -1,-10 185 | 186 | [convolutional] 187 | batch_normalize=1 188 | filters=128 189 | size=1 190 | stride=1 191 | pad=1 192 | activation=leaky 193 | 194 | # Downsample 195 | 196 | [convolutional] 197 | batch_normalize=1 198 | filters=256 199 | size=3 200 | stride=2 201 | pad=1 202 | activation=leaky 203 | 204 | [convolutional] 205 | batch_normalize=1 206 | filters=128 207 | size=1 208 | stride=1 209 | pad=1 210 | activation=leaky 211 | 212 | [route] 213 | layers = -2 214 | 215 | [convolutional] 216 | batch_normalize=1 217 | filters=128 218 | size=1 219 | stride=1 220 | pad=1 221 | activation=leaky 222 | 223 | [convolutional] 224 | batch_normalize=1 225 | filters=128 226 | size=1 227 | stride=1 228 | pad=1 229 | activation=leaky 230 | 231 | [convolutional] 232 | batch_normalize=1 233 | filters=128 234 | size=3 235 | stride=1 236 | pad=1 237 | activation=leaky 238 | 239 | [shortcut] 240 | from=-3 241 | activation=linear 242 | 243 | [convolutional] 244 | batch_normalize=1 245 | filters=128 246 | size=1 247 | stride=1 248 | pad=1 249 | activation=leaky 250 | 251 | [convolutional] 252 | batch_normalize=1 253 | filters=128 254 | size=3 255 | stride=1 256 | pad=1 257 | activation=leaky 258 | 259 | [shortcut] 260 | from=-3 261 | activation=linear 262 | 263 | [convolutional] 264 | batch_normalize=1 265 | filters=128 266 | size=1 267 | stride=1 268 | pad=1 269 | activation=leaky 270 | 271 | [convolutional] 272 | batch_normalize=1 273 | filters=128 274 | size=3 275 | stride=1 276 | pad=1 277 | activation=leaky 278 | 279 | [shortcut] 280 | from=-3 281 | activation=linear 282 | 283 | [convolutional] 284 | batch_normalize=1 285 | filters=128 286 | size=1 287 | stride=1 288 | pad=1 289 | activation=leaky 290 | 291 | [convolutional] 292 | batch_normalize=1 293 | filters=128 294 | size=3 295 | stride=1 296 | pad=1 297 | activation=leaky 298 | 299 | [shortcut] 300 | from=-3 301 | activation=linear 302 | 303 | 304 | [convolutional] 305 | batch_normalize=1 306 | filters=128 307 | size=1 308 | stride=1 309 | pad=1 310 | activation=leaky 311 | 312 | [convolutional] 313 | batch_normalize=1 314 | filters=128 315 | size=3 316 | stride=1 317 | pad=1 318 | activation=leaky 319 | 320 | [shortcut] 321 | from=-3 322 | activation=linear 323 | 324 | [convolutional] 325 | batch_normalize=1 326 | filters=128 327 | size=1 328 | stride=1 329 | pad=1 330 | activation=leaky 331 | 332 | [convolutional] 333 | batch_normalize=1 334 | filters=128 335 | size=3 336 | stride=1 337 | pad=1 338 | activation=leaky 339 | 340 | [shortcut] 341 | from=-3 342 | activation=linear 343 | 344 | [convolutional] 345 | batch_normalize=1 346 | filters=128 347 | size=1 348 | stride=1 349 | pad=1 350 | activation=leaky 351 | 352 | [convolutional] 353 | batch_normalize=1 354 | filters=128 355 | size=3 356 | stride=1 357 | pad=1 358 | activation=leaky 359 | 360 | [shortcut] 361 | from=-3 362 | activation=linear 363 | 364 | [convolutional] 365 | batch_normalize=1 366 | filters=128 367 | size=1 368 | stride=1 369 | pad=1 370 | activation=leaky 371 | 372 | [convolutional] 373 | batch_normalize=1 374 | filters=128 375 | size=3 376 | stride=1 377 | pad=1 378 | activation=leaky 379 | 380 | [shortcut] 381 | from=-3 382 | activation=linear 383 | 384 | [convolutional] 385 | batch_normalize=1 386 | filters=128 387 | size=1 388 | stride=1 389 | pad=1 390 | activation=leaky 391 | 392 | [route] 393 | layers = -1,-28 394 | 395 | [convolutional] 396 | batch_normalize=1 397 | filters=256 398 | size=1 399 | stride=1 400 | pad=1 401 | activation=leaky 402 | 403 | # Downsample 404 | 405 | [convolutional] 406 | batch_normalize=1 407 | filters=512 408 | size=3 409 | stride=2 410 | pad=1 411 | activation=leaky 412 | 413 | [convolutional] 414 | batch_normalize=1 415 | filters=256 416 | size=1 417 | stride=1 418 | pad=1 419 | activation=leaky 420 | 421 | [route] 422 | layers = -2 423 | 424 | [convolutional] 425 | batch_normalize=1 426 | filters=256 427 | size=1 428 | stride=1 429 | pad=1 430 | activation=leaky 431 | 432 | [convolutional] 433 | batch_normalize=1 434 | filters=256 435 | size=1 436 | stride=1 437 | pad=1 438 | activation=leaky 439 | 440 | [convolutional] 441 | batch_normalize=1 442 | filters=256 443 | size=3 444 | stride=1 445 | pad=1 446 | activation=leaky 447 | 448 | [shortcut] 449 | from=-3 450 | activation=linear 451 | 452 | 453 | [convolutional] 454 | batch_normalize=1 455 | filters=256 456 | size=1 457 | stride=1 458 | pad=1 459 | activation=leaky 460 | 461 | [convolutional] 462 | batch_normalize=1 463 | filters=256 464 | size=3 465 | stride=1 466 | pad=1 467 | activation=leaky 468 | 469 | [shortcut] 470 | from=-3 471 | activation=linear 472 | 473 | 474 | [convolutional] 475 | batch_normalize=1 476 | filters=256 477 | size=1 478 | stride=1 479 | pad=1 480 | activation=leaky 481 | 482 | [convolutional] 483 | batch_normalize=1 484 | filters=256 485 | size=3 486 | stride=1 487 | pad=1 488 | activation=leaky 489 | 490 | [shortcut] 491 | from=-3 492 | activation=linear 493 | 494 | 495 | [convolutional] 496 | batch_normalize=1 497 | filters=256 498 | size=1 499 | stride=1 500 | pad=1 501 | activation=leaky 502 | 503 | [convolutional] 504 | batch_normalize=1 505 | filters=256 506 | size=3 507 | stride=1 508 | pad=1 509 | activation=leaky 510 | 511 | [shortcut] 512 | from=-3 513 | activation=linear 514 | 515 | 516 | [convolutional] 517 | batch_normalize=1 518 | filters=256 519 | size=1 520 | stride=1 521 | pad=1 522 | activation=leaky 523 | 524 | [convolutional] 525 | batch_normalize=1 526 | filters=256 527 | size=3 528 | stride=1 529 | pad=1 530 | activation=leaky 531 | 532 | [shortcut] 533 | from=-3 534 | activation=linear 535 | 536 | 537 | [convolutional] 538 | batch_normalize=1 539 | filters=256 540 | size=1 541 | stride=1 542 | pad=1 543 | activation=leaky 544 | 545 | [convolutional] 546 | batch_normalize=1 547 | filters=256 548 | size=3 549 | stride=1 550 | pad=1 551 | activation=leaky 552 | 553 | [shortcut] 554 | from=-3 555 | activation=linear 556 | 557 | 558 | [convolutional] 559 | batch_normalize=1 560 | filters=256 561 | size=1 562 | stride=1 563 | pad=1 564 | activation=leaky 565 | 566 | [convolutional] 567 | batch_normalize=1 568 | filters=256 569 | size=3 570 | stride=1 571 | pad=1 572 | activation=leaky 573 | 574 | [shortcut] 575 | from=-3 576 | activation=linear 577 | 578 | [convolutional] 579 | batch_normalize=1 580 | filters=256 581 | size=1 582 | stride=1 583 | pad=1 584 | activation=leaky 585 | 586 | [convolutional] 587 | batch_normalize=1 588 | filters=256 589 | size=3 590 | stride=1 591 | pad=1 592 | activation=leaky 593 | 594 | [shortcut] 595 | from=-3 596 | activation=linear 597 | 598 | [convolutional] 599 | batch_normalize=1 600 | filters=256 601 | size=1 602 | stride=1 603 | pad=1 604 | activation=leaky 605 | 606 | [route] 607 | layers = -1,-28 608 | 609 | [convolutional] 610 | batch_normalize=1 611 | filters=512 612 | size=1 613 | stride=1 614 | pad=1 615 | activation=leaky 616 | 617 | # Downsample 618 | 619 | [convolutional] 620 | batch_normalize=1 621 | filters=1024 622 | size=3 623 | stride=2 624 | pad=1 625 | activation=leaky 626 | 627 | [convolutional] 628 | batch_normalize=1 629 | filters=512 630 | size=1 631 | stride=1 632 | pad=1 633 | activation=leaky 634 | 635 | [route] 636 | layers = -2 637 | 638 | [convolutional] 639 | batch_normalize=1 640 | filters=512 641 | size=1 642 | stride=1 643 | pad=1 644 | activation=leaky 645 | 646 | [convolutional] 647 | batch_normalize=1 648 | filters=512 649 | size=1 650 | stride=1 651 | pad=1 652 | activation=leaky 653 | 654 | [convolutional] 655 | batch_normalize=1 656 | filters=512 657 | size=3 658 | stride=1 659 | pad=1 660 | activation=leaky 661 | 662 | [shortcut] 663 | from=-3 664 | activation=linear 665 | 666 | [convolutional] 667 | batch_normalize=1 668 | filters=512 669 | size=1 670 | stride=1 671 | pad=1 672 | activation=leaky 673 | 674 | [convolutional] 675 | batch_normalize=1 676 | filters=512 677 | size=3 678 | stride=1 679 | pad=1 680 | activation=leaky 681 | 682 | [shortcut] 683 | from=-3 684 | activation=linear 685 | 686 | [convolutional] 687 | batch_normalize=1 688 | filters=512 689 | size=1 690 | stride=1 691 | pad=1 692 | activation=leaky 693 | 694 | [convolutional] 695 | batch_normalize=1 696 | filters=512 697 | size=3 698 | stride=1 699 | pad=1 700 | activation=leaky 701 | 702 | [shortcut] 703 | from=-3 704 | activation=linear 705 | 706 | [convolutional] 707 | batch_normalize=1 708 | filters=512 709 | size=1 710 | stride=1 711 | pad=1 712 | activation=leaky 713 | 714 | [convolutional] 715 | batch_normalize=1 716 | filters=512 717 | size=3 718 | stride=1 719 | pad=1 720 | activation=leaky 721 | 722 | [shortcut] 723 | from=-3 724 | activation=linear 725 | 726 | [convolutional] 727 | batch_normalize=1 728 | filters=512 729 | size=1 730 | stride=1 731 | pad=1 732 | activation=leaky 733 | 734 | [route] 735 | layers = -1,-16 736 | 737 | [convolutional] 738 | batch_normalize=1 739 | filters=1024 740 | size=1 741 | stride=1 742 | pad=1 743 | activation=leaky 744 | 745 | ########################## 746 | 747 | [convolutional] 748 | batch_normalize=1 749 | filters=512 750 | size=1 751 | stride=1 752 | pad=1 753 | activation=leaky 754 | 755 | [convolutional] 756 | batch_normalize=1 757 | size=3 758 | stride=1 759 | pad=1 760 | filters=1024 761 | activation=leaky 762 | 763 | [convolutional] 764 | batch_normalize=1 765 | filters=512 766 | size=1 767 | stride=1 768 | pad=1 769 | activation=leaky 770 | 771 | ### SPP ### 772 | [maxpool] 773 | stride=1 774 | size=5 775 | 776 | [route] 777 | layers=-2 778 | 779 | [maxpool] 780 | stride=1 781 | size=9 782 | 783 | [route] 784 | layers=-4 785 | 786 | [maxpool] 787 | stride=1 788 | size=13 789 | 790 | [route] 791 | layers=-1,-3,-5,-6 792 | ### End SPP ### 793 | 794 | [convolutional] 795 | batch_normalize=1 796 | filters=512 797 | size=1 798 | stride=1 799 | pad=1 800 | activation=leaky 801 | 802 | [convolutional] 803 | batch_normalize=1 804 | size=3 805 | stride=1 806 | pad=1 807 | filters=1024 808 | activation=leaky 809 | 810 | [convolutional] 811 | batch_normalize=1 812 | filters=512 813 | size=1 814 | stride=1 815 | pad=1 816 | activation=leaky 817 | 818 | [convolutional] 819 | batch_normalize=1 820 | filters=256 821 | size=1 822 | stride=1 823 | pad=1 824 | activation=leaky 825 | 826 | [upsample] 827 | stride=2 828 | 829 | [route] 830 | layers = 85 831 | 832 | [convolutional] 833 | batch_normalize=1 834 | filters=256 835 | size=1 836 | stride=1 837 | pad=1 838 | activation=leaky 839 | 840 | [route] 841 | layers = -1, -3 842 | 843 | [convolutional] 844 | batch_normalize=1 845 | filters=256 846 | size=1 847 | stride=1 848 | pad=1 849 | activation=leaky 850 | 851 | [convolutional] 852 | batch_normalize=1 853 | size=3 854 | stride=1 855 | pad=1 856 | filters=512 857 | activation=leaky 858 | 859 | [convolutional] 860 | batch_normalize=1 861 | filters=256 862 | size=1 863 | stride=1 864 | pad=1 865 | activation=leaky 866 | 867 | [convolutional] 868 | batch_normalize=1 869 | size=3 870 | stride=1 871 | pad=1 872 | filters=512 873 | activation=leaky 874 | 875 | [convolutional] 876 | batch_normalize=1 877 | filters=256 878 | size=1 879 | stride=1 880 | pad=1 881 | activation=leaky 882 | 883 | [convolutional] 884 | batch_normalize=1 885 | filters=128 886 | size=1 887 | stride=1 888 | pad=1 889 | activation=leaky 890 | 891 | [upsample] 892 | stride=2 893 | 894 | [route] 895 | layers = 54 896 | 897 | [convolutional] 898 | batch_normalize=1 899 | filters=128 900 | size=1 901 | stride=1 902 | pad=1 903 | activation=leaky 904 | 905 | [route] 906 | layers = -1, -3 907 | 908 | [convolutional] 909 | batch_normalize=1 910 | filters=128 911 | size=1 912 | stride=1 913 | pad=1 914 | activation=leaky 915 | 916 | [convolutional] 917 | batch_normalize=1 918 | size=3 919 | stride=1 920 | pad=1 921 | filters=256 922 | activation=leaky 923 | 924 | [convolutional] 925 | batch_normalize=1 926 | filters=128 927 | size=1 928 | stride=1 929 | pad=1 930 | activation=leaky 931 | 932 | [convolutional] 933 | batch_normalize=1 934 | size=3 935 | stride=1 936 | pad=1 937 | filters=256 938 | activation=leaky 939 | 940 | [convolutional] 941 | batch_normalize=1 942 | filters=128 943 | size=1 944 | stride=1 945 | pad=1 946 | activation=leaky 947 | 948 | ########################## 949 | 950 | [convolutional] 951 | batch_normalize=1 952 | size=3 953 | stride=1 954 | pad=1 955 | filters=256 956 | activation=leaky 957 | 958 | [convolutional] 959 | size=1 960 | stride=1 961 | pad=1 962 | filters=255 963 | activation=linear 964 | 965 | 966 | [yolo] 967 | mask = 0,1,2 968 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 969 | classes=80 970 | num=9 971 | jitter=.3 972 | ignore_thresh = .7 973 | truth_thresh = 1 974 | scale_x_y = 1.2 975 | iou_thresh=0.213 976 | cls_normalizer=1.0 977 | iou_normalizer=0.07 978 | iou_loss=ciou 979 | nms_kind=greedynms 980 | beta_nms=0.6 981 | 982 | 983 | [route] 984 | layers = -4 985 | 986 | [convolutional] 987 | batch_normalize=1 988 | size=3 989 | stride=2 990 | pad=1 991 | filters=256 992 | activation=leaky 993 | 994 | [route] 995 | layers = -1, -16 996 | 997 | [convolutional] 998 | batch_normalize=1 999 | filters=256 1000 | size=1 1001 | stride=1 1002 | pad=1 1003 | activation=leaky 1004 | 1005 | [convolutional] 1006 | batch_normalize=1 1007 | size=3 1008 | stride=1 1009 | pad=1 1010 | filters=512 1011 | activation=leaky 1012 | 1013 | [convolutional] 1014 | batch_normalize=1 1015 | filters=256 1016 | size=1 1017 | stride=1 1018 | pad=1 1019 | activation=leaky 1020 | 1021 | [convolutional] 1022 | batch_normalize=1 1023 | size=3 1024 | stride=1 1025 | pad=1 1026 | filters=512 1027 | activation=leaky 1028 | 1029 | [convolutional] 1030 | batch_normalize=1 1031 | filters=256 1032 | size=1 1033 | stride=1 1034 | pad=1 1035 | activation=leaky 1036 | 1037 | [convolutional] 1038 | batch_normalize=1 1039 | size=3 1040 | stride=1 1041 | pad=1 1042 | filters=512 1043 | activation=leaky 1044 | 1045 | [convolutional] 1046 | size=1 1047 | stride=1 1048 | pad=1 1049 | filters=255 1050 | activation=linear 1051 | 1052 | 1053 | [yolo] 1054 | mask = 3,4,5 1055 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 1056 | classes=80 1057 | num=9 1058 | jitter=.3 1059 | ignore_thresh = .7 1060 | truth_thresh = 1 1061 | scale_x_y = 1.1 1062 | iou_thresh=0.213 1063 | cls_normalizer=1.0 1064 | iou_normalizer=0.07 1065 | iou_loss=ciou 1066 | nms_kind=greedynms 1067 | beta_nms=0.6 1068 | 1069 | 1070 | [route] 1071 | layers = -4 1072 | 1073 | [convolutional] 1074 | batch_normalize=1 1075 | size=3 1076 | stride=2 1077 | pad=1 1078 | filters=512 1079 | activation=leaky 1080 | 1081 | [route] 1082 | layers = -1, -37 1083 | 1084 | [convolutional] 1085 | batch_normalize=1 1086 | filters=512 1087 | size=1 1088 | stride=1 1089 | pad=1 1090 | activation=leaky 1091 | 1092 | [convolutional] 1093 | batch_normalize=1 1094 | size=3 1095 | stride=1 1096 | pad=1 1097 | filters=1024 1098 | activation=leaky 1099 | 1100 | [convolutional] 1101 | batch_normalize=1 1102 | filters=512 1103 | size=1 1104 | stride=1 1105 | pad=1 1106 | activation=leaky 1107 | 1108 | [convolutional] 1109 | batch_normalize=1 1110 | size=3 1111 | stride=1 1112 | pad=1 1113 | filters=1024 1114 | activation=leaky 1115 | 1116 | [convolutional] 1117 | batch_normalize=1 1118 | filters=512 1119 | size=1 1120 | stride=1 1121 | pad=1 1122 | activation=leaky 1123 | 1124 | [convolutional] 1125 | batch_normalize=1 1126 | size=3 1127 | stride=1 1128 | pad=1 1129 | filters=1024 1130 | activation=leaky 1131 | 1132 | [convolutional] 1133 | size=1 1134 | stride=1 1135 | pad=1 1136 | filters=255 1137 | activation=linear 1138 | 1139 | 1140 | [yolo] 1141 | mask = 6,7,8 1142 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 1143 | classes=80 1144 | num=9 1145 | jitter=.3 1146 | ignore_thresh = .7 1147 | truth_thresh = 1 1148 | random=1 1149 | scale_x_y = 1.05 1150 | iou_thresh=0.213 1151 | cls_normalizer=1.0 1152 | iou_normalizer=0.07 1153 | iou_loss=ciou 1154 | nms_kind=greedynms 1155 | beta_nms=0.6 1156 | -------------------------------------------------------------------------------- /cfg/yolov4-tiny.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | #batch=1 4 | #subdivisions=1 5 | # Training 6 | batch=64 7 | subdivisions=1 8 | width=416 9 | height=416 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.00261 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | [convolutional] 26 | batch_normalize=1 27 | filters=32 28 | size=3 29 | stride=2 30 | pad=1 31 | activation=leaky 32 | 33 | [convolutional] 34 | batch_normalize=1 35 | filters=64 36 | size=3 37 | stride=2 38 | pad=1 39 | activation=leaky 40 | 41 | [convolutional] 42 | batch_normalize=1 43 | filters=64 44 | size=3 45 | stride=1 46 | pad=1 47 | activation=leaky 48 | 49 | [route_lhalf] 50 | layers=-1 51 | 52 | [convolutional] 53 | batch_normalize=1 54 | filters=32 55 | size=3 56 | stride=1 57 | pad=1 58 | activation=leaky 59 | 60 | [convolutional] 61 | batch_normalize=1 62 | filters=32 63 | size=3 64 | stride=1 65 | pad=1 66 | activation=leaky 67 | 68 | [route] 69 | layers = -1,-2 70 | 71 | [convolutional] 72 | batch_normalize=1 73 | filters=64 74 | size=1 75 | stride=1 76 | pad=1 77 | activation=leaky 78 | 79 | [route] 80 | layers = -6,-1 81 | 82 | [maxpool] 83 | size=2 84 | stride=2 85 | 86 | [convolutional] 87 | batch_normalize=1 88 | filters=128 89 | size=3 90 | stride=1 91 | pad=1 92 | activation=leaky 93 | 94 | [route_lhalf] 95 | layers=-1 96 | 97 | [convolutional] 98 | batch_normalize=1 99 | filters=64 100 | size=3 101 | stride=1 102 | pad=1 103 | activation=leaky 104 | 105 | [convolutional] 106 | batch_normalize=1 107 | filters=64 108 | size=3 109 | stride=1 110 | pad=1 111 | activation=leaky 112 | 113 | [route] 114 | layers = -1,-2 115 | 116 | [convolutional] 117 | batch_normalize=1 118 | filters=128 119 | size=1 120 | stride=1 121 | pad=1 122 | activation=leaky 123 | 124 | [route] 125 | layers = -6,-1 126 | 127 | [maxpool] 128 | size=2 129 | stride=2 130 | 131 | [convolutional] 132 | batch_normalize=1 133 | filters=256 134 | size=3 135 | stride=1 136 | pad=1 137 | activation=leaky 138 | 139 | [route_lhalf] 140 | layers=-1 141 | 142 | [convolutional] 143 | batch_normalize=1 144 | filters=128 145 | size=3 146 | stride=1 147 | pad=1 148 | activation=leaky 149 | 150 | [convolutional] 151 | batch_normalize=1 152 | filters=128 153 | size=3 154 | stride=1 155 | pad=1 156 | activation=leaky 157 | 158 | [route] 159 | layers = -1,-2 160 | 161 | [convolutional] 162 | batch_normalize=1 163 | filters=256 164 | size=1 165 | stride=1 166 | pad=1 167 | activation=leaky 168 | 169 | [route] 170 | layers = -6,-1 171 | 172 | [maxpool] 173 | size=2 174 | stride=2 175 | 176 | [convolutional] 177 | batch_normalize=1 178 | filters=512 179 | size=3 180 | stride=1 181 | pad=1 182 | activation=leaky 183 | 184 | ################################## 185 | 186 | [convolutional] 187 | batch_normalize=1 188 | filters=256 189 | size=1 190 | stride=1 191 | pad=1 192 | activation=leaky 193 | 194 | [convolutional] 195 | batch_normalize=1 196 | filters=512 197 | size=3 198 | stride=1 199 | pad=1 200 | activation=leaky 201 | 202 | [convolutional] 203 | size=1 204 | stride=1 205 | pad=1 206 | filters=255 207 | activation=linear 208 | 209 | 210 | 211 | [yolo] 212 | mask = 3,4,5 213 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 214 | classes=80 215 | num=6 216 | jitter=.3 217 | scale_x_y = 1.05 218 | cls_normalizer=1.0 219 | iou_normalizer=0.07 220 | iou_loss=ciou 221 | ignore_thresh = .7 222 | truth_thresh = 1 223 | random=0 224 | nms_kind=greedynms 225 | beta_nms=0.6 226 | 227 | [route] 228 | layers = -4 229 | 230 | [convolutional] 231 | batch_normalize=1 232 | filters=128 233 | size=1 234 | stride=1 235 | pad=1 236 | activation=leaky 237 | 238 | [upsample] 239 | stride=2 240 | 241 | [route] 242 | layers = -1, 23 243 | 244 | [convolutional] 245 | batch_normalize=1 246 | filters=256 247 | size=3 248 | stride=1 249 | pad=1 250 | activation=leaky 251 | 252 | [convolutional] 253 | size=1 254 | stride=1 255 | pad=1 256 | filters=255 257 | activation=linear 258 | 259 | [yolo] 260 | mask = 1,2,3 261 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 262 | classes=80 263 | num=6 264 | jitter=.3 265 | scale_x_y = 1.05 266 | cls_normalizer=1.0 267 | iou_normalizer=0.07 268 | iou_loss=ciou 269 | ignore_thresh = .7 270 | truth_thresh = 1 271 | random=0 272 | nms_kind=greedynms 273 | beta_nms=0.6 274 | -------------------------------------------------------------------------------- /data/coco.data: -------------------------------------------------------------------------------- 1 | classes=80 2 | train=../coco/train2017.txt 3 | valid=../coco/testdev2017.txt 4 | names=data/coco.names 5 | -------------------------------------------------------------------------------- /data/coco.names: -------------------------------------------------------------------------------- 1 | person 2 | bicycle 3 | car 4 | motorcycle 5 | airplane 6 | bus 7 | train 8 | truck 9 | boat 10 | traffic light 11 | fire hydrant 12 | stop sign 13 | parking meter 14 | bench 15 | bird 16 | cat 17 | dog 18 | horse 19 | sheep 20 | cow 21 | elephant 22 | bear 23 | zebra 24 | giraffe 25 | backpack 26 | umbrella 27 | handbag 28 | tie 29 | suitcase 30 | frisbee 31 | skis 32 | snowboard 33 | sports ball 34 | kite 35 | baseball bat 36 | baseball glove 37 | skateboard 38 | surfboard 39 | tennis racket 40 | bottle 41 | wine glass 42 | cup 43 | fork 44 | knife 45 | spoon 46 | bowl 47 | banana 48 | apple 49 | sandwich 50 | orange 51 | broccoli 52 | carrot 53 | hot dog 54 | pizza 55 | donut 56 | cake 57 | chair 58 | couch 59 | potted plant 60 | bed 61 | dining table 62 | toilet 63 | tv 64 | laptop 65 | mouse 66 | remote 67 | keyboard 68 | cell phone 69 | microwave 70 | oven 71 | toaster 72 | sink 73 | refrigerator 74 | book 75 | clock 76 | vase 77 | scissors 78 | teddy bear 79 | hair drier 80 | toothbrush 81 | -------------------------------------------------------------------------------- /data/coco1.data: -------------------------------------------------------------------------------- 1 | classes=80 2 | train=data/coco1.txt 3 | valid=data/coco1.txt 4 | names=data/coco.names 5 | -------------------------------------------------------------------------------- /data/coco1.txt: -------------------------------------------------------------------------------- 1 | ../coco/images/train2017/000000109622.jpg 2 | -------------------------------------------------------------------------------- /data/coco16.data: -------------------------------------------------------------------------------- 1 | classes=80 2 | train=data/coco16.txt 3 | valid=data/coco16.txt 4 | names=data/coco.names 5 | -------------------------------------------------------------------------------- /data/coco16.txt: -------------------------------------------------------------------------------- 1 | ../coco/images/train2017/000000109622.jpg 2 | ../coco/images/train2017/000000160694.jpg 3 | ../coco/images/train2017/000000308590.jpg 4 | ../coco/images/train2017/000000327573.jpg 5 | ../coco/images/train2017/000000062929.jpg 6 | ../coco/images/train2017/000000512793.jpg 7 | ../coco/images/train2017/000000371735.jpg 8 | ../coco/images/train2017/000000148118.jpg 9 | ../coco/images/train2017/000000309856.jpg 10 | ../coco/images/train2017/000000141882.jpg 11 | ../coco/images/train2017/000000318783.jpg 12 | ../coco/images/train2017/000000337760.jpg 13 | ../coco/images/train2017/000000298197.jpg 14 | ../coco/images/train2017/000000042421.jpg 15 | ../coco/images/train2017/000000328898.jpg 16 | ../coco/images/train2017/000000458856.jpg 17 | -------------------------------------------------------------------------------- /data/coco1cls.data: -------------------------------------------------------------------------------- 1 | classes=1 2 | train=data/coco1cls.txt 3 | valid=data/coco1cls.txt 4 | names=data/coco.names 5 | -------------------------------------------------------------------------------- /data/coco1cls.txt: -------------------------------------------------------------------------------- 1 | ../coco/images/train2017/000000000901.jpg 2 | ../coco/images/train2017/000000001464.jpg 3 | ../coco/images/train2017/000000003220.jpg 4 | ../coco/images/train2017/000000003365.jpg 5 | ../coco/images/train2017/000000004772.jpg 6 | ../coco/images/train2017/000000009987.jpg 7 | ../coco/images/train2017/000000010498.jpg 8 | ../coco/images/train2017/000000012455.jpg 9 | ../coco/images/train2017/000000013992.jpg 10 | ../coco/images/train2017/000000014125.jpg 11 | ../coco/images/train2017/000000016314.jpg 12 | ../coco/images/train2017/000000016670.jpg 13 | ../coco/images/train2017/000000018412.jpg 14 | ../coco/images/train2017/000000021212.jpg 15 | ../coco/images/train2017/000000021826.jpg 16 | ../coco/images/train2017/000000030566.jpg 17 | -------------------------------------------------------------------------------- /data/coco2014.data: -------------------------------------------------------------------------------- 1 | classes=80 2 | train=../coco/trainvalno5k.txt 3 | valid=../coco/5k.txt 4 | names=data/coco.names 5 | -------------------------------------------------------------------------------- /data/coco2017.data: -------------------------------------------------------------------------------- 1 | classes=80 2 | train=../coco/train2017.txt 3 | valid=../coco/val2017.txt 4 | names=data/coco.names 5 | -------------------------------------------------------------------------------- /data/coco64.data: -------------------------------------------------------------------------------- 1 | classes=80 2 | train=data/coco64.txt 3 | valid=data/coco64.txt 4 | names=data/coco.names 5 | -------------------------------------------------------------------------------- /data/coco64.txt: -------------------------------------------------------------------------------- 1 | ../coco/images/train2017/000000109622.jpg 2 | ../coco/images/train2017/000000160694.jpg 3 | ../coco/images/train2017/000000308590.jpg 4 | ../coco/images/train2017/000000327573.jpg 5 | ../coco/images/train2017/000000062929.jpg 6 | ../coco/images/train2017/000000512793.jpg 7 | ../coco/images/train2017/000000371735.jpg 8 | ../coco/images/train2017/000000148118.jpg 9 | ../coco/images/train2017/000000309856.jpg 10 | ../coco/images/train2017/000000141882.jpg 11 | ../coco/images/train2017/000000318783.jpg 12 | ../coco/images/train2017/000000337760.jpg 13 | ../coco/images/train2017/000000298197.jpg 14 | ../coco/images/train2017/000000042421.jpg 15 | ../coco/images/train2017/000000328898.jpg 16 | ../coco/images/train2017/000000458856.jpg 17 | ../coco/images/train2017/000000073824.jpg 18 | ../coco/images/train2017/000000252846.jpg 19 | ../coco/images/train2017/000000459590.jpg 20 | ../coco/images/train2017/000000273650.jpg 21 | ../coco/images/train2017/000000331311.jpg 22 | ../coco/images/train2017/000000156326.jpg 23 | ../coco/images/train2017/000000262985.jpg 24 | ../coco/images/train2017/000000253580.jpg 25 | ../coco/images/train2017/000000447976.jpg 26 | ../coco/images/train2017/000000378077.jpg 27 | ../coco/images/train2017/000000259913.jpg 28 | ../coco/images/train2017/000000424553.jpg 29 | ../coco/images/train2017/000000000612.jpg 30 | ../coco/images/train2017/000000267625.jpg 31 | ../coco/images/train2017/000000566012.jpg 32 | ../coco/images/train2017/000000196664.jpg 33 | ../coco/images/train2017/000000363331.jpg 34 | ../coco/images/train2017/000000057992.jpg 35 | ../coco/images/train2017/000000520047.jpg 36 | ../coco/images/train2017/000000453903.jpg 37 | ../coco/images/train2017/000000162083.jpg 38 | ../coco/images/train2017/000000268516.jpg 39 | ../coco/images/train2017/000000277436.jpg 40 | ../coco/images/train2017/000000189744.jpg 41 | ../coco/images/train2017/000000041128.jpg 42 | ../coco/images/train2017/000000527728.jpg 43 | ../coco/images/train2017/000000465269.jpg 44 | ../coco/images/train2017/000000246833.jpg 45 | ../coco/images/train2017/000000076784.jpg 46 | ../coco/images/train2017/000000323715.jpg 47 | ../coco/images/train2017/000000560463.jpg 48 | ../coco/images/train2017/000000006263.jpg 49 | ../coco/images/train2017/000000094701.jpg 50 | ../coco/images/train2017/000000521359.jpg 51 | ../coco/images/train2017/000000302903.jpg 52 | ../coco/images/train2017/000000047559.jpg 53 | ../coco/images/train2017/000000480583.jpg 54 | ../coco/images/train2017/000000050025.jpg 55 | ../coco/images/train2017/000000084512.jpg 56 | ../coco/images/train2017/000000508913.jpg 57 | ../coco/images/train2017/000000093708.jpg 58 | ../coco/images/train2017/000000070493.jpg 59 | ../coco/images/train2017/000000539270.jpg 60 | ../coco/images/train2017/000000474402.jpg 61 | ../coco/images/train2017/000000209842.jpg 62 | ../coco/images/train2017/000000028820.jpg 63 | ../coco/images/train2017/000000154257.jpg 64 | ../coco/images/train2017/000000342499.jpg 65 | -------------------------------------------------------------------------------- /data/coco_paper.names: -------------------------------------------------------------------------------- 1 | person 2 | bicycle 3 | car 4 | motorcycle 5 | airplane 6 | bus 7 | train 8 | truck 9 | boat 10 | traffic light 11 | fire hydrant 12 | street sign 13 | stop sign 14 | parking meter 15 | bench 16 | bird 17 | cat 18 | dog 19 | horse 20 | sheep 21 | cow 22 | elephant 23 | bear 24 | zebra 25 | giraffe 26 | hat 27 | backpack 28 | umbrella 29 | shoe 30 | eye glasses 31 | handbag 32 | tie 33 | suitcase 34 | frisbee 35 | skis 36 | snowboard 37 | sports ball 38 | kite 39 | baseball bat 40 | baseball glove 41 | skateboard 42 | surfboard 43 | tennis racket 44 | bottle 45 | plate 46 | wine glass 47 | cup 48 | fork 49 | knife 50 | spoon 51 | bowl 52 | banana 53 | apple 54 | sandwich 55 | orange 56 | broccoli 57 | carrot 58 | hot dog 59 | pizza 60 | donut 61 | cake 62 | chair 63 | couch 64 | potted plant 65 | bed 66 | mirror 67 | dining table 68 | window 69 | desk 70 | toilet 71 | door 72 | tv 73 | laptop 74 | mouse 75 | remote 76 | keyboard 77 | cell phone 78 | microwave 79 | oven 80 | toaster 81 | sink 82 | refrigerator 83 | blender 84 | book 85 | clock 86 | vase 87 | scissors 88 | teddy bear 89 | hair drier 90 | toothbrush 91 | hair brush -------------------------------------------------------------------------------- /data/get_coco2014.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Zip coco folder 3 | # zip -r coco.zip coco 4 | # tar -czvf coco.tar.gz coco 5 | 6 | # Download labels from Google Drive, accepting presented query 7 | filename="coco2014labels.zip" 8 | fileid="1s6-CmF5_SElM28r52P1OUrCcuXZN-SFo" 9 | curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id=${fileid}" > /dev/null 10 | curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=${fileid}" -o ${filename} 11 | rm ./cookie 12 | 13 | # Unzip labels 14 | unzip -q ${filename} # for coco.zip 15 | # tar -xzf ${filename} # for coco.tar.gz 16 | rm ${filename} 17 | 18 | # Download and unzip images 19 | cd coco/images 20 | f="train2014.zip" && curl http://images.cocodataset.org/zips/$f -o $f && unzip -q $f && rm $f 21 | f="val2014.zip" && curl http://images.cocodataset.org/zips/$f -o $f && unzip -q $f && rm $f 22 | 23 | # cd out 24 | cd ../.. 25 | -------------------------------------------------------------------------------- /data/get_coco2017.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Zip coco folder 3 | # zip -r coco.zip coco 4 | # tar -czvf coco.tar.gz coco 5 | 6 | # Download labels from Google Drive, accepting presented query 7 | filename="coco2017labels.zip" 8 | fileid="1cXZR_ckHki6nddOmcysCuuJFM--T-Q6L" 9 | curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id=${fileid}" > /dev/null 10 | curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=${fileid}" -o ${filename} 11 | rm ./cookie 12 | 13 | # Unzip labels 14 | unzip -q ${filename} # for coco.zip 15 | # tar -xzf ${filename} # for coco.tar.gz 16 | rm ${filename} 17 | 18 | # Download and unzip images 19 | cd coco/images 20 | f="train2017.zip" && curl http://images.cocodataset.org/zips/$f -o $f && unzip -q $f && rm $f 21 | f="val2017.zip" && curl http://images.cocodataset.org/zips/$f -o $f && unzip -q $f && rm $f 22 | 23 | # cd out 24 | cd ../.. 25 | -------------------------------------------------------------------------------- /data/myData.data: -------------------------------------------------------------------------------- 1 | classes=3 2 | train=data/myData/myData_train.txt 3 | valid=data/myData/myData_val.txt 4 | names=data/myData.names 5 | -------------------------------------------------------------------------------- /data/myData.names: -------------------------------------------------------------------------------- 1 | QP 2 | NY 3 | QG 4 | -------------------------------------------------------------------------------- /data/myData/score/images/train/readme: -------------------------------------------------------------------------------- 1 | 此处存放train的image -------------------------------------------------------------------------------- /data/myData/score/images/val/readme: -------------------------------------------------------------------------------- 1 | 此处存放val的image -------------------------------------------------------------------------------- /data/myData/score/labels/train/readme: -------------------------------------------------------------------------------- 1 | 此处存放train的标注 2 | xxx.txt 3 | xxxx.txt 4 | xxxxx.txt -------------------------------------------------------------------------------- /data/myData/score/labels/val/readme: -------------------------------------------------------------------------------- 1 | 此处存放val的标注 2 | 3 | xxx.txt 4 | xxxx.txt 5 | xxxxx.txt -------------------------------------------------------------------------------- /detect.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | from sys import platform 3 | 4 | from models import * # set ONNX_EXPORT in models.py 5 | from utils.datasets import * 6 | from utils.utils import * 7 | 8 | 9 | def detect(save_img=False): 10 | img_size = (320, 192) if ONNX_EXPORT else opt.img_size # (320, 192) or (416, 256) or (608, 352) for (height, width) 11 | out, source, weights, half, view_img, save_txt = opt.output, opt.source, opt.weights, opt.half, opt.view_img, opt.save_txt 12 | webcam = source == '0' or source.startswith('rtsp') or source.startswith('http') or source.endswith('.txt') 13 | 14 | # Initialize 15 | device = torch_utils.select_device(device='cpu' if ONNX_EXPORT else opt.device) 16 | if os.path.exists(out): 17 | shutil.rmtree(out) # delete output folder 18 | os.makedirs(out) # make new output folder 19 | 20 | # Initialize model 21 | model = Darknet(opt.cfg, img_size) 22 | 23 | # Load weights 24 | attempt_download(weights) 25 | if weights.endswith('.pt'): # pytorch format 26 | model.load_state_dict(torch.load(weights, map_location=device)['model']) 27 | else: # darknet format 28 | load_darknet_weights(model, weights) 29 | 30 | # Second-stage classifier 31 | classify = False 32 | if classify: 33 | modelc = torch_utils.load_classifier(name='resnet101', n=2) # initialize 34 | modelc.load_state_dict(torch.load('weights/resnet101.pt', map_location=device)['model']) # load weights 35 | modelc.to(device).eval() 36 | 37 | # Eval mode 38 | model.to(device).eval() 39 | 40 | # Fuse Conv2d + BatchNorm2d layers 41 | # model.fuse() 42 | 43 | # Export mode 44 | if ONNX_EXPORT: 45 | model.fuse() 46 | img = torch.zeros((1, 3) + img_size) # (1, 3, 320, 192) 47 | f = opt.weights.replace(opt.weights.split('.')[-1], 'onnx') # *.onnx filename 48 | torch.onnx.export(model, img, f, verbose=False, opset_version=11, 49 | input_names=['images'], output_names=['classes', 'boxes']) 50 | 51 | # Validate exported model 52 | import onnx 53 | model = onnx.load(f) # Load the ONNX model 54 | onnx.checker.check_model(model) # Check that the IR is well formed 55 | print(onnx.helper.printable_graph(model.graph)) # Print a human readable representation of the graph 56 | return 57 | 58 | # Half precision 59 | half = half and device.type != 'cpu' # half precision only supported on CUDA 60 | if half: 61 | model.half() 62 | 63 | # Set Dataloader 64 | vid_path, vid_writer = None, None 65 | if webcam: 66 | view_img = True 67 | torch.backends.cudnn.benchmark = True # set True to speed up constant image size inference 68 | dataset = LoadStreams(source, img_size=img_size) 69 | else: 70 | save_img = True 71 | dataset = LoadImages(source, img_size=img_size) 72 | 73 | # Get names and colors 74 | names = load_classes(opt.names) 75 | colors = [[random.randint(0, 255) for _ in range(3)] for _ in range(len(names))] 76 | 77 | # Run inference 78 | t0 = time.time() 79 | img = torch.zeros((1, 3, img_size, img_size), device=device) # init img 80 | _ = model(img.half() if half else img.float()) if device.type != 'cpu' else None # run once 81 | for path, img, im0s, vid_cap in dataset: 82 | img = torch.from_numpy(img).to(device) 83 | img = img.half() if half else img.float() # uint8 to fp16/32 84 | img /= 255.0 # 0 - 255 to 0.0 - 1.0 85 | if img.ndimension() == 3: 86 | img = img.unsqueeze(0) 87 | 88 | # Inference 89 | t1 = torch_utils.time_synchronized() 90 | pred = model(img, augment=opt.augment)[0] 91 | t2 = torch_utils.time_synchronized() 92 | 93 | # to float 94 | if half: 95 | pred = pred.float() 96 | 97 | # Apply NMS 98 | pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres, 99 | merge=False, classes=opt.classes, agnostic=opt.agnostic_nms) 100 | 101 | # Apply Classifier 102 | if classify: 103 | pred = apply_classifier(pred, modelc, img, im0s) 104 | 105 | # Process detections 106 | for i, det in enumerate(pred): # detections per image 107 | if webcam: # batch_size >= 1 108 | p, s, im0 = path[i], '%g: ' % i, im0s[i] 109 | else: 110 | p, s, im0 = path, '', im0s 111 | 112 | save_path = str(Path(out) / Path(p).name) 113 | s += '%gx%g ' % img.shape[2:] # print string 114 | if det is not None and len(det): 115 | # Rescale boxes from img_size to im0 size 116 | det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round() 117 | 118 | # Print results 119 | for c in det[:, -1].unique(): 120 | n = (det[:, -1] == c).sum() # detections per class 121 | s += '%g %ss, ' % (n, names[int(c)]) # add to string 122 | 123 | # Write results 124 | for *xyxy, conf, cls in det: 125 | if save_txt: # Write to file 126 | with open(save_path + '.txt', 'a') as file: 127 | file.write(('%g ' * 6 + '\n') % (*xyxy, cls, conf)) 128 | 129 | if save_img or view_img: # Add bbox to image 130 | label = '%s %.2f' % (names[int(cls)], conf) 131 | plot_one_box(xyxy, im0, label=label, color=colors[int(cls)]) 132 | 133 | # Print time (inference + NMS) 134 | print('%sDone. (%.3fs)' % (s, t2 - t1)) 135 | 136 | # Stream results 137 | if view_img: 138 | cv2.imshow(p, im0) 139 | if cv2.waitKey(1) == ord('q'): # q to quit 140 | raise StopIteration 141 | 142 | # Save results (image with detections) 143 | if save_img: 144 | if dataset.mode == 'images': 145 | cv2.imwrite(save_path, im0) 146 | else: 147 | if vid_path != save_path: # new video 148 | vid_path = save_path 149 | if isinstance(vid_writer, cv2.VideoWriter): 150 | vid_writer.release() # release previous video writer 151 | 152 | fps = vid_cap.get(cv2.CAP_PROP_FPS) 153 | w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH)) 154 | h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) 155 | vid_writer = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*opt.fourcc), fps, (w, h)) 156 | vid_writer.write(im0) 157 | 158 | if save_txt or save_img: 159 | print('Results saved to %s' % os.getcwd() + os.sep + out) 160 | if platform == 'darwin': # MacOS 161 | os.system('open ' + save_path) 162 | 163 | print('Done. (%.3fs)' % (time.time() - t0)) 164 | 165 | 166 | if __name__ == '__main__': 167 | parser = argparse.ArgumentParser() 168 | parser.add_argument('--cfg', type=str, default='cfg/yolov4-pacsp.cfg', help='*.cfg path') 169 | parser.add_argument('--names', type=str, default='data/coco.names', help='*.names path') 170 | parser.add_argument('--weights', type=str, default='weights/yolov4-pacsp.pt', help='weights path') 171 | parser.add_argument('--source', type=str, default='data/samples', help='source') # input file/folder, 0 for webcam 172 | parser.add_argument('--output', type=str, default='output', help='output folder') # output folder 173 | parser.add_argument('--img-size', type=int, default=512, help='inference size (pixels)') 174 | parser.add_argument('--conf-thres', type=float, default=0.3, help='object confidence threshold') 175 | parser.add_argument('--iou-thres', type=float, default=0.6, help='IOU threshold for NMS') 176 | parser.add_argument('--fourcc', type=str, default='mp4v', help='output video codec (verify ffmpeg support)') 177 | parser.add_argument('--half', action='store_true', help='half precision FP16 inference') 178 | parser.add_argument('--device', default='', help='device id (i.e. 0 or 0,1) or cpu') 179 | parser.add_argument('--view-img', action='store_true', help='display results') 180 | parser.add_argument('--save-txt', action='store_true', help='save results to *.txt') 181 | parser.add_argument('--classes', nargs='+', type=int, help='filter by class') 182 | parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS') 183 | parser.add_argument('--augment', action='store_true', help='augmented inference') 184 | opt = parser.parse_args() 185 | print(opt) 186 | 187 | with torch.no_grad(): 188 | detect() 189 | -------------------------------------------------------------------------------- /experiments.md: -------------------------------------------------------------------------------- 1 | ## Experimental results on MSCOCO 2017 test-dev set 2 | 3 | | Model | Test Size | APtest | AP50test | AP75test | APStest | APMtest | APLtest | 4 | | :-- | :-: | :-: | :-: | :-: | :-: | :-: | :-: | 5 | | **YOLOv4**paspp | 608 | | | | | | | 6 | | **YOLOv4**pacsp-s | 608 | | | | | | | 7 | | **YOLOv4**pacsp | 608 | 45.9% | 64.3% | 50.4% | 25.4% | 50.6% | 59.0% | 8 | | **YOLOv4**pacsp-x | 608 | 47.7% | 66.0% | 52.2% | 27.4% | 52.3% | 61.0% | 9 | | | | | | | | | 10 | | **YOLOv4**pacsp-s-mish | 608 | | | | | | | 11 | | **YOLOv4**pacsp-mish | 608 | 46.6% | 64.9% | 51.0% | 26.1% | 51.3% | 59.6% | 12 | | **YOLOv4**pacsp-x-mish | 608 | **48.5%** | **66.6%** | **53.2%** | **28.4%** | **53.2%** | **61.7%** | 13 | | | | | | | | | 14 | | **YOLOv4**tiny | 416 | 22.6% | 38.7% | 23.2% | 6.6% | 25.9% | 33.3% | 15 | | | | | | | | | 16 | -------------------------------------------------------------------------------- /images/scalingCSP.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/images/scalingCSP.png -------------------------------------------------------------------------------- /models.py: -------------------------------------------------------------------------------- 1 | from utils.google_utils import * 2 | from utils.layers import * 3 | from utils.parse_config import * 4 | 5 | ONNX_EXPORT = False 6 | 7 | 8 | def create_modules(module_defs, img_size, cfg): 9 | # Constructs module list of layer blocks from module configuration in module_defs 10 | 11 | img_size = [img_size] * 2 if isinstance(img_size, int) else img_size # expand if necessary 12 | _ = module_defs.pop(0) # cfg training hyperparams (unused) 13 | output_filters = [3] # input channels 14 | module_list = nn.ModuleList() 15 | routs = [] # list of layers which rout to deeper layers 16 | yolo_index = -1 17 | 18 | for i, mdef in enumerate(module_defs): 19 | modules = nn.Sequential() 20 | 21 | if mdef['type'] == 'convolutional': 22 | bn = mdef['batch_normalize'] 23 | filters = mdef['filters'] 24 | k = mdef['size'] # kernel size 25 | stride = mdef['stride'] if 'stride' in mdef else (mdef['stride_y'], mdef['stride_x']) 26 | if isinstance(k, int): # single-size conv 27 | modules.add_module('Conv2d', nn.Conv2d(in_channels=output_filters[-1], 28 | out_channels=filters, 29 | kernel_size=k, 30 | stride=stride, 31 | padding=k // 2 if mdef['pad'] else 0, 32 | groups=mdef['groups'] if 'groups' in mdef else 1, 33 | bias=not bn)) 34 | else: # multiple-size conv 35 | modules.add_module('MixConv2d', MixConv2d(in_ch=output_filters[-1], 36 | out_ch=filters, 37 | k=k, 38 | stride=stride, 39 | bias=not bn)) 40 | 41 | if bn: 42 | modules.add_module('BatchNorm2d', nn.BatchNorm2d(filters, momentum=0.03, eps=1E-4)) 43 | else: 44 | routs.append(i) # detection output (goes into yolo layer) 45 | 46 | if mdef['activation'] == 'leaky': # activation study https://github.com/ultralytics/yolov3/issues/441 47 | modules.add_module('activation', nn.LeakyReLU(0.1, inplace=True)) 48 | elif mdef['activation'] == 'swish': 49 | modules.add_module('activation', Swish()) 50 | elif mdef['activation'] == 'mish': 51 | modules.add_module('activation', Mish()) 52 | 53 | elif mdef['type'] == 'BatchNorm2d': 54 | filters = output_filters[-1] 55 | modules = nn.BatchNorm2d(filters, momentum=0.03, eps=1E-4) 56 | if i == 0 and filters == 3: # normalize RGB image 57 | # imagenet mean and var https://pytorch.org/docs/stable/torchvision/models.html#classification 58 | modules.running_mean = torch.tensor([0.485, 0.456, 0.406]) 59 | modules.running_var = torch.tensor([0.0524, 0.0502, 0.0506]) 60 | 61 | elif mdef['type'] == 'maxpool': 62 | k = mdef['size'] # kernel size 63 | stride = mdef['stride'] 64 | maxpool = nn.MaxPool2d(kernel_size=k, stride=stride, padding=(k - 1) // 2) 65 | if k == 2 and stride == 1: # yolov3-tiny 66 | modules.add_module('ZeroPad2d', nn.ZeroPad2d((0, 1, 0, 1))) 67 | modules.add_module('MaxPool2d', maxpool) 68 | else: 69 | modules = maxpool 70 | 71 | elif mdef['type'] == 'upsample': 72 | if ONNX_EXPORT: # explicitly state size, avoid scale_factor 73 | g = (yolo_index + 1) * 2 / 32 # gain 74 | modules = nn.Upsample(size=tuple(int(x * g) for x in img_size)) # img_size = (320, 192) 75 | else: 76 | modules = nn.Upsample(scale_factor=mdef['stride']) 77 | 78 | elif mdef['type'] == 'route': # nn.Sequential() placeholder for 'route' layer 79 | layers = mdef['layers'] 80 | filters = sum([output_filters[l + 1 if l > 0 else l] for l in layers]) 81 | routs.extend([i + l if l < 0 else l for l in layers]) 82 | modules = FeatureConcat(layers=layers) 83 | 84 | elif mdef['type'] == 'route_lhalf': # nn.Sequential() placeholder for 'route' layer 85 | layers = mdef['layers'] 86 | filters = sum([output_filters[l + 1 if l > 0 else l] for l in layers])//2 87 | routs.extend([i + l if l < 0 else l for l in layers]) 88 | modules = FeatureConcat_l(layers=layers) 89 | 90 | elif mdef['type'] == 'shortcut': # nn.Sequential() placeholder for 'shortcut' layer 91 | layers = mdef['from'] 92 | filters = output_filters[-1] 93 | routs.extend([i + l if l < 0 else l for l in layers]) 94 | modules = WeightedFeatureFusion(layers=layers, weight='weights_type' in mdef) 95 | 96 | elif mdef['type'] == 'reorg3d': # yolov3-spp-pan-scale 97 | pass 98 | 99 | elif mdef['type'] == 'yolo': 100 | yolo_index += 1 101 | stride = [8, 16, 32] # P5, P4, P3 strides 102 | if any(x in cfg for x in ['yolov4-tiny']): # stride order reversed 103 | stride = [32, 16, 8] 104 | layers = mdef['from'] if 'from' in mdef else [] 105 | modules = YOLOLayer(anchors=mdef['anchors'][mdef['mask']], # anchor list 106 | nc=mdef['classes'], # number of classes 107 | img_size=img_size, # (416, 416) 108 | yolo_index=yolo_index, # 0, 1, 2... 109 | layers=layers, # output layers 110 | stride=stride[yolo_index]) 111 | 112 | # Initialize preceding Conv2d() bias (https://arxiv.org/pdf/1708.02002.pdf section 3.3) 113 | try: 114 | j = layers[yolo_index] if 'from' in mdef else -1 115 | bias_ = module_list[j][0].bias # shape(255,) 116 | bias = bias_[:modules.no * modules.na].view(modules.na, -1) # shape(3,85) 117 | bias[:, 4] += -4.5 # obj 118 | bias[:, 5:] += math.log(0.6 / (modules.nc - 0.99)) # cls (sigmoid(p) = 1/nc) 119 | module_list[j][0].bias = torch.nn.Parameter(bias_, requires_grad=bias_.requires_grad) 120 | except: 121 | print('WARNING: smart bias initialization failure.') 122 | 123 | else: 124 | print('Warning: Unrecognized Layer Type: ' + mdef['type']) 125 | 126 | # Register module list and number of output filters 127 | module_list.append(modules) 128 | output_filters.append(filters) 129 | 130 | routs_binary = [False] * (i + 1) 131 | for i in routs: 132 | routs_binary[i] = True 133 | return module_list, routs_binary 134 | 135 | 136 | class YOLOLayer(nn.Module): 137 | def __init__(self, anchors, nc, img_size, yolo_index, layers, stride): 138 | super(YOLOLayer, self).__init__() 139 | self.anchors = torch.Tensor(anchors) 140 | self.index = yolo_index # index of this layer in layers 141 | self.layers = layers # model output layer indices 142 | self.stride = stride # layer stride 143 | self.nl = len(layers) # number of output layers (3) 144 | self.na = len(anchors) # number of anchors (3) 145 | self.nc = nc # number of classes (80) 146 | self.no = nc + 5 # number of outputs (85) 147 | self.nx, self.ny, self.ng = 0, 0, 0 # initialize number of x, y gridpoints 148 | self.anchor_vec = self.anchors / self.stride 149 | self.anchor_wh = self.anchor_vec.view(1, self.na, 1, 1, 2) 150 | 151 | if ONNX_EXPORT: 152 | self.training = False 153 | self.create_grids((img_size[1] // stride, img_size[0] // stride)) # number x, y grid points 154 | 155 | def create_grids(self, ng=(13, 13), device='cpu'): 156 | self.nx, self.ny = ng # x and y grid size 157 | self.ng = torch.tensor(ng, dtype=torch.float) 158 | 159 | # build xy offsets 160 | if not self.training: 161 | yv, xv = torch.meshgrid([torch.arange(self.ny, device=device), torch.arange(self.nx, device=device)]) 162 | self.grid = torch.stack((xv, yv), 2).view((1, 1, self.ny, self.nx, 2)).float() 163 | 164 | if self.anchor_vec.device != device: 165 | self.anchor_vec = self.anchor_vec.to(device) 166 | self.anchor_wh = self.anchor_wh.to(device) 167 | 168 | def forward(self, p, out): 169 | ASFF = False # https://arxiv.org/abs/1911.09516 170 | if ASFF: 171 | i, n = self.index, self.nl # index in layers, number of layers 172 | p = out[self.layers[i]] 173 | bs, _, ny, nx = p.shape # bs, 255, 13, 13 174 | if (self.nx, self.ny) != (nx, ny): 175 | self.create_grids((nx, ny), p.device) 176 | 177 | # outputs and weights 178 | # w = F.softmax(p[:, -n:], 1) # normalized weights 179 | w = torch.sigmoid(p[:, -n:]) * (2 / n) # sigmoid weights (faster) 180 | # w = w / w.sum(1).unsqueeze(1) # normalize across layer dimension 181 | 182 | # weighted ASFF sum 183 | p = out[self.layers[i]][:, :-n] * w[:, i:i + 1] 184 | for j in range(n): 185 | if j != i: 186 | p += w[:, j:j + 1] * \ 187 | F.interpolate(out[self.layers[j]][:, :-n], size=[ny, nx], mode='bilinear', align_corners=False) 188 | 189 | elif ONNX_EXPORT: 190 | bs = 1 # batch size 191 | else: 192 | bs, _, ny, nx = p.shape # bs, 255, 13, 13 193 | if (self.nx, self.ny) != (nx, ny): 194 | self.create_grids((nx, ny), p.device) 195 | 196 | # p.view(bs, 255, 13, 13) -- > (bs, 3, 13, 13, 85) # (bs, anchors, grid, grid, classes + xywh) 197 | p = p.view(bs, self.na, self.no, self.ny, self.nx).permute(0, 1, 3, 4, 2).contiguous() # prediction 198 | 199 | if self.training: 200 | return p 201 | 202 | elif ONNX_EXPORT: 203 | # Avoid broadcasting for ANE operations 204 | m = self.na * self.nx * self.ny 205 | ng = 1. / self.ng.repeat(m, 1) 206 | grid = self.grid.repeat(1, self.na, 1, 1, 1).view(m, 2) 207 | anchor_wh = self.anchor_wh.repeat(1, 1, self.nx, self.ny, 1).view(m, 2) * ng 208 | 209 | p = p.view(m, self.no) 210 | xy = torch.sigmoid(p[:, 0:2]) + grid # x, y 211 | wh = torch.exp(p[:, 2:4]) * anchor_wh # width, height 212 | p_cls = torch.sigmoid(p[:, 4:5]) if self.nc == 1 else \ 213 | torch.sigmoid(p[:, 5:self.no]) * torch.sigmoid(p[:, 4:5]) # conf 214 | return p_cls, xy * ng, wh 215 | 216 | else: # inference 217 | io = p.clone() # inference output 218 | io[..., :2] = torch.sigmoid(io[..., :2]) + self.grid # xy 219 | io[..., 2:4] = torch.exp(io[..., 2:4]) * self.anchor_wh # wh yolo method 220 | io[..., :4] *= self.stride 221 | torch.sigmoid_(io[..., 4:]) 222 | return io.view(bs, -1, self.no), p # view [1, 3, 13, 13, 85] as [1, 507, 85] 223 | 224 | 225 | class Darknet(nn.Module): 226 | # YOLOv3 object detection model 227 | 228 | def __init__(self, cfg, img_size=(416, 416), verbose=False): 229 | super(Darknet, self).__init__() 230 | 231 | self.module_defs = parse_model_cfg(cfg) 232 | self.module_list, self.routs = create_modules(self.module_defs, img_size, cfg) 233 | self.yolo_layers = get_yolo_layers(self) 234 | # torch_utils.initialize_weights(self) 235 | 236 | # Darknet Header https://github.com/AlexeyAB/darknet/issues/2914#issuecomment-496675346 237 | self.version = np.array([0, 2, 5], dtype=np.int32) # (int32) version info: major, minor, revision 238 | self.seen = np.array([0], dtype=np.int64) # (int64) number of images seen during training 239 | self.info(verbose) if not ONNX_EXPORT else None # print model description 240 | 241 | def forward(self, x, augment=False, verbose=False): 242 | 243 | if not augment: 244 | return self.forward_once(x) 245 | else: # Augment images (inference and test only) https://github.com/ultralytics/yolov3/issues/931 246 | img_size = x.shape[-2:] # height, width 247 | s = [0.83, 0.67] # scales 248 | y = [] 249 | for i, xi in enumerate((x, 250 | torch_utils.scale_img(x.flip(3), s[0], same_shape=False), # flip-lr and scale 251 | torch_utils.scale_img(x, s[1], same_shape=False), # scale 252 | )): 253 | # cv2.imwrite('img%g.jpg' % i, 255 * xi[0].numpy().transpose((1, 2, 0))[:, :, ::-1]) 254 | y.append(self.forward_once(xi)[0]) 255 | 256 | y[1][..., :4] /= s[0] # scale 257 | y[1][..., 0] = img_size[1] - y[1][..., 0] # flip lr 258 | y[2][..., :4] /= s[1] # scale 259 | 260 | # for i, yi in enumerate(y): # coco small, medium, large = < 32**2 < 96**2 < 261 | # area = yi[..., 2:4].prod(2)[:, :, None] 262 | # if i == 1: 263 | # yi *= (area < 96. ** 2).float() 264 | # elif i == 2: 265 | # yi *= (area > 32. ** 2).float() 266 | # y[i] = yi 267 | 268 | y = torch.cat(y, 1) 269 | return y, None 270 | 271 | def forward_once(self, x, augment=False, verbose=False): 272 | img_size = x.shape[-2:] # height, width 273 | yolo_out, out = [], [] 274 | if verbose: 275 | print('0', x.shape) 276 | str = '' 277 | 278 | # Augment images (inference and test only) 279 | if augment: # https://github.com/ultralytics/yolov3/issues/931 280 | nb = x.shape[0] # batch size 281 | s = [0.83, 0.67] # scales 282 | x = torch.cat((x, 283 | torch_utils.scale_img(x.flip(3), s[0]), # flip-lr and scale 284 | torch_utils.scale_img(x, s[1]), # scale 285 | ), 0) 286 | 287 | for i, module in enumerate(self.module_list): 288 | name = module.__class__.__name__ 289 | if name in ['WeightedFeatureFusion', 'FeatureConcat', 'FeatureConcat_l']: # sum, concat 290 | if verbose: 291 | l = [i - 1] + module.layers # layers 292 | sh = [list(x.shape)] + [list(out[i].shape) for i in module.layers] # shapes 293 | str = ' >> ' + ' + '.join(['layer %g %s' % x for x in zip(l, sh)]) 294 | x = module(x, out) # WeightedFeatureFusion(), FeatureConcat() 295 | elif name == 'YOLOLayer': 296 | yolo_out.append(module(x, out)) 297 | else: # run module directly, i.e. mtype = 'convolutional', 'upsample', 'maxpool', 'batchnorm2d' etc. 298 | x = module(x) 299 | 300 | out.append(x if self.routs[i] else []) 301 | if verbose: 302 | print('%g/%g %s -' % (i, len(self.module_list), name), list(x.shape), str) 303 | str = '' 304 | 305 | if self.training: # train 306 | return yolo_out 307 | elif ONNX_EXPORT: # export 308 | x = [torch.cat(x, 0) for x in zip(*yolo_out)] 309 | return x[0], torch.cat(x[1:3], 1) # scores, boxes: 3780x80, 3780x4 310 | else: # inference or test 311 | x, p = zip(*yolo_out) # inference output, training output 312 | x = torch.cat(x, 1) # cat yolo outputs 313 | if augment: # de-augment results 314 | x = torch.split(x, nb, dim=0) 315 | x[1][..., :4] /= s[0] # scale 316 | x[1][..., 0] = img_size[1] - x[1][..., 0] # flip lr 317 | x[2][..., :4] /= s[1] # scale 318 | x = torch.cat(x, 1) 319 | return x, p 320 | 321 | def fuse(self): 322 | # Fuse Conv2d + BatchNorm2d layers throughout model 323 | print('Fusing layers...') 324 | fused_list = nn.ModuleList() 325 | for a in list(self.children())[0]: 326 | if isinstance(a, nn.Sequential): 327 | for i, b in enumerate(a): 328 | if isinstance(b, nn.modules.batchnorm.BatchNorm2d): 329 | # fuse this bn layer with the previous conv2d layer 330 | conv = a[i - 1] 331 | fused = torch_utils.fuse_conv_and_bn(conv, b) 332 | a = nn.Sequential(fused, *list(a.children())[i + 1:]) 333 | break 334 | fused_list.append(a) 335 | self.module_list = fused_list 336 | self.info() if not ONNX_EXPORT else None # yolov3-spp reduced from 225 to 152 layers 337 | 338 | def info(self, verbose=False): 339 | torch_utils.model_info(self, verbose) 340 | 341 | 342 | def get_yolo_layers(model): 343 | return [i for i, m in enumerate(model.module_list) if m.__class__.__name__ == 'YOLOLayer'] # [89, 101, 113] 344 | 345 | 346 | def load_darknet_weights(self, weights, cutoff=-1): 347 | # Parses and loads the weights stored in 'weights' 348 | 349 | # Establish cutoffs (load layers between 0 and cutoff. if cutoff = -1 all are loaded) 350 | file = Path(weights).name 351 | if file == 'darknet53.conv.74': 352 | cutoff = 75 353 | elif file == 'yolov3-tiny.conv.15': 354 | cutoff = 15 355 | 356 | # Read weights file 357 | with open(weights, 'rb') as f: 358 | # Read Header https://github.com/AlexeyAB/darknet/issues/2914#issuecomment-496675346 359 | self.version = np.fromfile(f, dtype=np.int32, count=3) # (int32) version info: major, minor, revision 360 | self.seen = np.fromfile(f, dtype=np.int64, count=1) # (int64) number of images seen during training 361 | 362 | weights = np.fromfile(f, dtype=np.float32) # the rest are weights 363 | 364 | ptr = 0 365 | for i, (mdef, module) in enumerate(zip(self.module_defs[:cutoff], self.module_list[:cutoff])): 366 | if mdef['type'] == 'convolutional': 367 | conv = module[0] 368 | if mdef['batch_normalize']: 369 | # Load BN bias, weights, running mean and running variance 370 | bn = module[1] 371 | nb = bn.bias.numel() # number of biases 372 | # Bias 373 | bn.bias.data.copy_(torch.from_numpy(weights[ptr:ptr + nb]).view_as(bn.bias)) 374 | ptr += nb 375 | # Weight 376 | bn.weight.data.copy_(torch.from_numpy(weights[ptr:ptr + nb]).view_as(bn.weight)) 377 | ptr += nb 378 | # Running Mean 379 | bn.running_mean.data.copy_(torch.from_numpy(weights[ptr:ptr + nb]).view_as(bn.running_mean)) 380 | ptr += nb 381 | # Running Var 382 | bn.running_var.data.copy_(torch.from_numpy(weights[ptr:ptr + nb]).view_as(bn.running_var)) 383 | ptr += nb 384 | else: 385 | # Load conv. bias 386 | nb = conv.bias.numel() 387 | conv_b = torch.from_numpy(weights[ptr:ptr + nb]).view_as(conv.bias) 388 | conv.bias.data.copy_(conv_b) 389 | ptr += nb 390 | # Load conv. weights 391 | nw = conv.weight.numel() # number of weights 392 | conv.weight.data.copy_(torch.from_numpy(weights[ptr:ptr + nw]).view_as(conv.weight)) 393 | ptr += nw 394 | 395 | 396 | def save_weights(self, path='model.weights', cutoff=-1): 397 | # Converts a PyTorch model to Darket format (*.pt to *.weights) 398 | # Note: Does not work if model.fuse() is applied 399 | with open(path, 'wb') as f: 400 | # Write Header https://github.com/AlexeyAB/darknet/issues/2914#issuecomment-496675346 401 | self.version.tofile(f) # (int32) version info: major, minor, revision 402 | self.seen.tofile(f) # (int64) number of images seen during training 403 | 404 | # Iterate through layers 405 | for i, (mdef, module) in enumerate(zip(self.module_defs[:cutoff], self.module_list[:cutoff])): 406 | if mdef['type'] == 'convolutional': 407 | conv_layer = module[0] 408 | # If batch norm, load bn first 409 | if mdef['batch_normalize']: 410 | bn_layer = module[1] 411 | bn_layer.bias.data.cpu().numpy().tofile(f) 412 | bn_layer.weight.data.cpu().numpy().tofile(f) 413 | bn_layer.running_mean.data.cpu().numpy().tofile(f) 414 | bn_layer.running_var.data.cpu().numpy().tofile(f) 415 | # Load conv bias 416 | else: 417 | conv_layer.bias.data.cpu().numpy().tofile(f) 418 | # Load conv weights 419 | conv_layer.weight.data.cpu().numpy().tofile(f) 420 | 421 | 422 | def convert(cfg='cfg/yolov4-pacsp.cfg', weights='weights/yolov4-pacsp.weights'): 423 | # Converts between PyTorch and Darknet format per extension (i.e. *.weights convert to *.pt and vice versa) 424 | # from models import *; convert('cfg/yolov3-spp.cfg', 'weights/yolov3-spp.weights') 425 | 426 | # Initialize model 427 | model = Darknet(cfg) 428 | 429 | # Load weights and save 430 | if weights.endswith('.pt'): # if PyTorch format 431 | model.load_state_dict(torch.load(weights, map_location='cpu')['model']) 432 | save_weights(model, path='converted.weights', cutoff=-1) 433 | print("Success: converted '%s' to 'converted.weights'" % weights) 434 | 435 | elif weights.endswith('.weights'): # darknet format 436 | _ = load_darknet_weights(model, weights) 437 | 438 | chkpt = {'epoch': -1, 439 | 'best_fitness': None, 440 | 'training_results': None, 441 | 'model': model.state_dict(), 442 | 'optimizer': None} 443 | 444 | torch.save(chkpt, 'converted.pt') 445 | print("Success: converted '%s' to 'converted.pt'" % weights) 446 | 447 | else: 448 | print('Error: extension not supported.') 449 | 450 | 451 | def attempt_download(weights): 452 | # Attempt to download pretrained weights if not found locally 453 | weights = weights.strip() 454 | msg = weights + ' missing, try downloading from https://drive.google.com/open?id=1LezFG5g3BCW6iYaV89B2i64cqEUZD7e0' 455 | 456 | if len(weights) > 0 and not os.path.isfile(weights): 457 | d = {'': ''} 458 | 459 | file = Path(weights).name 460 | if file in d: 461 | r = gdrive_download(id=d[file], name=weights) 462 | else: # download from pjreddie.com 463 | url = 'https://pjreddie.com/media/files/' + file 464 | print('Downloading ' + url) 465 | r = os.system('curl -f ' + url + ' -o ' + weights) 466 | 467 | # Error check 468 | if not (r == 0 and os.path.exists(weights) and os.path.getsize(weights) > 1E6): # weights exist and > 1MB 469 | os.system('rm ' + weights) # remove partial downloads 470 | raise Exception(msg) 471 | -------------------------------------------------------------------------------- /pic/p0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/pic/p0.png -------------------------------------------------------------------------------- /pic/p1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/pic/p1.png -------------------------------------------------------------------------------- /pic/p2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/pic/p2.png -------------------------------------------------------------------------------- /pic/p3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/pic/p3.png -------------------------------------------------------------------------------- /pic/p4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/pic/p4.png -------------------------------------------------------------------------------- /pic/p5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/pic/p5.png -------------------------------------------------------------------------------- /pic/test1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/pic/test1.jpg -------------------------------------------------------------------------------- /pic/test2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/pic/test2.jpg -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy == 1.17 2 | opencv-python >= 4.1 3 | torch >= 1.5 4 | torchvision 5 | matplotlib 6 | pycocotools 7 | tqdm 8 | pillow 9 | tensorboard >= 1.14 10 | 11 | # Nvidia Apex (optional) for mixed precision training -------------------------- 12 | # git clone https://github.com/NVIDIA/apex && cd apex && pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" . --user 13 | -------------------------------------------------------------------------------- /runs/readme: -------------------------------------------------------------------------------- 1 | tensorboard的log存放在此 -------------------------------------------------------------------------------- /test.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import json 3 | 4 | from torch.utils.data import DataLoader 5 | 6 | from models import * 7 | from utils.datasets import * 8 | from utils.utils import * 9 | 10 | 11 | def test(cfg, 12 | data, 13 | weights=None, 14 | batch_size=16, 15 | img_size=416, 16 | conf_thres=0.001, 17 | iou_thres=0.6, # for nms 18 | save_json=False, 19 | single_cls=False, 20 | augment=False, 21 | model=None, 22 | dataloader=None): 23 | # Initialize/load model and set device 24 | if model is None: 25 | device = torch_utils.select_device(opt.device, batch_size=batch_size) 26 | verbose = opt.task == 'test' 27 | 28 | # Remove previous 29 | for f in glob.glob('test_batch*.jpg'): 30 | os.remove(f) 31 | 32 | # Initialize model 33 | model = Darknet(cfg, img_size) 34 | 35 | # Load weights 36 | attempt_download(weights) 37 | if weights.endswith('.pt'): # pytorch format 38 | model.load_state_dict(torch.load(weights, map_location=device)['model']) 39 | else: # darknet format 40 | load_darknet_weights(model, weights) 41 | 42 | # Fuse 43 | model.fuse() 44 | model.to(device) 45 | 46 | if device.type != 'cpu' and torch.cuda.device_count() > 1: 47 | model = nn.DataParallel(model) 48 | else: # called by train.py 49 | device = next(model.parameters()).device # get model device 50 | verbose = False 51 | 52 | # Configure run 53 | data = parse_data_cfg(data) 54 | nc = 1 if single_cls else int(data['classes']) # number of classes 55 | path = data['valid'] # path to test images 56 | names = load_classes(data['names']) # class names 57 | iouv = torch.linspace(0.5, 0.95, 10).to(device) # iou vector for mAP@0.5:0.95 58 | iouv = iouv[0].view(1) # comment for mAP@0.5:0.95 59 | niou = iouv.numel() 60 | 61 | # Dataloader 62 | if dataloader is None: 63 | dataset = LoadImagesAndLabels(path, img_size, batch_size, rect=True, single_cls=opt.single_cls) 64 | batch_size = min(batch_size, len(dataset)) 65 | dataloader = DataLoader(dataset, 66 | batch_size=batch_size, 67 | num_workers=min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]), 68 | pin_memory=True, 69 | collate_fn=dataset.collate_fn) 70 | 71 | seen = 0 72 | model.eval() 73 | _ = model(torch.zeros((1, 3, img_size, img_size), device=device)) if device.type != 'cpu' else None # run once 74 | coco91class = coco80_to_coco91_class() 75 | s = ('%20s' + '%10s' * 6) % ('Class', 'Images', 'Targets', 'P', 'R', 'mAP@0.5', 'F1') 76 | p, r, f1, mp, mr, map, mf1, t0, t1 = 0., 0., 0., 0., 0., 0., 0., 0., 0. 77 | loss = torch.zeros(3, device=device) 78 | jdict, stats, ap, ap_class = [], [], [], [] 79 | for batch_i, (imgs, targets, paths, shapes) in enumerate(tqdm(dataloader, desc=s)): 80 | imgs = imgs.to(device).float() / 255.0 # uint8 to float32, 0 - 255 to 0.0 - 1.0 81 | targets = targets.to(device) 82 | nb, _, height, width = imgs.shape # batch size, channels, height, width 83 | whwh = torch.Tensor([width, height, width, height]).to(device) 84 | 85 | # Plot images with bounding boxes 86 | f = 'test_batch%g.jpg' % batch_i # filename 87 | # if batch_i < 1 and not os.path.exists(f): #<---------------不打印 88 | # plot_images(imgs=imgs, targets=targets, paths=paths, fname=f) 89 | 90 | # Disable gradients 91 | with torch.no_grad(): 92 | # Run model 93 | t = torch_utils.time_synchronized() 94 | inf_out, train_out = model(imgs, augment=augment) # inference and training outputs 95 | t0 += torch_utils.time_synchronized() - t 96 | 97 | # Compute loss 98 | if hasattr(model, 'hyp'): # if model has loss hyperparameters 99 | loss += compute_loss(train_out, targets, model)[1][:3] # GIoU, obj, cls 100 | 101 | # Run NMS 102 | t = torch_utils.time_synchronized() 103 | output = non_max_suppression(inf_out, conf_thres=conf_thres, iou_thres=iou_thres) # nms 104 | t1 += torch_utils.time_synchronized() - t 105 | 106 | # Statistics per image 107 | for si, pred in enumerate(output): 108 | labels = targets[targets[:, 0] == si, 1:] 109 | nl = len(labels) 110 | tcls = labels[:, 0].tolist() if nl else [] # target class 111 | seen += 1 112 | 113 | if pred is None: 114 | if nl: 115 | stats.append((torch.zeros(0, niou, dtype=torch.bool), torch.Tensor(), torch.Tensor(), tcls)) 116 | continue 117 | 118 | # Append to text file 119 | # with open('test.txt', 'a') as file: 120 | # [file.write('%11.5g' * 7 % tuple(x) + '\n') for x in pred] 121 | 122 | # Clip boxes to image bounds 123 | clip_coords(pred, (height, width)) 124 | 125 | # Append to pycocotools JSON dictionary 126 | if save_json: 127 | # [{"image_id": 42, "category_id": 18, "bbox": [258.15, 41.29, 348.26, 243.78], "score": 0.236}, ... 128 | image_id = int(Path(paths[si]).stem.split('_')[-1]) 129 | box = pred[:, :4].clone() # xyxy 130 | scale_coords(imgs[si].shape[1:], box, shapes[si][0], shapes[si][1]) # to original shape 131 | box = xyxy2xywh(box) # xywh 132 | box[:, :2] -= box[:, 2:] / 2 # xy center to top-left corner 133 | for p, b in zip(pred.tolist(), box.tolist()): 134 | jdict.append({'image_id': image_id, 135 | 'category_id': coco91class[int(p[5])], 136 | 'bbox': [round(x, 3) for x in b], 137 | 'score': round(p[4], 5)}) 138 | 139 | # Assign all predictions as incorrect 140 | correct = torch.zeros(pred.shape[0], niou, dtype=torch.bool, device=device) 141 | if nl: 142 | detected = [] # target indices 143 | tcls_tensor = labels[:, 0] 144 | 145 | # target boxes 146 | tbox = xywh2xyxy(labels[:, 1:5]) * whwh 147 | 148 | # Per target class 149 | for cls in torch.unique(tcls_tensor): 150 | ti = (cls == tcls_tensor).nonzero().view(-1) # prediction indices 151 | pi = (cls == pred[:, 5]).nonzero().view(-1) # target indices 152 | 153 | # Search for detections 154 | if pi.shape[0]: 155 | # Prediction to target ious 156 | ious, i = box_iou(pred[pi, :4], tbox[ti]).max(1) # best ious, indices 157 | 158 | # Append detections 159 | for j in (ious > iouv[0]).nonzero(): 160 | d = ti[i[j]] # detected target 161 | if d not in detected: 162 | detected.append(d) 163 | correct[pi[j]] = ious[j] > iouv # iou_thres is 1xn 164 | if len(detected) == nl: # all targets already located in image 165 | break 166 | 167 | # Append statistics (correct, conf, pcls, tcls) 168 | stats.append((correct.cpu(), pred[:, 4].cpu(), pred[:, 5].cpu(), tcls)) 169 | 170 | # Compute statistics 171 | stats = [np.concatenate(x, 0) for x in zip(*stats)] # to numpy 172 | if len(stats): 173 | p, r, ap, f1, ap_class = ap_per_class(*stats) 174 | if niou > 1: 175 | p, r, ap, f1 = p[:, 0], r[:, 0], ap.mean(1), ap[:, 0] # [P, R, AP@0.5:0.95, AP@0.5] 176 | mp, mr, map, mf1 = p.mean(), r.mean(), ap.mean(), f1.mean() 177 | nt = np.bincount(stats[3].astype(np.int64), minlength=nc) # number of targets per class 178 | else: 179 | nt = torch.zeros(1) 180 | 181 | # Print results 182 | pf = '%20s' + '%10.3g' * 6 # print format 183 | print(pf % ('all', seen, nt.sum(), mp, mr, map, mf1)) 184 | 185 | # Print results per class 186 | if verbose and nc > 1 and len(stats): 187 | for i, c in enumerate(ap_class): 188 | print(pf % (names[c], seen, nt[c], p[i], r[i], ap[i], f1[i])) 189 | 190 | # Print speeds 191 | if verbose or save_json: 192 | t = tuple(x / seen * 1E3 for x in (t0, t1, t0 + t1)) + (img_size, img_size, batch_size) # tuple 193 | print('Speed: %.1f/%.1f/%.1f ms inference/NMS/total per %gx%g image at batch-size %g' % t) 194 | 195 | maps = np.zeros(nc) + map 196 | # Save JSON 197 | if save_json and map and len(jdict): 198 | print('\nCOCO mAP with pycocotools...') 199 | imgIds = [int(Path(x).stem.split('_')[-1]) for x in dataloader.dataset.img_files] 200 | with open('results.json', 'w') as file: 201 | json.dump(jdict, file) 202 | 203 | try: 204 | from pycocotools.coco import COCO 205 | from pycocotools.cocoeval import COCOeval 206 | except: 207 | print('WARNING: missing pycocotools package, can not compute official COCO mAP. See requirements.txt.') 208 | 209 | # https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb 210 | cocoGt = COCO(glob.glob('../coco/annotations/instances_val*.json')[0]) # initialize COCO ground truth api 211 | cocoDt = cocoGt.loadRes('results.json') # initialize COCO pred api 212 | 213 | cocoEval = COCOeval(cocoGt, cocoDt, 'bbox') 214 | cocoEval.params.imgIds = imgIds # [:32] # only evaluate these images 215 | cocoEval.evaluate() 216 | cocoEval.accumulate() 217 | cocoEval.summarize() 218 | map, map50 = cocoEval.stats[:2] # update results (mAP@0.5:0.95, mAP@0.5) 219 | return (mp, mr, map50, map, *(loss.cpu() / len(dataloader)).tolist()), maps, t 220 | 221 | # Return results 222 | for i, c in enumerate(ap_class): 223 | maps[c] = ap[i] 224 | return (mp, mr, map, mf1, *(loss.cpu() / len(dataloader)).tolist()), maps 225 | 226 | 227 | if __name__ == '__main__': 228 | parser = argparse.ArgumentParser(prog='test.py') 229 | parser.add_argument('--cfg', type=str, default='cfg/yolov4-pacsp.cfg', help='*.cfg path') 230 | parser.add_argument('--data', type=str, default='data/coco2017.data', help='*.data path') 231 | parser.add_argument('--weights', type=str, default='weights/yolov4-pacsp.pt', help='weights path') 232 | parser.add_argument('--batch-size', type=int, default=16, help='size of each image batch') 233 | parser.add_argument('--img-size', type=int, default=512, help='inference size (pixels)') 234 | parser.add_argument('--conf-thres', type=float, default=0.001, help='object confidence threshold') 235 | parser.add_argument('--iou-thres', type=float, default=0.6, help='IOU threshold for NMS') 236 | parser.add_argument('--save-json', action='store_true', help='save a cocoapi-compatible JSON results file') 237 | parser.add_argument('--task', default='test', help="'test', 'study', 'benchmark'") 238 | parser.add_argument('--device', default='', help='device id (i.e. 0 or 0,1) or cpu') 239 | parser.add_argument('--single-cls', action='store_true', help='train as single-class dataset') 240 | parser.add_argument('--augment', action='store_true', help='augmented inference') 241 | opt = parser.parse_args() 242 | opt.save_json = opt.save_json or any([x in opt.data for x in ['coco.data', 'coco2014.data', 'coco2017.data']]) 243 | print(opt) 244 | 245 | # task = 'test', 'study', 'benchmark' 246 | if opt.task == 'test': # (default) test normally 247 | test(opt.cfg, 248 | opt.data, 249 | opt.weights, 250 | opt.batch_size, 251 | opt.img_size, 252 | opt.conf_thres, 253 | opt.iou_thres, 254 | opt.save_json, 255 | opt.single_cls, 256 | opt.augment) 257 | 258 | elif opt.task == 'benchmark': # mAPs at 320-608 at conf 0.5 and 0.7 259 | y = [] 260 | x = list(range(288, 896, 64)) 261 | f = 'benchmark_%s_%s.txt' % (Path(opt.data).stem, Path(opt.weights).stem) # filename to save to 262 | for i in x: # img-size 263 | for j in [0.7]: # iou-thres 264 | r, _, t = test(opt.cfg, opt.data, opt.weights, opt.batch_size, i, opt.conf_thres, j, opt.save_json) 265 | y.append(r + t) 266 | np.savetxt(f, y, fmt='%10.6g') # save 267 | 268 | elif opt.task == 'study': # Parameter study 269 | y = [] 270 | x = np.arange(0.4, 0.9, 0.05) # iou-thres 271 | for i in x: 272 | t = time.time() 273 | r = test(opt.cfg, opt.data, opt.weights, opt.batch_size, opt.img_size, opt.conf_thres, i, opt.save_json)[0] 274 | y.append(r + (time.time() - t,)) 275 | np.savetxt('study.txt', y, fmt='%10.4g') # y = np.loadtxt('study.txt') 276 | 277 | # Plot 278 | fig, ax = plt.subplots(3, 1, figsize=(6, 6)) 279 | y = np.stack(y, 0) 280 | ax[0].plot(x, y[:, 2], marker='.', label='mAP@0.5') 281 | ax[0].set_ylabel('mAP') 282 | ax[1].plot(x, y[:, 3], marker='.', label='mAP@0.5:0.95') 283 | ax[1].set_ylabel('mAP') 284 | ax[2].plot(x, y[:, -1], marker='.', label='time') 285 | ax[2].set_ylabel('time (s)') 286 | for i in range(3): 287 | ax[i].legend() 288 | ax[i].set_xlabel('iou_thr') 289 | fig.tight_layout() 290 | plt.savefig('study.jpg', dpi=200) 291 | -------------------------------------------------------------------------------- /test_half.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import json 3 | 4 | from torch.utils.data import DataLoader 5 | 6 | from models import * 7 | from utils.datasets import * 8 | from utils.utils import * 9 | 10 | 11 | def test(cfg, 12 | data, 13 | weights=None, 14 | batch_size=16, 15 | img_size=416, 16 | conf_thres=0.001, 17 | iou_thres=0.6, # for nms 18 | save_json=False, 19 | single_cls=False, 20 | augment=False, 21 | model=None, 22 | dataloader=None): 23 | # Initialize/load model and set device 24 | if model is None: 25 | device = torch_utils.select_device(opt.device, batch_size=batch_size) 26 | verbose = opt.task == 'test' 27 | 28 | # Remove previous 29 | for f in glob.glob('test_batch*.jpg'): 30 | os.remove(f) 31 | 32 | # Initialize model 33 | model = Darknet(cfg, img_size) 34 | 35 | # Load weights 36 | attempt_download(weights) 37 | if weights.endswith('.pt'): # pytorch format 38 | model.load_state_dict(torch.load(weights, map_location=device)['model']) 39 | else: # darknet format 40 | load_darknet_weights(model, weights) 41 | 42 | # Fuse 43 | model.fuse() 44 | model.to(device) 45 | model.half() 46 | 47 | if device.type != 'cpu' and torch.cuda.device_count() > 1: 48 | model = nn.DataParallel(model) 49 | else: # called by train.py 50 | device = next(model.parameters()).device # get model device 51 | verbose = False 52 | 53 | # Configure run 54 | data = parse_data_cfg(data) 55 | nc = 1 if single_cls else int(data['classes']) # number of classes 56 | path = data['valid'] # path to test images 57 | names = load_classes(data['names']) # class names 58 | iouv = torch.linspace(0.5, 0.95, 10).to(device) # iou vector for mAP@0.5:0.95 59 | iouv = iouv[0].view(1) # comment for mAP@0.5:0.95 60 | niou = iouv.numel() 61 | 62 | # Dataloader 63 | if dataloader is None: 64 | dataset = LoadImagesAndLabels(path, img_size, batch_size, rect=True, single_cls=opt.single_cls) 65 | batch_size = min(batch_size, len(dataset)) 66 | dataloader = DataLoader(dataset, 67 | batch_size=batch_size, 68 | num_workers=min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]), 69 | pin_memory=True, 70 | collate_fn=dataset.collate_fn) 71 | 72 | seen = 0 73 | model.eval() 74 | _ = model(torch.zeros((1, 3, img_size, img_size), device=device).half()) if device.type != 'cpu' else None # run once 75 | coco91class = coco80_to_coco91_class() 76 | s = ('%20s' + '%10s' * 6) % ('Class', 'Images', 'Targets', 'P', 'R', 'mAP@0.5', 'F1') 77 | p, r, f1, mp, mr, map, mf1, t0, t1 = 0., 0., 0., 0., 0., 0., 0., 0., 0. 78 | loss = torch.zeros(3, device=device) 79 | jdict, stats, ap, ap_class = [], [], [], [] 80 | for batch_i, (imgs, targets, paths, shapes) in enumerate(tqdm(dataloader, desc=s)): 81 | imgs = imgs.to(device).half() / 255.0 # uint8 to float32, 0 - 255 to 0.0 - 1.0 82 | targets = targets.to(device) 83 | nb, _, height, width = imgs.shape # batch size, channels, height, width 84 | whwh = torch.Tensor([width, height, width, height]).to(device) 85 | 86 | # Plot images with bounding boxes 87 | f = 'test_batch%g.jpg' % batch_i # filename 88 | #if batch_i < 1 and not os.path.exists(f): 89 | # plot_images(imgs=imgs, targets=targets, paths=paths, fname=f) 90 | 91 | # Disable gradients 92 | with torch.no_grad(): 93 | # Run model 94 | t = torch_utils.time_synchronized() 95 | inf_out, train_out = model(imgs, augment=augment) # inference and training outputs 96 | t0 += torch_utils.time_synchronized() - t 97 | 98 | # Compute loss 99 | if hasattr(model, 'hyp'): # if model has loss hyperparameters 100 | loss += compute_loss(train_out, targets, model)[1][:3] # GIoU, obj, cls 101 | 102 | # Run NMS 103 | t = torch_utils.time_synchronized() 104 | output = non_max_suppression(inf_out, conf_thres=conf_thres, iou_thres=iou_thres) # nms 105 | t1 += torch_utils.time_synchronized() - t 106 | 107 | # Statistics per image 108 | for si, pred in enumerate(output): 109 | labels = targets[targets[:, 0] == si, 1:] 110 | nl = len(labels) 111 | tcls = labels[:, 0].tolist() if nl else [] # target class 112 | seen += 1 113 | 114 | if pred is None: 115 | if nl: 116 | stats.append((torch.zeros(0, niou, dtype=torch.bool), torch.Tensor(), torch.Tensor(), tcls)) 117 | continue 118 | 119 | # Append to text file 120 | # with open('test.txt', 'a') as file: 121 | # [file.write('%11.5g' * 7 % tuple(x) + '\n') for x in pred] 122 | 123 | # Clip boxes to image bounds 124 | clip_coords(pred, (height, width)) 125 | 126 | # Append to pycocotools JSON dictionary 127 | if save_json: 128 | # [{"image_id": 42, "category_id": 18, "bbox": [258.15, 41.29, 348.26, 243.78], "score": 0.236}, ... 129 | image_id = int(Path(paths[si]).stem.split('_')[-1]) 130 | box = pred[:, :4].clone() # xyxy 131 | scale_coords(imgs[si].shape[1:], box, shapes[si][0], shapes[si][1]) # to original shape 132 | box = xyxy2xywh(box) # xywh 133 | box[:, :2] -= box[:, 2:] / 2 # xy center to top-left corner 134 | for p, b in zip(pred.tolist(), box.tolist()): 135 | jdict.append({'image_id': image_id, 136 | 'category_id': coco91class[int(p[5])], 137 | 'bbox': [round(x, 3) for x in b], 138 | 'score': round(p[4], 5)}) 139 | 140 | # Assign all predictions as incorrect 141 | correct = torch.zeros(pred.shape[0], niou, dtype=torch.bool, device=device) 142 | if nl: 143 | detected = [] # target indices 144 | tcls_tensor = labels[:, 0] 145 | 146 | # target boxes 147 | tbox = xywh2xyxy(labels[:, 1:5]) * whwh 148 | 149 | # Per target class 150 | for cls in torch.unique(tcls_tensor): 151 | ti = (cls == tcls_tensor).nonzero().view(-1) # prediction indices 152 | pi = (cls == pred[:, 5]).nonzero().view(-1) # target indices 153 | 154 | # Search for detections 155 | if pi.shape[0]: 156 | # Prediction to target ious 157 | ious, i = box_iou(pred[pi, :4], tbox[ti]).max(1) # best ious, indices 158 | 159 | # Append detections 160 | for j in (ious > iouv[0]).nonzero(): 161 | d = ti[i[j]] # detected target 162 | if d not in detected: 163 | detected.append(d) 164 | correct[pi[j]] = ious[j] > iouv # iou_thres is 1xn 165 | if len(detected) == nl: # all targets already located in image 166 | break 167 | 168 | # Append statistics (correct, conf, pcls, tcls) 169 | stats.append((correct.cpu(), pred[:, 4].cpu(), pred[:, 5].cpu(), tcls)) 170 | 171 | # Compute statistics 172 | stats = [np.concatenate(x, 0) for x in zip(*stats)] # to numpy 173 | if len(stats): 174 | p, r, ap, f1, ap_class = ap_per_class(*stats) 175 | if niou > 1: 176 | p, r, ap, f1 = p[:, 0], r[:, 0], ap.mean(1), ap[:, 0] # [P, R, AP@0.5:0.95, AP@0.5] 177 | mp, mr, map, mf1 = p.mean(), r.mean(), ap.mean(), f1.mean() 178 | nt = np.bincount(stats[3].astype(np.int64), minlength=nc) # number of targets per class 179 | else: 180 | nt = torch.zeros(1) 181 | 182 | # Print results 183 | pf = '%20s' + '%10.3g' * 6 # print format 184 | print(pf % ('all', seen, nt.sum(), mp, mr, map, mf1)) 185 | 186 | # Print results per class 187 | if verbose and nc > 1 and len(stats): 188 | for i, c in enumerate(ap_class): 189 | print(pf % (names[c], seen, nt[c], p[i], r[i], ap[i], f1[i])) 190 | 191 | # Print speeds 192 | if verbose or save_json: 193 | t = tuple(x / seen * 1E3 for x in (t0, t1, t0 + t1)) + (img_size, img_size, batch_size) # tuple 194 | print('Speed: %.1f/%.1f/%.1f ms inference/NMS/total per %gx%g image at batch-size %g' % t) 195 | 196 | maps = np.zeros(nc) + map 197 | # Save JSON 198 | if save_json and map and len(jdict): 199 | print('\nCOCO mAP with pycocotools...') 200 | imgIds = [int(Path(x).stem.split('_')[-1]) for x in dataloader.dataset.img_files] 201 | with open('results.json', 'w') as file: 202 | json.dump(jdict, file) 203 | 204 | try: 205 | from pycocotools.coco import COCO 206 | from pycocotools.cocoeval import COCOeval 207 | except: 208 | print('WARNING: missing pycocotools package, can not compute official COCO mAP. See requirements.txt.') 209 | 210 | # https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb 211 | cocoGt = COCO(glob.glob('../coco/annotations/instances_val*.json')[0]) # initialize COCO ground truth api 212 | cocoDt = cocoGt.loadRes('results.json') # initialize COCO pred api 213 | 214 | cocoEval = COCOeval(cocoGt, cocoDt, 'bbox') 215 | cocoEval.params.imgIds = imgIds # [:32] # only evaluate these images 216 | cocoEval.evaluate() 217 | cocoEval.accumulate() 218 | cocoEval.summarize() 219 | map, map50 = cocoEval.stats[:2] # update results (mAP@0.5:0.95, mAP@0.5) 220 | return (mp, mr, map50, map, *(loss.cpu() / len(dataloader)).tolist()), maps, t 221 | 222 | # Return results 223 | for i, c in enumerate(ap_class): 224 | maps[c] = ap[i] 225 | return (mp, mr, map, mf1, *(loss.cpu() / len(dataloader)).tolist()), maps 226 | 227 | 228 | if __name__ == '__main__': 229 | parser = argparse.ArgumentParser(prog='test.py') 230 | parser.add_argument('--cfg', type=str, default='cfg/yolov4-pacsp.cfg', help='*.cfg path') 231 | parser.add_argument('--data', type=str, default='data/coco2017.data', help='*.data path') 232 | parser.add_argument('--weights', type=str, default='weights/yolov4-pacsp.pt', help='weights path') 233 | parser.add_argument('--batch-size', type=int, default=16, help='size of each image batch') 234 | parser.add_argument('--img-size', type=int, default=512, help='inference size (pixels)') 235 | parser.add_argument('--conf-thres', type=float, default=0.001, help='object confidence threshold') 236 | parser.add_argument('--iou-thres', type=float, default=0.6, help='IOU threshold for NMS') 237 | parser.add_argument('--save-json', action='store_true', help='save a cocoapi-compatible JSON results file') 238 | parser.add_argument('--task', default='test', help="'test', 'study', 'benchmark'") 239 | parser.add_argument('--device', default='', help='device id (i.e. 0 or 0,1) or cpu') 240 | parser.add_argument('--single-cls', action='store_true', help='train as single-class dataset') 241 | parser.add_argument('--augment', action='store_true', help='augmented inference') 242 | opt = parser.parse_args() 243 | opt.save_json = opt.save_json or any([x in opt.data for x in ['coco.data', 'coco2014.data', 'coco2017.data']]) 244 | print(opt) 245 | 246 | # task = 'test', 'study', 'benchmark' 247 | if opt.task == 'test': # (default) test normally 248 | test(opt.cfg, 249 | opt.data, 250 | opt.weights, 251 | opt.batch_size, 252 | opt.img_size, 253 | opt.conf_thres, 254 | opt.iou_thres, 255 | opt.save_json, 256 | opt.single_cls, 257 | opt.augment) 258 | 259 | elif opt.task == 'benchmark': # mAPs at 320-608 at conf 0.5 and 0.7 260 | y = [] 261 | x = list(range(288, 896, 64)) 262 | f = 'study_%s_%s.txt' % (Path(opt.data).stem, Path(opt.weights).stem) # filename to save to 263 | for i in x: # img-size 264 | for j in [0.7]: # iou-thres 265 | r, _, t = test(opt.cfg, opt.data, opt.weights, opt.batch_size, i, opt.conf_thres, j, opt.save_json) 266 | y.append(r + t) 267 | np.savetxt(f, y, fmt='%10.6g') # save 268 | 269 | elif opt.task == 'study': # Parameter study 270 | y = [] 271 | x = np.arange(0.4, 0.9, 0.05) # iou-thres 272 | for i in x: 273 | t = time.time() 274 | r = test(opt.cfg, opt.data, opt.weights, opt.batch_size, opt.img_size, opt.conf_thres, i, opt.save_json)[0] 275 | y.append(r + (time.time() - t,)) 276 | np.savetxt('study.txt', y, fmt='%10.4g') # y = np.loadtxt('study.txt') 277 | 278 | # Plot 279 | fig, ax = plt.subplots(3, 1, figsize=(6, 6)) 280 | y = np.stack(y, 0) 281 | ax[0].plot(x, y[:, 2], marker='.', label='mAP@0.5') 282 | ax[0].set_ylabel('mAP') 283 | ax[1].plot(x, y[:, 3], marker='.', label='mAP@0.5:0.95') 284 | ax[1].set_ylabel('mAP') 285 | ax[2].plot(x, y[:, -1], marker='.', label='time') 286 | ax[2].set_ylabel('time (s)') 287 | for i in range(3): 288 | ax[i].legend() 289 | ax[i].set_xlabel('iou_thr') 290 | fig.tight_layout() 291 | plt.savefig('study.jpg', dpi=200) 292 | -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /utils/__pycache__/__init__.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/utils/__pycache__/__init__.cpython-35.pyc -------------------------------------------------------------------------------- /utils/__pycache__/datasets.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/utils/__pycache__/datasets.cpython-35.pyc -------------------------------------------------------------------------------- /utils/__pycache__/google_utils.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/utils/__pycache__/google_utils.cpython-35.pyc -------------------------------------------------------------------------------- /utils/__pycache__/layers.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/utils/__pycache__/layers.cpython-35.pyc -------------------------------------------------------------------------------- /utils/__pycache__/parse_config.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/utils/__pycache__/parse_config.cpython-35.pyc -------------------------------------------------------------------------------- /utils/__pycache__/torch_utils.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/utils/__pycache__/torch_utils.cpython-35.pyc -------------------------------------------------------------------------------- /utils/__pycache__/utils.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DataXujing/Pytorch_YOLO-v4/0b5d8c4c6de528fc79be71e1c0a13b1580e9d923/utils/__pycache__/utils.cpython-35.pyc -------------------------------------------------------------------------------- /utils/adabound.py: -------------------------------------------------------------------------------- 1 | import math 2 | 3 | import torch 4 | from torch.optim.optimizer import Optimizer 5 | 6 | 7 | class AdaBound(Optimizer): 8 | """Implements AdaBound algorithm. 9 | It has been proposed in `Adaptive Gradient Methods with Dynamic Bound of Learning Rate`_. 10 | Arguments: 11 | params (iterable): iterable of parameters to optimize or dicts defining 12 | parameter groups 13 | lr (float, optional): Adam learning rate (default: 1e-3) 14 | betas (Tuple[float, float], optional): coefficients used for computing 15 | running averages of gradient and its square (default: (0.9, 0.999)) 16 | final_lr (float, optional): final (SGD) learning rate (default: 0.1) 17 | gamma (float, optional): convergence speed of the bound functions (default: 1e-3) 18 | eps (float, optional): term added to the denominator to improve 19 | numerical stability (default: 1e-8) 20 | weight_decay (float, optional): weight decay (L2 penalty) (default: 0) 21 | amsbound (boolean, optional): whether to use the AMSBound variant of this algorithm 22 | .. Adaptive Gradient Methods with Dynamic Bound of Learning Rate: 23 | https://openreview.net/forum?id=Bkg3g2R9FX 24 | """ 25 | 26 | def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), final_lr=0.1, gamma=1e-3, 27 | eps=1e-8, weight_decay=0, amsbound=False): 28 | if not 0.0 <= lr: 29 | raise ValueError("Invalid learning rate: {}".format(lr)) 30 | if not 0.0 <= eps: 31 | raise ValueError("Invalid epsilon value: {}".format(eps)) 32 | if not 0.0 <= betas[0] < 1.0: 33 | raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0])) 34 | if not 0.0 <= betas[1] < 1.0: 35 | raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1])) 36 | if not 0.0 <= final_lr: 37 | raise ValueError("Invalid final learning rate: {}".format(final_lr)) 38 | if not 0.0 <= gamma < 1.0: 39 | raise ValueError("Invalid gamma parameter: {}".format(gamma)) 40 | defaults = dict(lr=lr, betas=betas, final_lr=final_lr, gamma=gamma, eps=eps, 41 | weight_decay=weight_decay, amsbound=amsbound) 42 | super(AdaBound, self).__init__(params, defaults) 43 | 44 | self.base_lrs = list(map(lambda group: group['lr'], self.param_groups)) 45 | 46 | def __setstate__(self, state): 47 | super(AdaBound, self).__setstate__(state) 48 | for group in self.param_groups: 49 | group.setdefault('amsbound', False) 50 | 51 | def step(self, closure=None): 52 | """Performs a single optimization step. 53 | Arguments: 54 | closure (callable, optional): A closure that reevaluates the model 55 | and returns the loss. 56 | """ 57 | loss = None 58 | if closure is not None: 59 | loss = closure() 60 | 61 | for group, base_lr in zip(self.param_groups, self.base_lrs): 62 | for p in group['params']: 63 | if p.grad is None: 64 | continue 65 | grad = p.grad.data 66 | if grad.is_sparse: 67 | raise RuntimeError( 68 | 'Adam does not support sparse gradients, please consider SparseAdam instead') 69 | amsbound = group['amsbound'] 70 | 71 | state = self.state[p] 72 | 73 | # State initialization 74 | if len(state) == 0: 75 | state['step'] = 0 76 | # Exponential moving average of gradient values 77 | state['exp_avg'] = torch.zeros_like(p.data) 78 | # Exponential moving average of squared gradient values 79 | state['exp_avg_sq'] = torch.zeros_like(p.data) 80 | if amsbound: 81 | # Maintains max of all exp. moving avg. of sq. grad. values 82 | state['max_exp_avg_sq'] = torch.zeros_like(p.data) 83 | 84 | exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq'] 85 | if amsbound: 86 | max_exp_avg_sq = state['max_exp_avg_sq'] 87 | beta1, beta2 = group['betas'] 88 | 89 | state['step'] += 1 90 | 91 | if group['weight_decay'] != 0: 92 | grad = grad.add(group['weight_decay'], p.data) 93 | 94 | # Decay the first and second moment running average coefficient 95 | exp_avg.mul_(beta1).add_(1 - beta1, grad) 96 | exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad) 97 | if amsbound: 98 | # Maintains the maximum of all 2nd moment running avg. till now 99 | torch.max(max_exp_avg_sq, exp_avg_sq, out=max_exp_avg_sq) 100 | # Use the max. for normalizing running avg. of gradient 101 | denom = max_exp_avg_sq.sqrt().add_(group['eps']) 102 | else: 103 | denom = exp_avg_sq.sqrt().add_(group['eps']) 104 | 105 | bias_correction1 = 1 - beta1 ** state['step'] 106 | bias_correction2 = 1 - beta2 ** state['step'] 107 | step_size = group['lr'] * math.sqrt(bias_correction2) / bias_correction1 108 | 109 | # Applies bounds on actual learning rate 110 | # lr_scheduler cannot affect final_lr, this is a workaround to apply lr decay 111 | final_lr = group['final_lr'] * group['lr'] / base_lr 112 | lower_bound = final_lr * (1 - 1 / (group['gamma'] * state['step'] + 1)) 113 | upper_bound = final_lr * (1 + 1 / (group['gamma'] * state['step'])) 114 | step_size = torch.full_like(denom, step_size) 115 | step_size.div_(denom).clamp_(lower_bound, upper_bound).mul_(exp_avg) 116 | 117 | p.data.add_(-step_size) 118 | 119 | return loss 120 | 121 | 122 | class AdaBoundW(Optimizer): 123 | """Implements AdaBound algorithm with Decoupled Weight Decay (arxiv.org/abs/1711.05101) 124 | It has been proposed in `Adaptive Gradient Methods with Dynamic Bound of Learning Rate`_. 125 | Arguments: 126 | params (iterable): iterable of parameters to optimize or dicts defining 127 | parameter groups 128 | lr (float, optional): Adam learning rate (default: 1e-3) 129 | betas (Tuple[float, float], optional): coefficients used for computing 130 | running averages of gradient and its square (default: (0.9, 0.999)) 131 | final_lr (float, optional): final (SGD) learning rate (default: 0.1) 132 | gamma (float, optional): convergence speed of the bound functions (default: 1e-3) 133 | eps (float, optional): term added to the denominator to improve 134 | numerical stability (default: 1e-8) 135 | weight_decay (float, optional): weight decay (L2 penalty) (default: 0) 136 | amsbound (boolean, optional): whether to use the AMSBound variant of this algorithm 137 | .. Adaptive Gradient Methods with Dynamic Bound of Learning Rate: 138 | https://openreview.net/forum?id=Bkg3g2R9FX 139 | """ 140 | 141 | def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), final_lr=0.1, gamma=1e-3, 142 | eps=1e-8, weight_decay=0, amsbound=False): 143 | if not 0.0 <= lr: 144 | raise ValueError("Invalid learning rate: {}".format(lr)) 145 | if not 0.0 <= eps: 146 | raise ValueError("Invalid epsilon value: {}".format(eps)) 147 | if not 0.0 <= betas[0] < 1.0: 148 | raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0])) 149 | if not 0.0 <= betas[1] < 1.0: 150 | raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1])) 151 | if not 0.0 <= final_lr: 152 | raise ValueError("Invalid final learning rate: {}".format(final_lr)) 153 | if not 0.0 <= gamma < 1.0: 154 | raise ValueError("Invalid gamma parameter: {}".format(gamma)) 155 | defaults = dict(lr=lr, betas=betas, final_lr=final_lr, gamma=gamma, eps=eps, 156 | weight_decay=weight_decay, amsbound=amsbound) 157 | super(AdaBoundW, self).__init__(params, defaults) 158 | 159 | self.base_lrs = list(map(lambda group: group['lr'], self.param_groups)) 160 | 161 | def __setstate__(self, state): 162 | super(AdaBoundW, self).__setstate__(state) 163 | for group in self.param_groups: 164 | group.setdefault('amsbound', False) 165 | 166 | def step(self, closure=None): 167 | """Performs a single optimization step. 168 | Arguments: 169 | closure (callable, optional): A closure that reevaluates the model 170 | and returns the loss. 171 | """ 172 | loss = None 173 | if closure is not None: 174 | loss = closure() 175 | 176 | for group, base_lr in zip(self.param_groups, self.base_lrs): 177 | for p in group['params']: 178 | if p.grad is None: 179 | continue 180 | grad = p.grad.data 181 | if grad.is_sparse: 182 | raise RuntimeError( 183 | 'Adam does not support sparse gradients, please consider SparseAdam instead') 184 | amsbound = group['amsbound'] 185 | 186 | state = self.state[p] 187 | 188 | # State initialization 189 | if len(state) == 0: 190 | state['step'] = 0 191 | # Exponential moving average of gradient values 192 | state['exp_avg'] = torch.zeros_like(p.data) 193 | # Exponential moving average of squared gradient values 194 | state['exp_avg_sq'] = torch.zeros_like(p.data) 195 | if amsbound: 196 | # Maintains max of all exp. moving avg. of sq. grad. values 197 | state['max_exp_avg_sq'] = torch.zeros_like(p.data) 198 | 199 | exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq'] 200 | if amsbound: 201 | max_exp_avg_sq = state['max_exp_avg_sq'] 202 | beta1, beta2 = group['betas'] 203 | 204 | state['step'] += 1 205 | 206 | # Decay the first and second moment running average coefficient 207 | exp_avg.mul_(beta1).add_(1 - beta1, grad) 208 | exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad) 209 | if amsbound: 210 | # Maintains the maximum of all 2nd moment running avg. till now 211 | torch.max(max_exp_avg_sq, exp_avg_sq, out=max_exp_avg_sq) 212 | # Use the max. for normalizing running avg. of gradient 213 | denom = max_exp_avg_sq.sqrt().add_(group['eps']) 214 | else: 215 | denom = exp_avg_sq.sqrt().add_(group['eps']) 216 | 217 | bias_correction1 = 1 - beta1 ** state['step'] 218 | bias_correction2 = 1 - beta2 ** state['step'] 219 | step_size = group['lr'] * math.sqrt(bias_correction2) / bias_correction1 220 | 221 | # Applies bounds on actual learning rate 222 | # lr_scheduler cannot affect final_lr, this is a workaround to apply lr decay 223 | final_lr = group['final_lr'] * group['lr'] / base_lr 224 | lower_bound = final_lr * (1 - 1 / (group['gamma'] * state['step'] + 1)) 225 | upper_bound = final_lr * (1 + 1 / (group['gamma'] * state['step'])) 226 | step_size = torch.full_like(denom, step_size) 227 | step_size.div_(denom).clamp_(lower_bound, upper_bound).mul_(exp_avg) 228 | 229 | if group['weight_decay'] != 0: 230 | decayed_weights = torch.mul(p.data, group['weight_decay']) 231 | p.data.add_(-step_size) 232 | p.data.sub_(decayed_weights) 233 | else: 234 | p.data.add_(-step_size) 235 | 236 | return loss 237 | -------------------------------------------------------------------------------- /utils/evolve.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | #for i in 0 1 2 3 3 | #do 4 | # t=ultralytics/yolov3:v139 && sudo docker pull $t && sudo nvidia-docker run -d --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t utils/evolve.sh $i 5 | # sleep 30 6 | #done 7 | 8 | while true; do 9 | # python3 train.py --data ../data/sm4/out.data --img-size 320 --epochs 100 --batch 64 --accum 1 --weights yolov3-tiny.conv.15 --multi --bucket ult/wer --evolve --cache --device $1 --cfg yolov3-tiny3-1cls.cfg --single --adam 10 | # python3 train.py --data ../out/data.data --img-size 608 --epochs 10 --batch 8 --accum 8 --weights ultralytics68.pt --multi --bucket ult/athena --evolve --device $1 --cfg yolov3-spp-1cls.cfg 11 | 12 | python3 train.py --data coco2014.data --img-size 512 608 --epochs 27 --batch 8 --accum 8 --evolve --weights '' --bucket ult/coco/sppa_512 --device $1 --cfg yolov3-sppa.cfg --multi 13 | done 14 | 15 | 16 | # coco epoch times --img-size 416 608 --epochs 27 --batch 16 --accum 4 17 | # 36:34 2080ti 18 | # 21:58 V100 19 | # 63:00 T4 -------------------------------------------------------------------------------- /utils/gcp.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | # New VM 4 | rm -rf sample_data yolov3 5 | git clone https://github.com/ultralytics/yolov3 6 | # git clone -b test --depth 1 https://github.com/ultralytics/yolov3 test # branch 7 | # sudo apt-get install zip 8 | #git clone https://github.com/NVIDIA/apex && cd apex && pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" . --user && cd .. && rm -rf apex 9 | sudo conda install -yc conda-forge scikit-image pycocotools 10 | # python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('193Zp_ye-3qXMonR1nZj3YyxMtQkMy50k','coco2014.zip')" 11 | python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('1WQT6SOktSe8Uw6r10-2JhbEhMY5DJaph','coco2017.zip')" 12 | python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('1C3HewOG9akA3y456SZLBJZfNDPkBwAto','knife.zip')" 13 | python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('13g3LqdpkNE8sPosVJT6KFXlfoMypzRP4','sm4.zip')" 14 | sudo shutdown 15 | 16 | # Mount local SSD 17 | lsblk 18 | sudo mkfs.ext4 -F /dev/nvme0n1 19 | sudo mkdir -p /mnt/disks/nvme0n1 20 | sudo mount /dev/nvme0n1 /mnt/disks/nvme0n1 21 | sudo chmod a+w /mnt/disks/nvme0n1 22 | cp -r coco /mnt/disks/nvme0n1 23 | 24 | # Kill All 25 | t=ultralytics/yolov3:v1 26 | docker kill $(docker ps -a -q --filter ancestor=$t) 27 | 28 | # Evolve coco 29 | sudo -s 30 | t=ultralytics/yolov3:evolve 31 | # docker kill $(docker ps -a -q --filter ancestor=$t) 32 | for i in 0 1 6 7 33 | do 34 | docker pull $t && docker run --gpus all -d --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t bash utils/evolve.sh $i 35 | sleep 30 36 | done 37 | 38 | #COCO training 39 | n=131 && t=ultralytics/coco:v131 && sudo docker pull $t && sudo docker run -it --gpus all --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t python3 train.py --data coco2014.data --img-size 320 640 --epochs 300 --batch 16 --weights '' --device 0 --cfg yolov3-spp.cfg --bucket ult/coco --name $n && sudo shutdown 40 | n=132 && t=ultralytics/coco:v131 && sudo docker pull $t && sudo docker run -it --gpus all --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t python3 train.py --data coco2014.data --img-size 320 640 --epochs 300 --batch 64 --weights '' --device 0 --cfg yolov3-tiny.cfg --bucket ult/coco --name $n && sudo shutdown 41 | -------------------------------------------------------------------------------- /utils/google_utils.py: -------------------------------------------------------------------------------- 1 | # This file contains google utils: https://cloud.google.com/storage/docs/reference/libraries 2 | # pip install --upgrade google-cloud-storage 3 | 4 | import os 5 | import time 6 | 7 | 8 | # from google.cloud import storage 9 | 10 | 11 | def gdrive_download(id='1HaXkef9z6y5l4vUnCYgdmEAj61c6bfWO', name='coco.zip'): 12 | # https://gist.github.com/tanaikech/f0f2d122e05bf5f971611258c22c110f 13 | # Downloads a file from Google Drive, accepting presented query 14 | # from utils.google_utils import *; gdrive_download() 15 | t = time.time() 16 | 17 | print('Downloading https://drive.google.com/uc?export=download&id=%s as %s... ' % (id, name), end='') 18 | os.remove(name) if os.path.exists(name) else None # remove existing 19 | os.remove('cookie') if os.path.exists('cookie') else None 20 | 21 | # Attempt file download 22 | os.system("curl -c ./cookie -s -L \"https://drive.google.com/uc?export=download&id=%s\" > /dev/null" % id) 23 | if os.path.exists('cookie'): # large file 24 | s = "curl -Lb ./cookie \"https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=%s\" -o %s" % ( 25 | id, name) 26 | else: # small file 27 | s = "curl -s -L -o %s 'https://drive.google.com/uc?export=download&id=%s'" % (name, id) 28 | r = os.system(s) # execute, capture return values 29 | os.remove('cookie') if os.path.exists('cookie') else None 30 | 31 | # Error check 32 | if r != 0: 33 | os.remove(name) if os.path.exists(name) else None # remove partial 34 | print('Download error ') # raise Exception('Download error') 35 | return r 36 | 37 | # Unzip if archive 38 | if name.endswith('.zip'): 39 | print('unzipping... ', end='') 40 | os.system('unzip -q %s' % name) # unzip 41 | os.remove(name) # remove zip to free space 42 | 43 | print('Done (%.1fs)' % (time.time() - t)) 44 | return r 45 | 46 | 47 | def upload_blob(bucket_name, source_file_name, destination_blob_name): 48 | # Uploads a file to a bucket 49 | # https://cloud.google.com/storage/docs/uploading-objects#storage-upload-object-python 50 | 51 | storage_client = storage.Client() 52 | bucket = storage_client.get_bucket(bucket_name) 53 | blob = bucket.blob(destination_blob_name) 54 | 55 | blob.upload_from_filename(source_file_name) 56 | 57 | print('File {} uploaded to {}.'.format( 58 | source_file_name, 59 | destination_blob_name)) 60 | 61 | 62 | def download_blob(bucket_name, source_blob_name, destination_file_name): 63 | # Uploads a blob from a bucket 64 | storage_client = storage.Client() 65 | bucket = storage_client.get_bucket(bucket_name) 66 | blob = bucket.blob(source_blob_name) 67 | 68 | blob.download_to_filename(destination_file_name) 69 | 70 | print('Blob {} downloaded to {}.'.format( 71 | source_blob_name, 72 | destination_file_name)) 73 | -------------------------------------------------------------------------------- /utils/layers.py: -------------------------------------------------------------------------------- 1 | import torch.nn.functional as F 2 | 3 | from utils.utils import * 4 | 5 | try: 6 | from mish_cuda import MishCuda as Mish 7 | except: 8 | class Mish(nn.Module): # https://github.com/digantamisra98/Mish 9 | def forward(self, x): 10 | return x * F.softplus(x).tanh() 11 | 12 | 13 | def make_divisible(v, divisor): 14 | # Function ensures all layers have a channel number that is divisible by 8 15 | # https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py 16 | return math.ceil(v / divisor) * divisor 17 | 18 | 19 | class Flatten(nn.Module): 20 | # Use after nn.AdaptiveAvgPool2d(1) to remove last 2 dimensions 21 | def forward(self, x): 22 | return x.view(x.size(0), -1) 23 | 24 | 25 | class Concat(nn.Module): 26 | # Concatenate a list of tensors along dimension 27 | def __init__(self, dimension=1): 28 | super(Concat, self).__init__() 29 | self.d = dimension 30 | 31 | def forward(self, x): 32 | return torch.cat(x, self.d) 33 | 34 | 35 | class FeatureConcat(nn.Module): 36 | def __init__(self, layers): 37 | super(FeatureConcat, self).__init__() 38 | self.layers = layers # layer indices 39 | self.multiple = len(layers) > 1 # multiple layers flag 40 | 41 | def forward(self, x, outputs): 42 | return torch.cat([outputs[i] for i in self.layers], 1) if self.multiple else outputs[self.layers[0]] 43 | 44 | 45 | class FeatureConcat_l(nn.Module): 46 | def __init__(self, layers): 47 | super(FeatureConcat_l, self).__init__() 48 | self.layers = layers # layer indices 49 | self.multiple = len(layers) > 1 # multiple layers flag 50 | 51 | def forward(self, x, outputs): 52 | return torch.cat([outputs[i][:,:outputs[i].shape[1]//2,:,:] for i in self.layers], 1) if self.multiple else outputs[self.layers[0]][:,:outputs[self.layers[0]].shape[1]//2,:,:] 53 | 54 | 55 | class WeightedFeatureFusion(nn.Module): # weighted sum of 2 or more layers https://arxiv.org/abs/1911.09070 56 | def __init__(self, layers, weight=False): 57 | super(WeightedFeatureFusion, self).__init__() 58 | self.layers = layers # layer indices 59 | self.weight = weight # apply weights boolean 60 | self.n = len(layers) + 1 # number of layers 61 | if weight: 62 | self.w = nn.Parameter(torch.zeros(self.n), requires_grad=True) # layer weights 63 | 64 | def forward(self, x, outputs): 65 | # Weights 66 | if self.weight: 67 | w = torch.sigmoid(self.w) * (2 / self.n) # sigmoid weights (0-1) 68 | x = x * w[0] 69 | 70 | # Fusion 71 | nx = x.shape[1] # input channels 72 | for i in range(self.n - 1): 73 | a = outputs[self.layers[i]] * w[i + 1] if self.weight else outputs[self.layers[i]] # feature to add 74 | na = a.shape[1] # feature channels 75 | 76 | # Adjust channels 77 | if nx == na: # same shape 78 | x = x + a 79 | elif nx > na: # slice input 80 | x[:, :na] = x[:, :na] + a # or a = nn.ZeroPad2d((0, 0, 0, 0, 0, dc))(a); x = x + a 81 | else: # slice feature 82 | x = x + a[:, :nx] 83 | 84 | return x 85 | 86 | 87 | class MixConv2d(nn.Module): # MixConv: Mixed Depthwise Convolutional Kernels https://arxiv.org/abs/1907.09595 88 | def __init__(self, in_ch, out_ch, k=(3, 5, 7), stride=1, dilation=1, bias=True, method='equal_params'): 89 | super(MixConv2d, self).__init__() 90 | 91 | groups = len(k) 92 | if method == 'equal_ch': # equal channels per group 93 | i = torch.linspace(0, groups - 1E-6, out_ch).floor() # out_ch indices 94 | ch = [(i == g).sum() for g in range(groups)] 95 | else: # 'equal_params': equal parameter count per group 96 | b = [out_ch] + [0] * groups 97 | a = np.eye(groups + 1, groups, k=-1) 98 | a -= np.roll(a, 1, axis=1) 99 | a *= np.array(k) ** 2 100 | a[0] = 1 101 | ch = np.linalg.lstsq(a, b, rcond=None)[0].round().astype(int) # solve for equal weight indices, ax = b 102 | 103 | self.m = nn.ModuleList([nn.Conv2d(in_channels=in_ch, 104 | out_channels=ch[g], 105 | kernel_size=k[g], 106 | stride=stride, 107 | padding=k[g] // 2, # 'same' pad 108 | dilation=dilation, 109 | bias=bias) for g in range(groups)]) 110 | 111 | def forward(self, x): 112 | return torch.cat([m(x) for m in self.m], 1) 113 | 114 | 115 | # Activation functions below ------------------------------------------------------------------------------------------- 116 | class SwishImplementation(torch.autograd.Function): 117 | @staticmethod 118 | def forward(ctx, x): 119 | ctx.save_for_backward(x) 120 | return x * torch.sigmoid(x) 121 | 122 | @staticmethod 123 | def backward(ctx, grad_output): 124 | x = ctx.saved_tensors[0] 125 | sx = torch.sigmoid(x) # sigmoid(ctx) 126 | return grad_output * (sx * (1 + x * (1 - sx))) 127 | 128 | 129 | class MishImplementation(torch.autograd.Function): 130 | @staticmethod 131 | def forward(ctx, x): 132 | ctx.save_for_backward(x) 133 | return x.mul(torch.tanh(F.softplus(x))) # x * tanh(ln(1 + exp(x))) 134 | 135 | @staticmethod 136 | def backward(ctx, grad_output): 137 | x = ctx.saved_tensors[0] 138 | sx = torch.sigmoid(x) 139 | fx = F.softplus(x).tanh() 140 | return grad_output * (fx + x * sx * (1 - fx * fx)) 141 | 142 | 143 | class MemoryEfficientSwish(nn.Module): 144 | def forward(self, x): 145 | return SwishImplementation.apply(x) 146 | 147 | 148 | class MemoryEfficientMish(nn.Module): 149 | def forward(self, x): 150 | return MishImplementation.apply(x) 151 | 152 | 153 | class Swish(nn.Module): 154 | def forward(self, x): 155 | return x * torch.sigmoid(x) 156 | 157 | 158 | class HardSwish(nn.Module): # https://arxiv.org/pdf/1905.02244.pdf 159 | def forward(self, x): 160 | return x * F.hardtanh(x + 3, 0., 6., True) / 6. 161 | -------------------------------------------------------------------------------- /utils/parse_config.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | import numpy as np 4 | 5 | 6 | def parse_model_cfg(path): 7 | # Parse the yolo *.cfg file and return module definitions path may be 'cfg/yolov3.cfg', 'yolov3.cfg', or 'yolov3' 8 | if not path.endswith('.cfg'): # add .cfg suffix if omitted 9 | path += '.cfg' 10 | if not os.path.exists(path) and os.path.exists('cfg' + os.sep + path): # add cfg/ prefix if omitted 11 | path = 'cfg' + os.sep + path 12 | 13 | with open(path, 'r') as f: 14 | lines = f.read().split('\n') 15 | lines = [x for x in lines if x and not x.startswith('#')] 16 | lines = [x.rstrip().lstrip() for x in lines] # get rid of fringe whitespaces 17 | mdefs = [] # module definitions 18 | for line in lines: 19 | if line.startswith('['): # This marks the start of a new block 20 | mdefs.append({}) 21 | mdefs[-1]['type'] = line[1:-1].rstrip() 22 | if mdefs[-1]['type'] == 'convolutional': 23 | mdefs[-1]['batch_normalize'] = 0 # pre-populate with zeros (may be overwritten later) 24 | else: 25 | key, val = line.split("=") 26 | key = key.rstrip() 27 | 28 | if key == 'anchors': # return nparray 29 | mdefs[-1][key] = np.array([float(x) for x in val.split(',')]).reshape((-1, 2)) # np anchors 30 | elif (key in ['from', 'layers', 'mask']) or (key == 'size' and ',' in val): # return array 31 | mdefs[-1][key] = [int(x) for x in val.split(',')] 32 | else: 33 | val = val.strip() 34 | if val.isnumeric(): # return int or float 35 | mdefs[-1][key] = int(val) if (int(val) - float(val)) == 0 else float(val) 36 | else: 37 | mdefs[-1][key] = val # return string 38 | 39 | # Check all fields are supported 40 | supported = ['type', 'batch_normalize', 'filters', 'size', 'stride', 'pad', 'activation', 'layers', 'groups', 41 | 'from', 'mask', 'anchors', 'classes', 'num', 'jitter', 'ignore_thresh', 'truth_thresh', 'random', 42 | 'stride_x', 'stride_y', 'weights_type', 'weights_normalization', 'scale_x_y', 'beta_nms', 'nms_kind', 43 | 'iou_loss', 'iou_normalizer', 'cls_normalizer', 'iou_thresh'] 44 | 45 | f = [] # fields 46 | for x in mdefs[1:]: 47 | [f.append(k) for k in x if k not in f] 48 | u = [x for x in f if x not in supported] # unsupported fields 49 | assert not any(u), "Unsupported fields %s in %s. See https://github.com/ultralytics/yolov3/issues/631" % (u, path) 50 | 51 | return mdefs 52 | 53 | 54 | def parse_data_cfg(path): 55 | # Parses the data configuration file 56 | if not os.path.exists(path) and os.path.exists('data' + os.sep + path): # add data/ prefix if omitted 57 | path = 'data' + os.sep + path 58 | 59 | with open(path, 'r') as f: 60 | lines = f.readlines() 61 | 62 | options = dict() 63 | for line in lines: 64 | line = line.strip() 65 | if line == '' or line.startswith('#'): 66 | continue 67 | key, val = line.split('=') 68 | options[key.strip()] = val.strip() 69 | 70 | return options 71 | -------------------------------------------------------------------------------- /utils/torch_utils.py: -------------------------------------------------------------------------------- 1 | import math 2 | import os 3 | import time 4 | from copy import deepcopy 5 | 6 | import torch 7 | import torch.backends.cudnn as cudnn 8 | import torch.nn as nn 9 | import torch.nn.functional as F 10 | 11 | 12 | def init_seeds(seed=0): 13 | torch.manual_seed(seed) 14 | 15 | # Remove randomness (may be slower on Tesla GPUs) # https://pytorch.org/docs/stable/notes/randomness.html 16 | if seed == 0: 17 | cudnn.deterministic = True 18 | cudnn.benchmark = False 19 | 20 | 21 | def select_device(device='', apex=False, batch_size=None): 22 | # device = 'cpu' or '0' or '0,1,2,3' 23 | cpu_request = device.lower() == 'cpu' 24 | if device and not cpu_request: # if device requested other than 'cpu' 25 | os.environ['CUDA_VISIBLE_DEVICES'] = device # set environment variable 26 | assert torch.cuda.is_available(), 'CUDA unavailable, invalid device %s requested' % device # check availablity 27 | 28 | cuda = False if cpu_request else torch.cuda.is_available() 29 | if cuda: 30 | c = 1024 ** 2 # bytes to MB 31 | ng = torch.cuda.device_count() 32 | if ng > 1 and batch_size: # check that batch_size is compatible with device_count 33 | assert batch_size % ng == 0, 'batch-size %g not multiple of GPU count %g' % (batch_size, ng) 34 | x = [torch.cuda.get_device_properties(i) for i in range(ng)] 35 | s = 'Using CUDA ' + ('Apex ' if apex else '') # apex for mixed precision https://github.com/NVIDIA/apex 36 | for i in range(0, ng): 37 | if i == 1: 38 | s = ' ' * len(s) 39 | print("%sdevice%g _CudaDeviceProperties(name='%s', total_memory=%dMB)" % 40 | (s, i, x[i].name, x[i].total_memory / c)) 41 | else: 42 | print('Using CPU') 43 | 44 | print('') # skip a line 45 | return torch.device('cuda:0' if cuda else 'cpu') 46 | 47 | 48 | def time_synchronized(): 49 | torch.cuda.synchronize() if torch.cuda.is_available() else None 50 | return time.time() 51 | 52 | 53 | def initialize_weights(model): 54 | for m in model.modules(): 55 | t = type(m) 56 | if t is nn.Conv2d: 57 | pass # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 58 | elif t is nn.BatchNorm2d: 59 | m.eps = 1e-4 60 | m.momentum = 0.03 61 | elif t in [nn.LeakyReLU, nn.ReLU, nn.ReLU6]: 62 | m.inplace = True 63 | 64 | 65 | def find_modules(model, mclass=nn.Conv2d): 66 | # finds layer indices matching module class 'mclass' 67 | return [i for i, m in enumerate(model.module_list) if isinstance(m, mclass)] 68 | 69 | 70 | def fuse_conv_and_bn(conv, bn): 71 | # https://tehnokv.com/posts/fusing-batchnorm-and-conv/ 72 | with torch.no_grad(): 73 | # init 74 | fusedconv = torch.nn.Conv2d(conv.in_channels, 75 | conv.out_channels, 76 | kernel_size=conv.kernel_size, 77 | stride=conv.stride, 78 | padding=conv.padding, 79 | bias=True) 80 | 81 | # prepare filters 82 | w_conv = conv.weight.clone().view(conv.out_channels, -1) 83 | w_bn = torch.diag(bn.weight.div(torch.sqrt(bn.eps + bn.running_var))) 84 | fusedconv.weight.copy_(torch.mm(w_bn, w_conv).view(fusedconv.weight.size())) 85 | 86 | # prepare spatial bias 87 | if conv.bias is not None: 88 | b_conv = conv.bias 89 | else: 90 | b_conv = torch.zeros(conv.weight.size(0)) 91 | b_bn = bn.bias - bn.weight.mul(bn.running_mean).div(torch.sqrt(bn.running_var + bn.eps)) 92 | fusedconv.bias.copy_(torch.mm(w_bn, b_conv.reshape(-1, 1)).reshape(-1) + b_bn) 93 | 94 | return fusedconv 95 | 96 | 97 | def model_info(model, verbose=False): 98 | # Plots a line-by-line description of a PyTorch model 99 | n_p = sum(x.numel() for x in model.parameters()) # number parameters 100 | n_g = sum(x.numel() for x in model.parameters() if x.requires_grad) # number gradients 101 | if verbose: 102 | print('%5s %40s %9s %12s %20s %10s %10s' % ('layer', 'name', 'gradient', 'parameters', 'shape', 'mu', 'sigma')) 103 | for i, (name, p) in enumerate(model.named_parameters()): 104 | name = name.replace('module_list.', '') 105 | print('%5g %40s %9s %12g %20s %10.3g %10.3g' % 106 | (i, name, p.requires_grad, p.numel(), list(p.shape), p.mean(), p.std())) 107 | 108 | try: # FLOPS 109 | from thop import profile 110 | macs, _ = profile(model, inputs=(torch.zeros(1, 3, 480, 640),), verbose=False) 111 | fs = ', %.1f GFLOPS' % (macs / 1E9 * 2) 112 | except: 113 | fs = '' 114 | 115 | print('Model Summary: %g layers, %g parameters, %g gradients%s' % (len(list(model.parameters())), n_p, n_g, fs)) 116 | 117 | 118 | def load_classifier(name='resnet101', n=2): 119 | # Loads a pretrained model reshaped to n-class output 120 | import pretrainedmodels # https://github.com/Cadene/pretrained-models.pytorch#torchvision 121 | model = pretrainedmodels.__dict__[name](num_classes=1000, pretrained='imagenet') 122 | 123 | # Display model properties 124 | for x in ['model.input_size', 'model.input_space', 'model.input_range', 'model.mean', 'model.std']: 125 | print(x + ' =', eval(x)) 126 | 127 | # Reshape output to n classes 128 | filters = model.last_linear.weight.shape[1] 129 | model.last_linear.bias = torch.nn.Parameter(torch.zeros(n)) 130 | model.last_linear.weight = torch.nn.Parameter(torch.zeros(n, filters)) 131 | model.last_linear.out_features = n 132 | return model 133 | 134 | 135 | def scale_img(img, ratio=1.0, same_shape=True): # img(16,3,256,416), r=ratio 136 | # scales img(bs,3,y,x) by ratio 137 | h, w = img.shape[2:] 138 | s = (int(h * ratio), int(w * ratio)) # new size 139 | img = F.interpolate(img, size=s, mode='bilinear', align_corners=False) # resize 140 | if not same_shape: # pad/crop img 141 | gs = 64 # (pixels) grid size 142 | h, w = [math.ceil(x * ratio / gs) * gs for x in (h, w)] 143 | return F.pad(img, [0, w - s[1], 0, h - s[0]], value=0.447) # value = imagenet mean 144 | 145 | 146 | class ModelEMA: 147 | """ Model Exponential Moving Average from https://github.com/rwightman/pytorch-image-models 148 | Keep a moving average of everything in the model state_dict (parameters and buffers). 149 | This is intended to allow functionality like 150 | https://www.tensorflow.org/api_docs/python/tf/train/ExponentialMovingAverage 151 | A smoothed version of the weights is necessary for some training schemes to perform well. 152 | E.g. Google's hyper-params for training MNASNet, MobileNet-V3, EfficientNet, etc that use 153 | RMSprop with a short 2.4-3 epoch decay period and slow LR decay rate of .96-.99 requires EMA 154 | smoothing of weights to match results. Pay attention to the decay constant you are using 155 | relative to your update count per epoch. 156 | To keep EMA from using GPU resources, set device='cpu'. This will save a bit of memory but 157 | disable validation of the EMA weights. Validation will have to be done manually in a separate 158 | process, or after the training stops converging. 159 | This class is sensitive where it is initialized in the sequence of model init, 160 | GPU assignment and distributed training wrappers. 161 | I've tested with the sequence in my own train.py for torch.DataParallel, apex.DDP, and single-GPU. 162 | """ 163 | 164 | def __init__(self, model, decay=0.9999, device=''): 165 | # make a copy of the model for accumulating moving average of weights 166 | self.ema = deepcopy(model) 167 | self.ema.eval() 168 | self.updates = 0 # number of EMA updates 169 | self.decay = lambda x: decay * (1 - math.exp(-x / 2000)) # decay exponential ramp (to help early epochs) 170 | self.device = device # perform ema on different device from model if set 171 | if device: 172 | self.ema.to(device=device) 173 | for p in self.ema.parameters(): 174 | p.requires_grad_(False) 175 | 176 | def update(self, model): 177 | self.updates += 1 178 | d = self.decay(self.updates) 179 | with torch.no_grad(): 180 | if type(model) in (nn.parallel.DataParallel, nn.parallel.DistributedDataParallel): 181 | msd, esd = model.module.state_dict(), self.ema.module.state_dict() 182 | else: 183 | msd, esd = model.state_dict(), self.ema.state_dict() 184 | 185 | for k, v in esd.items(): 186 | if v.dtype.is_floating_point: 187 | v *= d 188 | v += (1. - d) * msd[k].detach() 189 | 190 | def update_attr(self, model): 191 | # Assign attributes (which may change during training) 192 | for k in model.__dict__.keys(): 193 | if not k.startswith('_'): 194 | setattr(self.ema, k, getattr(model, k)) 195 | -------------------------------------------------------------------------------- /weights/put your weights file here.txt: -------------------------------------------------------------------------------- 1 | yolov4-paspp.pt 2 | yolov4-pacsp-s.pt 3 | yolov4-pacsp.pt 4 | yolov4-pacsp-x.pt --------------------------------------------------------------------------------