├── README.md
├── cfg
├── yolov4-csp-leaky.cfg
├── yolov4-csp-mish.cfg
├── yolov4-csp-s-leaky.cfg
├── yolov4-csp-s-mish.cfg
├── yolov4-csp-x-leaky.cfg
├── yolov4-csp-x-mish.cfg
├── yolov4-pacsp-mish.cfg
├── yolov4-pacsp-s-mish.cfg
├── yolov4-pacsp-s.cfg
├── yolov4-pacsp-x-mish.cfg
├── yolov4-pacsp-x.cfg
├── yolov4-pacsp.cfg
├── yolov4-paspp.cfg
├── yolov4-tiny.cfg
└── yolov4.cfg
├── data
├── coco.data
├── coco.names
├── coco.yaml
├── coco1.data
├── coco1.txt
├── coco16.data
├── coco16.txt
├── coco1cls.data
├── coco1cls.txt
├── coco2014.data
├── coco2017.data
├── coco64.data
├── coco64.txt
├── coco_paper.names
├── get_coco2014.sh
├── get_coco2017.sh
├── hyp.scratch.s.yaml
├── hyp.scratch.yaml
└── samples
│ ├── bus.jpg
│ └── zidane.jpg
├── detect.py
├── images
└── scalingCSP.png
├── models
├── export.py
└── models.py
├── requirements.txt
├── test.py
├── train.py
├── utils
├── __init__.py
├── activations.py
├── adabound.py
├── autoanchor.py
├── datasets.py
├── evolve.sh
├── gcp.sh
├── general.py
├── google_utils.py
├── layers.py
├── loss.py
├── metrics.py
├── parse_config.py
├── plots.py
├── torch_utils.py
└── utils.py
└── weights
└── put your weights file here.txt
/README.md:
--------------------------------------------------------------------------------
1 | # YOLOv4
2 |
3 | This is PyTorch implementation of [YOLOv4](https://github.com/AlexeyAB/darknet) which is based on [ultralytics/yolov3](https://github.com/ultralytics/yolov3).
4 |
5 | * [[original Darknet implementation of YOLOv4]](https://github.com/AlexeyAB/darknet)
6 |
7 | * [[ultralytics/yolov5 based PyTorch implementation of YOLOv4]](https://github.com/WongKinYiu/PyTorch_YOLOv4/tree/u5).
8 |
9 | ### development log
10 |
11 | Expand
12 |
13 | * `2021-10-31` - support [RS loss](https://arxiv.org/abs/2107.11669), [aLRP loss](https://arxiv.org/abs/2009.13592), [AP loss](https://arxiv.org/abs/2008.07294).
14 | * `2021-10-30` - support [alpha IoU](https://arxiv.org/abs/2110.13675).
15 | * `2021-10-20` - design resolution calibration methods.
16 | * `2021-10-15` - support joint detection, instance segmentation, and semantic segmentation. [`seg-yolo`]()
17 | * `2021-10-13` - design ratio yolo.
18 | * `2021-09-22` - pytorch 1.9 compatibility.
19 | * `2021-09-21` - support [DIM](https://arxiv.org/abs/1808.06670).
20 | * `2021-09-16` - support [Dynamic Head](https://arxiv.org/abs/2106.08322).
21 | * `2021-08-28` - design domain adaptive training.
22 | * `2021-08-22` - design re-balance models.
23 | * `2021-08-21` - support [simOTA](https://arxiv.org/abs/2107.08430).
24 | * `2021-08-14` - design approximation-based methods.
25 | * `2021-07-27` - design new decoders.
26 | * `2021-07-22` - support 1) decoupled head, 2) anchor-free, and 3) multi positives in [yolox](https://arxiv.org/abs/2107.08430).
27 | * `2021-07-10` - design distribution-based implicit modeling.
28 | * `2021-07-06` - support outlooker attention. [`volo`](https://arxiv.org/abs/2106.13112)
29 | * `2021-07-06` - design self emsemble training method.
30 | * `2021-06-23` - design cross multi-stage correlation module.
31 | * `2021-06-18` - design cross stage cross correlation module.
32 | * `2021-06-17` - support cross correlation module. [`ccn`](https://arxiv.org/abs/2010.12138)
33 | * `2021-06-17` - support attention modules. [`cbam`](https://arxiv.org/abs/1807.06521) [`saan`](https://arxiv.org/abs/2010.12138)
34 | * `2021-04-20` - support swin transformer. [`swin`](https://arxiv.org/abs/2103.14030)
35 | * `2021-03-16` - design new stem layers.
36 | * `2021-03-13` - design implicit modeling. [`nn`]() [`mf`]() [`lc`]()
37 | * `2021-01-26` - support vision transformer. [`tr`](https://arxiv.org/abs/2010.11929)
38 | * `2021-01-26` - design mask objectness.
39 | * `2021-01-25` - design rotate augmentation.
40 | * `2021-01-23` - design collage augmentation.
41 | * `2021-01-22` - support [VoVNet](https://arxiv.org/abs/1904.09730), [VoVNetv2](https://arxiv.org/abs/1911.06667).
42 | * `2021-01-22` - support [EIoU](https://arxiv.org/abs/2101.08158).
43 | * `2021-01-19` - support instance segmentation. [`mask-yolo`]()
44 | * `2021-01-17` - support anchor-free-based methods. [`center-yolo`]()
45 | * `2021-01-14` - support joint detection and classification. [`classify-yolo`]()
46 | * `2020-01-02` - design new [PRN](https://github.com/WongKinYiu/PartialResidualNetworks) and [CSP](https://github.com/WongKinYiu/CrossStagePartialNetworks)-based models.
47 | * `2020-12-22` - support transfer learning.
48 | * `2020-12-18` - support non-local series self-attention blocks. [`gc`](https://arxiv.org/abs/1904.11492) [`dnl`](https://arxiv.org/abs/2006.06668)
49 | * `2020-12-16` - support down-sampling blocks in cspnet paper. [`down-c`]() [`down-d`](https://arxiv.org/abs/1812.01187)
50 | * `2020-12-03` - support imitation learning.
51 | * `2020-12-02` - support [squeeze and excitation](https://arxiv.org/abs/1709.01507).
52 | * `2020-11-26` - support multi-class multi-anchor joint detection and embedding.
53 | * `2020-11-25` - support [joint detection and embedding](https://arxiv.org/abs/1909.12605). [`track-yolo`]()
54 | * `2020-11-23` - support teacher-student learning.
55 | * `2020-11-17` - pytorch 1.7 compatibility.
56 | * `2020-11-06` - support inference with initial weights.
57 | * `2020-10-21` - fully supported by darknet.
58 | * `2020-09-18` - design fine-tune methods.
59 | * `2020-08-29` - support [deformable kernel](https://arxiv.org/abs/1910.02940).
60 | * `2020-08-25` - pytorch 1.6 compatibility.
61 | * `2020-08-24` - support channel last training/testing.
62 | * `2020-08-16` - design CSPPRN.
63 | * `2020-08-15` - design deeper model. [`csp-p6-mish`]()
64 | * `2020-08-11` - support [HarDNet](https://arxiv.org/abs/1909.00948). [`hard39-pacsp`]() [`hard68-pacsp`]() [`hard85-pacsp`]()
65 | * `2020-08-10` - add DDP training.
66 | * `2020-08-06` - support [DCN](https://arxiv.org/abs/1703.06211), [DCNv2](https://arxiv.org/abs/1811.11168). [`yolov4-dcn`]()
67 | * `2020-08-01` - add pytorch hub.
68 | * `2020-07-31` - support [ResNet](https://arxiv.org/abs/1512.03385), [ResNeXt](https://arxiv.org/abs/1611.05431), [CSPResNet](https://github.com/WongKinYiu/CrossStagePartialNetworks), [CSPResNeXt](https://github.com/WongKinYiu/CrossStagePartialNetworks). [`r50-pacsp`]() [`x50-pacsp`]() [`cspr50-pacsp`]() [`cspx50-pacsp`]()
69 | * `2020-07-28` - support [SAM](https://arxiv.org/abs/2004.10934). [`yolov4-pacsp-sam`]()
70 | * `2020-07-24` - update api.
71 | * `2020-07-23` - support CUDA accelerated Mish activation function.
72 | * `2020-07-19` - support and training tiny YOLOv4. [`yolov4-tiny`]()
73 | * `2020-07-15` - design and training conditional YOLOv4. [`yolov4-pacsp-conditional`]()
74 | * `2020-07-13` - support [MixUp](https://arxiv.org/abs/1710.09412) data augmentation.
75 | * `2020-07-03` - design new stem layers.
76 | * `2020-06-16` - support floating16 of GPU inference.
77 | * `2020-06-14` - convert .pt to .weights for darknet fine-tuning.
78 | * `2020-06-13` - update multi-scale training strategy.
79 | * `2020-06-12` - design scaled YOLOv4 follow [ultralytics](https://github.com/ultralytics/yolov5). [`yolov4-pacsp-s`]() [`yolov4-pacsp-m`]() [`yolov4-pacsp-l`]() [`yolov4-pacsp-x`]()
80 | * `2020-06-07` - design [scaling methods](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/images/scalingCSP.png) for CSP-based models. [`yolov4-pacsp-25`]() [`yolov4-pacsp-75`]()
81 | * `2020-06-03` - update COCO2014 to COCO2017.
82 | * `2020-05-30` - update FPN neck to CSPFPN. [`yolov4-yocsp`]() [`yolov4-yocsp-mish`]()
83 | * `2020-05-24` - update neck of YOLOv4 to CSPPAN. [`yolov4-pacsp`]() [`yolov4-pacsp-mish`]()
84 | * `2020-05-15` - training YOLOv4 with Mish activation function. [`yolov4-yospp-mish`]() [`yolov4-paspp-mish`]()
85 | * `2020-05-08` - design and training YOLOv4 with [FPN](https://arxiv.org/abs/1612.03144) neck. [`yolov4-yospp`]()
86 | * `2020-05-01` - training YOLOv4 with Leaky activation function using PyTorch. [`yolov4-paspp`]() [`PAN`](https://arxiv.org/abs/1803.01534)
87 |
88 |
89 |
90 | ## Pretrained Models & Comparison
91 |
92 |
93 | | Model | Test Size | APtest | AP50test | AP75test | APStest | APMtest | APLtest | cfg | weights |
94 | | :-- | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: |
95 | | **YOLOv4** | 640 | 50.0% | 68.4% | 54.7% | 30.5% | 54.3% | 63.3% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4.weights) |
96 | | | | | | | | |
97 | | **YOLOv4**pacsp-s | 640 | 39.0% | 57.8% | 42.4% | 20.6% | 42.6% | 50.0% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-csp-s-leaky.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4-pacsp-s.weights) |
98 | | **YOLOv4**pacsp | 640 | 49.8% | 68.4% | 54.3% | 30.1% | 54.0% | 63.4% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-csp-leaky.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4-pacsp.weights) |
99 | | **YOLOv4**pacsp-x | 640 | **52.2%** | **70.5%** | **56.8%** | **32.7%** | **56.3%** | **65.9%** | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-csp-x-leaky.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4-pacsp-x.weights) |
100 | | | | | | | | |
101 | | **YOLOv4**pacsp-s-mish | 640 | 40.8% | 59.5% | 44.3% | 22.4% | 44.6% | 51.8% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-csp-s-mish.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4-pacsp-s-mish.weights) |
102 | | **YOLOv4**pacsp-mish | 640 | 50.9% | 69.4% | 55.5% | 31.2% | 55.0% | 64.7% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-csp-mish.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4-pacsp-mish.weights) |
103 | | **YOLOv4**pacsp-x-mish | 640 | 52.8% | 71.1% | 57.5% | 33.6% | 56.9% | 66.6% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-csp-x-mish.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4-pacsp-x-mish.weights) |
104 |
105 | | Model | Test Size | APval | AP50val | AP75val | APSval | APMval | APLval | cfg | weights |
106 | | :-- | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: |
107 | | **YOLOv4** | 640 | 49.7% | 68.2% | 54.3% | 32.9% | 54.8% | 63.7% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4.weights) |
108 | | | | | | | | |
109 | | **YOLOv4**pacsp-s | 640 | 38.9% | 57.7% | 42.2% | 21.9% | 43.3% | 51.9% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-csp-s-leaky.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4-pacsp-s.weights) |
110 | | **YOLOv4**pacsp | 640 | 49.4% | 68.1% | 53.8% | 32.7% | 54.2% | 64.0% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-csp-leaky.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4-pacsp.weights) |
111 | | **YOLOv4**pacsp-x | 640 | **51.6%** | **70.1%** | **56.2%** | **35.3%** | **56.4%** | **66.9%** | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-csp-x-leaky.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4-pacsp-x.weights) |
112 | | | | | | | | |
113 | | **YOLOv4**pacsp-s-mish | 640 | 40.7% | 59.5% | 44.2% | 25.3% | 45.1% | 53.4% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-csp-s-mish.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4-pacsp-s-mish.weights) |
114 | | **YOLOv4**pacsp-mish | 640 | 50.8% | 69.4% | 55.4% | 34.3% | 55.5% | 65.7% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-csp-mish.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4-pacsp-mish.weights) |
115 | | **YOLOv4**pacsp-x-mish | 640 | 52.6% | 71.0% | 57.2% | 36.4% | 57.3% | 67.6% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-csp-x-mish.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4-pacsp-x-mish.weights) |
116 |
117 | archive
118 |
119 | | Model | Test Size | APval | AP50val | AP75val | APSval | APMval | APLval | cfg | weights |
120 | | :-- | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: |
121 | | **YOLOv4** | 640 | 48.4% | 67.1% | 52.9% | 31.7% | 53.8% | 62.0% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4.cfg) | [weights](https://drive.google.com/file/d/14zPRaYxMOe7hXi6N-Vs_QbWs6ue_CZPd/view?usp=sharing) |
122 | | | | | | | | |
123 | | **YOLOv4**pacsp-s | 640 | 37.0% | 55.7% | 40.0% | 20.2% | 41.6% | 48.4% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-s.cfg) | [weights](https://drive.google.com/file/d/1PiS9pF4tsydPN4-vMjiJPHjIOJMeRwWS/view?usp=sharing) |
124 | | **YOLOv4**pacsp | 640 | 47.7% | 66.4% | 52.0% | 32.3% | 53.0% | 61.7% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp.cfg) | [weights](https://drive.google.com/file/d/1C7xwfYzPF4dKFAmDNCetdTCB_cPvsuwf/view?usp=sharing) |
125 | | **YOLOv4**pacsp-x | 640 | **50.0%** | **68.3%** | **54.5%** | **33.9%** | **55.4%** | **63.7%** | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-x.cfg) | [weights](https://drive.google.com/file/d/1kWzJk5DJNlW9Xf2xR89OfmrEoeY9Szzj/view?usp=sharing) |
126 | | | | | | | | |
127 | | **YOLOv4**pacsp-s-mish | 640 | 38.8% | 57.8% | 42.0% | 21.6% | 43.7% | 51.1% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-s-mish.cfg) | [weights](https://drive.google.com/file/d/1OiDhQqYH23GrP6f5vU2j_DvA8PqL0pcF/view?usp=sharing) |
128 | | **YOLOv4**pacsp-mish | 640 | 48.8% | 67.2% | 53.4% | 31.5% | 54.4% | 62.2% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-mish.cfg) | [weights](https://drive.google.com/file/d/1mk9mkM0_B9e_QgPxF6pBIB6uXDxZENsk/view?usp=sharing) |
129 | | **YOLOv4**pacsp-x-mish | 640 | 51.2% | 69.4% | 55.9% | 35.0% | 56.5% | 65.0% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-x-mish.cfg) | [weights](https://drive.google.com/file/d/1kZee29alFFnm1rlJieAyHzB3Niywew_0/view?usp=sharing) |
130 |
131 | | Model | Test Size | APval | AP50val | AP75val | APSval | APMval | APLval | cfg | weights |
132 | | :-- | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: |
133 | | **YOLOv4** | 672 | 47.7% | 66.7% | 52.1% | 30.5% | 52.6% | 61.4% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4.cfg) | [weights](https://drive.google.com/file/d/137U-oLekAu-J-fe0E_seTblVxnU3tlNC/view?usp=sharing) |
134 | | | | | | | | |
135 | | **YOLOv4**pacsp-s | 672 | 36.6% | 55.5% | 39.6% | 21.2% | 41.1% | 47.0% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-s.cfg) | [weights](https://drive.google.com/file/d/1-QZc043NMNa_O0oLaB3r0XYKFRSktfsd/view?usp=sharing) |
136 | | **YOLOv4**pacsp | 672 | 47.2% | 66.2% | 51.6% | 30.4% | 52.3% | 60.8% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp.cfg) | [weights](https://drive.google.com/file/d/1sIpu29jEBZ3VI_1uy2Q1f3iEzvIpBZbP/view?usp=sharing) |
137 | | **YOLOv4**pacsp-x | 672 | **49.3%** | **68.1%** | **53.6%** | **31.8%** | **54.5%** | **63.6%** | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-x.cfg) | [weights](https://drive.google.com/file/d/1aZRfA2CD9SdIwmscbyp6rXZjGysDvaYv/view?usp=sharing) |
138 | | | | | | | | |
139 | | **YOLOv4**pacsp-s-mish | 672 | 38.6% | 57.7% | 41.8% | 22.3% | 43.5% | 49.3% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-s-mish.cfg) | [weights](https://drive.google.com/file/d/1q0zbQKcSNSf_AxWQv6DAUPXeaTywPqVB/view?usp=sharing) |
140 | | (+BoF) | 640 | 39.9% | 59.1% | 43.1% | 24.4% | 45.2% | 51.4% | | [weights](https://drive.google.com/file/d/1-8PqBaI8oYb7TB9L-KMzvjZcK_VaGXCF/view?usp=sharing) |
141 | | **YOLOv4**pacsp-mish | 672 | 48.1% | 66.9% | 52.3% | 30.8% | 53.4% | 61.7% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-mish.cfg) | [weights](https://drive.google.com/file/d/116yreAUTK_dTJErDuDVX2WTIBcd5YPSI/view?usp=sharing) |
142 | | (+BoF) | 640 | 49.3% | 68.2% | 53.8% | 31.9% | 54.9% | 62.8% | | [weights](https://drive.google.com/file/d/12qRrqDRlUElsR_TI97j4qkrttrNKKG3k/view?usp=sharing) |
143 | | **YOLOv4**pacsp-x-mish | 672 | 50.0% | 68.5% | 54.4% | 32.9% | 54.9% | 64.0% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-x-mish.cfg) | [weights](https://drive.google.com/file/d/1GGCrokkRZ06CZ5MUCVokbX1FF2e1DbPF/view?usp=sharing) |
144 | | (+BoF) | 640 | **51.0%** | **69.7%** | **55.5%** | **33.3%** | **56.2%** | **65.5%** | | [weights](https://drive.google.com/file/d/1lVmSqItSKywg6yk1qiCvgOYw55O03Qgj/view?usp=sharing) |
145 | | | | | | | | |
146 |
147 |
148 |
149 | ## Requirements
150 |
151 | docker (recommanded):
152 | ```
153 | # create the docker container, you can change the share memory size if you have more.
154 | nvidia-docker run --name yolov4 -it -v your_coco_path/:/coco/ -v your_code_path/:/yolo --shm-size=64g nvcr.io/nvidia/pytorch:20.11-py3
155 |
156 | # apt install required packages
157 | apt update
158 | apt install -y zip htop screen libgl1-mesa-glx
159 |
160 | # pip install required packages
161 | pip install seaborn thop
162 |
163 | # install mish-cuda if you want to use mish activation
164 | # https://github.com/thomasbrandon/mish-cuda
165 | # https://github.com/JunnYu/mish-cuda
166 | cd /
167 | git clone https://github.com/JunnYu/mish-cuda
168 | cd mish-cuda
169 | python setup.py build install
170 |
171 | # go to code folder
172 | cd /yolo
173 | ```
174 |
175 | local:
176 | ```
177 | pip install -r requirements.txt
178 | ```
179 | ※ For running Mish models, please install https://github.com/thomasbrandon/mish-cuda
180 |
181 | ## Training
182 |
183 | ```
184 | python train.py --device 0 --batch-size 16 --img 640 640 --data coco.yaml --cfg cfg/yolov4-pacsp.cfg --weights '' --name yolov4-pacsp
185 | ```
186 |
187 | ## Testing
188 |
189 | ```
190 | python test.py --img 640 --conf 0.001 --batch 8 --device 0 --data coco.yaml --cfg cfg/yolov4-pacsp.cfg --weights weights/yolov4-pacsp.pt
191 | ```
192 |
193 | ## Citation
194 |
195 | ```
196 | @article{bochkovskiy2020yolov4,
197 | title={{YOLOv4}: Optimal Speed and Accuracy of Object Detection},
198 | author={Bochkovskiy, Alexey and Wang, Chien-Yao and Liao, Hong-Yuan Mark},
199 | journal={arXiv preprint arXiv:2004.10934},
200 | year={2020}
201 | }
202 | ```
203 |
204 | ```
205 | @inproceedings{wang2020cspnet,
206 | title={{CSPNet}: A New Backbone That Can Enhance Learning Capability of {CNN}},
207 | author={Wang, Chien-Yao and Mark Liao, Hong-Yuan and Wu, Yueh-Hua and Chen, Ping-Yang and Hsieh, Jun-Wei and Yeh, I-Hau},
208 | booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
209 | pages={390--391},
210 | year={2020}
211 | }
212 | ```
213 |
214 | ## Acknowledgements
215 |
216 | * [https://github.com/AlexeyAB/darknet](https://github.com/AlexeyAB/darknet)
217 | * [https://github.com/ultralytics/yolov3](https://github.com/ultralytics/yolov3)
218 | * [https://github.com/ultralytics/yolov5](https://github.com/ultralytics/yolov5)
219 |
--------------------------------------------------------------------------------
/cfg/yolov4-csp-s-leaky.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | #batch=1
4 | #subdivisions=1
5 | # Training
6 | batch=64
7 | subdivisions=8
8 | width=512
9 | height=512
10 | channels=3
11 | momentum=0.949
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.00261
19 | burn_in=1000
20 | max_batches = 500500
21 | policy=steps
22 | steps=400000,450000
23 | scales=.1,.1
24 |
25 | #cutmix=1
26 | mosaic=1
27 |
28 |
29 | # ============ Backbone ============ #
30 |
31 | # Stem
32 |
33 | # 0
34 | [convolutional]
35 | batch_normalize=1
36 | filters=32
37 | size=3
38 | stride=1
39 | pad=1
40 | activation=leaky
41 |
42 | # P1
43 |
44 | # Downsample
45 |
46 | [convolutional]
47 | batch_normalize=1
48 | filters=32
49 | size=3
50 | stride=2
51 | pad=1
52 | activation=leaky
53 |
54 | # Residual Block
55 |
56 | [convolutional]
57 | batch_normalize=1
58 | filters=32
59 | size=1
60 | stride=1
61 | pad=1
62 | activation=leaky
63 |
64 | [convolutional]
65 | batch_normalize=1
66 | filters=32
67 | size=3
68 | stride=1
69 | pad=1
70 | activation=leaky
71 |
72 | # 4 (previous+1+3k)
73 | [shortcut]
74 | from=-3
75 | activation=linear
76 |
77 | # P2
78 |
79 | # Downsample
80 |
81 | [convolutional]
82 | batch_normalize=1
83 | filters=64
84 | size=3
85 | stride=2
86 | pad=1
87 | activation=leaky
88 |
89 | # Split
90 |
91 | [convolutional]
92 | batch_normalize=1
93 | filters=32
94 | size=1
95 | stride=1
96 | pad=1
97 | activation=leaky
98 |
99 | [route]
100 | layers = -2
101 |
102 | [convolutional]
103 | batch_normalize=1
104 | filters=32
105 | size=1
106 | stride=1
107 | pad=1
108 | activation=leaky
109 |
110 | # Residual Block
111 |
112 | [convolutional]
113 | batch_normalize=1
114 | filters=32
115 | size=1
116 | stride=1
117 | pad=1
118 | activation=leaky
119 |
120 | [convolutional]
121 | batch_normalize=1
122 | filters=32
123 | size=3
124 | stride=1
125 | pad=1
126 | activation=leaky
127 |
128 | [shortcut]
129 | from=-3
130 | activation=linear
131 |
132 | # Transition first
133 |
134 | [convolutional]
135 | batch_normalize=1
136 | filters=32
137 | size=1
138 | stride=1
139 | pad=1
140 | activation=leaky
141 |
142 | # Merge [-1, -(3k+4)]
143 |
144 | [route]
145 | layers = -1,-7
146 |
147 | # Transition last
148 |
149 | # 14 (previous+7+3k)
150 | [convolutional]
151 | batch_normalize=1
152 | filters=64
153 | size=1
154 | stride=1
155 | pad=1
156 | activation=leaky
157 |
158 | # P3
159 |
160 | # Downsample
161 |
162 | [convolutional]
163 | batch_normalize=1
164 | filters=128
165 | size=3
166 | stride=2
167 | pad=1
168 | activation=leaky
169 |
170 | # Split
171 |
172 | [convolutional]
173 | batch_normalize=1
174 | filters=64
175 | size=1
176 | stride=1
177 | pad=1
178 | activation=leaky
179 |
180 | [route]
181 | layers = -2
182 |
183 | [convolutional]
184 | batch_normalize=1
185 | filters=64
186 | size=1
187 | stride=1
188 | pad=1
189 | activation=leaky
190 |
191 | # Residual Block
192 |
193 | [convolutional]
194 | batch_normalize=1
195 | filters=64
196 | size=1
197 | stride=1
198 | pad=1
199 | activation=leaky
200 |
201 | [convolutional]
202 | batch_normalize=1
203 | filters=64
204 | size=3
205 | stride=1
206 | pad=1
207 | activation=leaky
208 |
209 | [shortcut]
210 | from=-3
211 | activation=linear
212 |
213 | # Transition first
214 |
215 | [convolutional]
216 | batch_normalize=1
217 | filters=64
218 | size=1
219 | stride=1
220 | pad=1
221 | activation=leaky
222 |
223 | # Merge [-1 -(4+3k)]
224 |
225 | [route]
226 | layers = -1,-7
227 |
228 | # Transition last
229 |
230 | # 24 (previous+7+3k)
231 | [convolutional]
232 | batch_normalize=1
233 | filters=128
234 | size=1
235 | stride=1
236 | pad=1
237 | activation=leaky
238 |
239 | # P4
240 |
241 | # Downsample
242 |
243 | [convolutional]
244 | batch_normalize=1
245 | filters=256
246 | size=3
247 | stride=2
248 | pad=1
249 | activation=leaky
250 |
251 | # Split
252 |
253 | [convolutional]
254 | batch_normalize=1
255 | filters=128
256 | size=1
257 | stride=1
258 | pad=1
259 | activation=leaky
260 |
261 | [route]
262 | layers = -2
263 |
264 | [convolutional]
265 | batch_normalize=1
266 | filters=128
267 | size=1
268 | stride=1
269 | pad=1
270 | activation=leaky
271 |
272 | # Residual Block
273 |
274 | [convolutional]
275 | batch_normalize=1
276 | filters=128
277 | size=1
278 | stride=1
279 | pad=1
280 | activation=leaky
281 |
282 | [convolutional]
283 | batch_normalize=1
284 | filters=128
285 | size=3
286 | stride=1
287 | pad=1
288 | activation=leaky
289 |
290 | [shortcut]
291 | from=-3
292 | activation=linear
293 |
294 | # Transition first
295 |
296 | [convolutional]
297 | batch_normalize=1
298 | filters=128
299 | size=1
300 | stride=1
301 | pad=1
302 | activation=leaky
303 |
304 | # Merge [-1 -(3k+4)]
305 |
306 | [route]
307 | layers = -1,-7
308 |
309 | # Transition last
310 |
311 | # 34 (previous+7+3k)
312 | [convolutional]
313 | batch_normalize=1
314 | filters=256
315 | size=1
316 | stride=1
317 | pad=1
318 | activation=leaky
319 |
320 | # P5
321 |
322 | # Downsample
323 |
324 | [convolutional]
325 | batch_normalize=1
326 | filters=512
327 | size=3
328 | stride=2
329 | pad=1
330 | activation=leaky
331 |
332 | # Split
333 |
334 | [convolutional]
335 | batch_normalize=1
336 | filters=256
337 | size=1
338 | stride=1
339 | pad=1
340 | activation=leaky
341 |
342 | [route]
343 | layers = -2
344 |
345 | [convolutional]
346 | batch_normalize=1
347 | filters=256
348 | size=1
349 | stride=1
350 | pad=1
351 | activation=leaky
352 |
353 | # Residual Block
354 |
355 | [convolutional]
356 | batch_normalize=1
357 | filters=256
358 | size=1
359 | stride=1
360 | pad=1
361 | activation=leaky
362 |
363 | [convolutional]
364 | batch_normalize=1
365 | filters=256
366 | size=3
367 | stride=1
368 | pad=1
369 | activation=leaky
370 |
371 | [shortcut]
372 | from=-3
373 | activation=linear
374 |
375 | # Transition first
376 |
377 | [convolutional]
378 | batch_normalize=1
379 | filters=256
380 | size=1
381 | stride=1
382 | pad=1
383 | activation=leaky
384 |
385 | # Merge [-1 -(3k+4)]
386 |
387 | [route]
388 | layers = -1,-7
389 |
390 | # Transition last
391 |
392 | # 44 (previous+7+3k)
393 | [convolutional]
394 | batch_normalize=1
395 | filters=512
396 | size=1
397 | stride=1
398 | pad=1
399 | activation=leaky
400 |
401 | # ============ End of Backbone ============ #
402 |
403 | # ============ Neck ============ #
404 |
405 | # CSPSPP
406 |
407 | [convolutional]
408 | batch_normalize=1
409 | filters=256
410 | size=1
411 | stride=1
412 | pad=1
413 | activation=leaky
414 |
415 | [route]
416 | layers = -2
417 |
418 | [convolutional]
419 | batch_normalize=1
420 | filters=256
421 | size=1
422 | stride=1
423 | pad=1
424 | activation=leaky
425 |
426 | ### SPP ###
427 | [maxpool]
428 | stride=1
429 | size=5
430 |
431 | [route]
432 | layers=-2
433 |
434 | [maxpool]
435 | stride=1
436 | size=9
437 |
438 | [route]
439 | layers=-4
440 |
441 | [maxpool]
442 | stride=1
443 | size=13
444 |
445 | [route]
446 | layers=-1,-3,-5,-6
447 | ### End SPP ###
448 |
449 | [convolutional]
450 | batch_normalize=1
451 | filters=256
452 | size=1
453 | stride=1
454 | pad=1
455 | activation=leaky
456 |
457 | [convolutional]
458 | batch_normalize=1
459 | size=3
460 | stride=1
461 | pad=1
462 | filters=256
463 | activation=leaky
464 |
465 | [route]
466 | layers = -1, -11
467 |
468 | # 57 (previous+6+5+2k)
469 | [convolutional]
470 | batch_normalize=1
471 | filters=256
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=leaky
476 |
477 | # End of CSPSPP
478 |
479 |
480 | # FPN-4
481 |
482 | [convolutional]
483 | batch_normalize=1
484 | filters=128
485 | size=1
486 | stride=1
487 | pad=1
488 | activation=leaky
489 |
490 | [upsample]
491 | stride=2
492 |
493 | [route]
494 | layers = 34
495 |
496 | [convolutional]
497 | batch_normalize=1
498 | filters=128
499 | size=1
500 | stride=1
501 | pad=1
502 | activation=leaky
503 |
504 | [route]
505 | layers = -1, -3
506 |
507 | [convolutional]
508 | batch_normalize=1
509 | filters=128
510 | size=1
511 | stride=1
512 | pad=1
513 | activation=leaky
514 |
515 | # Split
516 |
517 | [convolutional]
518 | batch_normalize=1
519 | filters=128
520 | size=1
521 | stride=1
522 | pad=1
523 | activation=leaky
524 |
525 | [route]
526 | layers = -2
527 |
528 | # Plain Block
529 |
530 | [convolutional]
531 | batch_normalize=1
532 | filters=128
533 | size=1
534 | stride=1
535 | pad=1
536 | activation=leaky
537 |
538 | [convolutional]
539 | batch_normalize=1
540 | size=3
541 | stride=1
542 | pad=1
543 | filters=128
544 | activation=leaky
545 |
546 | # Merge [-1, -(2k+2)]
547 |
548 | [route]
549 | layers = -1, -4
550 |
551 | # Transition last
552 |
553 | # 69 (previous+6+4+2k)
554 | [convolutional]
555 | batch_normalize=1
556 | filters=128
557 | size=1
558 | stride=1
559 | pad=1
560 | activation=leaky
561 |
562 |
563 | # FPN-3
564 |
565 | [convolutional]
566 | batch_normalize=1
567 | filters=64
568 | size=1
569 | stride=1
570 | pad=1
571 | activation=leaky
572 |
573 | [upsample]
574 | stride=2
575 |
576 | [route]
577 | layers = 24
578 |
579 | [convolutional]
580 | batch_normalize=1
581 | filters=64
582 | size=1
583 | stride=1
584 | pad=1
585 | activation=leaky
586 |
587 | [route]
588 | layers = -1, -3
589 |
590 | [convolutional]
591 | batch_normalize=1
592 | filters=64
593 | size=1
594 | stride=1
595 | pad=1
596 | activation=leaky
597 |
598 | # Split
599 |
600 | [convolutional]
601 | batch_normalize=1
602 | filters=64
603 | size=1
604 | stride=1
605 | pad=1
606 | activation=leaky
607 |
608 | [route]
609 | layers = -2
610 |
611 | # Plain Block
612 |
613 | [convolutional]
614 | batch_normalize=1
615 | filters=64
616 | size=1
617 | stride=1
618 | pad=1
619 | activation=leaky
620 |
621 | [convolutional]
622 | batch_normalize=1
623 | size=3
624 | stride=1
625 | pad=1
626 | filters=64
627 | activation=leaky
628 |
629 | # Merge [-1, -(2k+2)]
630 |
631 | [route]
632 | layers = -1, -4
633 |
634 | # Transition last
635 |
636 | # 81 (previous+6+4+2k)
637 | [convolutional]
638 | batch_normalize=1
639 | filters=64
640 | size=1
641 | stride=1
642 | pad=1
643 | activation=leaky
644 |
645 |
646 | # PAN-4
647 |
648 | [convolutional]
649 | batch_normalize=1
650 | size=3
651 | stride=2
652 | pad=1
653 | filters=128
654 | activation=leaky
655 |
656 | [route]
657 | layers = -1, 69
658 |
659 | [convolutional]
660 | batch_normalize=1
661 | filters=128
662 | size=1
663 | stride=1
664 | pad=1
665 | activation=leaky
666 |
667 | # Split
668 |
669 | [convolutional]
670 | batch_normalize=1
671 | filters=128
672 | size=1
673 | stride=1
674 | pad=1
675 | activation=leaky
676 |
677 | [route]
678 | layers = -2
679 |
680 | # Plain Block
681 |
682 | [convolutional]
683 | batch_normalize=1
684 | filters=128
685 | size=1
686 | stride=1
687 | pad=1
688 | activation=leaky
689 |
690 | [convolutional]
691 | batch_normalize=1
692 | size=3
693 | stride=1
694 | pad=1
695 | filters=128
696 | activation=leaky
697 |
698 | [route]
699 | layers = -1,-4
700 |
701 | # Transition last
702 |
703 | # 90 (previous+3+4+2k)
704 | [convolutional]
705 | batch_normalize=1
706 | filters=128
707 | size=1
708 | stride=1
709 | pad=1
710 | activation=leaky
711 |
712 |
713 | # PAN-5
714 |
715 | [convolutional]
716 | batch_normalize=1
717 | size=3
718 | stride=2
719 | pad=1
720 | filters=256
721 | activation=leaky
722 |
723 | [route]
724 | layers = -1, 57
725 |
726 | [convolutional]
727 | batch_normalize=1
728 | filters=256
729 | size=1
730 | stride=1
731 | pad=1
732 | activation=leaky
733 |
734 | # Split
735 |
736 | [convolutional]
737 | batch_normalize=1
738 | filters=256
739 | size=1
740 | stride=1
741 | pad=1
742 | activation=leaky
743 |
744 | [route]
745 | layers = -2
746 |
747 | # Plain Block
748 |
749 | [convolutional]
750 | batch_normalize=1
751 | filters=256
752 | size=1
753 | stride=1
754 | pad=1
755 | activation=leaky
756 |
757 | [convolutional]
758 | batch_normalize=1
759 | size=3
760 | stride=1
761 | pad=1
762 | filters=256
763 | activation=leaky
764 |
765 | [route]
766 | layers = -1,-4
767 |
768 | # Transition last
769 |
770 | # 99 (previous+3+4+2k)
771 | [convolutional]
772 | batch_normalize=1
773 | filters=256
774 | size=1
775 | stride=1
776 | pad=1
777 | activation=leaky
778 |
779 | # ============ End of Neck ============ #
780 |
781 | # ============ Head ============ #
782 |
783 | # YOLO-3
784 |
785 | [route]
786 | layers = 81
787 |
788 | [convolutional]
789 | batch_normalize=1
790 | size=3
791 | stride=1
792 | pad=1
793 | filters=128
794 | activation=leaky
795 |
796 | [convolutional]
797 | size=1
798 | stride=1
799 | pad=1
800 | filters=255
801 | activation=linear
802 |
803 | [yolo]
804 | mask = 0,1,2
805 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
806 | classes=80
807 | num=9
808 | jitter=.3
809 | ignore_thresh = .7
810 | truth_thresh = 1
811 | random=1
812 | scale_x_y = 1.05
813 | iou_thresh=0.213
814 | cls_normalizer=1.0
815 | iou_normalizer=0.07
816 | iou_loss=ciou
817 | nms_kind=greedynms
818 | beta_nms=0.6
819 |
820 |
821 | # YOLO-4
822 |
823 | [route]
824 | layers = 90
825 |
826 | [convolutional]
827 | batch_normalize=1
828 | size=3
829 | stride=1
830 | pad=1
831 | filters=256
832 | activation=leaky
833 |
834 | [convolutional]
835 | size=1
836 | stride=1
837 | pad=1
838 | filters=255
839 | activation=linear
840 |
841 | [yolo]
842 | mask = 3,4,5
843 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
844 | classes=80
845 | num=9
846 | jitter=.3
847 | ignore_thresh = .7
848 | truth_thresh = 1
849 | random=1
850 | scale_x_y = 1.05
851 | iou_thresh=0.213
852 | cls_normalizer=1.0
853 | iou_normalizer=0.07
854 | iou_loss=ciou
855 | nms_kind=greedynms
856 | beta_nms=0.6
857 |
858 |
859 | # YOLO-5
860 |
861 | [route]
862 | layers = 99
863 |
864 | [convolutional]
865 | batch_normalize=1
866 | size=3
867 | stride=1
868 | pad=1
869 | filters=512
870 | activation=leaky
871 |
872 | [convolutional]
873 | size=1
874 | stride=1
875 | pad=1
876 | filters=255
877 | activation=linear
878 |
879 | [yolo]
880 | mask = 6,7,8
881 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
882 | classes=80
883 | num=9
884 | jitter=.3
885 | ignore_thresh = .7
886 | truth_thresh = 1
887 | random=1
888 | scale_x_y = 1.05
889 | iou_thresh=0.213
890 | cls_normalizer=1.0
891 | iou_normalizer=0.07
892 | iou_loss=ciou
893 | nms_kind=greedynms
894 | beta_nms=0.6
--------------------------------------------------------------------------------
/cfg/yolov4-csp-s-mish.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | #batch=1
4 | #subdivisions=1
5 | # Training
6 | batch=64
7 | subdivisions=8
8 | width=512
9 | height=512
10 | channels=3
11 | momentum=0.949
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.00261
19 | burn_in=1000
20 | max_batches = 500500
21 | policy=steps
22 | steps=400000,450000
23 | scales=.1,.1
24 |
25 | #cutmix=1
26 | mosaic=1
27 |
28 |
29 | # ============ Backbone ============ #
30 |
31 | # Stem
32 |
33 | # 0
34 | [convolutional]
35 | batch_normalize=1
36 | filters=32
37 | size=3
38 | stride=1
39 | pad=1
40 | activation=mish
41 |
42 | # P1
43 |
44 | # Downsample
45 |
46 | [convolutional]
47 | batch_normalize=1
48 | filters=32
49 | size=3
50 | stride=2
51 | pad=1
52 | activation=mish
53 |
54 | # Residual Block
55 |
56 | [convolutional]
57 | batch_normalize=1
58 | filters=32
59 | size=1
60 | stride=1
61 | pad=1
62 | activation=mish
63 |
64 | [convolutional]
65 | batch_normalize=1
66 | filters=32
67 | size=3
68 | stride=1
69 | pad=1
70 | activation=mish
71 |
72 | # 4 (previous+1+3k)
73 | [shortcut]
74 | from=-3
75 | activation=linear
76 |
77 | # P2
78 |
79 | # Downsample
80 |
81 | [convolutional]
82 | batch_normalize=1
83 | filters=64
84 | size=3
85 | stride=2
86 | pad=1
87 | activation=mish
88 |
89 | # Split
90 |
91 | [convolutional]
92 | batch_normalize=1
93 | filters=32
94 | size=1
95 | stride=1
96 | pad=1
97 | activation=mish
98 |
99 | [route]
100 | layers = -2
101 |
102 | [convolutional]
103 | batch_normalize=1
104 | filters=32
105 | size=1
106 | stride=1
107 | pad=1
108 | activation=mish
109 |
110 | # Residual Block
111 |
112 | [convolutional]
113 | batch_normalize=1
114 | filters=32
115 | size=1
116 | stride=1
117 | pad=1
118 | activation=mish
119 |
120 | [convolutional]
121 | batch_normalize=1
122 | filters=32
123 | size=3
124 | stride=1
125 | pad=1
126 | activation=mish
127 |
128 | [shortcut]
129 | from=-3
130 | activation=linear
131 |
132 | # Transition first
133 |
134 | [convolutional]
135 | batch_normalize=1
136 | filters=32
137 | size=1
138 | stride=1
139 | pad=1
140 | activation=mish
141 |
142 | # Merge [-1, -(3k+4)]
143 |
144 | [route]
145 | layers = -1,-7
146 |
147 | # Transition last
148 |
149 | # 14 (previous+7+3k)
150 | [convolutional]
151 | batch_normalize=1
152 | filters=64
153 | size=1
154 | stride=1
155 | pad=1
156 | activation=mish
157 |
158 | # P3
159 |
160 | # Downsample
161 |
162 | [convolutional]
163 | batch_normalize=1
164 | filters=128
165 | size=3
166 | stride=2
167 | pad=1
168 | activation=mish
169 |
170 | # Split
171 |
172 | [convolutional]
173 | batch_normalize=1
174 | filters=64
175 | size=1
176 | stride=1
177 | pad=1
178 | activation=mish
179 |
180 | [route]
181 | layers = -2
182 |
183 | [convolutional]
184 | batch_normalize=1
185 | filters=64
186 | size=1
187 | stride=1
188 | pad=1
189 | activation=mish
190 |
191 | # Residual Block
192 |
193 | [convolutional]
194 | batch_normalize=1
195 | filters=64
196 | size=1
197 | stride=1
198 | pad=1
199 | activation=mish
200 |
201 | [convolutional]
202 | batch_normalize=1
203 | filters=64
204 | size=3
205 | stride=1
206 | pad=1
207 | activation=mish
208 |
209 | [shortcut]
210 | from=-3
211 | activation=linear
212 |
213 | # Transition first
214 |
215 | [convolutional]
216 | batch_normalize=1
217 | filters=64
218 | size=1
219 | stride=1
220 | pad=1
221 | activation=mish
222 |
223 | # Merge [-1 -(4+3k)]
224 |
225 | [route]
226 | layers = -1,-7
227 |
228 | # Transition last
229 |
230 | # 24 (previous+7+3k)
231 | [convolutional]
232 | batch_normalize=1
233 | filters=128
234 | size=1
235 | stride=1
236 | pad=1
237 | activation=mish
238 |
239 | # P4
240 |
241 | # Downsample
242 |
243 | [convolutional]
244 | batch_normalize=1
245 | filters=256
246 | size=3
247 | stride=2
248 | pad=1
249 | activation=mish
250 |
251 | # Split
252 |
253 | [convolutional]
254 | batch_normalize=1
255 | filters=128
256 | size=1
257 | stride=1
258 | pad=1
259 | activation=mish
260 |
261 | [route]
262 | layers = -2
263 |
264 | [convolutional]
265 | batch_normalize=1
266 | filters=128
267 | size=1
268 | stride=1
269 | pad=1
270 | activation=mish
271 |
272 | # Residual Block
273 |
274 | [convolutional]
275 | batch_normalize=1
276 | filters=128
277 | size=1
278 | stride=1
279 | pad=1
280 | activation=mish
281 |
282 | [convolutional]
283 | batch_normalize=1
284 | filters=128
285 | size=3
286 | stride=1
287 | pad=1
288 | activation=mish
289 |
290 | [shortcut]
291 | from=-3
292 | activation=linear
293 |
294 | # Transition first
295 |
296 | [convolutional]
297 | batch_normalize=1
298 | filters=128
299 | size=1
300 | stride=1
301 | pad=1
302 | activation=mish
303 |
304 | # Merge [-1 -(3k+4)]
305 |
306 | [route]
307 | layers = -1,-7
308 |
309 | # Transition last
310 |
311 | # 34 (previous+7+3k)
312 | [convolutional]
313 | batch_normalize=1
314 | filters=256
315 | size=1
316 | stride=1
317 | pad=1
318 | activation=mish
319 |
320 | # P5
321 |
322 | # Downsample
323 |
324 | [convolutional]
325 | batch_normalize=1
326 | filters=512
327 | size=3
328 | stride=2
329 | pad=1
330 | activation=mish
331 |
332 | # Split
333 |
334 | [convolutional]
335 | batch_normalize=1
336 | filters=256
337 | size=1
338 | stride=1
339 | pad=1
340 | activation=mish
341 |
342 | [route]
343 | layers = -2
344 |
345 | [convolutional]
346 | batch_normalize=1
347 | filters=256
348 | size=1
349 | stride=1
350 | pad=1
351 | activation=mish
352 |
353 | # Residual Block
354 |
355 | [convolutional]
356 | batch_normalize=1
357 | filters=256
358 | size=1
359 | stride=1
360 | pad=1
361 | activation=mish
362 |
363 | [convolutional]
364 | batch_normalize=1
365 | filters=256
366 | size=3
367 | stride=1
368 | pad=1
369 | activation=mish
370 |
371 | [shortcut]
372 | from=-3
373 | activation=linear
374 |
375 | # Transition first
376 |
377 | [convolutional]
378 | batch_normalize=1
379 | filters=256
380 | size=1
381 | stride=1
382 | pad=1
383 | activation=mish
384 |
385 | # Merge [-1 -(3k+4)]
386 |
387 | [route]
388 | layers = -1,-7
389 |
390 | # Transition last
391 |
392 | # 44 (previous+7+3k)
393 | [convolutional]
394 | batch_normalize=1
395 | filters=512
396 | size=1
397 | stride=1
398 | pad=1
399 | activation=mish
400 |
401 | # ============ End of Backbone ============ #
402 |
403 | # ============ Neck ============ #
404 |
405 | # CSPSPP
406 |
407 | [convolutional]
408 | batch_normalize=1
409 | filters=256
410 | size=1
411 | stride=1
412 | pad=1
413 | activation=mish
414 |
415 | [route]
416 | layers = -2
417 |
418 | [convolutional]
419 | batch_normalize=1
420 | filters=256
421 | size=1
422 | stride=1
423 | pad=1
424 | activation=mish
425 |
426 | ### SPP ###
427 | [maxpool]
428 | stride=1
429 | size=5
430 |
431 | [route]
432 | layers=-2
433 |
434 | [maxpool]
435 | stride=1
436 | size=9
437 |
438 | [route]
439 | layers=-4
440 |
441 | [maxpool]
442 | stride=1
443 | size=13
444 |
445 | [route]
446 | layers=-1,-3,-5,-6
447 | ### End SPP ###
448 |
449 | [convolutional]
450 | batch_normalize=1
451 | filters=256
452 | size=1
453 | stride=1
454 | pad=1
455 | activation=mish
456 |
457 | [convolutional]
458 | batch_normalize=1
459 | size=3
460 | stride=1
461 | pad=1
462 | filters=256
463 | activation=mish
464 |
465 | [route]
466 | layers = -1, -11
467 |
468 | # 57 (previous+6+5+2k)
469 | [convolutional]
470 | batch_normalize=1
471 | filters=256
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=mish
476 |
477 | # End of CSPSPP
478 |
479 |
480 | # FPN-4
481 |
482 | [convolutional]
483 | batch_normalize=1
484 | filters=128
485 | size=1
486 | stride=1
487 | pad=1
488 | activation=mish
489 |
490 | [upsample]
491 | stride=2
492 |
493 | [route]
494 | layers = 34
495 |
496 | [convolutional]
497 | batch_normalize=1
498 | filters=128
499 | size=1
500 | stride=1
501 | pad=1
502 | activation=mish
503 |
504 | [route]
505 | layers = -1, -3
506 |
507 | [convolutional]
508 | batch_normalize=1
509 | filters=128
510 | size=1
511 | stride=1
512 | pad=1
513 | activation=mish
514 |
515 | # Split
516 |
517 | [convolutional]
518 | batch_normalize=1
519 | filters=128
520 | size=1
521 | stride=1
522 | pad=1
523 | activation=mish
524 |
525 | [route]
526 | layers = -2
527 |
528 | # Plain Block
529 |
530 | [convolutional]
531 | batch_normalize=1
532 | filters=128
533 | size=1
534 | stride=1
535 | pad=1
536 | activation=mish
537 |
538 | [convolutional]
539 | batch_normalize=1
540 | size=3
541 | stride=1
542 | pad=1
543 | filters=128
544 | activation=mish
545 |
546 | # Merge [-1, -(2k+2)]
547 |
548 | [route]
549 | layers = -1, -4
550 |
551 | # Transition last
552 |
553 | # 69 (previous+6+4+2k)
554 | [convolutional]
555 | batch_normalize=1
556 | filters=128
557 | size=1
558 | stride=1
559 | pad=1
560 | activation=mish
561 |
562 |
563 | # FPN-3
564 |
565 | [convolutional]
566 | batch_normalize=1
567 | filters=64
568 | size=1
569 | stride=1
570 | pad=1
571 | activation=mish
572 |
573 | [upsample]
574 | stride=2
575 |
576 | [route]
577 | layers = 24
578 |
579 | [convolutional]
580 | batch_normalize=1
581 | filters=64
582 | size=1
583 | stride=1
584 | pad=1
585 | activation=mish
586 |
587 | [route]
588 | layers = -1, -3
589 |
590 | [convolutional]
591 | batch_normalize=1
592 | filters=64
593 | size=1
594 | stride=1
595 | pad=1
596 | activation=mish
597 |
598 | # Split
599 |
600 | [convolutional]
601 | batch_normalize=1
602 | filters=64
603 | size=1
604 | stride=1
605 | pad=1
606 | activation=mish
607 |
608 | [route]
609 | layers = -2
610 |
611 | # Plain Block
612 |
613 | [convolutional]
614 | batch_normalize=1
615 | filters=64
616 | size=1
617 | stride=1
618 | pad=1
619 | activation=mish
620 |
621 | [convolutional]
622 | batch_normalize=1
623 | size=3
624 | stride=1
625 | pad=1
626 | filters=64
627 | activation=mish
628 |
629 | # Merge [-1, -(2k+2)]
630 |
631 | [route]
632 | layers = -1, -4
633 |
634 | # Transition last
635 |
636 | # 81 (previous+6+4+2k)
637 | [convolutional]
638 | batch_normalize=1
639 | filters=64
640 | size=1
641 | stride=1
642 | pad=1
643 | activation=mish
644 |
645 |
646 | # PAN-4
647 |
648 | [convolutional]
649 | batch_normalize=1
650 | size=3
651 | stride=2
652 | pad=1
653 | filters=128
654 | activation=mish
655 |
656 | [route]
657 | layers = -1, 69
658 |
659 | [convolutional]
660 | batch_normalize=1
661 | filters=128
662 | size=1
663 | stride=1
664 | pad=1
665 | activation=mish
666 |
667 | # Split
668 |
669 | [convolutional]
670 | batch_normalize=1
671 | filters=128
672 | size=1
673 | stride=1
674 | pad=1
675 | activation=mish
676 |
677 | [route]
678 | layers = -2
679 |
680 | # Plain Block
681 |
682 | [convolutional]
683 | batch_normalize=1
684 | filters=128
685 | size=1
686 | stride=1
687 | pad=1
688 | activation=mish
689 |
690 | [convolutional]
691 | batch_normalize=1
692 | size=3
693 | stride=1
694 | pad=1
695 | filters=128
696 | activation=mish
697 |
698 | [route]
699 | layers = -1,-4
700 |
701 | # Transition last
702 |
703 | # 90 (previous+3+4+2k)
704 | [convolutional]
705 | batch_normalize=1
706 | filters=128
707 | size=1
708 | stride=1
709 | pad=1
710 | activation=mish
711 |
712 |
713 | # PAN-5
714 |
715 | [convolutional]
716 | batch_normalize=1
717 | size=3
718 | stride=2
719 | pad=1
720 | filters=256
721 | activation=mish
722 |
723 | [route]
724 | layers = -1, 57
725 |
726 | [convolutional]
727 | batch_normalize=1
728 | filters=256
729 | size=1
730 | stride=1
731 | pad=1
732 | activation=mish
733 |
734 | # Split
735 |
736 | [convolutional]
737 | batch_normalize=1
738 | filters=256
739 | size=1
740 | stride=1
741 | pad=1
742 | activation=mish
743 |
744 | [route]
745 | layers = -2
746 |
747 | # Plain Block
748 |
749 | [convolutional]
750 | batch_normalize=1
751 | filters=256
752 | size=1
753 | stride=1
754 | pad=1
755 | activation=mish
756 |
757 | [convolutional]
758 | batch_normalize=1
759 | size=3
760 | stride=1
761 | pad=1
762 | filters=256
763 | activation=mish
764 |
765 | [route]
766 | layers = -1,-4
767 |
768 | # Transition last
769 |
770 | # 99 (previous+3+4+2k)
771 | [convolutional]
772 | batch_normalize=1
773 | filters=256
774 | size=1
775 | stride=1
776 | pad=1
777 | activation=mish
778 |
779 | # ============ End of Neck ============ #
780 |
781 | # ============ Head ============ #
782 |
783 | # YOLO-3
784 |
785 | [route]
786 | layers = 81
787 |
788 | [convolutional]
789 | batch_normalize=1
790 | size=3
791 | stride=1
792 | pad=1
793 | filters=128
794 | activation=mish
795 |
796 | [convolutional]
797 | size=1
798 | stride=1
799 | pad=1
800 | filters=255
801 | activation=linear
802 |
803 | [yolo]
804 | mask = 0,1,2
805 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
806 | classes=80
807 | num=9
808 | jitter=.3
809 | ignore_thresh = .7
810 | truth_thresh = 1
811 | random=1
812 | scale_x_y = 1.05
813 | iou_thresh=0.213
814 | cls_normalizer=1.0
815 | iou_normalizer=0.07
816 | iou_loss=ciou
817 | nms_kind=greedynms
818 | beta_nms=0.6
819 |
820 |
821 | # YOLO-4
822 |
823 | [route]
824 | layers = 90
825 |
826 | [convolutional]
827 | batch_normalize=1
828 | size=3
829 | stride=1
830 | pad=1
831 | filters=256
832 | activation=mish
833 |
834 | [convolutional]
835 | size=1
836 | stride=1
837 | pad=1
838 | filters=255
839 | activation=linear
840 |
841 | [yolo]
842 | mask = 3,4,5
843 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
844 | classes=80
845 | num=9
846 | jitter=.3
847 | ignore_thresh = .7
848 | truth_thresh = 1
849 | random=1
850 | scale_x_y = 1.05
851 | iou_thresh=0.213
852 | cls_normalizer=1.0
853 | iou_normalizer=0.07
854 | iou_loss=ciou
855 | nms_kind=greedynms
856 | beta_nms=0.6
857 |
858 |
859 | # YOLO-5
860 |
861 | [route]
862 | layers = 99
863 |
864 | [convolutional]
865 | batch_normalize=1
866 | size=3
867 | stride=1
868 | pad=1
869 | filters=512
870 | activation=mish
871 |
872 | [convolutional]
873 | size=1
874 | stride=1
875 | pad=1
876 | filters=255
877 | activation=linear
878 |
879 | [yolo]
880 | mask = 6,7,8
881 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
882 | classes=80
883 | num=9
884 | jitter=.3
885 | ignore_thresh = .7
886 | truth_thresh = 1
887 | random=1
888 | scale_x_y = 1.05
889 | iou_thresh=0.213
890 | cls_normalizer=1.0
891 | iou_normalizer=0.07
892 | iou_loss=ciou
893 | nms_kind=greedynms
894 | beta_nms=0.6
--------------------------------------------------------------------------------
/cfg/yolov4-pacsp-s-mish.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | #batch=1
4 | #subdivisions=1
5 | # Training
6 | batch=64
7 | subdivisions=8
8 | width=512
9 | height=512
10 | channels=3
11 | momentum=0.949
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.00261
19 | burn_in=1000
20 | max_batches = 500500
21 | policy=steps
22 | steps=400000,450000
23 | scales=.1,.1
24 |
25 | mosaic=1
26 |
27 | [convolutional]
28 | batch_normalize=1
29 | filters=32
30 | size=3
31 | stride=1
32 | pad=1
33 | activation=mish
34 |
35 | # Downsample
36 |
37 | [convolutional]
38 | batch_normalize=1
39 | filters=32
40 | size=3
41 | stride=2
42 | pad=1
43 | activation=mish
44 |
45 | [convolutional]
46 | batch_normalize=1
47 | filters=32
48 | size=1
49 | stride=1
50 | pad=1
51 | activation=mish
52 |
53 | [convolutional]
54 | batch_normalize=1
55 | filters=32
56 | size=3
57 | stride=1
58 | pad=1
59 | activation=mish
60 |
61 | [shortcut]
62 | from=-3
63 | activation=linear
64 |
65 | # Downsample
66 |
67 | [convolutional]
68 | batch_normalize=1
69 | filters=64
70 | size=3
71 | stride=2
72 | pad=1
73 | activation=mish
74 |
75 | [convolutional]
76 | batch_normalize=1
77 | filters=32
78 | size=1
79 | stride=1
80 | pad=1
81 | activation=mish
82 |
83 | [route]
84 | layers = -2
85 |
86 | [convolutional]
87 | batch_normalize=1
88 | filters=32
89 | size=1
90 | stride=1
91 | pad=1
92 | activation=mish
93 |
94 | [convolutional]
95 | batch_normalize=1
96 | filters=32
97 | size=1
98 | stride=1
99 | pad=1
100 | activation=mish
101 |
102 | [convolutional]
103 | batch_normalize=1
104 | filters=32
105 | size=3
106 | stride=1
107 | pad=1
108 | activation=mish
109 |
110 | [shortcut]
111 | from=-3
112 | activation=linear
113 |
114 | [convolutional]
115 | batch_normalize=1
116 | filters=32
117 | size=1
118 | stride=1
119 | pad=1
120 | activation=mish
121 |
122 | [route]
123 | layers = -1,-7
124 |
125 | [convolutional]
126 | batch_normalize=1
127 | filters=64
128 | size=1
129 | stride=1
130 | pad=1
131 | activation=mish
132 |
133 | # Downsample
134 |
135 | [convolutional]
136 | batch_normalize=1
137 | filters=128
138 | size=3
139 | stride=2
140 | pad=1
141 | activation=mish
142 |
143 | [convolutional]
144 | batch_normalize=1
145 | filters=64
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=mish
150 |
151 | [route]
152 | layers = -2
153 |
154 | [convolutional]
155 | batch_normalize=1
156 | filters=64
157 | size=1
158 | stride=1
159 | pad=1
160 | activation=mish
161 |
162 | [convolutional]
163 | batch_normalize=1
164 | filters=64
165 | size=1
166 | stride=1
167 | pad=1
168 | activation=mish
169 |
170 | [convolutional]
171 | batch_normalize=1
172 | filters=64
173 | size=3
174 | stride=1
175 | pad=1
176 | activation=mish
177 |
178 | [shortcut]
179 | from=-3
180 | activation=linear
181 |
182 | [convolutional]
183 | batch_normalize=1
184 | filters=64
185 | size=1
186 | stride=1
187 | pad=1
188 | activation=mish
189 |
190 | [route]
191 | layers = -1,-7
192 |
193 | [convolutional]
194 | batch_normalize=1
195 | filters=128
196 | size=1
197 | stride=1
198 | pad=1
199 | activation=mish
200 |
201 | # Downsample
202 |
203 | [convolutional]
204 | batch_normalize=1
205 | filters=256
206 | size=3
207 | stride=2
208 | pad=1
209 | activation=mish
210 |
211 | [convolutional]
212 | batch_normalize=1
213 | filters=128
214 | size=1
215 | stride=1
216 | pad=1
217 | activation=mish
218 |
219 | [route]
220 | layers = -2
221 |
222 | [convolutional]
223 | batch_normalize=1
224 | filters=128
225 | size=1
226 | stride=1
227 | pad=1
228 | activation=mish
229 |
230 | [convolutional]
231 | batch_normalize=1
232 | filters=128
233 | size=1
234 | stride=1
235 | pad=1
236 | activation=mish
237 |
238 | [convolutional]
239 | batch_normalize=1
240 | filters=128
241 | size=3
242 | stride=1
243 | pad=1
244 | activation=mish
245 |
246 | [shortcut]
247 | from=-3
248 | activation=linear
249 |
250 | [convolutional]
251 | batch_normalize=1
252 | filters=128
253 | size=1
254 | stride=1
255 | pad=1
256 | activation=mish
257 |
258 | [route]
259 | layers = -1,-7
260 |
261 | [convolutional]
262 | batch_normalize=1
263 | filters=256
264 | size=1
265 | stride=1
266 | pad=1
267 | activation=mish
268 |
269 | # Downsample
270 |
271 | [convolutional]
272 | batch_normalize=1
273 | filters=512
274 | size=3
275 | stride=2
276 | pad=1
277 | activation=mish
278 |
279 | [convolutional]
280 | batch_normalize=1
281 | filters=256
282 | size=1
283 | stride=1
284 | pad=1
285 | activation=mish
286 |
287 | [route]
288 | layers = -2
289 |
290 | [convolutional]
291 | batch_normalize=1
292 | filters=256
293 | size=1
294 | stride=1
295 | pad=1
296 | activation=mish
297 |
298 | [convolutional]
299 | batch_normalize=1
300 | filters=256
301 | size=1
302 | stride=1
303 | pad=1
304 | activation=mish
305 |
306 | [convolutional]
307 | batch_normalize=1
308 | filters=256
309 | size=3
310 | stride=1
311 | pad=1
312 | activation=mish
313 |
314 | [shortcut]
315 | from=-3
316 | activation=linear
317 |
318 | [convolutional]
319 | batch_normalize=1
320 | filters=256
321 | size=1
322 | stride=1
323 | pad=1
324 | activation=mish
325 |
326 | [route]
327 | layers = -1,-7
328 |
329 | [convolutional]
330 | batch_normalize=1
331 | filters=512
332 | size=1
333 | stride=1
334 | pad=1
335 | activation=mish
336 |
337 | ##########################
338 |
339 | [convolutional]
340 | batch_normalize=1
341 | filters=256
342 | size=1
343 | stride=1
344 | pad=1
345 | activation=mish
346 |
347 | [route]
348 | layers = -2
349 |
350 | [convolutional]
351 | batch_normalize=1
352 | filters=256
353 | size=1
354 | stride=1
355 | pad=1
356 | activation=mish
357 |
358 | ### SPP ###
359 | [maxpool]
360 | stride=1
361 | size=5
362 |
363 | [route]
364 | layers=-2
365 |
366 | [maxpool]
367 | stride=1
368 | size=9
369 |
370 | [route]
371 | layers=-4
372 |
373 | [maxpool]
374 | stride=1
375 | size=13
376 |
377 | [route]
378 | layers=-1,-3,-5,-6
379 | ### End SPP ###
380 |
381 | [convolutional]
382 | batch_normalize=1
383 | filters=256
384 | size=1
385 | stride=1
386 | pad=1
387 | activation=mish
388 |
389 | [convolutional]
390 | batch_normalize=1
391 | size=3
392 | stride=1
393 | pad=1
394 | filters=256
395 | activation=mish
396 |
397 | [route]
398 | layers = -1, -11
399 |
400 | [convolutional]
401 | batch_normalize=1
402 | filters=256
403 | size=1
404 | stride=1
405 | pad=1
406 | activation=mish
407 |
408 | [convolutional]
409 | batch_normalize=1
410 | filters=128
411 | size=1
412 | stride=1
413 | pad=1
414 | activation=mish
415 |
416 | [upsample]
417 | stride=2
418 |
419 | [route]
420 | layers = 34
421 |
422 | [convolutional]
423 | batch_normalize=1
424 | filters=128
425 | size=1
426 | stride=1
427 | pad=1
428 | activation=mish
429 |
430 | [route]
431 | layers = -1, -3
432 |
433 | [convolutional]
434 | batch_normalize=1
435 | filters=128
436 | size=1
437 | stride=1
438 | pad=1
439 | activation=mish
440 |
441 | [convolutional]
442 | batch_normalize=1
443 | filters=128
444 | size=1
445 | stride=1
446 | pad=1
447 | activation=mish
448 |
449 | [route]
450 | layers = -2
451 |
452 | [convolutional]
453 | batch_normalize=1
454 | filters=128
455 | size=1
456 | stride=1
457 | pad=1
458 | activation=mish
459 |
460 | [convolutional]
461 | batch_normalize=1
462 | size=3
463 | stride=1
464 | pad=1
465 | filters=128
466 | activation=mish
467 |
468 | [route]
469 | layers = -1, -4
470 |
471 | [convolutional]
472 | batch_normalize=1
473 | filters=128
474 | size=1
475 | stride=1
476 | pad=1
477 | activation=mish
478 |
479 | [convolutional]
480 | batch_normalize=1
481 | filters=64
482 | size=1
483 | stride=1
484 | pad=1
485 | activation=mish
486 |
487 | [upsample]
488 | stride=2
489 |
490 | [route]
491 | layers = 24
492 |
493 | [convolutional]
494 | batch_normalize=1
495 | filters=64
496 | size=1
497 | stride=1
498 | pad=1
499 | activation=mish
500 |
501 | [route]
502 | layers = -1, -3
503 |
504 | [convolutional]
505 | batch_normalize=1
506 | filters=64
507 | size=1
508 | stride=1
509 | pad=1
510 | activation=mish
511 |
512 | [convolutional]
513 | batch_normalize=1
514 | filters=64
515 | size=1
516 | stride=1
517 | pad=1
518 | activation=mish
519 |
520 | [route]
521 | layers = -2
522 |
523 | [convolutional]
524 | batch_normalize=1
525 | filters=64
526 | size=1
527 | stride=1
528 | pad=1
529 | activation=mish
530 |
531 | [convolutional]
532 | batch_normalize=1
533 | size=3
534 | stride=1
535 | pad=1
536 | filters=64
537 | activation=mish
538 |
539 | [route]
540 | layers = -1, -4
541 |
542 | [convolutional]
543 | batch_normalize=1
544 | filters=64
545 | size=1
546 | stride=1
547 | pad=1
548 | activation=mish
549 |
550 | ##########################
551 |
552 | [convolutional]
553 | batch_normalize=1
554 | size=3
555 | stride=1
556 | pad=1
557 | filters=128
558 | activation=mish
559 |
560 | [convolutional]
561 | size=1
562 | stride=1
563 | pad=1
564 | filters=255
565 | activation=linear
566 |
567 |
568 | [yolo]
569 | mask = 0,1,2
570 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
571 | classes=80
572 | num=9
573 | jitter=.3
574 | ignore_thresh = .7
575 | truth_thresh = 1
576 | random=1
577 | scale_x_y = 1.05
578 | iou_thresh=0.213
579 | cls_normalizer=1.0
580 | iou_normalizer=0.07
581 | iou_loss=ciou
582 | nms_kind=greedynms
583 | beta_nms=0.6
584 |
585 | [route]
586 | layers = -4
587 |
588 | [convolutional]
589 | batch_normalize=1
590 | size=3
591 | stride=2
592 | pad=1
593 | filters=128
594 | activation=mish
595 |
596 | [route]
597 | layers = -1, -18
598 |
599 | [convolutional]
600 | batch_normalize=1
601 | filters=128
602 | size=1
603 | stride=1
604 | pad=1
605 | activation=mish
606 |
607 | [convolutional]
608 | batch_normalize=1
609 | filters=128
610 | size=1
611 | stride=1
612 | pad=1
613 | activation=mish
614 |
615 | [route]
616 | layers = -2
617 |
618 | [convolutional]
619 | batch_normalize=1
620 | filters=128
621 | size=1
622 | stride=1
623 | pad=1
624 | activation=mish
625 |
626 | [convolutional]
627 | batch_normalize=1
628 | size=3
629 | stride=1
630 | pad=1
631 | filters=128
632 | activation=mish
633 |
634 | [route]
635 | layers = -1,-4
636 |
637 | [convolutional]
638 | batch_normalize=1
639 | filters=128
640 | size=1
641 | stride=1
642 | pad=1
643 | activation=mish
644 |
645 | [convolutional]
646 | batch_normalize=1
647 | size=3
648 | stride=1
649 | pad=1
650 | filters=256
651 | activation=mish
652 |
653 | [convolutional]
654 | size=1
655 | stride=1
656 | pad=1
657 | filters=255
658 | activation=linear
659 |
660 |
661 | [yolo]
662 | mask = 3,4,5
663 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
664 | classes=80
665 | num=9
666 | jitter=.3
667 | ignore_thresh = .7
668 | truth_thresh = 1
669 | random=1
670 | scale_x_y = 1.05
671 | iou_thresh=0.213
672 | cls_normalizer=1.0
673 | iou_normalizer=0.07
674 | iou_loss=ciou
675 | nms_kind=greedynms
676 | beta_nms=0.6
677 |
678 | [route]
679 | layers = -4
680 |
681 | [convolutional]
682 | batch_normalize=1
683 | size=3
684 | stride=2
685 | pad=1
686 | filters=256
687 | activation=mish
688 |
689 | [route]
690 | layers = -1, -43
691 |
692 | [convolutional]
693 | batch_normalize=1
694 | filters=256
695 | size=1
696 | stride=1
697 | pad=1
698 | activation=mish
699 |
700 | [convolutional]
701 | batch_normalize=1
702 | filters=256
703 | size=1
704 | stride=1
705 | pad=1
706 | activation=mish
707 |
708 | [route]
709 | layers = -2
710 |
711 | [convolutional]
712 | batch_normalize=1
713 | filters=256
714 | size=1
715 | stride=1
716 | pad=1
717 | activation=mish
718 |
719 | [convolutional]
720 | batch_normalize=1
721 | size=3
722 | stride=1
723 | pad=1
724 | filters=256
725 | activation=mish
726 |
727 | [route]
728 | layers = -1,-4
729 |
730 | [convolutional]
731 | batch_normalize=1
732 | filters=256
733 | size=1
734 | stride=1
735 | pad=1
736 | activation=mish
737 |
738 | [convolutional]
739 | batch_normalize=1
740 | size=3
741 | stride=1
742 | pad=1
743 | filters=512
744 | activation=mish
745 |
746 | [convolutional]
747 | size=1
748 | stride=1
749 | pad=1
750 | filters=255
751 | activation=linear
752 |
753 |
754 | [yolo]
755 | mask = 6,7,8
756 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
757 | classes=80
758 | num=9
759 | jitter=.3
760 | ignore_thresh = .7
761 | truth_thresh = 1
762 | random=1
763 | scale_x_y = 1.05
764 | iou_thresh=0.213
765 | cls_normalizer=1.0
766 | iou_normalizer=0.07
767 | iou_loss=ciou
768 | nms_kind=greedynms
769 | beta_nms=0.6
770 |
--------------------------------------------------------------------------------
/cfg/yolov4-pacsp-s.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | #batch=1
4 | #subdivisions=1
5 | # Training
6 | batch=64
7 | subdivisions=8
8 | width=512
9 | height=512
10 | channels=3
11 | momentum=0.949
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.00261
19 | burn_in=1000
20 | max_batches = 500500
21 | policy=steps
22 | steps=400000,450000
23 | scales=.1,.1
24 |
25 | mosaic=1
26 |
27 | [convolutional]
28 | batch_normalize=1
29 | filters=32
30 | size=3
31 | stride=1
32 | pad=1
33 | activation=leaky
34 |
35 | # Downsample
36 |
37 | [convolutional]
38 | batch_normalize=1
39 | filters=32
40 | size=3
41 | stride=2
42 | pad=1
43 | activation=leaky
44 |
45 | [convolutional]
46 | batch_normalize=1
47 | filters=32
48 | size=1
49 | stride=1
50 | pad=1
51 | activation=leaky
52 |
53 | [convolutional]
54 | batch_normalize=1
55 | filters=32
56 | size=3
57 | stride=1
58 | pad=1
59 | activation=leaky
60 |
61 | [shortcut]
62 | from=-3
63 | activation=linear
64 |
65 | # Downsample
66 |
67 | [convolutional]
68 | batch_normalize=1
69 | filters=64
70 | size=3
71 | stride=2
72 | pad=1
73 | activation=leaky
74 |
75 | [convolutional]
76 | batch_normalize=1
77 | filters=32
78 | size=1
79 | stride=1
80 | pad=1
81 | activation=leaky
82 |
83 | [route]
84 | layers = -2
85 |
86 | [convolutional]
87 | batch_normalize=1
88 | filters=32
89 | size=1
90 | stride=1
91 | pad=1
92 | activation=leaky
93 |
94 | [convolutional]
95 | batch_normalize=1
96 | filters=32
97 | size=1
98 | stride=1
99 | pad=1
100 | activation=leaky
101 |
102 | [convolutional]
103 | batch_normalize=1
104 | filters=32
105 | size=3
106 | stride=1
107 | pad=1
108 | activation=leaky
109 |
110 | [shortcut]
111 | from=-3
112 | activation=linear
113 |
114 | [convolutional]
115 | batch_normalize=1
116 | filters=32
117 | size=1
118 | stride=1
119 | pad=1
120 | activation=leaky
121 |
122 | [route]
123 | layers = -1,-7
124 |
125 | [convolutional]
126 | batch_normalize=1
127 | filters=64
128 | size=1
129 | stride=1
130 | pad=1
131 | activation=leaky
132 |
133 | # Downsample
134 |
135 | [convolutional]
136 | batch_normalize=1
137 | filters=128
138 | size=3
139 | stride=2
140 | pad=1
141 | activation=leaky
142 |
143 | [convolutional]
144 | batch_normalize=1
145 | filters=64
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=leaky
150 |
151 | [route]
152 | layers = -2
153 |
154 | [convolutional]
155 | batch_normalize=1
156 | filters=64
157 | size=1
158 | stride=1
159 | pad=1
160 | activation=leaky
161 |
162 | [convolutional]
163 | batch_normalize=1
164 | filters=64
165 | size=1
166 | stride=1
167 | pad=1
168 | activation=leaky
169 |
170 | [convolutional]
171 | batch_normalize=1
172 | filters=64
173 | size=3
174 | stride=1
175 | pad=1
176 | activation=leaky
177 |
178 | [shortcut]
179 | from=-3
180 | activation=linear
181 |
182 | [convolutional]
183 | batch_normalize=1
184 | filters=64
185 | size=1
186 | stride=1
187 | pad=1
188 | activation=leaky
189 |
190 | [route]
191 | layers = -1,-7
192 |
193 | [convolutional]
194 | batch_normalize=1
195 | filters=128
196 | size=1
197 | stride=1
198 | pad=1
199 | activation=leaky
200 |
201 | # Downsample
202 |
203 | [convolutional]
204 | batch_normalize=1
205 | filters=256
206 | size=3
207 | stride=2
208 | pad=1
209 | activation=leaky
210 |
211 | [convolutional]
212 | batch_normalize=1
213 | filters=128
214 | size=1
215 | stride=1
216 | pad=1
217 | activation=leaky
218 |
219 | [route]
220 | layers = -2
221 |
222 | [convolutional]
223 | batch_normalize=1
224 | filters=128
225 | size=1
226 | stride=1
227 | pad=1
228 | activation=leaky
229 |
230 | [convolutional]
231 | batch_normalize=1
232 | filters=128
233 | size=1
234 | stride=1
235 | pad=1
236 | activation=leaky
237 |
238 | [convolutional]
239 | batch_normalize=1
240 | filters=128
241 | size=3
242 | stride=1
243 | pad=1
244 | activation=leaky
245 |
246 | [shortcut]
247 | from=-3
248 | activation=linear
249 |
250 | [convolutional]
251 | batch_normalize=1
252 | filters=128
253 | size=1
254 | stride=1
255 | pad=1
256 | activation=leaky
257 |
258 | [route]
259 | layers = -1,-7
260 |
261 | [convolutional]
262 | batch_normalize=1
263 | filters=256
264 | size=1
265 | stride=1
266 | pad=1
267 | activation=leaky
268 |
269 | # Downsample
270 |
271 | [convolutional]
272 | batch_normalize=1
273 | filters=512
274 | size=3
275 | stride=2
276 | pad=1
277 | activation=leaky
278 |
279 | [convolutional]
280 | batch_normalize=1
281 | filters=256
282 | size=1
283 | stride=1
284 | pad=1
285 | activation=leaky
286 |
287 | [route]
288 | layers = -2
289 |
290 | [convolutional]
291 | batch_normalize=1
292 | filters=256
293 | size=1
294 | stride=1
295 | pad=1
296 | activation=leaky
297 |
298 | [convolutional]
299 | batch_normalize=1
300 | filters=256
301 | size=1
302 | stride=1
303 | pad=1
304 | activation=leaky
305 |
306 | [convolutional]
307 | batch_normalize=1
308 | filters=256
309 | size=3
310 | stride=1
311 | pad=1
312 | activation=leaky
313 |
314 | [shortcut]
315 | from=-3
316 | activation=linear
317 |
318 | [convolutional]
319 | batch_normalize=1
320 | filters=256
321 | size=1
322 | stride=1
323 | pad=1
324 | activation=leaky
325 |
326 | [route]
327 | layers = -1,-7
328 |
329 | [convolutional]
330 | batch_normalize=1
331 | filters=512
332 | size=1
333 | stride=1
334 | pad=1
335 | activation=leaky
336 |
337 | ##########################
338 |
339 | [convolutional]
340 | batch_normalize=1
341 | filters=256
342 | size=1
343 | stride=1
344 | pad=1
345 | activation=leaky
346 |
347 | [route]
348 | layers = -2
349 |
350 | [convolutional]
351 | batch_normalize=1
352 | filters=256
353 | size=1
354 | stride=1
355 | pad=1
356 | activation=leaky
357 |
358 | ### SPP ###
359 | [maxpool]
360 | stride=1
361 | size=5
362 |
363 | [route]
364 | layers=-2
365 |
366 | [maxpool]
367 | stride=1
368 | size=9
369 |
370 | [route]
371 | layers=-4
372 |
373 | [maxpool]
374 | stride=1
375 | size=13
376 |
377 | [route]
378 | layers=-1,-3,-5,-6
379 | ### End SPP ###
380 |
381 | [convolutional]
382 | batch_normalize=1
383 | filters=256
384 | size=1
385 | stride=1
386 | pad=1
387 | activation=leaky
388 |
389 | [convolutional]
390 | batch_normalize=1
391 | size=3
392 | stride=1
393 | pad=1
394 | filters=256
395 | activation=leaky
396 |
397 | [route]
398 | layers = -1, -11
399 |
400 | [convolutional]
401 | batch_normalize=1
402 | filters=256
403 | size=1
404 | stride=1
405 | pad=1
406 | activation=leaky
407 |
408 | [convolutional]
409 | batch_normalize=1
410 | filters=128
411 | size=1
412 | stride=1
413 | pad=1
414 | activation=leaky
415 |
416 | [upsample]
417 | stride=2
418 |
419 | [route]
420 | layers = 34
421 |
422 | [convolutional]
423 | batch_normalize=1
424 | filters=128
425 | size=1
426 | stride=1
427 | pad=1
428 | activation=leaky
429 |
430 | [route]
431 | layers = -1, -3
432 |
433 | [convolutional]
434 | batch_normalize=1
435 | filters=128
436 | size=1
437 | stride=1
438 | pad=1
439 | activation=leaky
440 |
441 | [convolutional]
442 | batch_normalize=1
443 | filters=128
444 | size=1
445 | stride=1
446 | pad=1
447 | activation=leaky
448 |
449 | [route]
450 | layers = -2
451 |
452 | [convolutional]
453 | batch_normalize=1
454 | filters=128
455 | size=1
456 | stride=1
457 | pad=1
458 | activation=leaky
459 |
460 | [convolutional]
461 | batch_normalize=1
462 | size=3
463 | stride=1
464 | pad=1
465 | filters=128
466 | activation=leaky
467 |
468 | [route]
469 | layers = -1, -4
470 |
471 | [convolutional]
472 | batch_normalize=1
473 | filters=128
474 | size=1
475 | stride=1
476 | pad=1
477 | activation=leaky
478 |
479 | [convolutional]
480 | batch_normalize=1
481 | filters=64
482 | size=1
483 | stride=1
484 | pad=1
485 | activation=leaky
486 |
487 | [upsample]
488 | stride=2
489 |
490 | [route]
491 | layers = 24
492 |
493 | [convolutional]
494 | batch_normalize=1
495 | filters=64
496 | size=1
497 | stride=1
498 | pad=1
499 | activation=leaky
500 |
501 | [route]
502 | layers = -1, -3
503 |
504 | [convolutional]
505 | batch_normalize=1
506 | filters=64
507 | size=1
508 | stride=1
509 | pad=1
510 | activation=leaky
511 |
512 | [convolutional]
513 | batch_normalize=1
514 | filters=64
515 | size=1
516 | stride=1
517 | pad=1
518 | activation=leaky
519 |
520 | [route]
521 | layers = -2
522 |
523 | [convolutional]
524 | batch_normalize=1
525 | filters=64
526 | size=1
527 | stride=1
528 | pad=1
529 | activation=leaky
530 |
531 | [convolutional]
532 | batch_normalize=1
533 | size=3
534 | stride=1
535 | pad=1
536 | filters=64
537 | activation=leaky
538 |
539 | [route]
540 | layers = -1, -4
541 |
542 | [convolutional]
543 | batch_normalize=1
544 | filters=64
545 | size=1
546 | stride=1
547 | pad=1
548 | activation=leaky
549 |
550 | ##########################
551 |
552 | [convolutional]
553 | batch_normalize=1
554 | size=3
555 | stride=1
556 | pad=1
557 | filters=128
558 | activation=leaky
559 |
560 | [convolutional]
561 | size=1
562 | stride=1
563 | pad=1
564 | filters=255
565 | activation=linear
566 |
567 |
568 | [yolo]
569 | mask = 0,1,2
570 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
571 | classes=80
572 | num=9
573 | jitter=.3
574 | ignore_thresh = .7
575 | truth_thresh = 1
576 | random=1
577 | scale_x_y = 1.05
578 | iou_thresh=0.213
579 | cls_normalizer=1.0
580 | iou_normalizer=0.07
581 | iou_loss=ciou
582 | nms_kind=greedynms
583 | beta_nms=0.6
584 |
585 | [route]
586 | layers = -4
587 |
588 | [convolutional]
589 | batch_normalize=1
590 | size=3
591 | stride=2
592 | pad=1
593 | filters=128
594 | activation=leaky
595 |
596 | [route]
597 | layers = -1, -18
598 |
599 | [convolutional]
600 | batch_normalize=1
601 | filters=128
602 | size=1
603 | stride=1
604 | pad=1
605 | activation=leaky
606 |
607 | [convolutional]
608 | batch_normalize=1
609 | filters=128
610 | size=1
611 | stride=1
612 | pad=1
613 | activation=leaky
614 |
615 | [route]
616 | layers = -2
617 |
618 | [convolutional]
619 | batch_normalize=1
620 | filters=128
621 | size=1
622 | stride=1
623 | pad=1
624 | activation=leaky
625 |
626 | [convolutional]
627 | batch_normalize=1
628 | size=3
629 | stride=1
630 | pad=1
631 | filters=128
632 | activation=leaky
633 |
634 | [route]
635 | layers = -1,-4
636 |
637 | [convolutional]
638 | batch_normalize=1
639 | filters=128
640 | size=1
641 | stride=1
642 | pad=1
643 | activation=leaky
644 |
645 | [convolutional]
646 | batch_normalize=1
647 | size=3
648 | stride=1
649 | pad=1
650 | filters=256
651 | activation=leaky
652 |
653 | [convolutional]
654 | size=1
655 | stride=1
656 | pad=1
657 | filters=255
658 | activation=linear
659 |
660 |
661 | [yolo]
662 | mask = 3,4,5
663 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
664 | classes=80
665 | num=9
666 | jitter=.3
667 | ignore_thresh = .7
668 | truth_thresh = 1
669 | random=1
670 | scale_x_y = 1.05
671 | iou_thresh=0.213
672 | cls_normalizer=1.0
673 | iou_normalizer=0.07
674 | iou_loss=ciou
675 | nms_kind=greedynms
676 | beta_nms=0.6
677 |
678 | [route]
679 | layers = -4
680 |
681 | [convolutional]
682 | batch_normalize=1
683 | size=3
684 | stride=2
685 | pad=1
686 | filters=256
687 | activation=leaky
688 |
689 | [route]
690 | layers = -1, -43
691 |
692 | [convolutional]
693 | batch_normalize=1
694 | filters=256
695 | size=1
696 | stride=1
697 | pad=1
698 | activation=leaky
699 |
700 | [convolutional]
701 | batch_normalize=1
702 | filters=256
703 | size=1
704 | stride=1
705 | pad=1
706 | activation=leaky
707 |
708 | [route]
709 | layers = -2
710 |
711 | [convolutional]
712 | batch_normalize=1
713 | filters=256
714 | size=1
715 | stride=1
716 | pad=1
717 | activation=leaky
718 |
719 | [convolutional]
720 | batch_normalize=1
721 | size=3
722 | stride=1
723 | pad=1
724 | filters=256
725 | activation=leaky
726 |
727 | [route]
728 | layers = -1,-4
729 |
730 | [convolutional]
731 | batch_normalize=1
732 | filters=256
733 | size=1
734 | stride=1
735 | pad=1
736 | activation=leaky
737 |
738 | [convolutional]
739 | batch_normalize=1
740 | size=3
741 | stride=1
742 | pad=1
743 | filters=512
744 | activation=leaky
745 |
746 | [convolutional]
747 | size=1
748 | stride=1
749 | pad=1
750 | filters=255
751 | activation=linear
752 |
753 |
754 | [yolo]
755 | mask = 6,7,8
756 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
757 | classes=80
758 | num=9
759 | jitter=.3
760 | ignore_thresh = .7
761 | truth_thresh = 1
762 | random=1
763 | scale_x_y = 1.05
764 | iou_thresh=0.213
765 | cls_normalizer=1.0
766 | iou_normalizer=0.07
767 | iou_loss=ciou
768 | nms_kind=greedynms
769 | beta_nms=0.6
770 |
--------------------------------------------------------------------------------
/cfg/yolov4-tiny.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | #batch=1
4 | #subdivisions=1
5 | # Training
6 | batch=64
7 | subdivisions=1
8 | width=416
9 | height=416
10 | channels=3
11 | momentum=0.9
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.00261
19 | burn_in=1000
20 | max_batches = 500200
21 | policy=steps
22 | steps=400000,450000
23 | scales=.1,.1
24 |
25 | [convolutional]
26 | batch_normalize=1
27 | filters=32
28 | size=3
29 | stride=2
30 | pad=1
31 | activation=leaky
32 |
33 | [convolutional]
34 | batch_normalize=1
35 | filters=64
36 | size=3
37 | stride=2
38 | pad=1
39 | activation=leaky
40 |
41 | [convolutional]
42 | batch_normalize=1
43 | filters=64
44 | size=3
45 | stride=1
46 | pad=1
47 | activation=leaky
48 |
49 | [route_lhalf]
50 | layers=-1
51 |
52 | [convolutional]
53 | batch_normalize=1
54 | filters=32
55 | size=3
56 | stride=1
57 | pad=1
58 | activation=leaky
59 |
60 | [convolutional]
61 | batch_normalize=1
62 | filters=32
63 | size=3
64 | stride=1
65 | pad=1
66 | activation=leaky
67 |
68 | [route]
69 | layers = -1,-2
70 |
71 | [convolutional]
72 | batch_normalize=1
73 | filters=64
74 | size=1
75 | stride=1
76 | pad=1
77 | activation=leaky
78 |
79 | [route]
80 | layers = -6,-1
81 |
82 | [maxpool]
83 | size=2
84 | stride=2
85 |
86 | [convolutional]
87 | batch_normalize=1
88 | filters=128
89 | size=3
90 | stride=1
91 | pad=1
92 | activation=leaky
93 |
94 | [route_lhalf]
95 | layers=-1
96 |
97 | [convolutional]
98 | batch_normalize=1
99 | filters=64
100 | size=3
101 | stride=1
102 | pad=1
103 | activation=leaky
104 |
105 | [convolutional]
106 | batch_normalize=1
107 | filters=64
108 | size=3
109 | stride=1
110 | pad=1
111 | activation=leaky
112 |
113 | [route]
114 | layers = -1,-2
115 |
116 | [convolutional]
117 | batch_normalize=1
118 | filters=128
119 | size=1
120 | stride=1
121 | pad=1
122 | activation=leaky
123 |
124 | [route]
125 | layers = -6,-1
126 |
127 | [maxpool]
128 | size=2
129 | stride=2
130 |
131 | [convolutional]
132 | batch_normalize=1
133 | filters=256
134 | size=3
135 | stride=1
136 | pad=1
137 | activation=leaky
138 |
139 | [route_lhalf]
140 | layers=-1
141 |
142 | [convolutional]
143 | batch_normalize=1
144 | filters=128
145 | size=3
146 | stride=1
147 | pad=1
148 | activation=leaky
149 |
150 | [convolutional]
151 | batch_normalize=1
152 | filters=128
153 | size=3
154 | stride=1
155 | pad=1
156 | activation=leaky
157 |
158 | [route]
159 | layers = -1,-2
160 |
161 | [convolutional]
162 | batch_normalize=1
163 | filters=256
164 | size=1
165 | stride=1
166 | pad=1
167 | activation=leaky
168 |
169 | [route]
170 | layers = -6,-1
171 |
172 | [maxpool]
173 | size=2
174 | stride=2
175 |
176 | [convolutional]
177 | batch_normalize=1
178 | filters=512
179 | size=3
180 | stride=1
181 | pad=1
182 | activation=leaky
183 |
184 | ##################################
185 |
186 | [convolutional]
187 | batch_normalize=1
188 | filters=256
189 | size=1
190 | stride=1
191 | pad=1
192 | activation=leaky
193 |
194 | [convolutional]
195 | batch_normalize=1
196 | filters=512
197 | size=3
198 | stride=1
199 | pad=1
200 | activation=leaky
201 |
202 | [convolutional]
203 | size=1
204 | stride=1
205 | pad=1
206 | filters=255
207 | activation=linear
208 |
209 |
210 |
211 | [yolo]
212 | mask = 3,4,5
213 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
214 | classes=80
215 | num=6
216 | jitter=.3
217 | scale_x_y = 1.05
218 | cls_normalizer=1.0
219 | iou_normalizer=0.07
220 | iou_loss=ciou
221 | ignore_thresh = .7
222 | truth_thresh = 1
223 | random=0
224 | nms_kind=greedynms
225 | beta_nms=0.6
226 |
227 | [route]
228 | layers = -4
229 |
230 | [convolutional]
231 | batch_normalize=1
232 | filters=128
233 | size=1
234 | stride=1
235 | pad=1
236 | activation=leaky
237 |
238 | [upsample]
239 | stride=2
240 |
241 | [route]
242 | layers = -1, 23
243 |
244 | [convolutional]
245 | batch_normalize=1
246 | filters=256
247 | size=3
248 | stride=1
249 | pad=1
250 | activation=leaky
251 |
252 | [convolutional]
253 | size=1
254 | stride=1
255 | pad=1
256 | filters=255
257 | activation=linear
258 |
259 | [yolo]
260 | mask = 1,2,3
261 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
262 | classes=80
263 | num=6
264 | jitter=.3
265 | scale_x_y = 1.05
266 | cls_normalizer=1.0
267 | iou_normalizer=0.07
268 | iou_loss=ciou
269 | ignore_thresh = .7
270 | truth_thresh = 1
271 | random=0
272 | nms_kind=greedynms
273 | beta_nms=0.6
274 |
--------------------------------------------------------------------------------
/cfg/yolov4.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | batch=64
3 | subdivisions=8
4 | # Training
5 | #width=512
6 | #height=512
7 | width=608
8 | height=608
9 | channels=3
10 | momentum=0.949
11 | decay=0.0005
12 | angle=0
13 | saturation = 1.5
14 | exposure = 1.5
15 | hue=.1
16 |
17 | learning_rate=0.0013
18 | burn_in=1000
19 | max_batches = 500500
20 | policy=steps
21 | steps=400000,450000
22 | scales=.1,.1
23 |
24 | #cutmix=1
25 | mosaic=1
26 |
27 | #:104x104 54:52x52 85:26x26 104:13x13 for 416
28 |
29 | [convolutional]
30 | batch_normalize=1
31 | filters=32
32 | size=3
33 | stride=1
34 | pad=1
35 | activation=mish
36 |
37 | # Downsample
38 |
39 | [convolutional]
40 | batch_normalize=1
41 | filters=64
42 | size=3
43 | stride=2
44 | pad=1
45 | activation=mish
46 |
47 | [convolutional]
48 | batch_normalize=1
49 | filters=64
50 | size=1
51 | stride=1
52 | pad=1
53 | activation=mish
54 |
55 | [route]
56 | layers = -2
57 |
58 | [convolutional]
59 | batch_normalize=1
60 | filters=64
61 | size=1
62 | stride=1
63 | pad=1
64 | activation=mish
65 |
66 | [convolutional]
67 | batch_normalize=1
68 | filters=32
69 | size=1
70 | stride=1
71 | pad=1
72 | activation=mish
73 |
74 | [convolutional]
75 | batch_normalize=1
76 | filters=64
77 | size=3
78 | stride=1
79 | pad=1
80 | activation=mish
81 |
82 | [shortcut]
83 | from=-3
84 | activation=linear
85 |
86 | [convolutional]
87 | batch_normalize=1
88 | filters=64
89 | size=1
90 | stride=1
91 | pad=1
92 | activation=mish
93 |
94 | [route]
95 | layers = -1,-7
96 |
97 | [convolutional]
98 | batch_normalize=1
99 | filters=64
100 | size=1
101 | stride=1
102 | pad=1
103 | activation=mish
104 |
105 | # Downsample
106 |
107 | [convolutional]
108 | batch_normalize=1
109 | filters=128
110 | size=3
111 | stride=2
112 | pad=1
113 | activation=mish
114 |
115 | [convolutional]
116 | batch_normalize=1
117 | filters=64
118 | size=1
119 | stride=1
120 | pad=1
121 | activation=mish
122 |
123 | [route]
124 | layers = -2
125 |
126 | [convolutional]
127 | batch_normalize=1
128 | filters=64
129 | size=1
130 | stride=1
131 | pad=1
132 | activation=mish
133 |
134 | [convolutional]
135 | batch_normalize=1
136 | filters=64
137 | size=1
138 | stride=1
139 | pad=1
140 | activation=mish
141 |
142 | [convolutional]
143 | batch_normalize=1
144 | filters=64
145 | size=3
146 | stride=1
147 | pad=1
148 | activation=mish
149 |
150 | [shortcut]
151 | from=-3
152 | activation=linear
153 |
154 | [convolutional]
155 | batch_normalize=1
156 | filters=64
157 | size=1
158 | stride=1
159 | pad=1
160 | activation=mish
161 |
162 | [convolutional]
163 | batch_normalize=1
164 | filters=64
165 | size=3
166 | stride=1
167 | pad=1
168 | activation=mish
169 |
170 | [shortcut]
171 | from=-3
172 | activation=linear
173 |
174 | [convolutional]
175 | batch_normalize=1
176 | filters=64
177 | size=1
178 | stride=1
179 | pad=1
180 | activation=mish
181 |
182 | [route]
183 | layers = -1,-10
184 |
185 | [convolutional]
186 | batch_normalize=1
187 | filters=128
188 | size=1
189 | stride=1
190 | pad=1
191 | activation=mish
192 |
193 | # Downsample
194 |
195 | [convolutional]
196 | batch_normalize=1
197 | filters=256
198 | size=3
199 | stride=2
200 | pad=1
201 | activation=mish
202 |
203 | [convolutional]
204 | batch_normalize=1
205 | filters=128
206 | size=1
207 | stride=1
208 | pad=1
209 | activation=mish
210 |
211 | [route]
212 | layers = -2
213 |
214 | [convolutional]
215 | batch_normalize=1
216 | filters=128
217 | size=1
218 | stride=1
219 | pad=1
220 | activation=mish
221 |
222 | [convolutional]
223 | batch_normalize=1
224 | filters=128
225 | size=1
226 | stride=1
227 | pad=1
228 | activation=mish
229 |
230 | [convolutional]
231 | batch_normalize=1
232 | filters=128
233 | size=3
234 | stride=1
235 | pad=1
236 | activation=mish
237 |
238 | [shortcut]
239 | from=-3
240 | activation=linear
241 |
242 | [convolutional]
243 | batch_normalize=1
244 | filters=128
245 | size=1
246 | stride=1
247 | pad=1
248 | activation=mish
249 |
250 | [convolutional]
251 | batch_normalize=1
252 | filters=128
253 | size=3
254 | stride=1
255 | pad=1
256 | activation=mish
257 |
258 | [shortcut]
259 | from=-3
260 | activation=linear
261 |
262 | [convolutional]
263 | batch_normalize=1
264 | filters=128
265 | size=1
266 | stride=1
267 | pad=1
268 | activation=mish
269 |
270 | [convolutional]
271 | batch_normalize=1
272 | filters=128
273 | size=3
274 | stride=1
275 | pad=1
276 | activation=mish
277 |
278 | [shortcut]
279 | from=-3
280 | activation=linear
281 |
282 | [convolutional]
283 | batch_normalize=1
284 | filters=128
285 | size=1
286 | stride=1
287 | pad=1
288 | activation=mish
289 |
290 | [convolutional]
291 | batch_normalize=1
292 | filters=128
293 | size=3
294 | stride=1
295 | pad=1
296 | activation=mish
297 |
298 | [shortcut]
299 | from=-3
300 | activation=linear
301 |
302 |
303 | [convolutional]
304 | batch_normalize=1
305 | filters=128
306 | size=1
307 | stride=1
308 | pad=1
309 | activation=mish
310 |
311 | [convolutional]
312 | batch_normalize=1
313 | filters=128
314 | size=3
315 | stride=1
316 | pad=1
317 | activation=mish
318 |
319 | [shortcut]
320 | from=-3
321 | activation=linear
322 |
323 | [convolutional]
324 | batch_normalize=1
325 | filters=128
326 | size=1
327 | stride=1
328 | pad=1
329 | activation=mish
330 |
331 | [convolutional]
332 | batch_normalize=1
333 | filters=128
334 | size=3
335 | stride=1
336 | pad=1
337 | activation=mish
338 |
339 | [shortcut]
340 | from=-3
341 | activation=linear
342 |
343 | [convolutional]
344 | batch_normalize=1
345 | filters=128
346 | size=1
347 | stride=1
348 | pad=1
349 | activation=mish
350 |
351 | [convolutional]
352 | batch_normalize=1
353 | filters=128
354 | size=3
355 | stride=1
356 | pad=1
357 | activation=mish
358 |
359 | [shortcut]
360 | from=-3
361 | activation=linear
362 |
363 | [convolutional]
364 | batch_normalize=1
365 | filters=128
366 | size=1
367 | stride=1
368 | pad=1
369 | activation=mish
370 |
371 | [convolutional]
372 | batch_normalize=1
373 | filters=128
374 | size=3
375 | stride=1
376 | pad=1
377 | activation=mish
378 |
379 | [shortcut]
380 | from=-3
381 | activation=linear
382 |
383 | [convolutional]
384 | batch_normalize=1
385 | filters=128
386 | size=1
387 | stride=1
388 | pad=1
389 | activation=mish
390 |
391 | [route]
392 | layers = -1,-28
393 |
394 | [convolutional]
395 | batch_normalize=1
396 | filters=256
397 | size=1
398 | stride=1
399 | pad=1
400 | activation=mish
401 |
402 | # Downsample
403 |
404 | [convolutional]
405 | batch_normalize=1
406 | filters=512
407 | size=3
408 | stride=2
409 | pad=1
410 | activation=mish
411 |
412 | [convolutional]
413 | batch_normalize=1
414 | filters=256
415 | size=1
416 | stride=1
417 | pad=1
418 | activation=mish
419 |
420 | [route]
421 | layers = -2
422 |
423 | [convolutional]
424 | batch_normalize=1
425 | filters=256
426 | size=1
427 | stride=1
428 | pad=1
429 | activation=mish
430 |
431 | [convolutional]
432 | batch_normalize=1
433 | filters=256
434 | size=1
435 | stride=1
436 | pad=1
437 | activation=mish
438 |
439 | [convolutional]
440 | batch_normalize=1
441 | filters=256
442 | size=3
443 | stride=1
444 | pad=1
445 | activation=mish
446 |
447 | [shortcut]
448 | from=-3
449 | activation=linear
450 |
451 |
452 | [convolutional]
453 | batch_normalize=1
454 | filters=256
455 | size=1
456 | stride=1
457 | pad=1
458 | activation=mish
459 |
460 | [convolutional]
461 | batch_normalize=1
462 | filters=256
463 | size=3
464 | stride=1
465 | pad=1
466 | activation=mish
467 |
468 | [shortcut]
469 | from=-3
470 | activation=linear
471 |
472 |
473 | [convolutional]
474 | batch_normalize=1
475 | filters=256
476 | size=1
477 | stride=1
478 | pad=1
479 | activation=mish
480 |
481 | [convolutional]
482 | batch_normalize=1
483 | filters=256
484 | size=3
485 | stride=1
486 | pad=1
487 | activation=mish
488 |
489 | [shortcut]
490 | from=-3
491 | activation=linear
492 |
493 |
494 | [convolutional]
495 | batch_normalize=1
496 | filters=256
497 | size=1
498 | stride=1
499 | pad=1
500 | activation=mish
501 |
502 | [convolutional]
503 | batch_normalize=1
504 | filters=256
505 | size=3
506 | stride=1
507 | pad=1
508 | activation=mish
509 |
510 | [shortcut]
511 | from=-3
512 | activation=linear
513 |
514 |
515 | [convolutional]
516 | batch_normalize=1
517 | filters=256
518 | size=1
519 | stride=1
520 | pad=1
521 | activation=mish
522 |
523 | [convolutional]
524 | batch_normalize=1
525 | filters=256
526 | size=3
527 | stride=1
528 | pad=1
529 | activation=mish
530 |
531 | [shortcut]
532 | from=-3
533 | activation=linear
534 |
535 |
536 | [convolutional]
537 | batch_normalize=1
538 | filters=256
539 | size=1
540 | stride=1
541 | pad=1
542 | activation=mish
543 |
544 | [convolutional]
545 | batch_normalize=1
546 | filters=256
547 | size=3
548 | stride=1
549 | pad=1
550 | activation=mish
551 |
552 | [shortcut]
553 | from=-3
554 | activation=linear
555 |
556 |
557 | [convolutional]
558 | batch_normalize=1
559 | filters=256
560 | size=1
561 | stride=1
562 | pad=1
563 | activation=mish
564 |
565 | [convolutional]
566 | batch_normalize=1
567 | filters=256
568 | size=3
569 | stride=1
570 | pad=1
571 | activation=mish
572 |
573 | [shortcut]
574 | from=-3
575 | activation=linear
576 |
577 | [convolutional]
578 | batch_normalize=1
579 | filters=256
580 | size=1
581 | stride=1
582 | pad=1
583 | activation=mish
584 |
585 | [convolutional]
586 | batch_normalize=1
587 | filters=256
588 | size=3
589 | stride=1
590 | pad=1
591 | activation=mish
592 |
593 | [shortcut]
594 | from=-3
595 | activation=linear
596 |
597 | [convolutional]
598 | batch_normalize=1
599 | filters=256
600 | size=1
601 | stride=1
602 | pad=1
603 | activation=mish
604 |
605 | [route]
606 | layers = -1,-28
607 |
608 | [convolutional]
609 | batch_normalize=1
610 | filters=512
611 | size=1
612 | stride=1
613 | pad=1
614 | activation=mish
615 |
616 | # Downsample
617 |
618 | [convolutional]
619 | batch_normalize=1
620 | filters=1024
621 | size=3
622 | stride=2
623 | pad=1
624 | activation=mish
625 |
626 | [convolutional]
627 | batch_normalize=1
628 | filters=512
629 | size=1
630 | stride=1
631 | pad=1
632 | activation=mish
633 |
634 | [route]
635 | layers = -2
636 |
637 | [convolutional]
638 | batch_normalize=1
639 | filters=512
640 | size=1
641 | stride=1
642 | pad=1
643 | activation=mish
644 |
645 | [convolutional]
646 | batch_normalize=1
647 | filters=512
648 | size=1
649 | stride=1
650 | pad=1
651 | activation=mish
652 |
653 | [convolutional]
654 | batch_normalize=1
655 | filters=512
656 | size=3
657 | stride=1
658 | pad=1
659 | activation=mish
660 |
661 | [shortcut]
662 | from=-3
663 | activation=linear
664 |
665 | [convolutional]
666 | batch_normalize=1
667 | filters=512
668 | size=1
669 | stride=1
670 | pad=1
671 | activation=mish
672 |
673 | [convolutional]
674 | batch_normalize=1
675 | filters=512
676 | size=3
677 | stride=1
678 | pad=1
679 | activation=mish
680 |
681 | [shortcut]
682 | from=-3
683 | activation=linear
684 |
685 | [convolutional]
686 | batch_normalize=1
687 | filters=512
688 | size=1
689 | stride=1
690 | pad=1
691 | activation=mish
692 |
693 | [convolutional]
694 | batch_normalize=1
695 | filters=512
696 | size=3
697 | stride=1
698 | pad=1
699 | activation=mish
700 |
701 | [shortcut]
702 | from=-3
703 | activation=linear
704 |
705 | [convolutional]
706 | batch_normalize=1
707 | filters=512
708 | size=1
709 | stride=1
710 | pad=1
711 | activation=mish
712 |
713 | [convolutional]
714 | batch_normalize=1
715 | filters=512
716 | size=3
717 | stride=1
718 | pad=1
719 | activation=mish
720 |
721 | [shortcut]
722 | from=-3
723 | activation=linear
724 |
725 | [convolutional]
726 | batch_normalize=1
727 | filters=512
728 | size=1
729 | stride=1
730 | pad=1
731 | activation=mish
732 |
733 | [route]
734 | layers = -1,-16
735 |
736 | [convolutional]
737 | batch_normalize=1
738 | filters=1024
739 | size=1
740 | stride=1
741 | pad=1
742 | activation=mish
743 |
744 | ##########################
745 |
746 | [convolutional]
747 | batch_normalize=1
748 | filters=512
749 | size=1
750 | stride=1
751 | pad=1
752 | activation=leaky
753 |
754 | [convolutional]
755 | batch_normalize=1
756 | size=3
757 | stride=1
758 | pad=1
759 | filters=1024
760 | activation=leaky
761 |
762 | [convolutional]
763 | batch_normalize=1
764 | filters=512
765 | size=1
766 | stride=1
767 | pad=1
768 | activation=leaky
769 |
770 | ### SPP ###
771 | [maxpool]
772 | stride=1
773 | size=5
774 |
775 | [route]
776 | layers=-2
777 |
778 | [maxpool]
779 | stride=1
780 | size=9
781 |
782 | [route]
783 | layers=-4
784 |
785 | [maxpool]
786 | stride=1
787 | size=13
788 |
789 | [route]
790 | layers=-1,-3,-5,-6
791 | ### End SPP ###
792 |
793 | [convolutional]
794 | batch_normalize=1
795 | filters=512
796 | size=1
797 | stride=1
798 | pad=1
799 | activation=leaky
800 |
801 | [convolutional]
802 | batch_normalize=1
803 | size=3
804 | stride=1
805 | pad=1
806 | filters=1024
807 | activation=leaky
808 |
809 | [convolutional]
810 | batch_normalize=1
811 | filters=512
812 | size=1
813 | stride=1
814 | pad=1
815 | activation=leaky
816 |
817 | [convolutional]
818 | batch_normalize=1
819 | filters=256
820 | size=1
821 | stride=1
822 | pad=1
823 | activation=leaky
824 |
825 | [upsample]
826 | stride=2
827 |
828 | [route]
829 | layers = 85
830 |
831 | [convolutional]
832 | batch_normalize=1
833 | filters=256
834 | size=1
835 | stride=1
836 | pad=1
837 | activation=leaky
838 |
839 | [route]
840 | layers = -1, -3
841 |
842 | [convolutional]
843 | batch_normalize=1
844 | filters=256
845 | size=1
846 | stride=1
847 | pad=1
848 | activation=leaky
849 |
850 | [convolutional]
851 | batch_normalize=1
852 | size=3
853 | stride=1
854 | pad=1
855 | filters=512
856 | activation=leaky
857 |
858 | [convolutional]
859 | batch_normalize=1
860 | filters=256
861 | size=1
862 | stride=1
863 | pad=1
864 | activation=leaky
865 |
866 | [convolutional]
867 | batch_normalize=1
868 | size=3
869 | stride=1
870 | pad=1
871 | filters=512
872 | activation=leaky
873 |
874 | [convolutional]
875 | batch_normalize=1
876 | filters=256
877 | size=1
878 | stride=1
879 | pad=1
880 | activation=leaky
881 |
882 | [convolutional]
883 | batch_normalize=1
884 | filters=128
885 | size=1
886 | stride=1
887 | pad=1
888 | activation=leaky
889 |
890 | [upsample]
891 | stride=2
892 |
893 | [route]
894 | layers = 54
895 |
896 | [convolutional]
897 | batch_normalize=1
898 | filters=128
899 | size=1
900 | stride=1
901 | pad=1
902 | activation=leaky
903 |
904 | [route]
905 | layers = -1, -3
906 |
907 | [convolutional]
908 | batch_normalize=1
909 | filters=128
910 | size=1
911 | stride=1
912 | pad=1
913 | activation=leaky
914 |
915 | [convolutional]
916 | batch_normalize=1
917 | size=3
918 | stride=1
919 | pad=1
920 | filters=256
921 | activation=leaky
922 |
923 | [convolutional]
924 | batch_normalize=1
925 | filters=128
926 | size=1
927 | stride=1
928 | pad=1
929 | activation=leaky
930 |
931 | [convolutional]
932 | batch_normalize=1
933 | size=3
934 | stride=1
935 | pad=1
936 | filters=256
937 | activation=leaky
938 |
939 | [convolutional]
940 | batch_normalize=1
941 | filters=128
942 | size=1
943 | stride=1
944 | pad=1
945 | activation=leaky
946 |
947 | ##########################
948 |
949 | [convolutional]
950 | batch_normalize=1
951 | size=3
952 | stride=1
953 | pad=1
954 | filters=256
955 | activation=leaky
956 |
957 | [convolutional]
958 | size=1
959 | stride=1
960 | pad=1
961 | filters=255
962 | activation=linear
963 |
964 |
965 | [yolo]
966 | mask = 0,1,2
967 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
968 | classes=80
969 | num=9
970 | jitter=.3
971 | ignore_thresh = .7
972 | truth_thresh = 1
973 | scale_x_y = 1.2
974 | iou_thresh=0.213
975 | cls_normalizer=1.0
976 | iou_normalizer=0.07
977 | iou_loss=ciou
978 | nms_kind=greedynms
979 | beta_nms=0.6
980 |
981 |
982 | [route]
983 | layers = -4
984 |
985 | [convolutional]
986 | batch_normalize=1
987 | size=3
988 | stride=2
989 | pad=1
990 | filters=256
991 | activation=leaky
992 |
993 | [route]
994 | layers = -1, -16
995 |
996 | [convolutional]
997 | batch_normalize=1
998 | filters=256
999 | size=1
1000 | stride=1
1001 | pad=1
1002 | activation=leaky
1003 |
1004 | [convolutional]
1005 | batch_normalize=1
1006 | size=3
1007 | stride=1
1008 | pad=1
1009 | filters=512
1010 | activation=leaky
1011 |
1012 | [convolutional]
1013 | batch_normalize=1
1014 | filters=256
1015 | size=1
1016 | stride=1
1017 | pad=1
1018 | activation=leaky
1019 |
1020 | [convolutional]
1021 | batch_normalize=1
1022 | size=3
1023 | stride=1
1024 | pad=1
1025 | filters=512
1026 | activation=leaky
1027 |
1028 | [convolutional]
1029 | batch_normalize=1
1030 | filters=256
1031 | size=1
1032 | stride=1
1033 | pad=1
1034 | activation=leaky
1035 |
1036 | [convolutional]
1037 | batch_normalize=1
1038 | size=3
1039 | stride=1
1040 | pad=1
1041 | filters=512
1042 | activation=leaky
1043 |
1044 | [convolutional]
1045 | size=1
1046 | stride=1
1047 | pad=1
1048 | filters=255
1049 | activation=linear
1050 |
1051 |
1052 | [yolo]
1053 | mask = 3,4,5
1054 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1055 | classes=80
1056 | num=9
1057 | jitter=.3
1058 | ignore_thresh = .7
1059 | truth_thresh = 1
1060 | scale_x_y = 1.1
1061 | iou_thresh=0.213
1062 | cls_normalizer=1.0
1063 | iou_normalizer=0.07
1064 | iou_loss=ciou
1065 | nms_kind=greedynms
1066 | beta_nms=0.6
1067 |
1068 |
1069 | [route]
1070 | layers = -4
1071 |
1072 | [convolutional]
1073 | batch_normalize=1
1074 | size=3
1075 | stride=2
1076 | pad=1
1077 | filters=512
1078 | activation=leaky
1079 |
1080 | [route]
1081 | layers = -1, -37
1082 |
1083 | [convolutional]
1084 | batch_normalize=1
1085 | filters=512
1086 | size=1
1087 | stride=1
1088 | pad=1
1089 | activation=leaky
1090 |
1091 | [convolutional]
1092 | batch_normalize=1
1093 | size=3
1094 | stride=1
1095 | pad=1
1096 | filters=1024
1097 | activation=leaky
1098 |
1099 | [convolutional]
1100 | batch_normalize=1
1101 | filters=512
1102 | size=1
1103 | stride=1
1104 | pad=1
1105 | activation=leaky
1106 |
1107 | [convolutional]
1108 | batch_normalize=1
1109 | size=3
1110 | stride=1
1111 | pad=1
1112 | filters=1024
1113 | activation=leaky
1114 |
1115 | [convolutional]
1116 | batch_normalize=1
1117 | filters=512
1118 | size=1
1119 | stride=1
1120 | pad=1
1121 | activation=leaky
1122 |
1123 | [convolutional]
1124 | batch_normalize=1
1125 | size=3
1126 | stride=1
1127 | pad=1
1128 | filters=1024
1129 | activation=leaky
1130 |
1131 | [convolutional]
1132 | size=1
1133 | stride=1
1134 | pad=1
1135 | filters=255
1136 | activation=linear
1137 |
1138 |
1139 | [yolo]
1140 | mask = 6,7,8
1141 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1142 | classes=80
1143 | num=9
1144 | jitter=.3
1145 | ignore_thresh = .7
1146 | truth_thresh = 1
1147 | random=1
1148 | scale_x_y = 1.05
1149 | iou_thresh=0.213
1150 | cls_normalizer=1.0
1151 | iou_normalizer=0.07
1152 | iou_loss=ciou
1153 | nms_kind=greedynms
1154 | beta_nms=0.6
1155 |
--------------------------------------------------------------------------------
/data/coco.data:
--------------------------------------------------------------------------------
1 | classes=80
2 | train=../coco/train2017.txt
3 | valid=../coco/testdev2017.txt
4 | names=data/coco.names
5 |
--------------------------------------------------------------------------------
/data/coco.names:
--------------------------------------------------------------------------------
1 | person
2 | bicycle
3 | car
4 | motorcycle
5 | airplane
6 | bus
7 | train
8 | truck
9 | boat
10 | traffic light
11 | fire hydrant
12 | stop sign
13 | parking meter
14 | bench
15 | bird
16 | cat
17 | dog
18 | horse
19 | sheep
20 | cow
21 | elephant
22 | bear
23 | zebra
24 | giraffe
25 | backpack
26 | umbrella
27 | handbag
28 | tie
29 | suitcase
30 | frisbee
31 | skis
32 | snowboard
33 | sports ball
34 | kite
35 | baseball bat
36 | baseball glove
37 | skateboard
38 | surfboard
39 | tennis racket
40 | bottle
41 | wine glass
42 | cup
43 | fork
44 | knife
45 | spoon
46 | bowl
47 | banana
48 | apple
49 | sandwich
50 | orange
51 | broccoli
52 | carrot
53 | hot dog
54 | pizza
55 | donut
56 | cake
57 | chair
58 | couch
59 | potted plant
60 | bed
61 | dining table
62 | toilet
63 | tv
64 | laptop
65 | mouse
66 | remote
67 | keyboard
68 | cell phone
69 | microwave
70 | oven
71 | toaster
72 | sink
73 | refrigerator
74 | book
75 | clock
76 | vase
77 | scissors
78 | teddy bear
79 | hair drier
80 | toothbrush
81 |
--------------------------------------------------------------------------------
/data/coco.yaml:
--------------------------------------------------------------------------------
1 | # train and val datasets (image directory or *.txt file with image paths)
2 | train: ../coco/train2017.txt # 118k images
3 | val: ../coco/val2017.txt # 5k images
4 | test: ../coco/testdev2017.txt # 20k images for submission to https://competitions.codalab.org/competitions/20794
5 |
6 | # number of classes
7 | nc: 80
8 |
9 | # class names
10 | names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
11 | 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
12 | 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
13 | 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
14 | 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
15 | 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
16 | 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
17 | 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
18 | 'hair drier', 'toothbrush']
19 |
--------------------------------------------------------------------------------
/data/coco1.data:
--------------------------------------------------------------------------------
1 | classes=80
2 | train=data/coco1.txt
3 | valid=data/coco1.txt
4 | names=data/coco.names
5 |
--------------------------------------------------------------------------------
/data/coco1.txt:
--------------------------------------------------------------------------------
1 | ../coco/images/train2017/000000109622.jpg
2 |
--------------------------------------------------------------------------------
/data/coco16.data:
--------------------------------------------------------------------------------
1 | classes=80
2 | train=data/coco16.txt
3 | valid=data/coco16.txt
4 | names=data/coco.names
5 |
--------------------------------------------------------------------------------
/data/coco16.txt:
--------------------------------------------------------------------------------
1 | ../coco/images/train2017/000000109622.jpg
2 | ../coco/images/train2017/000000160694.jpg
3 | ../coco/images/train2017/000000308590.jpg
4 | ../coco/images/train2017/000000327573.jpg
5 | ../coco/images/train2017/000000062929.jpg
6 | ../coco/images/train2017/000000512793.jpg
7 | ../coco/images/train2017/000000371735.jpg
8 | ../coco/images/train2017/000000148118.jpg
9 | ../coco/images/train2017/000000309856.jpg
10 | ../coco/images/train2017/000000141882.jpg
11 | ../coco/images/train2017/000000318783.jpg
12 | ../coco/images/train2017/000000337760.jpg
13 | ../coco/images/train2017/000000298197.jpg
14 | ../coco/images/train2017/000000042421.jpg
15 | ../coco/images/train2017/000000328898.jpg
16 | ../coco/images/train2017/000000458856.jpg
17 |
--------------------------------------------------------------------------------
/data/coco1cls.data:
--------------------------------------------------------------------------------
1 | classes=1
2 | train=data/coco1cls.txt
3 | valid=data/coco1cls.txt
4 | names=data/coco.names
5 |
--------------------------------------------------------------------------------
/data/coco1cls.txt:
--------------------------------------------------------------------------------
1 | ../coco/images/train2017/000000000901.jpg
2 | ../coco/images/train2017/000000001464.jpg
3 | ../coco/images/train2017/000000003220.jpg
4 | ../coco/images/train2017/000000003365.jpg
5 | ../coco/images/train2017/000000004772.jpg
6 | ../coco/images/train2017/000000009987.jpg
7 | ../coco/images/train2017/000000010498.jpg
8 | ../coco/images/train2017/000000012455.jpg
9 | ../coco/images/train2017/000000013992.jpg
10 | ../coco/images/train2017/000000014125.jpg
11 | ../coco/images/train2017/000000016314.jpg
12 | ../coco/images/train2017/000000016670.jpg
13 | ../coco/images/train2017/000000018412.jpg
14 | ../coco/images/train2017/000000021212.jpg
15 | ../coco/images/train2017/000000021826.jpg
16 | ../coco/images/train2017/000000030566.jpg
17 |
--------------------------------------------------------------------------------
/data/coco2014.data:
--------------------------------------------------------------------------------
1 | classes=80
2 | train=../coco/trainvalno5k.txt
3 | valid=../coco/5k.txt
4 | names=data/coco.names
5 |
--------------------------------------------------------------------------------
/data/coco2017.data:
--------------------------------------------------------------------------------
1 | classes=80
2 | train=../coco/train2017.txt
3 | valid=../coco/val2017.txt
4 | names=data/coco.names
5 |
--------------------------------------------------------------------------------
/data/coco64.data:
--------------------------------------------------------------------------------
1 | classes=80
2 | train=data/coco64.txt
3 | valid=data/coco64.txt
4 | names=data/coco.names
5 |
--------------------------------------------------------------------------------
/data/coco64.txt:
--------------------------------------------------------------------------------
1 | ../coco/images/train2017/000000109622.jpg
2 | ../coco/images/train2017/000000160694.jpg
3 | ../coco/images/train2017/000000308590.jpg
4 | ../coco/images/train2017/000000327573.jpg
5 | ../coco/images/train2017/000000062929.jpg
6 | ../coco/images/train2017/000000512793.jpg
7 | ../coco/images/train2017/000000371735.jpg
8 | ../coco/images/train2017/000000148118.jpg
9 | ../coco/images/train2017/000000309856.jpg
10 | ../coco/images/train2017/000000141882.jpg
11 | ../coco/images/train2017/000000318783.jpg
12 | ../coco/images/train2017/000000337760.jpg
13 | ../coco/images/train2017/000000298197.jpg
14 | ../coco/images/train2017/000000042421.jpg
15 | ../coco/images/train2017/000000328898.jpg
16 | ../coco/images/train2017/000000458856.jpg
17 | ../coco/images/train2017/000000073824.jpg
18 | ../coco/images/train2017/000000252846.jpg
19 | ../coco/images/train2017/000000459590.jpg
20 | ../coco/images/train2017/000000273650.jpg
21 | ../coco/images/train2017/000000331311.jpg
22 | ../coco/images/train2017/000000156326.jpg
23 | ../coco/images/train2017/000000262985.jpg
24 | ../coco/images/train2017/000000253580.jpg
25 | ../coco/images/train2017/000000447976.jpg
26 | ../coco/images/train2017/000000378077.jpg
27 | ../coco/images/train2017/000000259913.jpg
28 | ../coco/images/train2017/000000424553.jpg
29 | ../coco/images/train2017/000000000612.jpg
30 | ../coco/images/train2017/000000267625.jpg
31 | ../coco/images/train2017/000000566012.jpg
32 | ../coco/images/train2017/000000196664.jpg
33 | ../coco/images/train2017/000000363331.jpg
34 | ../coco/images/train2017/000000057992.jpg
35 | ../coco/images/train2017/000000520047.jpg
36 | ../coco/images/train2017/000000453903.jpg
37 | ../coco/images/train2017/000000162083.jpg
38 | ../coco/images/train2017/000000268516.jpg
39 | ../coco/images/train2017/000000277436.jpg
40 | ../coco/images/train2017/000000189744.jpg
41 | ../coco/images/train2017/000000041128.jpg
42 | ../coco/images/train2017/000000527728.jpg
43 | ../coco/images/train2017/000000465269.jpg
44 | ../coco/images/train2017/000000246833.jpg
45 | ../coco/images/train2017/000000076784.jpg
46 | ../coco/images/train2017/000000323715.jpg
47 | ../coco/images/train2017/000000560463.jpg
48 | ../coco/images/train2017/000000006263.jpg
49 | ../coco/images/train2017/000000094701.jpg
50 | ../coco/images/train2017/000000521359.jpg
51 | ../coco/images/train2017/000000302903.jpg
52 | ../coco/images/train2017/000000047559.jpg
53 | ../coco/images/train2017/000000480583.jpg
54 | ../coco/images/train2017/000000050025.jpg
55 | ../coco/images/train2017/000000084512.jpg
56 | ../coco/images/train2017/000000508913.jpg
57 | ../coco/images/train2017/000000093708.jpg
58 | ../coco/images/train2017/000000070493.jpg
59 | ../coco/images/train2017/000000539270.jpg
60 | ../coco/images/train2017/000000474402.jpg
61 | ../coco/images/train2017/000000209842.jpg
62 | ../coco/images/train2017/000000028820.jpg
63 | ../coco/images/train2017/000000154257.jpg
64 | ../coco/images/train2017/000000342499.jpg
65 |
--------------------------------------------------------------------------------
/data/coco_paper.names:
--------------------------------------------------------------------------------
1 | person
2 | bicycle
3 | car
4 | motorcycle
5 | airplane
6 | bus
7 | train
8 | truck
9 | boat
10 | traffic light
11 | fire hydrant
12 | street sign
13 | stop sign
14 | parking meter
15 | bench
16 | bird
17 | cat
18 | dog
19 | horse
20 | sheep
21 | cow
22 | elephant
23 | bear
24 | zebra
25 | giraffe
26 | hat
27 | backpack
28 | umbrella
29 | shoe
30 | eye glasses
31 | handbag
32 | tie
33 | suitcase
34 | frisbee
35 | skis
36 | snowboard
37 | sports ball
38 | kite
39 | baseball bat
40 | baseball glove
41 | skateboard
42 | surfboard
43 | tennis racket
44 | bottle
45 | plate
46 | wine glass
47 | cup
48 | fork
49 | knife
50 | spoon
51 | bowl
52 | banana
53 | apple
54 | sandwich
55 | orange
56 | broccoli
57 | carrot
58 | hot dog
59 | pizza
60 | donut
61 | cake
62 | chair
63 | couch
64 | potted plant
65 | bed
66 | mirror
67 | dining table
68 | window
69 | desk
70 | toilet
71 | door
72 | tv
73 | laptop
74 | mouse
75 | remote
76 | keyboard
77 | cell phone
78 | microwave
79 | oven
80 | toaster
81 | sink
82 | refrigerator
83 | blender
84 | book
85 | clock
86 | vase
87 | scissors
88 | teddy bear
89 | hair drier
90 | toothbrush
91 | hair brush
--------------------------------------------------------------------------------
/data/get_coco2014.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | # Zip coco folder
3 | # zip -r coco.zip coco
4 | # tar -czvf coco.tar.gz coco
5 |
6 | # Download labels from Google Drive, accepting presented query
7 | filename="coco2014labels.zip"
8 | fileid="1s6-CmF5_SElM28r52P1OUrCcuXZN-SFo"
9 | curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id=${fileid}" > /dev/null
10 | curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=${fileid}" -o ${filename}
11 | rm ./cookie
12 |
13 | # Unzip labels
14 | unzip -q ${filename} # for coco.zip
15 | # tar -xzf ${filename} # for coco.tar.gz
16 | rm ${filename}
17 |
18 | # Download and unzip images
19 | cd coco/images
20 | f="train2014.zip" && curl http://images.cocodataset.org/zips/$f -o $f && unzip -q $f && rm $f
21 | f="val2014.zip" && curl http://images.cocodataset.org/zips/$f -o $f && unzip -q $f && rm $f
22 |
23 | # cd out
24 | cd ../..
25 |
--------------------------------------------------------------------------------
/data/get_coco2017.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | # Zip coco folder
3 | # zip -r coco.zip coco
4 | # tar -czvf coco.tar.gz coco
5 |
6 | # Download labels from Google Drive, accepting presented query
7 | filename="coco2017labels.zip"
8 | fileid="1cXZR_ckHki6nddOmcysCuuJFM--T-Q6L"
9 | curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id=${fileid}" > /dev/null
10 | curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=${fileid}" -o ${filename}
11 | rm ./cookie
12 |
13 | # Unzip labels
14 | unzip -q ${filename} # for coco.zip
15 | # tar -xzf ${filename} # for coco.tar.gz
16 | rm ${filename}
17 |
18 | # Download and unzip images
19 | cd coco/images
20 | f="train2017.zip" && curl http://images.cocodataset.org/zips/$f -o $f && unzip -q $f && rm $f
21 | f="val2017.zip" && curl http://images.cocodataset.org/zips/$f -o $f && unzip -q $f && rm $f
22 |
23 | # cd out
24 | cd ../..
25 |
--------------------------------------------------------------------------------
/data/hyp.scratch.s.yaml:
--------------------------------------------------------------------------------
1 | lr0: 0.01 # initial learning rate (SGD=1E-2, Adam=1E-3)
2 | lrf: 0.1 # final OneCycleLR learning rate (lr0 * lrf)
3 | momentum: 0.937 # SGD momentum/Adam beta1
4 | weight_decay: 0.0005 # optimizer weight decay 5e-4
5 | warmup_epochs: 3.0 # warmup epochs (fractions ok)
6 | warmup_momentum: 0.8 # warmup initial momentum
7 | warmup_bias_lr: 0.1 # warmup initial bias lr
8 | box: 0.05 # box loss gain
9 | cls: 0.5 # cls loss gain
10 | cls_pw: 1.0 # cls BCELoss positive_weight
11 | obj: 1.0 # obj loss gain (scale with pixels)
12 | obj_pw: 1.0 # obj BCELoss positive_weight
13 | iou_t: 0.20 # IoU training threshold
14 | anchor_t: 4.0 # anchor-multiple threshold
15 | # anchors: 3 # anchors per output layer (0 to ignore)
16 | fl_gamma: 0.0 # focal loss gamma (efficientDet default gamma=1.5)
17 | hsv_h: 0.015 # image HSV-Hue augmentation (fraction)
18 | hsv_s: 0.7 # image HSV-Saturation augmentation (fraction)
19 | hsv_v: 0.4 # image HSV-Value augmentation (fraction)
20 | degrees: 0.0 # image rotation (+/- deg)
21 | translate: 0.0 # image translation (+/- fraction)
22 | scale: 0.5 # image scale (+/- gain)
23 | shear: 0.0 # image shear (+/- deg)
24 | perspective: 0.0 # image perspective (+/- fraction), range 0-0.001
25 | flipud: 0.0 # image flip up-down (probability)
26 | fliplr: 0.5 # image flip left-right (probability)
27 | mosaic: 1.0 # image mosaic (probability)
28 | mixup: 0.0 # image mixup (probability)
29 |
--------------------------------------------------------------------------------
/data/hyp.scratch.yaml:
--------------------------------------------------------------------------------
1 | lr0: 0.01 # initial learning rate (SGD=1E-2, Adam=1E-3)
2 | lrf: 0.1 # final OneCycleLR learning rate (lr0 * lrf)
3 | momentum: 0.937 # SGD momentum/Adam beta1
4 | weight_decay: 0.0005 # optimizer weight decay 5e-4
5 | warmup_epochs: 3.0 # warmup epochs (fractions ok)
6 | warmup_momentum: 0.8 # warmup initial momentum
7 | warmup_bias_lr: 0.1 # warmup initial bias lr
8 | box: 0.05 # box loss gain
9 | cls: 0.3 # cls loss gain
10 | cls_pw: 1.0 # cls BCELoss positive_weight
11 | obj: 0.6 # obj loss gain (scale with pixels)
12 | obj_pw: 1.0 # obj BCELoss positive_weight
13 | iou_t: 0.20 # IoU training threshold
14 | anchor_t: 4.0 # anchor-multiple threshold
15 | # anchors: 3 # anchors per output layer (0 to ignore)
16 | fl_gamma: 0.0 # focal loss gamma (efficientDet default gamma=1.5)
17 | hsv_h: 0.015 # image HSV-Hue augmentation (fraction)
18 | hsv_s: 0.7 # image HSV-Saturation augmentation (fraction)
19 | hsv_v: 0.4 # image HSV-Value augmentation (fraction)
20 | degrees: 0.0 # image rotation (+/- deg)
21 | translate: 0.1 # image translation (+/- fraction)
22 | scale: 0.9 # image scale (+/- gain)
23 | shear: 0.0 # image shear (+/- deg)
24 | perspective: 0.0 # image perspective (+/- fraction), range 0-0.001
25 | flipud: 0.0 # image flip up-down (probability)
26 | fliplr: 0.5 # image flip left-right (probability)
27 | mosaic: 1.0 # image mosaic (probability)
28 | mixup: 0.0 # image mixup (probability)
29 |
--------------------------------------------------------------------------------
/data/samples/bus.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/WongKinYiu/PyTorch_YOLOv4/6e88dc21813e614c9848a2767fd0bac13d26fd51/data/samples/bus.jpg
--------------------------------------------------------------------------------
/data/samples/zidane.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/WongKinYiu/PyTorch_YOLOv4/6e88dc21813e614c9848a2767fd0bac13d26fd51/data/samples/zidane.jpg
--------------------------------------------------------------------------------
/detect.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import os
3 | import platform
4 | import shutil
5 | import time
6 | from pathlib import Path
7 |
8 | import cv2
9 | import torch
10 | import torch.backends.cudnn as cudnn
11 | from numpy import random
12 |
13 | from utils.google_utils import attempt_load
14 | from utils.datasets import LoadStreams, LoadImages
15 | from utils.general import (
16 | check_img_size, non_max_suppression, apply_classifier, scale_coords, xyxy2xywh, strip_optimizer)
17 | from utils.plots import plot_one_box
18 | from utils.torch_utils import select_device, load_classifier, time_synchronized
19 |
20 | from models.models import *
21 | from utils.datasets import *
22 | from utils.general import *
23 |
24 | def load_classes(path):
25 | # Loads *.names file at 'path'
26 | with open(path, 'r') as f:
27 | names = f.read().split('\n')
28 | return list(filter(None, names)) # filter removes empty strings (such as last line)
29 |
30 | def detect(save_img=False):
31 | out, source, weights, view_img, save_txt, imgsz, cfg, names = \
32 | opt.output, opt.source, opt.weights, opt.view_img, opt.save_txt, opt.img_size, opt.cfg, opt.names
33 | webcam = source == '0' or source.startswith('rtsp') or source.startswith('http') or source.endswith('.txt')
34 |
35 | # Initialize
36 | device = select_device(opt.device)
37 | if os.path.exists(out):
38 | shutil.rmtree(out) # delete output folder
39 | os.makedirs(out) # make new output folder
40 | half = device.type != 'cpu' # half precision only supported on CUDA
41 |
42 | # Load model
43 | model = Darknet(cfg, imgsz).cuda()
44 | try:
45 | model.load_state_dict(torch.load(weights[0], map_location=device)['model'])
46 | #model = attempt_load(weights, map_location=device) # load FP32 model
47 | #imgsz = check_img_size(imgsz, s=model.stride.max()) # check img_size
48 | except:
49 | load_darknet_weights(model, weights[0])
50 | model.to(device).eval()
51 | if half:
52 | model.half() # to FP16
53 |
54 | # Second-stage classifier
55 | classify = False
56 | if classify:
57 | modelc = load_classifier(name='resnet101', n=2) # initialize
58 | modelc.load_state_dict(torch.load('weights/resnet101.pt', map_location=device)['model']) # load weights
59 | modelc.to(device).eval()
60 |
61 | # Set Dataloader
62 | vid_path, vid_writer = None, None
63 | if webcam:
64 | view_img = True
65 | cudnn.benchmark = True # set True to speed up constant image size inference
66 | dataset = LoadStreams(source, img_size=imgsz)
67 | else:
68 | save_img = True
69 | dataset = LoadImages(source, img_size=imgsz, auto_size=64)
70 |
71 | # Get names and colors
72 | names = load_classes(names)
73 | colors = [[random.randint(0, 255) for _ in range(3)] for _ in range(len(names))]
74 |
75 | # Run inference
76 | t0 = time.time()
77 | img = torch.zeros((1, 3, imgsz, imgsz), device=device) # init img
78 | _ = model(img.half() if half else img) if device.type != 'cpu' else None # run once
79 | for path, img, im0s, vid_cap in dataset:
80 | img = torch.from_numpy(img).to(device)
81 | img = img.half() if half else img.float() # uint8 to fp16/32
82 | img /= 255.0 # 0 - 255 to 0.0 - 1.0
83 | if img.ndimension() == 3:
84 | img = img.unsqueeze(0)
85 |
86 | # Inference
87 | t1 = time_synchronized()
88 | pred = model(img, augment=opt.augment)[0]
89 |
90 | # Apply NMS
91 | pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres, classes=opt.classes, agnostic=opt.agnostic_nms)
92 | t2 = time_synchronized()
93 |
94 | # Apply Classifier
95 | if classify:
96 | pred = apply_classifier(pred, modelc, img, im0s)
97 |
98 | # Process detections
99 | for i, det in enumerate(pred): # detections per image
100 | if webcam: # batch_size >= 1
101 | p, s, im0 = path[i], '%g: ' % i, im0s[i].copy()
102 | else:
103 | p, s, im0 = path, '', im0s
104 |
105 | save_path = str(Path(out) / Path(p).name)
106 | txt_path = str(Path(out) / Path(p).stem) + ('_%g' % dataset.frame if dataset.mode == 'video' else '')
107 | s += '%gx%g ' % img.shape[2:] # print string
108 | gn = torch.tensor(im0.shape)[[1, 0, 1, 0]] # normalization gain whwh
109 | if det is not None and len(det):
110 | # Rescale boxes from img_size to im0 size
111 | det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()
112 |
113 | # Print results
114 | for c in det[:, -1].unique():
115 | n = (det[:, -1] == c).sum() # detections per class
116 | s += '%g %ss, ' % (n, names[int(c)]) # add to string
117 |
118 | # Write results
119 | for *xyxy, conf, cls in det:
120 | if save_txt: # Write to file
121 | xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist() # normalized xywh
122 | with open(txt_path + '.txt', 'a') as f:
123 | f.write(('%g ' * 5 + '\n') % (cls, *xywh)) # label format
124 |
125 | if save_img or view_img: # Add bbox to image
126 | label = '%s %.2f' % (names[int(cls)], conf)
127 | plot_one_box(xyxy, im0, label=label, color=colors[int(cls)], line_thickness=3)
128 |
129 | # Print time (inference + NMS)
130 | print('%sDone. (%.3fs)' % (s, t2 - t1))
131 |
132 | # Stream results
133 | if view_img:
134 | cv2.imshow(p, im0)
135 | if cv2.waitKey(1) == ord('q'): # q to quit
136 | raise StopIteration
137 |
138 | # Save results (image with detections)
139 | if save_img:
140 | if dataset.mode == 'images':
141 | cv2.imwrite(save_path, im0)
142 | else:
143 | if vid_path != save_path: # new video
144 | vid_path = save_path
145 | if isinstance(vid_writer, cv2.VideoWriter):
146 | vid_writer.release() # release previous video writer
147 |
148 | fourcc = 'mp4v' # output video codec
149 | fps = vid_cap.get(cv2.CAP_PROP_FPS)
150 | w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
151 | h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
152 | vid_writer = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*fourcc), fps, (w, h))
153 | vid_writer.write(im0)
154 |
155 | if save_txt or save_img:
156 | print('Results saved to %s' % Path(out))
157 | if platform == 'darwin' and not opt.update: # MacOS
158 | os.system('open ' + save_path)
159 |
160 | print('Done. (%.3fs)' % (time.time() - t0))
161 |
162 |
163 | if __name__ == '__main__':
164 | parser = argparse.ArgumentParser()
165 | parser.add_argument('--weights', nargs='+', type=str, default='yolov4.weights', help='model.pt path(s)')
166 | parser.add_argument('--source', type=str, default='inference/images', help='source') # file/folder, 0 for webcam
167 | parser.add_argument('--output', type=str, default='inference/output', help='output folder') # output folder
168 | parser.add_argument('--img-size', type=int, default=640, help='inference size (pixels)')
169 | parser.add_argument('--conf-thres', type=float, default=0.4, help='object confidence threshold')
170 | parser.add_argument('--iou-thres', type=float, default=0.5, help='IOU threshold for NMS')
171 | parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
172 | parser.add_argument('--view-img', action='store_true', help='display results')
173 | parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
174 | parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --class 0, or --class 0 2 3')
175 | parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')
176 | parser.add_argument('--augment', action='store_true', help='augmented inference')
177 | parser.add_argument('--update', action='store_true', help='update all models')
178 | parser.add_argument('--cfg', type=str, default='models/yolov4.cfg', help='*.cfg path')
179 | parser.add_argument('--names', type=str, default='data/coco.names', help='*.cfg path')
180 | opt = parser.parse_args()
181 | print(opt)
182 |
183 | with torch.no_grad():
184 | if opt.update: # update all models (to fix SourceChangeWarning)
185 | for opt.weights in ['']:
186 | detect()
187 | strip_optimizer(opt.weights)
188 | else:
189 | detect()
190 |
--------------------------------------------------------------------------------
/images/scalingCSP.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/WongKinYiu/PyTorch_YOLOv4/6e88dc21813e614c9848a2767fd0bac13d26fd51/images/scalingCSP.png
--------------------------------------------------------------------------------
/models/export.py:
--------------------------------------------------------------------------------
1 | import argparse
2 |
3 | import torch
4 |
5 | from utils.google_utils import attempt_download
6 |
7 | if __name__ == '__main__':
8 | parser = argparse.ArgumentParser()
9 | parser.add_argument('--weights', type=str, default='./yolov4-csp.pt', help='weights path')
10 | parser.add_argument('--img-size', nargs='+', type=int, default=[640, 640], help='image size')
11 | parser.add_argument('--batch-size', type=int, default=1, help='batch size')
12 | opt = parser.parse_args()
13 | opt.img_size *= 2 if len(opt.img_size) == 1 else 1 # expand
14 | print(opt)
15 |
16 | # Input
17 | img = torch.zeros((opt.batch_size, 3, *opt.img_size)) # image size(1,3,320,192) iDetection
18 |
19 | # Load PyTorch model
20 | attempt_download(opt.weights)
21 | model = torch.load(opt.weights, map_location=torch.device('cpu'))['model'].float()
22 | model.eval()
23 | model.model[-1].export = True # set Detect() layer export=True
24 | y = model(img) # dry run
25 |
26 | # TorchScript export
27 | try:
28 | print('\nStarting TorchScript export with torch %s...' % torch.__version__)
29 | f = opt.weights.replace('.pt', '.torchscript.pt') # filename
30 | ts = torch.jit.trace(model, img)
31 | ts.save(f)
32 | print('TorchScript export success, saved as %s' % f)
33 | except Exception as e:
34 | print('TorchScript export failure: %s' % e)
35 |
36 | # ONNX export
37 | try:
38 | import onnx
39 |
40 | print('\nStarting ONNX export with onnx %s...' % onnx.__version__)
41 | f = opt.weights.replace('.pt', '.onnx') # filename
42 | model.fuse() # only for ONNX
43 | torch.onnx.export(model, img, f, verbose=False, opset_version=12, input_names=['images'],
44 | output_names=['classes', 'boxes'] if y is None else ['output'])
45 |
46 | # Checks
47 | onnx_model = onnx.load(f) # load onnx model
48 | onnx.checker.check_model(onnx_model) # check onnx model
49 | print(onnx.helper.printable_graph(onnx_model.graph)) # print a human readable model
50 | print('ONNX export success, saved as %s' % f)
51 | except Exception as e:
52 | print('ONNX export failure: %s' % e)
53 |
54 | # CoreML export
55 | try:
56 | import coremltools as ct
57 |
58 | print('\nStarting CoreML export with coremltools %s...' % ct.__version__)
59 | # convert model from torchscript and apply pixel scaling as per detect.py
60 | model = ct.convert(ts, inputs=[ct.ImageType(name='images', shape=img.shape, scale=1 / 255.0, bias=[0, 0, 0])])
61 | f = opt.weights.replace('.pt', '.mlmodel') # filename
62 | model.save(f)
63 | print('CoreML export success, saved as %s' % f)
64 | except Exception as e:
65 | print('CoreML export failure: %s' % e)
66 |
67 | # Finish
68 | print('\nExport complete. Visualize with https://github.com/lutzroeder/netron.')
69 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy == 1.17
2 | opencv-python >= 4.1
3 | torch == 1.6
4 | torchvision
5 | matplotlib
6 | pycocotools
7 | tqdm
8 | pillow
9 | tensorboard >= 1.14
10 |
--------------------------------------------------------------------------------
/test.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import glob
3 | import json
4 | import os
5 | from pathlib import Path
6 |
7 | import numpy as np
8 | import torch
9 | import yaml
10 | from tqdm import tqdm
11 |
12 | from utils.google_utils import attempt_load
13 | from utils.datasets import create_dataloader
14 | from utils.general import coco80_to_coco91_class, check_dataset, check_file, check_img_size, box_iou, \
15 | non_max_suppression, scale_coords, xyxy2xywh, xywh2xyxy, clip_coords, set_logging, increment_path
16 | from utils.loss import compute_loss
17 | from utils.metrics import ap_per_class
18 | from utils.plots import plot_images, output_to_target
19 | from utils.torch_utils import select_device, time_synchronized
20 |
21 | from models.models import *
22 |
23 | def load_classes(path):
24 | # Loads *.names file at 'path'
25 | with open(path, 'r') as f:
26 | names = f.read().split('\n')
27 | return list(filter(None, names)) # filter removes empty strings (such as last line)
28 |
29 |
30 | def test(data,
31 | weights=None,
32 | batch_size=16,
33 | imgsz=640,
34 | conf_thres=0.001,
35 | iou_thres=0.6, # for NMS
36 | save_json=False,
37 | single_cls=False,
38 | augment=False,
39 | verbose=False,
40 | model=None,
41 | dataloader=None,
42 | save_dir=Path(''), # for saving images
43 | save_txt=False, # for auto-labelling
44 | save_conf=False,
45 | plots=True,
46 | log_imgs=0): # number of logged images
47 |
48 | # Initialize/load model and set device
49 | training = model is not None
50 | if training: # called by train.py
51 | device = next(model.parameters()).device # get model device
52 |
53 | else: # called directly
54 | set_logging()
55 | device = select_device(opt.device, batch_size=batch_size)
56 | save_txt = opt.save_txt # save *.txt labels
57 |
58 | # Directories
59 | save_dir = Path(increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok)) # increment run
60 | (save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True, exist_ok=True) # make dir
61 |
62 | # Load model
63 | model = Darknet(opt.cfg).to(device)
64 |
65 | # load model
66 | try:
67 | ckpt = torch.load(weights[0], map_location=device) # load checkpoint
68 | ckpt['model'] = {k: v for k, v in ckpt['model'].items() if model.state_dict()[k].numel() == v.numel()}
69 | model.load_state_dict(ckpt['model'], strict=False)
70 | except:
71 | load_darknet_weights(model, weights[0])
72 | imgsz = check_img_size(imgsz, s=64) # check img_size
73 |
74 | # Half
75 | half = device.type != 'cpu' # half precision only supported on CUDA
76 | if half:
77 | model.half()
78 |
79 | # Configure
80 | model.eval()
81 | is_coco = data.endswith('coco.yaml') # is COCO dataset
82 | with open(data) as f:
83 | data = yaml.load(f, Loader=yaml.FullLoader) # model dict
84 | check_dataset(data) # check
85 | nc = 1 if single_cls else int(data['nc']) # number of classes
86 | iouv = torch.linspace(0.5, 0.95, 10).to(device) # iou vector for mAP@0.5:0.95
87 | niou = iouv.numel()
88 |
89 | # Logging
90 | log_imgs, wandb = min(log_imgs, 100), None # ceil
91 | try:
92 | import wandb # Weights & Biases
93 | except ImportError:
94 | log_imgs = 0
95 |
96 | # Dataloader
97 | if not training:
98 | img = torch.zeros((1, 3, imgsz, imgsz), device=device) # init img
99 | _ = model(img.half() if half else img) if device.type != 'cpu' else None # run once
100 | path = data['test'] if opt.task == 'test' else data['val'] # path to val/test images
101 | dataloader = create_dataloader(path, imgsz, batch_size, 64, opt, pad=0.5, rect=True)[0]
102 |
103 | seen = 0
104 | try:
105 | names = model.names if hasattr(model, 'names') else model.module.names
106 | except:
107 | names = load_classes(opt.names)
108 | coco91class = coco80_to_coco91_class()
109 | s = ('%20s' + '%12s' * 6) % ('Class', 'Images', 'Targets', 'P', 'R', 'mAP@.5', 'mAP@.5:.95')
110 | p, r, f1, mp, mr, map50, map, t0, t1 = 0., 0., 0., 0., 0., 0., 0., 0., 0.
111 | loss = torch.zeros(3, device=device)
112 | jdict, stats, ap, ap_class, wandb_images = [], [], [], [], []
113 | for batch_i, (img, targets, paths, shapes) in enumerate(tqdm(dataloader, desc=s)):
114 | img = img.to(device, non_blocking=True)
115 | img = img.half() if half else img.float() # uint8 to fp16/32
116 | img /= 255.0 # 0 - 255 to 0.0 - 1.0
117 | targets = targets.to(device)
118 | nb, _, height, width = img.shape # batch size, channels, height, width
119 | whwh = torch.Tensor([width, height, width, height]).to(device)
120 |
121 | # Disable gradients
122 | with torch.no_grad():
123 | # Run model
124 | t = time_synchronized()
125 | inf_out, train_out = model(img, augment=augment) # inference and training outputs
126 | t0 += time_synchronized() - t
127 |
128 | # Compute loss
129 | if training: # if model has loss hyperparameters
130 | loss += compute_loss([x.float() for x in train_out], targets, model)[1][:3] # box, obj, cls
131 |
132 | # Run NMS
133 | t = time_synchronized()
134 | output = non_max_suppression(inf_out, conf_thres=conf_thres, iou_thres=iou_thres)
135 | t1 += time_synchronized() - t
136 |
137 | # Statistics per image
138 | for si, pred in enumerate(output):
139 | labels = targets[targets[:, 0] == si, 1:]
140 | nl = len(labels)
141 | tcls = labels[:, 0].tolist() if nl else [] # target class
142 | seen += 1
143 |
144 | if len(pred) == 0:
145 | if nl:
146 | stats.append((torch.zeros(0, niou, dtype=torch.bool), torch.Tensor(), torch.Tensor(), tcls))
147 | continue
148 |
149 | # Append to text file
150 | path = Path(paths[si])
151 | if save_txt:
152 | gn = torch.tensor(shapes[si][0])[[1, 0, 1, 0]] # normalization gain whwh
153 | x = pred.clone()
154 | x[:, :4] = scale_coords(img[si].shape[1:], x[:, :4], shapes[si][0], shapes[si][1]) # to original
155 | for *xyxy, conf, cls in x:
156 | xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist() # normalized xywh
157 | line = (cls, *xywh, conf) if save_conf else (cls, *xywh) # label format
158 | with open(save_dir / 'labels' / (path.stem + '.txt'), 'a') as f:
159 | f.write(('%g ' * len(line)).rstrip() % line + '\n')
160 |
161 | # W&B logging
162 | if plots and len(wandb_images) < log_imgs:
163 | box_data = [{"position": {"minX": xyxy[0], "minY": xyxy[1], "maxX": xyxy[2], "maxY": xyxy[3]},
164 | "class_id": int(cls),
165 | "box_caption": "%s %.3f" % (names[cls], conf),
166 | "scores": {"class_score": conf},
167 | "domain": "pixel"} for *xyxy, conf, cls in pred.tolist()]
168 | boxes = {"predictions": {"box_data": box_data, "class_labels": names}}
169 | wandb_images.append(wandb.Image(img[si], boxes=boxes, caption=path.name))
170 |
171 | # Clip boxes to image bounds
172 | clip_coords(pred, (height, width))
173 |
174 | # Append to pycocotools JSON dictionary
175 | if save_json:
176 | # [{"image_id": 42, "category_id": 18, "bbox": [258.15, 41.29, 348.26, 243.78], "score": 0.236}, ...
177 | image_id = int(path.stem) if path.stem.isnumeric() else path.stem
178 | box = pred[:, :4].clone() # xyxy
179 | scale_coords(img[si].shape[1:], box, shapes[si][0], shapes[si][1]) # to original shape
180 | box = xyxy2xywh(box) # xywh
181 | box[:, :2] -= box[:, 2:] / 2 # xy center to top-left corner
182 | for p, b in zip(pred.tolist(), box.tolist()):
183 | jdict.append({'image_id': image_id,
184 | 'category_id': coco91class[int(p[5])] if is_coco else int(p[5]),
185 | 'bbox': [round(x, 3) for x in b],
186 | 'score': round(p[4], 5)})
187 |
188 | # Assign all predictions as incorrect
189 | correct = torch.zeros(pred.shape[0], niou, dtype=torch.bool, device=device)
190 | if nl:
191 | detected = [] # target indices
192 | tcls_tensor = labels[:, 0]
193 |
194 | # target boxes
195 | tbox = xywh2xyxy(labels[:, 1:5]) * whwh
196 |
197 | # Per target class
198 | for cls in torch.unique(tcls_tensor):
199 | ti = (cls == tcls_tensor).nonzero(as_tuple=False).view(-1) # prediction indices
200 | pi = (cls == pred[:, 5]).nonzero(as_tuple=False).view(-1) # target indices
201 |
202 | # Search for detections
203 | if pi.shape[0]:
204 | # Prediction to target ious
205 | ious, i = box_iou(pred[pi, :4], tbox[ti]).max(1) # best ious, indices
206 |
207 | # Append detections
208 | detected_set = set()
209 | for j in (ious > iouv[0]).nonzero(as_tuple=False):
210 | d = ti[i[j]] # detected target
211 | if d.item() not in detected_set:
212 | detected_set.add(d.item())
213 | detected.append(d)
214 | correct[pi[j]] = ious[j] > iouv # iou_thres is 1xn
215 | if len(detected) == nl: # all targets already located in image
216 | break
217 |
218 | # Append statistics (correct, conf, pcls, tcls)
219 | stats.append((correct.cpu(), pred[:, 4].cpu(), pred[:, 5].cpu(), tcls))
220 |
221 | # Plot images
222 | if plots and batch_i < 3:
223 | f = save_dir / f'test_batch{batch_i}_labels.jpg' # filename
224 | plot_images(img, targets, paths, f, names) # labels
225 | f = save_dir / f'test_batch{batch_i}_pred.jpg'
226 | plot_images(img, output_to_target(output, width, height), paths, f, names) # predictions
227 |
228 | # Compute statistics
229 | stats = [np.concatenate(x, 0) for x in zip(*stats)] # to numpy
230 | if len(stats) and stats[0].any():
231 | p, r, ap, f1, ap_class = ap_per_class(*stats, plot=plots, fname=save_dir / 'precision-recall_curve.png')
232 | p, r, ap50, ap = p[:, 0], r[:, 0], ap[:, 0], ap.mean(1) # [P, R, AP@0.5, AP@0.5:0.95]
233 | mp, mr, map50, map = p.mean(), r.mean(), ap50.mean(), ap.mean()
234 | nt = np.bincount(stats[3].astype(np.int64), minlength=nc) # number of targets per class
235 | else:
236 | nt = torch.zeros(1)
237 |
238 | # W&B logging
239 | if plots and wandb:
240 | wandb.log({"Images": wandb_images})
241 | wandb.log({"Validation": [wandb.Image(str(x), caption=x.name) for x in sorted(save_dir.glob('test*.jpg'))]})
242 |
243 | # Print results
244 | pf = '%20s' + '%12.3g' * 6 # print format
245 | print(pf % ('all', seen, nt.sum(), mp, mr, map50, map))
246 |
247 | # Print results per class
248 | if verbose and nc > 1 and len(stats):
249 | for i, c in enumerate(ap_class):
250 | print(pf % (names[c], seen, nt[c], p[i], r[i], ap50[i], ap[i]))
251 |
252 | # Print speeds
253 | t = tuple(x / seen * 1E3 for x in (t0, t1, t0 + t1)) + (imgsz, imgsz, batch_size) # tuple
254 | if not training:
255 | print('Speed: %.1f/%.1f/%.1f ms inference/NMS/total per %gx%g image at batch-size %g' % t)
256 |
257 | # Save JSON
258 | if save_json and len(jdict):
259 | w = Path(weights[0] if isinstance(weights, list) else weights).stem if weights is not None else '' # weights
260 | anno_json = glob.glob('../coco/annotations/instances_val*.json')[0] # annotations json
261 | pred_json = str(save_dir / f"{w}_predictions.json") # predictions json
262 | print('\nEvaluating pycocotools mAP... saving %s...' % pred_json)
263 | with open(pred_json, 'w') as f:
264 | json.dump(jdict, f)
265 |
266 | try: # https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb
267 | from pycocotools.coco import COCO
268 | from pycocotools.cocoeval import COCOeval
269 |
270 | anno = COCO(anno_json) # init annotations api
271 | pred = anno.loadRes(pred_json) # init predictions api
272 | eval = COCOeval(anno, pred, 'bbox')
273 | if is_coco:
274 | eval.params.imgIds = [int(Path(x).stem) for x in dataloader.dataset.img_files] # image IDs to evaluate
275 | eval.evaluate()
276 | eval.accumulate()
277 | eval.summarize()
278 | map, map50 = eval.stats[:2] # update results (mAP@0.5:0.95, mAP@0.5)
279 | except Exception as e:
280 | print('ERROR: pycocotools unable to run: %s' % e)
281 |
282 | # Return results
283 | if not training:
284 | print('Results saved to %s' % save_dir)
285 | model.float() # for training
286 | maps = np.zeros(nc) + map
287 | for i, c in enumerate(ap_class):
288 | maps[c] = ap[i]
289 | return (mp, mr, map50, map, *(loss.cpu() / len(dataloader)).tolist()), maps, t
290 |
291 |
292 | if __name__ == '__main__':
293 | parser = argparse.ArgumentParser(prog='test.py')
294 | parser.add_argument('--weights', nargs='+', type=str, default='yolov4.pt', help='model.pt path(s)')
295 | parser.add_argument('--data', type=str, default='data/coco.yaml', help='*.data path')
296 | parser.add_argument('--batch-size', type=int, default=32, help='size of each image batch')
297 | parser.add_argument('--img-size', type=int, default=640, help='inference size (pixels)')
298 | parser.add_argument('--conf-thres', type=float, default=0.001, help='object confidence threshold')
299 | parser.add_argument('--iou-thres', type=float, default=0.65, help='IOU threshold for NMS')
300 | parser.add_argument('--task', default='val', help="'val', 'test', 'study'")
301 | parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
302 | parser.add_argument('--single-cls', action='store_true', help='treat as single-class dataset')
303 | parser.add_argument('--augment', action='store_true', help='augmented inference')
304 | parser.add_argument('--verbose', action='store_true', help='report mAP by class')
305 | parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
306 | parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels')
307 | parser.add_argument('--save-json', action='store_true', help='save a cocoapi-compatible JSON results file')
308 | parser.add_argument('--project', default='runs/test', help='save to project/name')
309 | parser.add_argument('--name', default='exp', help='save to project/name')
310 | parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
311 | parser.add_argument('--cfg', type=str, default='cfg/yolov4.cfg', help='*.cfg path')
312 | parser.add_argument('--names', type=str, default='data/coco.names', help='*.cfg path')
313 | opt = parser.parse_args()
314 | opt.save_json |= opt.data.endswith('coco.yaml')
315 | opt.data = check_file(opt.data) # check file
316 | print(opt)
317 |
318 | if opt.task in ['val', 'test']: # run normally
319 | test(opt.data,
320 | opt.weights,
321 | opt.batch_size,
322 | opt.img_size,
323 | opt.conf_thres,
324 | opt.iou_thres,
325 | opt.save_json,
326 | opt.single_cls,
327 | opt.augment,
328 | opt.verbose,
329 | save_txt=opt.save_txt,
330 | save_conf=opt.save_conf,
331 | )
332 |
333 | elif opt.task == 'study': # run over a range of settings and save/plot
334 | for weights in ['yolov4-pacsp.weights', 'yolov4-pacsp-x.weishts']:
335 | f = 'study_%s_%s.txt' % (Path(opt.data).stem, Path(weights).stem) # filename to save to
336 | x = list(range(320, 800, 64)) # x axis
337 | y = [] # y axis
338 | for i in x: # img-size
339 | print('\nRunning %s point %s...' % (f, i))
340 | r, _, t = test(opt.data, weights, opt.batch_size, i, opt.conf_thres, opt.iou_thres, opt.save_json)
341 | y.append(r + t) # results and times
342 | np.savetxt(f, y, fmt='%10.4g') # save
343 | os.system('zip -r study.zip study_*.txt')
344 | # utils.general.plot_study_txt(f, x) # plot
345 |
--------------------------------------------------------------------------------
/utils/__init__.py:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/utils/activations.py:
--------------------------------------------------------------------------------
1 | # Activation functions
2 |
3 | import torch
4 | import torch.nn as nn
5 | import torch.nn.functional as F
6 |
7 |
8 | # Swish https://arxiv.org/pdf/1905.02244.pdf ---------------------------------------------------------------------------
9 | class Swish(nn.Module): #
10 | @staticmethod
11 | def forward(x):
12 | return x * torch.sigmoid(x)
13 |
14 |
15 | class Hardswish(nn.Module): # export-friendly version of nn.Hardswish()
16 | @staticmethod
17 | def forward(x):
18 | # return x * F.hardsigmoid(x) # for torchscript and CoreML
19 | return x * F.hardtanh(x + 3, 0., 6.) / 6. # for torchscript, CoreML and ONNX
20 |
21 |
22 | class MemoryEfficientSwish(nn.Module):
23 | class F(torch.autograd.Function):
24 | @staticmethod
25 | def forward(ctx, x):
26 | ctx.save_for_backward(x)
27 | return x * torch.sigmoid(x)
28 |
29 | @staticmethod
30 | def backward(ctx, grad_output):
31 | x = ctx.saved_tensors[0]
32 | sx = torch.sigmoid(x)
33 | return grad_output * (sx * (1 + x * (1 - sx)))
34 |
35 | def forward(self, x):
36 | return self.F.apply(x)
37 |
38 |
39 | # Mish https://github.com/digantamisra98/Mish --------------------------------------------------------------------------
40 | class Mish(nn.Module):
41 | @staticmethod
42 | def forward(x):
43 | return x * F.softplus(x).tanh()
44 |
45 |
46 | class MemoryEfficientMish(nn.Module):
47 | class F(torch.autograd.Function):
48 | @staticmethod
49 | def forward(ctx, x):
50 | ctx.save_for_backward(x)
51 | return x.mul(torch.tanh(F.softplus(x))) # x * tanh(ln(1 + exp(x)))
52 |
53 | @staticmethod
54 | def backward(ctx, grad_output):
55 | x = ctx.saved_tensors[0]
56 | sx = torch.sigmoid(x)
57 | fx = F.softplus(x).tanh()
58 | return grad_output * (fx + x * sx * (1 - fx * fx))
59 |
60 | def forward(self, x):
61 | return self.F.apply(x)
62 |
63 |
64 | # FReLU https://arxiv.org/abs/2007.11824 -------------------------------------------------------------------------------
65 | class FReLU(nn.Module):
66 | def __init__(self, c1, k=3): # ch_in, kernel
67 | super().__init__()
68 | self.conv = nn.Conv2d(c1, c1, k, 1, 1, groups=c1)
69 | self.bn = nn.BatchNorm2d(c1)
70 |
71 | def forward(self, x):
72 | return torch.max(x, self.bn(self.conv(x)))
73 |
--------------------------------------------------------------------------------
/utils/adabound.py:
--------------------------------------------------------------------------------
1 | import math
2 |
3 | import torch
4 | from torch.optim.optimizer import Optimizer
5 |
6 |
7 | class AdaBound(Optimizer):
8 | """Implements AdaBound algorithm.
9 | It has been proposed in `Adaptive Gradient Methods with Dynamic Bound of Learning Rate`_.
10 | Arguments:
11 | params (iterable): iterable of parameters to optimize or dicts defining
12 | parameter groups
13 | lr (float, optional): Adam learning rate (default: 1e-3)
14 | betas (Tuple[float, float], optional): coefficients used for computing
15 | running averages of gradient and its square (default: (0.9, 0.999))
16 | final_lr (float, optional): final (SGD) learning rate (default: 0.1)
17 | gamma (float, optional): convergence speed of the bound functions (default: 1e-3)
18 | eps (float, optional): term added to the denominator to improve
19 | numerical stability (default: 1e-8)
20 | weight_decay (float, optional): weight decay (L2 penalty) (default: 0)
21 | amsbound (boolean, optional): whether to use the AMSBound variant of this algorithm
22 | .. Adaptive Gradient Methods with Dynamic Bound of Learning Rate:
23 | https://openreview.net/forum?id=Bkg3g2R9FX
24 | """
25 |
26 | def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), final_lr=0.1, gamma=1e-3,
27 | eps=1e-8, weight_decay=0, amsbound=False):
28 | if not 0.0 <= lr:
29 | raise ValueError("Invalid learning rate: {}".format(lr))
30 | if not 0.0 <= eps:
31 | raise ValueError("Invalid epsilon value: {}".format(eps))
32 | if not 0.0 <= betas[0] < 1.0:
33 | raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0]))
34 | if not 0.0 <= betas[1] < 1.0:
35 | raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1]))
36 | if not 0.0 <= final_lr:
37 | raise ValueError("Invalid final learning rate: {}".format(final_lr))
38 | if not 0.0 <= gamma < 1.0:
39 | raise ValueError("Invalid gamma parameter: {}".format(gamma))
40 | defaults = dict(lr=lr, betas=betas, final_lr=final_lr, gamma=gamma, eps=eps,
41 | weight_decay=weight_decay, amsbound=amsbound)
42 | super(AdaBound, self).__init__(params, defaults)
43 |
44 | self.base_lrs = list(map(lambda group: group['lr'], self.param_groups))
45 |
46 | def __setstate__(self, state):
47 | super(AdaBound, self).__setstate__(state)
48 | for group in self.param_groups:
49 | group.setdefault('amsbound', False)
50 |
51 | def step(self, closure=None):
52 | """Performs a single optimization step.
53 | Arguments:
54 | closure (callable, optional): A closure that reevaluates the model
55 | and returns the loss.
56 | """
57 | loss = None
58 | if closure is not None:
59 | loss = closure()
60 |
61 | for group, base_lr in zip(self.param_groups, self.base_lrs):
62 | for p in group['params']:
63 | if p.grad is None:
64 | continue
65 | grad = p.grad.data
66 | if grad.is_sparse:
67 | raise RuntimeError(
68 | 'Adam does not support sparse gradients, please consider SparseAdam instead')
69 | amsbound = group['amsbound']
70 |
71 | state = self.state[p]
72 |
73 | # State initialization
74 | if len(state) == 0:
75 | state['step'] = 0
76 | # Exponential moving average of gradient values
77 | state['exp_avg'] = torch.zeros_like(p.data)
78 | # Exponential moving average of squared gradient values
79 | state['exp_avg_sq'] = torch.zeros_like(p.data)
80 | if amsbound:
81 | # Maintains max of all exp. moving avg. of sq. grad. values
82 | state['max_exp_avg_sq'] = torch.zeros_like(p.data)
83 |
84 | exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
85 | if amsbound:
86 | max_exp_avg_sq = state['max_exp_avg_sq']
87 | beta1, beta2 = group['betas']
88 |
89 | state['step'] += 1
90 |
91 | if group['weight_decay'] != 0:
92 | grad = grad.add(group['weight_decay'], p.data)
93 |
94 | # Decay the first and second moment running average coefficient
95 | exp_avg.mul_(beta1).add_(1 - beta1, grad)
96 | exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
97 | if amsbound:
98 | # Maintains the maximum of all 2nd moment running avg. till now
99 | torch.max(max_exp_avg_sq, exp_avg_sq, out=max_exp_avg_sq)
100 | # Use the max. for normalizing running avg. of gradient
101 | denom = max_exp_avg_sq.sqrt().add_(group['eps'])
102 | else:
103 | denom = exp_avg_sq.sqrt().add_(group['eps'])
104 |
105 | bias_correction1 = 1 - beta1 ** state['step']
106 | bias_correction2 = 1 - beta2 ** state['step']
107 | step_size = group['lr'] * math.sqrt(bias_correction2) / bias_correction1
108 |
109 | # Applies bounds on actual learning rate
110 | # lr_scheduler cannot affect final_lr, this is a workaround to apply lr decay
111 | final_lr = group['final_lr'] * group['lr'] / base_lr
112 | lower_bound = final_lr * (1 - 1 / (group['gamma'] * state['step'] + 1))
113 | upper_bound = final_lr * (1 + 1 / (group['gamma'] * state['step']))
114 | step_size = torch.full_like(denom, step_size)
115 | step_size.div_(denom).clamp_(lower_bound, upper_bound).mul_(exp_avg)
116 |
117 | p.data.add_(-step_size)
118 |
119 | return loss
120 |
121 |
122 | class AdaBoundW(Optimizer):
123 | """Implements AdaBound algorithm with Decoupled Weight Decay (arxiv.org/abs/1711.05101)
124 | It has been proposed in `Adaptive Gradient Methods with Dynamic Bound of Learning Rate`_.
125 | Arguments:
126 | params (iterable): iterable of parameters to optimize or dicts defining
127 | parameter groups
128 | lr (float, optional): Adam learning rate (default: 1e-3)
129 | betas (Tuple[float, float], optional): coefficients used for computing
130 | running averages of gradient and its square (default: (0.9, 0.999))
131 | final_lr (float, optional): final (SGD) learning rate (default: 0.1)
132 | gamma (float, optional): convergence speed of the bound functions (default: 1e-3)
133 | eps (float, optional): term added to the denominator to improve
134 | numerical stability (default: 1e-8)
135 | weight_decay (float, optional): weight decay (L2 penalty) (default: 0)
136 | amsbound (boolean, optional): whether to use the AMSBound variant of this algorithm
137 | .. Adaptive Gradient Methods with Dynamic Bound of Learning Rate:
138 | https://openreview.net/forum?id=Bkg3g2R9FX
139 | """
140 |
141 | def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), final_lr=0.1, gamma=1e-3,
142 | eps=1e-8, weight_decay=0, amsbound=False):
143 | if not 0.0 <= lr:
144 | raise ValueError("Invalid learning rate: {}".format(lr))
145 | if not 0.0 <= eps:
146 | raise ValueError("Invalid epsilon value: {}".format(eps))
147 | if not 0.0 <= betas[0] < 1.0:
148 | raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0]))
149 | if not 0.0 <= betas[1] < 1.0:
150 | raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1]))
151 | if not 0.0 <= final_lr:
152 | raise ValueError("Invalid final learning rate: {}".format(final_lr))
153 | if not 0.0 <= gamma < 1.0:
154 | raise ValueError("Invalid gamma parameter: {}".format(gamma))
155 | defaults = dict(lr=lr, betas=betas, final_lr=final_lr, gamma=gamma, eps=eps,
156 | weight_decay=weight_decay, amsbound=amsbound)
157 | super(AdaBoundW, self).__init__(params, defaults)
158 |
159 | self.base_lrs = list(map(lambda group: group['lr'], self.param_groups))
160 |
161 | def __setstate__(self, state):
162 | super(AdaBoundW, self).__setstate__(state)
163 | for group in self.param_groups:
164 | group.setdefault('amsbound', False)
165 |
166 | def step(self, closure=None):
167 | """Performs a single optimization step.
168 | Arguments:
169 | closure (callable, optional): A closure that reevaluates the model
170 | and returns the loss.
171 | """
172 | loss = None
173 | if closure is not None:
174 | loss = closure()
175 |
176 | for group, base_lr in zip(self.param_groups, self.base_lrs):
177 | for p in group['params']:
178 | if p.grad is None:
179 | continue
180 | grad = p.grad.data
181 | if grad.is_sparse:
182 | raise RuntimeError(
183 | 'Adam does not support sparse gradients, please consider SparseAdam instead')
184 | amsbound = group['amsbound']
185 |
186 | state = self.state[p]
187 |
188 | # State initialization
189 | if len(state) == 0:
190 | state['step'] = 0
191 | # Exponential moving average of gradient values
192 | state['exp_avg'] = torch.zeros_like(p.data)
193 | # Exponential moving average of squared gradient values
194 | state['exp_avg_sq'] = torch.zeros_like(p.data)
195 | if amsbound:
196 | # Maintains max of all exp. moving avg. of sq. grad. values
197 | state['max_exp_avg_sq'] = torch.zeros_like(p.data)
198 |
199 | exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
200 | if amsbound:
201 | max_exp_avg_sq = state['max_exp_avg_sq']
202 | beta1, beta2 = group['betas']
203 |
204 | state['step'] += 1
205 |
206 | # Decay the first and second moment running average coefficient
207 | exp_avg.mul_(beta1).add_(1 - beta1, grad)
208 | exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
209 | if amsbound:
210 | # Maintains the maximum of all 2nd moment running avg. till now
211 | torch.max(max_exp_avg_sq, exp_avg_sq, out=max_exp_avg_sq)
212 | # Use the max. for normalizing running avg. of gradient
213 | denom = max_exp_avg_sq.sqrt().add_(group['eps'])
214 | else:
215 | denom = exp_avg_sq.sqrt().add_(group['eps'])
216 |
217 | bias_correction1 = 1 - beta1 ** state['step']
218 | bias_correction2 = 1 - beta2 ** state['step']
219 | step_size = group['lr'] * math.sqrt(bias_correction2) / bias_correction1
220 |
221 | # Applies bounds on actual learning rate
222 | # lr_scheduler cannot affect final_lr, this is a workaround to apply lr decay
223 | final_lr = group['final_lr'] * group['lr'] / base_lr
224 | lower_bound = final_lr * (1 - 1 / (group['gamma'] * state['step'] + 1))
225 | upper_bound = final_lr * (1 + 1 / (group['gamma'] * state['step']))
226 | step_size = torch.full_like(denom, step_size)
227 | step_size.div_(denom).clamp_(lower_bound, upper_bound).mul_(exp_avg)
228 |
229 | if group['weight_decay'] != 0:
230 | decayed_weights = torch.mul(p.data, group['weight_decay'])
231 | p.data.add_(-step_size)
232 | p.data.sub_(decayed_weights)
233 | else:
234 | p.data.add_(-step_size)
235 |
236 | return loss
237 |
--------------------------------------------------------------------------------
/utils/autoanchor.py:
--------------------------------------------------------------------------------
1 | # Auto-anchor utils
2 |
3 | import numpy as np
4 | import torch
5 | import yaml
6 | from scipy.cluster.vq import kmeans
7 | from tqdm import tqdm
8 |
9 |
10 | def check_anchor_order(m):
11 | # Check anchor order against stride order for YOLOv5 Detect() module m, and correct if necessary
12 | a = m.anchor_grid.prod(-1).view(-1) # anchor area
13 | da = a[-1] - a[0] # delta a
14 | ds = m.stride[-1] - m.stride[0] # delta s
15 | if da.sign() != ds.sign(): # same order
16 | print('Reversing anchor order')
17 | m.anchors[:] = m.anchors.flip(0)
18 | m.anchor_grid[:] = m.anchor_grid.flip(0)
19 |
20 |
21 | def check_anchors(dataset, model, thr=4.0, imgsz=640):
22 | # Check anchor fit to data, recompute if necessary
23 | print('\nAnalyzing anchors... ', end='')
24 | m = model.module.model[-1] if hasattr(model, 'module') else model.model[-1] # Detect()
25 | shapes = imgsz * dataset.shapes / dataset.shapes.max(1, keepdims=True)
26 | scale = np.random.uniform(0.9, 1.1, size=(shapes.shape[0], 1)) # augment scale
27 | wh = torch.tensor(np.concatenate([l[:, 3:5] * s for s, l in zip(shapes * scale, dataset.labels)])).float() # wh
28 |
29 | def metric(k): # compute metric
30 | r = wh[:, None] / k[None]
31 | x = torch.min(r, 1. / r).min(2)[0] # ratio metric
32 | best = x.max(1)[0] # best_x
33 | aat = (x > 1. / thr).float().sum(1).mean() # anchors above threshold
34 | bpr = (best > 1. / thr).float().mean() # best possible recall
35 | return bpr, aat
36 |
37 | bpr, aat = metric(m.anchor_grid.clone().cpu().view(-1, 2))
38 | print('anchors/target = %.2f, Best Possible Recall (BPR) = %.4f' % (aat, bpr), end='')
39 | if bpr < 0.98: # threshold to recompute
40 | print('. Attempting to improve anchors, please wait...')
41 | na = m.anchor_grid.numel() // 2 # number of anchors
42 | new_anchors = kmean_anchors(dataset, n=na, img_size=imgsz, thr=thr, gen=1000, verbose=False)
43 | new_bpr = metric(new_anchors.reshape(-1, 2))[0]
44 | if new_bpr > bpr: # replace anchors
45 | new_anchors = torch.tensor(new_anchors, device=m.anchors.device).type_as(m.anchors)
46 | m.anchor_grid[:] = new_anchors.clone().view_as(m.anchor_grid) # for inference
47 | m.anchors[:] = new_anchors.clone().view_as(m.anchors) / m.stride.to(m.anchors.device).view(-1, 1, 1) # loss
48 | check_anchor_order(m)
49 | print('New anchors saved to model. Update model *.yaml to use these anchors in the future.')
50 | else:
51 | print('Original anchors better than new anchors. Proceeding with original anchors.')
52 | print('') # newline
53 |
54 |
55 | def kmean_anchors(path='./data/coco128.yaml', n=9, img_size=640, thr=4.0, gen=1000, verbose=True):
56 | """ Creates kmeans-evolved anchors from training dataset
57 | Arguments:
58 | path: path to dataset *.yaml, or a loaded dataset
59 | n: number of anchors
60 | img_size: image size used for training
61 | thr: anchor-label wh ratio threshold hyperparameter hyp['anchor_t'] used for training, default=4.0
62 | gen: generations to evolve anchors using genetic algorithm
63 | verbose: print all results
64 | Return:
65 | k: kmeans evolved anchors
66 | Usage:
67 | from utils.general import *; _ = kmean_anchors()
68 | """
69 | thr = 1. / thr
70 |
71 | def metric(k, wh): # compute metrics
72 | r = wh[:, None] / k[None]
73 | x = torch.min(r, 1. / r).min(2)[0] # ratio metric
74 | # x = wh_iou(wh, torch.tensor(k)) # iou metric
75 | return x, x.max(1)[0] # x, best_x
76 |
77 | def anchor_fitness(k): # mutation fitness
78 | _, best = metric(torch.tensor(k, dtype=torch.float32), wh)
79 | return (best * (best > thr).float()).mean() # fitness
80 |
81 | def print_results(k):
82 | k = k[np.argsort(k.prod(1))] # sort small to large
83 | x, best = metric(k, wh0)
84 | bpr, aat = (best > thr).float().mean(), (x > thr).float().mean() * n # best possible recall, anch > thr
85 | print('thr=%.2f: %.4f best possible recall, %.2f anchors past thr' % (thr, bpr, aat))
86 | print('n=%g, img_size=%s, metric_all=%.3f/%.3f-mean/best, past_thr=%.3f-mean: ' %
87 | (n, img_size, x.mean(), best.mean(), x[x > thr].mean()), end='')
88 | for i, x in enumerate(k):
89 | print('%i,%i' % (round(x[0]), round(x[1])), end=', ' if i < len(k) - 1 else '\n') # use in *.cfg
90 | return k
91 |
92 | if isinstance(path, str): # *.yaml file
93 | with open(path) as f:
94 | data_dict = yaml.load(f, Loader=yaml.FullLoader) # model dict
95 | from utils.datasets import LoadImagesAndLabels
96 | dataset = LoadImagesAndLabels(data_dict['train'], augment=True, rect=True)
97 | else:
98 | dataset = path # dataset
99 |
100 | # Get label wh
101 | shapes = img_size * dataset.shapes / dataset.shapes.max(1, keepdims=True)
102 | wh0 = np.concatenate([l[:, 3:5] * s for s, l in zip(shapes, dataset.labels)]) # wh
103 |
104 | # Filter
105 | i = (wh0 < 3.0).any(1).sum()
106 | if i:
107 | print('WARNING: Extremely small objects found. '
108 | '%g of %g labels are < 3 pixels in width or height.' % (i, len(wh0)))
109 | wh = wh0[(wh0 >= 2.0).any(1)] # filter > 2 pixels
110 |
111 | # Kmeans calculation
112 | print('Running kmeans for %g anchors on %g points...' % (n, len(wh)))
113 | s = wh.std(0) # sigmas for whitening
114 | k, dist = kmeans(wh / s, n, iter=30) # points, mean distance
115 | k *= s
116 | wh = torch.tensor(wh, dtype=torch.float32) # filtered
117 | wh0 = torch.tensor(wh0, dtype=torch.float32) # unfiltered
118 | k = print_results(k)
119 |
120 | # Plot
121 | # k, d = [None] * 20, [None] * 20
122 | # for i in tqdm(range(1, 21)):
123 | # k[i-1], d[i-1] = kmeans(wh / s, i) # points, mean distance
124 | # fig, ax = plt.subplots(1, 2, figsize=(14, 7))
125 | # ax = ax.ravel()
126 | # ax[0].plot(np.arange(1, 21), np.array(d) ** 2, marker='.')
127 | # fig, ax = plt.subplots(1, 2, figsize=(14, 7)) # plot wh
128 | # ax[0].hist(wh[wh[:, 0]<100, 0],400)
129 | # ax[1].hist(wh[wh[:, 1]<100, 1],400)
130 | # fig.tight_layout()
131 | # fig.savefig('wh.png', dpi=200)
132 |
133 | # Evolve
134 | npr = np.random
135 | f, sh, mp, s = anchor_fitness(k), k.shape, 0.9, 0.1 # fitness, generations, mutation prob, sigma
136 | pbar = tqdm(range(gen), desc='Evolving anchors with Genetic Algorithm') # progress bar
137 | for _ in pbar:
138 | v = np.ones(sh)
139 | while (v == 1).all(): # mutate until a change occurs (prevent duplicates)
140 | v = ((npr.random(sh) < mp) * npr.random() * npr.randn(*sh) * s + 1).clip(0.3, 3.0)
141 | kg = (k.copy() * v).clip(min=2.0)
142 | fg = anchor_fitness(kg)
143 | if fg > f:
144 | f, k = fg, kg.copy()
145 | pbar.desc = 'Evolving anchors with Genetic Algorithm: fitness = %.4f' % f
146 | if verbose:
147 | print_results(k)
148 |
149 | return print_results(k)
150 |
--------------------------------------------------------------------------------
/utils/evolve.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | #for i in 0 1 2 3
3 | #do
4 | # t=ultralytics/yolov3:v139 && sudo docker pull $t && sudo nvidia-docker run -d --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t utils/evolve.sh $i
5 | # sleep 30
6 | #done
7 |
8 | while true; do
9 | # python3 train.py --data ../data/sm4/out.data --img-size 320 --epochs 100 --batch 64 --accum 1 --weights yolov3-tiny.conv.15 --multi --bucket ult/wer --evolve --cache --device $1 --cfg yolov3-tiny3-1cls.cfg --single --adam
10 | # python3 train.py --data ../out/data.data --img-size 608 --epochs 10 --batch 8 --accum 8 --weights ultralytics68.pt --multi --bucket ult/athena --evolve --device $1 --cfg yolov3-spp-1cls.cfg
11 |
12 | python3 train.py --data coco2014.data --img-size 512 608 --epochs 27 --batch 8 --accum 8 --evolve --weights '' --bucket ult/coco/sppa_512 --device $1 --cfg yolov3-sppa.cfg --multi
13 | done
14 |
15 |
16 | # coco epoch times --img-size 416 608 --epochs 27 --batch 16 --accum 4
17 | # 36:34 2080ti
18 | # 21:58 V100
19 | # 63:00 T4
--------------------------------------------------------------------------------
/utils/gcp.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | # New VM
4 | rm -rf sample_data yolov3
5 | git clone https://github.com/ultralytics/yolov3
6 | # git clone -b test --depth 1 https://github.com/ultralytics/yolov3 test # branch
7 | # sudo apt-get install zip
8 | #git clone https://github.com/NVIDIA/apex && cd apex && pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" . --user && cd .. && rm -rf apex
9 | sudo conda install -yc conda-forge scikit-image pycocotools
10 | # python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('193Zp_ye-3qXMonR1nZj3YyxMtQkMy50k','coco2014.zip')"
11 | python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('1WQT6SOktSe8Uw6r10-2JhbEhMY5DJaph','coco2017.zip')"
12 | python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('1C3HewOG9akA3y456SZLBJZfNDPkBwAto','knife.zip')"
13 | python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('13g3LqdpkNE8sPosVJT6KFXlfoMypzRP4','sm4.zip')"
14 | sudo shutdown
15 |
16 | # Mount local SSD
17 | lsblk
18 | sudo mkfs.ext4 -F /dev/nvme0n1
19 | sudo mkdir -p /mnt/disks/nvme0n1
20 | sudo mount /dev/nvme0n1 /mnt/disks/nvme0n1
21 | sudo chmod a+w /mnt/disks/nvme0n1
22 | cp -r coco /mnt/disks/nvme0n1
23 |
24 | # Kill All
25 | t=ultralytics/yolov3:v1
26 | docker kill $(docker ps -a -q --filter ancestor=$t)
27 |
28 | # Evolve coco
29 | sudo -s
30 | t=ultralytics/yolov3:evolve
31 | # docker kill $(docker ps -a -q --filter ancestor=$t)
32 | for i in 0 1 6 7
33 | do
34 | docker pull $t && docker run --gpus all -d --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t bash utils/evolve.sh $i
35 | sleep 30
36 | done
37 |
38 | #COCO training
39 | n=131 && t=ultralytics/coco:v131 && sudo docker pull $t && sudo docker run -it --gpus all --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t python3 train.py --data coco2014.data --img-size 320 640 --epochs 300 --batch 16 --weights '' --device 0 --cfg yolov3-spp.cfg --bucket ult/coco --name $n && sudo shutdown
40 | n=132 && t=ultralytics/coco:v131 && sudo docker pull $t && sudo docker run -it --gpus all --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t python3 train.py --data coco2014.data --img-size 320 640 --epochs 300 --batch 64 --weights '' --device 0 --cfg yolov3-tiny.cfg --bucket ult/coco --name $n && sudo shutdown
41 |
--------------------------------------------------------------------------------
/utils/google_utils.py:
--------------------------------------------------------------------------------
1 | # Google utils: https://cloud.google.com/storage/docs/reference/libraries
2 |
3 | import os
4 | import platform
5 | import subprocess
6 | import time
7 | from pathlib import Path
8 |
9 | import torch
10 | import torch.nn as nn
11 |
12 |
13 | def gsutil_getsize(url=''):
14 | # gs://bucket/file size https://cloud.google.com/storage/docs/gsutil/commands/du
15 | s = subprocess.check_output('gsutil du %s' % url, shell=True).decode('utf-8')
16 | return eval(s.split(' ')[0]) if len(s) else 0 # bytes
17 |
18 |
19 | def attempt_download(weights):
20 | # Attempt to download pretrained weights if not found locally
21 | weights = weights.strip().replace("'", '')
22 | file = Path(weights).name
23 |
24 | msg = weights + ' missing, try downloading from https://github.com/WongKinYiu/ScaledYOLOv4/releases/'
25 | models = ['yolov4-csp.pt', 'yolov4-csp-x.pt'] # available models
26 |
27 | if file in models and not os.path.isfile(weights):
28 |
29 | try: # GitHub
30 | url = 'https://github.com/WongKinYiu/ScaledYOLOv4/releases/download/v1.0/' + file
31 | print('Downloading %s to %s...' % (url, weights))
32 | torch.hub.download_url_to_file(url, weights)
33 | assert os.path.exists(weights) and os.path.getsize(weights) > 1E6 # check
34 | except Exception as e: # GCP
35 | print('ERROR: Download failure.')
36 | print('')
37 |
38 |
39 | def attempt_load(weights, map_location=None):
40 | # Loads an ensemble of models weights=[a,b,c] or a single model weights=[a] or weights=a
41 | model = Ensemble()
42 | for w in weights if isinstance(weights, list) else [weights]:
43 | attempt_download(w)
44 | model.append(torch.load(w, map_location=map_location)['model'].float().fuse().eval()) # load FP32 model
45 |
46 | if len(model) == 1:
47 | return model[-1] # return model
48 | else:
49 | print('Ensemble created with %s\n' % weights)
50 | for k in ['names', 'stride']:
51 | setattr(model, k, getattr(model[-1], k))
52 | return model # return ensemble
53 |
54 |
55 | def gdrive_download(id='1n_oKgR81BJtqk75b00eAjdv03qVCQn2f', name='coco128.zip'):
56 | # Downloads a file from Google Drive. from utils.google_utils import *; gdrive_download()
57 | t = time.time()
58 |
59 | print('Downloading https://drive.google.com/uc?export=download&id=%s as %s... ' % (id, name), end='')
60 | os.remove(name) if os.path.exists(name) else None # remove existing
61 | os.remove('cookie') if os.path.exists('cookie') else None
62 |
63 | # Attempt file download
64 | out = "NUL" if platform.system() == "Windows" else "/dev/null"
65 | os.system('curl -c ./cookie -s -L "drive.google.com/uc?export=download&id=%s" > %s ' % (id, out))
66 | if os.path.exists('cookie'): # large file
67 | s = 'curl -Lb ./cookie "drive.google.com/uc?export=download&confirm=%s&id=%s" -o %s' % (get_token(), id, name)
68 | else: # small file
69 | s = 'curl -s -L -o %s "drive.google.com/uc?export=download&id=%s"' % (name, id)
70 | r = os.system(s) # execute, capture return
71 | os.remove('cookie') if os.path.exists('cookie') else None
72 |
73 | # Error check
74 | if r != 0:
75 | os.remove(name) if os.path.exists(name) else None # remove partial
76 | print('Download error ') # raise Exception('Download error')
77 | return r
78 |
79 | # Unzip if archive
80 | if name.endswith('.zip'):
81 | print('unzipping... ', end='')
82 | os.system('unzip -q %s' % name) # unzip
83 | os.remove(name) # remove zip to free space
84 |
85 | print('Done (%.1fs)' % (time.time() - t))
86 | return r
87 |
88 |
89 | def get_token(cookie="./cookie"):
90 | with open(cookie) as f:
91 | for line in f:
92 | if "download" in line:
93 | return line.split()[-1]
94 | return ""
95 |
96 |
97 | class Ensemble(nn.ModuleList):
98 | # Ensemble of models
99 | def __init__(self):
100 | super(Ensemble, self).__init__()
101 |
102 | def forward(self, x, augment=False):
103 | y = []
104 | for module in self:
105 | y.append(module(x, augment)[0])
106 | # y = torch.stack(y).max(0)[0] # max ensemble
107 | # y = torch.cat(y, 1) # nms ensemble
108 | y = torch.stack(y).mean(0) # mean ensemble
109 | return y, None # inference, train output
110 |
111 |
112 | # def upload_blob(bucket_name, source_file_name, destination_blob_name):
113 | # # Uploads a file to a bucket
114 | # # https://cloud.google.com/storage/docs/uploading-objects#storage-upload-object-python
115 | #
116 | # storage_client = storage.Client()
117 | # bucket = storage_client.get_bucket(bucket_name)
118 | # blob = bucket.blob(destination_blob_name)
119 | #
120 | # blob.upload_from_filename(source_file_name)
121 | #
122 | # print('File {} uploaded to {}.'.format(
123 | # source_file_name,
124 | # destination_blob_name))
125 | #
126 | #
127 | # def download_blob(bucket_name, source_blob_name, destination_file_name):
128 | # # Uploads a blob from a bucket
129 | # storage_client = storage.Client()
130 | # bucket = storage_client.get_bucket(bucket_name)
131 | # blob = bucket.blob(source_blob_name)
132 | #
133 | # blob.download_to_filename(destination_file_name)
134 | #
135 | # print('Blob {} downloaded to {}.'.format(
136 | # source_blob_name,
137 | # destination_file_name))
138 |
--------------------------------------------------------------------------------
/utils/layers.py:
--------------------------------------------------------------------------------
1 | import torch.nn.functional as F
2 |
3 | from utils.general import *
4 |
5 | import torch
6 | from torch import nn
7 |
8 | try:
9 | from mish_cuda import MishCuda as Mish
10 |
11 | except:
12 | class Mish(nn.Module): # https://github.com/digantamisra98/Mish
13 | def forward(self, x):
14 | return x * F.softplus(x).tanh()
15 |
16 |
17 | class Reorg(nn.Module):
18 | def forward(self, x):
19 | return torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1)
20 |
21 |
22 | def make_divisible(v, divisor):
23 | # Function ensures all layers have a channel number that is divisible by 8
24 | # https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
25 | return math.ceil(v / divisor) * divisor
26 |
27 |
28 | class Flatten(nn.Module):
29 | # Use after nn.AdaptiveAvgPool2d(1) to remove last 2 dimensions
30 | def forward(self, x):
31 | return x.view(x.size(0), -1)
32 |
33 |
34 | class Concat(nn.Module):
35 | # Concatenate a list of tensors along dimension
36 | def __init__(self, dimension=1):
37 | super(Concat, self).__init__()
38 | self.d = dimension
39 |
40 | def forward(self, x):
41 | return torch.cat(x, self.d)
42 |
43 |
44 | class FeatureConcat(nn.Module):
45 | def __init__(self, layers):
46 | super(FeatureConcat, self).__init__()
47 | self.layers = layers # layer indices
48 | self.multiple = len(layers) > 1 # multiple layers flag
49 |
50 | def forward(self, x, outputs):
51 | return torch.cat([outputs[i] for i in self.layers], 1) if self.multiple else outputs[self.layers[0]]
52 |
53 |
54 | class FeatureConcat2(nn.Module):
55 | def __init__(self, layers):
56 | super(FeatureConcat2, self).__init__()
57 | self.layers = layers # layer indices
58 | self.multiple = len(layers) > 1 # multiple layers flag
59 |
60 | def forward(self, x, outputs):
61 | return torch.cat([outputs[self.layers[0]], outputs[self.layers[1]].detach()], 1)
62 |
63 |
64 | class FeatureConcat3(nn.Module):
65 | def __init__(self, layers):
66 | super(FeatureConcat3, self).__init__()
67 | self.layers = layers # layer indices
68 | self.multiple = len(layers) > 1 # multiple layers flag
69 |
70 | def forward(self, x, outputs):
71 | return torch.cat([outputs[self.layers[0]], outputs[self.layers[1]].detach(), outputs[self.layers[2]].detach()], 1)
72 |
73 |
74 | class FeatureConcat_l(nn.Module):
75 | def __init__(self, layers):
76 | super(FeatureConcat_l, self).__init__()
77 | self.layers = layers # layer indices
78 | self.multiple = len(layers) > 1 # multiple layers flag
79 |
80 | def forward(self, x, outputs):
81 | return torch.cat([outputs[i][:,:outputs[i].shape[1]//2,:,:] for i in self.layers], 1) if self.multiple else outputs[self.layers[0]][:,:outputs[self.layers[0]].shape[1]//2,:,:]
82 |
83 |
84 | class WeightedFeatureFusion(nn.Module): # weighted sum of 2 or more layers https://arxiv.org/abs/1911.09070
85 | def __init__(self, layers, weight=False):
86 | super(WeightedFeatureFusion, self).__init__()
87 | self.layers = layers # layer indices
88 | self.weight = weight # apply weights boolean
89 | self.n = len(layers) + 1 # number of layers
90 | if weight:
91 | self.w = nn.Parameter(torch.zeros(self.n), requires_grad=True) # layer weights
92 |
93 | def forward(self, x, outputs):
94 | # Weights
95 | if self.weight:
96 | w = torch.sigmoid(self.w) * (2 / self.n) # sigmoid weights (0-1)
97 | x = x * w[0]
98 |
99 | # Fusion
100 | nx = x.shape[1] # input channels
101 | for i in range(self.n - 1):
102 | a = outputs[self.layers[i]] * w[i + 1] if self.weight else outputs[self.layers[i]] # feature to add
103 | na = a.shape[1] # feature channels
104 |
105 | # Adjust channels
106 | if nx == na: # same shape
107 | x = x + a
108 | elif nx > na: # slice input
109 | x[:, :na] = x[:, :na] + a # or a = nn.ZeroPad2d((0, 0, 0, 0, 0, dc))(a); x = x + a
110 | else: # slice feature
111 | x = x + a[:, :nx]
112 |
113 | return x
114 |
115 |
116 | class MixConv2d(nn.Module): # MixConv: Mixed Depthwise Convolutional Kernels https://arxiv.org/abs/1907.09595
117 | def __init__(self, in_ch, out_ch, k=(3, 5, 7), stride=1, dilation=1, bias=True, method='equal_params'):
118 | super(MixConv2d, self).__init__()
119 |
120 | groups = len(k)
121 | if method == 'equal_ch': # equal channels per group
122 | i = torch.linspace(0, groups - 1E-6, out_ch).floor() # out_ch indices
123 | ch = [(i == g).sum() for g in range(groups)]
124 | else: # 'equal_params': equal parameter count per group
125 | b = [out_ch] + [0] * groups
126 | a = np.eye(groups + 1, groups, k=-1)
127 | a -= np.roll(a, 1, axis=1)
128 | a *= np.array(k) ** 2
129 | a[0] = 1
130 | ch = np.linalg.lstsq(a, b, rcond=None)[0].round().astype(int) # solve for equal weight indices, ax = b
131 |
132 | self.m = nn.ModuleList([nn.Conv2d(in_channels=in_ch,
133 | out_channels=ch[g],
134 | kernel_size=k[g],
135 | stride=stride,
136 | padding=k[g] // 2, # 'same' pad
137 | dilation=dilation,
138 | bias=bias) for g in range(groups)])
139 |
140 | def forward(self, x):
141 | return torch.cat([m(x) for m in self.m], 1)
142 |
143 |
144 | # Activation functions below -------------------------------------------------------------------------------------------
145 | class SwishImplementation(torch.autograd.Function):
146 | @staticmethod
147 | def forward(ctx, x):
148 | ctx.save_for_backward(x)
149 | return x * torch.sigmoid(x)
150 |
151 | @staticmethod
152 | def backward(ctx, grad_output):
153 | x = ctx.saved_tensors[0]
154 | sx = torch.sigmoid(x) # sigmoid(ctx)
155 | return grad_output * (sx * (1 + x * (1 - sx)))
156 |
157 |
158 | class MishImplementation(torch.autograd.Function):
159 | @staticmethod
160 | def forward(ctx, x):
161 | ctx.save_for_backward(x)
162 | return x.mul(torch.tanh(F.softplus(x))) # x * tanh(ln(1 + exp(x)))
163 |
164 | @staticmethod
165 | def backward(ctx, grad_output):
166 | x = ctx.saved_tensors[0]
167 | sx = torch.sigmoid(x)
168 | fx = F.softplus(x).tanh()
169 | return grad_output * (fx + x * sx * (1 - fx * fx))
170 |
171 |
172 | class MemoryEfficientSwish(nn.Module):
173 | def forward(self, x):
174 | return SwishImplementation.apply(x)
175 |
176 |
177 | class MemoryEfficientMish(nn.Module):
178 | def forward(self, x):
179 | return MishImplementation.apply(x)
180 |
181 |
182 | class Swish(nn.Module):
183 | def forward(self, x):
184 | return x * torch.sigmoid(x)
185 |
186 |
187 | class HardSwish(nn.Module): # https://arxiv.org/pdf/1905.02244.pdf
188 | def forward(self, x):
189 | return x * F.hardtanh(x + 3, 0., 6., True) / 6.
190 |
191 |
192 | class DeformConv2d(nn.Module):
193 | def __init__(self, inc, outc, kernel_size=3, padding=1, stride=1, bias=None, modulation=False):
194 | """
195 | Args:
196 | modulation (bool, optional): If True, Modulated Defomable Convolution (Deformable ConvNets v2).
197 | """
198 | super(DeformConv2d, self).__init__()
199 | self.kernel_size = kernel_size
200 | self.padding = padding
201 | self.stride = stride
202 | self.zero_padding = nn.ZeroPad2d(padding)
203 | self.conv = nn.Conv2d(inc, outc, kernel_size=kernel_size, stride=kernel_size, bias=bias)
204 |
205 | self.p_conv = nn.Conv2d(inc, 2*kernel_size*kernel_size, kernel_size=3, padding=1, stride=stride)
206 | nn.init.constant_(self.p_conv.weight, 0)
207 | self.p_conv.register_backward_hook(self._set_lr)
208 |
209 | self.modulation = modulation
210 | if modulation:
211 | self.m_conv = nn.Conv2d(inc, kernel_size*kernel_size, kernel_size=3, padding=1, stride=stride)
212 | nn.init.constant_(self.m_conv.weight, 0)
213 | self.m_conv.register_backward_hook(self._set_lr)
214 |
215 | @staticmethod
216 | def _set_lr(module, grad_input, grad_output):
217 | grad_input = (grad_input[i] * 0.1 for i in range(len(grad_input)))
218 | grad_output = (grad_output[i] * 0.1 for i in range(len(grad_output)))
219 |
220 | def forward(self, x):
221 | offset = self.p_conv(x)
222 | if self.modulation:
223 | m = torch.sigmoid(self.m_conv(x))
224 |
225 | dtype = offset.data.type()
226 | ks = self.kernel_size
227 | N = offset.size(1) // 2
228 |
229 | if self.padding:
230 | x = self.zero_padding(x)
231 |
232 | # (b, 2N, h, w)
233 | p = self._get_p(offset, dtype)
234 |
235 | # (b, h, w, 2N)
236 | p = p.contiguous().permute(0, 2, 3, 1)
237 | q_lt = p.detach().floor()
238 | q_rb = q_lt + 1
239 |
240 | q_lt = torch.cat([torch.clamp(q_lt[..., :N], 0, x.size(2)-1), torch.clamp(q_lt[..., N:], 0, x.size(3)-1)], dim=-1).long()
241 | q_rb = torch.cat([torch.clamp(q_rb[..., :N], 0, x.size(2)-1), torch.clamp(q_rb[..., N:], 0, x.size(3)-1)], dim=-1).long()
242 | q_lb = torch.cat([q_lt[..., :N], q_rb[..., N:]], dim=-1)
243 | q_rt = torch.cat([q_rb[..., :N], q_lt[..., N:]], dim=-1)
244 |
245 | # clip p
246 | p = torch.cat([torch.clamp(p[..., :N], 0, x.size(2)-1), torch.clamp(p[..., N:], 0, x.size(3)-1)], dim=-1)
247 |
248 | # bilinear kernel (b, h, w, N)
249 | g_lt = (1 + (q_lt[..., :N].type_as(p) - p[..., :N])) * (1 + (q_lt[..., N:].type_as(p) - p[..., N:]))
250 | g_rb = (1 - (q_rb[..., :N].type_as(p) - p[..., :N])) * (1 - (q_rb[..., N:].type_as(p) - p[..., N:]))
251 | g_lb = (1 + (q_lb[..., :N].type_as(p) - p[..., :N])) * (1 - (q_lb[..., N:].type_as(p) - p[..., N:]))
252 | g_rt = (1 - (q_rt[..., :N].type_as(p) - p[..., :N])) * (1 + (q_rt[..., N:].type_as(p) - p[..., N:]))
253 |
254 | # (b, c, h, w, N)
255 | x_q_lt = self._get_x_q(x, q_lt, N)
256 | x_q_rb = self._get_x_q(x, q_rb, N)
257 | x_q_lb = self._get_x_q(x, q_lb, N)
258 | x_q_rt = self._get_x_q(x, q_rt, N)
259 |
260 | # (b, c, h, w, N)
261 | x_offset = g_lt.unsqueeze(dim=1) * x_q_lt + \
262 | g_rb.unsqueeze(dim=1) * x_q_rb + \
263 | g_lb.unsqueeze(dim=1) * x_q_lb + \
264 | g_rt.unsqueeze(dim=1) * x_q_rt
265 |
266 | # modulation
267 | if self.modulation:
268 | m = m.contiguous().permute(0, 2, 3, 1)
269 | m = m.unsqueeze(dim=1)
270 | m = torch.cat([m for _ in range(x_offset.size(1))], dim=1)
271 | x_offset *= m
272 |
273 | x_offset = self._reshape_x_offset(x_offset, ks)
274 | out = self.conv(x_offset)
275 |
276 | return out
277 |
278 | def _get_p_n(self, N, dtype):
279 | p_n_x, p_n_y = torch.meshgrid(
280 | torch.arange(-(self.kernel_size-1)//2, (self.kernel_size-1)//2+1),
281 | torch.arange(-(self.kernel_size-1)//2, (self.kernel_size-1)//2+1))
282 | # (2N, 1)
283 | p_n = torch.cat([torch.flatten(p_n_x), torch.flatten(p_n_y)], 0)
284 | p_n = p_n.view(1, 2*N, 1, 1).type(dtype)
285 |
286 | return p_n
287 |
288 | def _get_p_0(self, h, w, N, dtype):
289 | p_0_x, p_0_y = torch.meshgrid(
290 | torch.arange(1, h*self.stride+1, self.stride),
291 | torch.arange(1, w*self.stride+1, self.stride))
292 | p_0_x = torch.flatten(p_0_x).view(1, 1, h, w).repeat(1, N, 1, 1)
293 | p_0_y = torch.flatten(p_0_y).view(1, 1, h, w).repeat(1, N, 1, 1)
294 | p_0 = torch.cat([p_0_x, p_0_y], 1).type(dtype)
295 |
296 | return p_0
297 |
298 | def _get_p(self, offset, dtype):
299 | N, h, w = offset.size(1)//2, offset.size(2), offset.size(3)
300 |
301 | # (1, 2N, 1, 1)
302 | p_n = self._get_p_n(N, dtype)
303 | # (1, 2N, h, w)
304 | p_0 = self._get_p_0(h, w, N, dtype)
305 | p = p_0 + p_n + offset
306 | return p
307 |
308 | def _get_x_q(self, x, q, N):
309 | b, h, w, _ = q.size()
310 | padded_w = x.size(3)
311 | c = x.size(1)
312 | # (b, c, h*w)
313 | x = x.contiguous().view(b, c, -1)
314 |
315 | # (b, h, w, N)
316 | index = q[..., :N]*padded_w + q[..., N:] # offset_x*w + offset_y
317 | # (b, c, h*w*N)
318 | index = index.contiguous().unsqueeze(dim=1).expand(-1, c, -1, -1, -1).contiguous().view(b, c, -1)
319 |
320 | x_offset = x.gather(dim=-1, index=index).contiguous().view(b, c, h, w, N)
321 |
322 | return x_offset
323 |
324 | @staticmethod
325 | def _reshape_x_offset(x_offset, ks):
326 | b, c, h, w, N = x_offset.size()
327 | x_offset = torch.cat([x_offset[..., s:s+ks].contiguous().view(b, c, h, w*ks) for s in range(0, N, ks)], dim=-1)
328 | x_offset = x_offset.contiguous().view(b, c, h*ks, w*ks)
329 |
330 | return x_offset
331 |
332 |
333 | class GAP(nn.Module):
334 | def __init__(self):
335 | super(GAP, self).__init__()
336 | self.avg_pool = nn.AdaptiveAvgPool2d(1)
337 | def forward(self, x):
338 | #b, c, _, _ = x.size()
339 | return self.avg_pool(x)#.view(b, c)
340 |
341 |
342 | class Silence(nn.Module):
343 | def __init__(self):
344 | super(Silence, self).__init__()
345 | def forward(self, x):
346 | return x
347 |
348 |
349 | class ScaleChannel(nn.Module): # weighted sum of 2 or more layers https://arxiv.org/abs/1911.09070
350 | def __init__(self, layers):
351 | super(ScaleChannel, self).__init__()
352 | self.layers = layers # layer indices
353 |
354 | def forward(self, x, outputs):
355 | a = outputs[self.layers[0]]
356 | return x.expand_as(a) * a
357 |
358 |
359 | class ScaleSpatial(nn.Module): # weighted sum of 2 or more layers https://arxiv.org/abs/1911.09070
360 | def __init__(self, layers):
361 | super(ScaleSpatial, self).__init__()
362 | self.layers = layers # layer indices
363 |
364 | def forward(self, x, outputs):
365 | a = outputs[self.layers[0]]
366 | return x * a
367 |
--------------------------------------------------------------------------------
/utils/loss.py:
--------------------------------------------------------------------------------
1 | # Loss functions
2 |
3 | import torch
4 | import torch.nn as nn
5 |
6 | from utils.general import bbox_iou
7 | from utils.torch_utils import is_parallel
8 |
9 |
10 | def smooth_BCE(eps=0.1): # https://github.com/ultralytics/yolov3/issues/238#issuecomment-598028441
11 | # return positive, negative label smoothing BCE targets
12 | return 1.0 - 0.5 * eps, 0.5 * eps
13 |
14 |
15 | class BCEBlurWithLogitsLoss(nn.Module):
16 | # BCEwithLogitLoss() with reduced missing label effects.
17 | def __init__(self, alpha=0.05):
18 | super(BCEBlurWithLogitsLoss, self).__init__()
19 | self.loss_fcn = nn.BCEWithLogitsLoss(reduction='none') # must be nn.BCEWithLogitsLoss()
20 | self.alpha = alpha
21 |
22 | def forward(self, pred, true):
23 | loss = self.loss_fcn(pred, true)
24 | pred = torch.sigmoid(pred) # prob from logits
25 | dx = pred - true # reduce only missing label effects
26 | # dx = (pred - true).abs() # reduce missing label and false label effects
27 | alpha_factor = 1 - torch.exp((dx - 1) / (self.alpha + 1e-4))
28 | loss *= alpha_factor
29 | return loss.mean()
30 |
31 |
32 | class FocalLoss(nn.Module):
33 | # Wraps focal loss around existing loss_fcn(), i.e. criteria = FocalLoss(nn.BCEWithLogitsLoss(), gamma=1.5)
34 | def __init__(self, loss_fcn, gamma=1.5, alpha=0.25):
35 | super(FocalLoss, self).__init__()
36 | self.loss_fcn = loss_fcn # must be nn.BCEWithLogitsLoss()
37 | self.gamma = gamma
38 | self.alpha = alpha
39 | self.reduction = loss_fcn.reduction
40 | self.loss_fcn.reduction = 'none' # required to apply FL to each element
41 |
42 | def forward(self, pred, true):
43 | loss = self.loss_fcn(pred, true)
44 | # p_t = torch.exp(-loss)
45 | # loss *= self.alpha * (1.000001 - p_t) ** self.gamma # non-zero power for gradient stability
46 |
47 | # TF implementation https://github.com/tensorflow/addons/blob/v0.7.1/tensorflow_addons/losses/focal_loss.py
48 | pred_prob = torch.sigmoid(pred) # prob from logits
49 | p_t = true * pred_prob + (1 - true) * (1 - pred_prob)
50 | alpha_factor = true * self.alpha + (1 - true) * (1 - self.alpha)
51 | modulating_factor = (1.0 - p_t) ** self.gamma
52 | loss *= alpha_factor * modulating_factor
53 |
54 | if self.reduction == 'mean':
55 | return loss.mean()
56 | elif self.reduction == 'sum':
57 | return loss.sum()
58 | else: # 'none'
59 | return loss
60 |
61 |
62 | def compute_loss(p, targets, model): # predictions, targets, model
63 | device = targets.device
64 | #print(device)
65 | lcls, lbox, lobj = torch.zeros(1, device=device), torch.zeros(1, device=device), torch.zeros(1, device=device)
66 | tcls, tbox, indices, anchors = build_targets(p, targets, model) # targets
67 | h = model.hyp # hyperparameters
68 |
69 | # Define criteria
70 | BCEcls = nn.BCEWithLogitsLoss(pos_weight=torch.Tensor([h['cls_pw']])).to(device)
71 | BCEobj = nn.BCEWithLogitsLoss(pos_weight=torch.Tensor([h['obj_pw']])).to(device)
72 |
73 | # Class label smoothing https://arxiv.org/pdf/1902.04103.pdf eqn 3
74 | cp, cn = smooth_BCE(eps=0.0)
75 |
76 | # Focal loss
77 | g = h['fl_gamma'] # focal loss gamma
78 | if g > 0:
79 | BCEcls, BCEobj = FocalLoss(BCEcls, g), FocalLoss(BCEobj, g)
80 |
81 | # Losses
82 | nt = 0 # number of targets
83 | no = len(p) # number of outputs
84 | balance = [4.0, 1.0, 0.4] if no == 3 else [4.0, 1.0, 0.4, 0.1] # P3-5 or P3-6
85 | balance = [4.0, 1.0, 0.5, 0.4, 0.1] if no == 5 else balance
86 | for i, pi in enumerate(p): # layer index, layer predictions
87 | b, a, gj, gi = indices[i] # image, anchor, gridy, gridx
88 | tobj = torch.zeros_like(pi[..., 0], device=device) # target obj
89 |
90 | n = b.shape[0] # number of targets
91 | if n:
92 | nt += n # cumulative targets
93 | ps = pi[b, a, gj, gi] # prediction subset corresponding to targets
94 |
95 | # Regression
96 | pxy = ps[:, :2].sigmoid() * 2. - 0.5
97 | pwh = (ps[:, 2:4].sigmoid() * 2) ** 2 * anchors[i]
98 | pbox = torch.cat((pxy, pwh), 1).to(device) # predicted box
99 | iou = bbox_iou(pbox.T, tbox[i], x1y1x2y2=False, CIoU=True) # iou(prediction, target)
100 | lbox += (1.0 - iou).mean() # iou loss
101 |
102 | # Objectness
103 | tobj[b, a, gj, gi] = (1.0 - model.gr) + model.gr * iou.detach().clamp(0).type(tobj.dtype) # iou ratio
104 |
105 | # Classification
106 | if model.nc > 1: # cls loss (only if multiple classes)
107 | t = torch.full_like(ps[:, 5:], cn, device=device) # targets
108 | t[range(n), tcls[i]] = cp
109 | lcls += BCEcls(ps[:, 5:], t) # BCE
110 |
111 | # Append targets to text file
112 | # with open('targets.txt', 'a') as file:
113 | # [file.write('%11.5g ' * 4 % tuple(x) + '\n') for x in torch.cat((txy[i], twh[i]), 1)]
114 |
115 | lobj += BCEobj(pi[..., 4], tobj) * balance[i] # obj loss
116 |
117 | s = 3 / no # output count scaling
118 | lbox *= h['box'] * s
119 | lobj *= h['obj'] * s * (1.4 if no >= 4 else 1.)
120 | lcls *= h['cls'] * s
121 | bs = tobj.shape[0] # batch size
122 |
123 | loss = lbox + lobj + lcls
124 | return loss * bs, torch.cat((lbox, lobj, lcls, loss)).detach()
125 |
126 |
127 | def build_targets(p, targets, model):
128 | nt = targets.shape[0] # number of anchors, targets
129 | tcls, tbox, indices, anch = [], [], [], []
130 | gain = torch.ones(6, device=targets.device) # normalized to gridspace gain
131 | off = torch.tensor([[1, 0], [0, 1], [-1, 0], [0, -1]], device=targets.device).float() # overlap offsets
132 |
133 | g = 0.5 # offset
134 | multi_gpu = is_parallel(model)
135 | for i, jj in enumerate(model.module.yolo_layers if multi_gpu else model.yolo_layers):
136 | # get number of grid points and anchor vec for this yolo layer
137 | anchors = model.module.module_list[jj].anchor_vec if multi_gpu else model.module_list[jj].anchor_vec
138 | gain[2:] = torch.tensor(p[i].shape)[[3, 2, 3, 2]] # xyxy gain
139 |
140 | # Match targets to anchors
141 | a, t, offsets = [], targets * gain, 0
142 | if nt:
143 | na = anchors.shape[0] # number of anchors
144 | at = torch.arange(na).view(na, 1).repeat(1, nt) # anchor tensor, same as .repeat_interleave(nt)
145 | r = t[None, :, 4:6] / anchors[:, None] # wh ratio
146 | j = torch.max(r, 1. / r).max(2)[0] < model.hyp['anchor_t'] # compare
147 | # j = wh_iou(anchors, t[:, 4:6]) > model.hyp['iou_t'] # iou(3,n) = wh_iou(anchors(3,2), gwh(n,2))
148 | a, t = at[j], t.repeat(na, 1, 1)[j] # filter
149 |
150 | # overlaps
151 | gxy = t[:, 2:4] # grid xy
152 | z = torch.zeros_like(gxy)
153 | j, k = ((gxy % 1. < g) & (gxy > 1.)).T
154 | l, m = ((gxy % 1. > (1 - g)) & (gxy < (gain[[2, 3]] - 1.))).T
155 | a, t = torch.cat((a, a[j], a[k], a[l], a[m]), 0), torch.cat((t, t[j], t[k], t[l], t[m]), 0)
156 | offsets = torch.cat((z, z[j] + off[0], z[k] + off[1], z[l] + off[2], z[m] + off[3]), 0) * g
157 |
158 | # Define
159 | b, c = t[:, :2].long().T # image, class
160 | gxy = t[:, 2:4] # grid xy
161 | gwh = t[:, 4:6] # grid wh
162 | gij = (gxy - offsets).long()
163 | gi, gj = gij.T # grid xy indices
164 |
165 | # Append
166 | #indices.append((b, a, gj, gi)) # image, anchor, grid indices
167 | indices.append((b, a, gj.clamp_(0, gain[3] - 1), gi.clamp_(0, gain[2] - 1))) # image, anchor, grid indices
168 | tbox.append(torch.cat((gxy - gij, gwh), 1)) # box
169 | anch.append(anchors[a]) # anchors
170 | tcls.append(c) # class
171 |
172 | return tcls, tbox, indices, anch
173 |
--------------------------------------------------------------------------------
/utils/metrics.py:
--------------------------------------------------------------------------------
1 | # Model validation metrics
2 |
3 | import matplotlib.pyplot as plt
4 | import numpy as np
5 |
6 |
7 | def fitness(x):
8 | # Model fitness as a weighted combination of metrics
9 | w = [0.0, 0.0, 0.1, 0.9] # weights for [P, R, mAP@0.5, mAP@0.5:0.95]
10 | return (x[:, :4] * w).sum(1)
11 |
12 |
13 | def fitness_p(x):
14 | # Model fitness as a weighted combination of metrics
15 | w = [1.0, 0.0, 0.0, 0.0] # weights for [P, R, mAP@0.5, mAP@0.5:0.95]
16 | return (x[:, :4] * w).sum(1)
17 |
18 |
19 | def fitness_r(x):
20 | # Model fitness as a weighted combination of metrics
21 | w = [0.0, 1.0, 0.0, 0.0] # weights for [P, R, mAP@0.5, mAP@0.5:0.95]
22 | return (x[:, :4] * w).sum(1)
23 |
24 |
25 | def fitness_ap50(x):
26 | # Model fitness as a weighted combination of metrics
27 | w = [0.0, 0.0, 1.0, 0.0] # weights for [P, R, mAP@0.5, mAP@0.5:0.95]
28 | return (x[:, :4] * w).sum(1)
29 |
30 |
31 | def fitness_ap(x):
32 | # Model fitness as a weighted combination of metrics
33 | w = [0.0, 0.0, 0.0, 1.0] # weights for [P, R, mAP@0.5, mAP@0.5:0.95]
34 | return (x[:, :4] * w).sum(1)
35 |
36 |
37 | def fitness_f(x):
38 | # Model fitness as a weighted combination of metrics
39 | #w = [0.0, 0.0, 0.0, 1.0] # weights for [P, R, mAP@0.5, mAP@0.5:0.95]
40 | return ((x[:, 0]*x[:, 1])/(x[:, 0]+x[:, 1]))
41 |
42 |
43 | def ap_per_class(tp, conf, pred_cls, target_cls, plot=False, fname='precision-recall_curve.png'):
44 | """ Compute the average precision, given the recall and precision curves.
45 | Source: https://github.com/rafaelpadilla/Object-Detection-Metrics.
46 | # Arguments
47 | tp: True positives (nparray, nx1 or nx10).
48 | conf: Objectness value from 0-1 (nparray).
49 | pred_cls: Predicted object classes (nparray).
50 | target_cls: True object classes (nparray).
51 | plot: Plot precision-recall curve at mAP@0.5
52 | fname: Plot filename
53 | # Returns
54 | The average precision as computed in py-faster-rcnn.
55 | """
56 |
57 | # Sort by objectness
58 | i = np.argsort(-conf)
59 | tp, conf, pred_cls = tp[i], conf[i], pred_cls[i]
60 |
61 | # Find unique classes
62 | unique_classes = np.unique(target_cls)
63 |
64 | # Create Precision-Recall curve and compute AP for each class
65 | px, py = np.linspace(0, 1, 1000), [] # for plotting
66 | pr_score = 0.1 # score to evaluate P and R https://github.com/ultralytics/yolov3/issues/898
67 | s = [unique_classes.shape[0], tp.shape[1]] # number class, number iou thresholds (i.e. 10 for mAP0.5...0.95)
68 | ap, p, r = np.zeros(s), np.zeros(s), np.zeros(s)
69 | for ci, c in enumerate(unique_classes):
70 | i = pred_cls == c
71 | n_l = (target_cls == c).sum() # number of labels
72 | n_p = i.sum() # number of predictions
73 |
74 | if n_p == 0 or n_l == 0:
75 | continue
76 | else:
77 | # Accumulate FPs and TPs
78 | fpc = (1 - tp[i]).cumsum(0)
79 | tpc = tp[i].cumsum(0)
80 |
81 | # Recall
82 | recall = tpc / (n_l + 1e-16) # recall curve
83 | r[ci] = np.interp(-pr_score, -conf[i], recall[:, 0]) # r at pr_score, negative x, xp because xp decreases
84 |
85 | # Precision
86 | precision = tpc / (tpc + fpc) # precision curve
87 | p[ci] = np.interp(-pr_score, -conf[i], precision[:, 0]) # p at pr_score
88 |
89 | # AP from recall-precision curve
90 | for j in range(tp.shape[1]):
91 | ap[ci, j], mpre, mrec = compute_ap(recall[:, j], precision[:, j])
92 | if j == 0:
93 | py.append(np.interp(px, mrec, mpre)) # precision at mAP@0.5
94 |
95 | # Compute F1 score (harmonic mean of precision and recall)
96 | f1 = 2 * p * r / (p + r + 1e-16)
97 |
98 | if plot:
99 | py = np.stack(py, axis=1)
100 | fig, ax = plt.subplots(1, 1, figsize=(5, 5))
101 | ax.plot(px, py, linewidth=0.5, color='grey') # plot(recall, precision)
102 | ax.plot(px, py.mean(1), linewidth=2, color='blue', label='all classes %.3f mAP@0.5' % ap[:, 0].mean())
103 | ax.set_xlabel('Recall')
104 | ax.set_ylabel('Precision')
105 | ax.set_xlim(0, 1)
106 | ax.set_ylim(0, 1)
107 | plt.legend()
108 | fig.tight_layout()
109 | fig.savefig(fname, dpi=200)
110 |
111 | return p, r, ap, f1, unique_classes.astype('int32')
112 |
113 |
114 | def compute_ap(recall, precision):
115 | """ Compute the average precision, given the recall and precision curves.
116 | Source: https://github.com/rbgirshick/py-faster-rcnn.
117 | # Arguments
118 | recall: The recall curve (list).
119 | precision: The precision curve (list).
120 | # Returns
121 | The average precision as computed in py-faster-rcnn.
122 | """
123 |
124 | # Append sentinel values to beginning and end
125 | mrec = recall # np.concatenate(([0.], recall, [recall[-1] + 1E-3]))
126 | mpre = precision # np.concatenate(([0.], precision, [0.]))
127 |
128 | # Compute the precision envelope
129 | mpre = np.flip(np.maximum.accumulate(np.flip(mpre)))
130 |
131 | # Integrate area under curve
132 | method = 'interp' # methods: 'continuous', 'interp'
133 | if method == 'interp':
134 | x = np.linspace(0, 1, 101) # 101-point interp (COCO)
135 | ap = np.trapz(np.interp(x, mrec, mpre), x) # integrate
136 | else: # 'continuous'
137 | i = np.where(mrec[1:] != mrec[:-1])[0] # points where x axis (recall) changes
138 | ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1]) # area under curve
139 |
140 | return ap, mpre, mrec
141 |
--------------------------------------------------------------------------------
/utils/parse_config.py:
--------------------------------------------------------------------------------
1 | import os
2 |
3 | import numpy as np
4 |
5 |
6 | def parse_model_cfg(path):
7 | # Parse the yolo *.cfg file and return module definitions path may be 'cfg/yolov3.cfg', 'yolov3.cfg', or 'yolov3'
8 | if not path.endswith('.cfg'): # add .cfg suffix if omitted
9 | path += '.cfg'
10 | if not os.path.exists(path) and os.path.exists('cfg' + os.sep + path): # add cfg/ prefix if omitted
11 | path = 'cfg' + os.sep + path
12 |
13 | with open(path, 'r') as f:
14 | lines = f.read().split('\n')
15 | lines = [x for x in lines if x and not x.startswith('#')]
16 | lines = [x.rstrip().lstrip() for x in lines] # get rid of fringe whitespaces
17 | mdefs = [] # module definitions
18 | for line in lines:
19 | if line.startswith('['): # This marks the start of a new block
20 | mdefs.append({})
21 | mdefs[-1]['type'] = line[1:-1].rstrip()
22 | if mdefs[-1]['type'] == 'convolutional':
23 | mdefs[-1]['batch_normalize'] = 0 # pre-populate with zeros (may be overwritten later)
24 |
25 | else:
26 | key, val = line.split("=")
27 | key = key.rstrip()
28 |
29 | if key == 'anchors': # return nparray
30 | mdefs[-1][key] = np.array([float(x) for x in val.split(',')]).reshape((-1, 2)) # np anchors
31 | elif (key in ['from', 'layers', 'mask']) or (key == 'size' and ',' in val): # return array
32 | mdefs[-1][key] = [int(x) for x in val.split(',')]
33 | else:
34 | val = val.strip()
35 | if val.isnumeric(): # return int or float
36 | mdefs[-1][key] = int(val) if (int(val) - float(val)) == 0 else float(val)
37 | else:
38 | mdefs[-1][key] = val # return string
39 |
40 | # Check all fields are supported
41 | supported = ['type', 'batch_normalize', 'filters', 'size', 'stride', 'pad', 'activation', 'layers', 'groups',
42 | 'from', 'mask', 'anchors', 'classes', 'num', 'jitter', 'ignore_thresh', 'truth_thresh', 'random',
43 | 'stride_x', 'stride_y', 'weights_type', 'weights_normalization', 'scale_x_y', 'beta_nms', 'nms_kind',
44 | 'iou_loss', 'iou_normalizer', 'cls_normalizer', 'iou_thresh', 'atoms', 'na', 'nc']
45 |
46 | f = [] # fields
47 | for x in mdefs[1:]:
48 | [f.append(k) for k in x if k not in f]
49 | u = [x for x in f if x not in supported] # unsupported fields
50 | assert not any(u), "Unsupported fields %s in %s. See https://github.com/ultralytics/yolov3/issues/631" % (u, path)
51 |
52 | return mdefs
53 |
54 |
55 | def parse_data_cfg(path):
56 | # Parses the data configuration file
57 | if not os.path.exists(path) and os.path.exists('data' + os.sep + path): # add data/ prefix if omitted
58 | path = 'data' + os.sep + path
59 |
60 | with open(path, 'r') as f:
61 | lines = f.readlines()
62 |
63 | options = dict()
64 | for line in lines:
65 | line = line.strip()
66 | if line == '' or line.startswith('#'):
67 | continue
68 | key, val = line.split('=')
69 | options[key.strip()] = val.strip()
70 |
71 | return options
72 |
--------------------------------------------------------------------------------
/utils/plots.py:
--------------------------------------------------------------------------------
1 | # Plotting utils
2 |
3 | import glob
4 | import math
5 | import os
6 | import random
7 | from copy import copy
8 | from pathlib import Path
9 |
10 | import cv2
11 | import matplotlib
12 | import matplotlib.pyplot as plt
13 | import numpy as np
14 | import torch
15 | import yaml
16 | from PIL import Image
17 | from scipy.signal import butter, filtfilt
18 |
19 | from utils.general import xywh2xyxy, xyxy2xywh
20 | from utils.metrics import fitness
21 |
22 | # Settings
23 | matplotlib.use('Agg') # for writing to files only
24 |
25 |
26 | def color_list():
27 | # Return first 10 plt colors as (r,g,b) https://stackoverflow.com/questions/51350872/python-from-color-name-to-rgb
28 | def hex2rgb(h):
29 | return tuple(int(h[1 + i:1 + i + 2], 16) for i in (0, 2, 4))
30 |
31 | return [hex2rgb(h) for h in plt.rcParams['axes.prop_cycle'].by_key()['color']]
32 |
33 |
34 | def hist2d(x, y, n=100):
35 | # 2d histogram used in labels.png and evolve.png
36 | xedges, yedges = np.linspace(x.min(), x.max(), n), np.linspace(y.min(), y.max(), n)
37 | hist, xedges, yedges = np.histogram2d(x, y, (xedges, yedges))
38 | xidx = np.clip(np.digitize(x, xedges) - 1, 0, hist.shape[0] - 1)
39 | yidx = np.clip(np.digitize(y, yedges) - 1, 0, hist.shape[1] - 1)
40 | return np.log(hist[xidx, yidx])
41 |
42 |
43 | def butter_lowpass_filtfilt(data, cutoff=1500, fs=50000, order=5):
44 | # https://stackoverflow.com/questions/28536191/how-to-filter-smooth-with-scipy-numpy
45 | def butter_lowpass(cutoff, fs, order):
46 | nyq = 0.5 * fs
47 | normal_cutoff = cutoff / nyq
48 | return butter(order, normal_cutoff, btype='low', analog=False)
49 |
50 | b, a = butter_lowpass(cutoff, fs, order=order)
51 | return filtfilt(b, a, data) # forward-backward filter
52 |
53 |
54 | def plot_one_box(x, img, color=None, label=None, line_thickness=None):
55 | # Plots one bounding box on image img
56 | tl = line_thickness or round(0.002 * (img.shape[0] + img.shape[1]) / 2) + 1 # line/font thickness
57 | color = color or [random.randint(0, 255) for _ in range(3)]
58 | c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3]))
59 | cv2.rectangle(img, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA)
60 | if label:
61 | tf = max(tl - 1, 1) # font thickness
62 | t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0]
63 | c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3
64 | cv2.rectangle(img, c1, c2, color, -1, cv2.LINE_AA) # filled
65 | cv2.putText(img, label, (c1[0], c1[1] - 2), 0, tl / 3, [225, 255, 255], thickness=tf, lineType=cv2.LINE_AA)
66 |
67 |
68 | def plot_wh_methods(): # from utils.general import *; plot_wh_methods()
69 | # Compares the two methods for width-height anchor multiplication
70 | # https://github.com/ultralytics/yolov3/issues/168
71 | x = np.arange(-4.0, 4.0, .1)
72 | ya = np.exp(x)
73 | yb = torch.sigmoid(torch.from_numpy(x)).numpy() * 2
74 |
75 | fig = plt.figure(figsize=(6, 3), dpi=150)
76 | plt.plot(x, ya, '.-', label='YOLO')
77 | plt.plot(x, yb ** 2, '.-', label='YOLO ^2')
78 | plt.plot(x, yb ** 1.6, '.-', label='YOLO ^1.6')
79 | plt.xlim(left=-4, right=4)
80 | plt.ylim(bottom=0, top=6)
81 | plt.xlabel('input')
82 | plt.ylabel('output')
83 | plt.grid()
84 | plt.legend()
85 | fig.tight_layout()
86 | fig.savefig('comparison.png', dpi=200)
87 |
88 |
89 | def output_to_target(output, width, height):
90 | # Convert model output to target format [batch_id, class_id, x, y, w, h, conf]
91 | if isinstance(output, torch.Tensor):
92 | output = output.cpu().numpy()
93 |
94 | targets = []
95 | for i, o in enumerate(output):
96 | if o is not None:
97 | for pred in o:
98 | box = pred[:4]
99 | w = (box[2] - box[0]) / width
100 | h = (box[3] - box[1]) / height
101 | x = box[0] / width + w / 2
102 | y = box[1] / height + h / 2
103 | conf = pred[4]
104 | cls = int(pred[5])
105 |
106 | targets.append([i, cls, x, y, w, h, conf])
107 |
108 | return np.array(targets)
109 |
110 |
111 | def plot_images(images, targets, paths=None, fname='images.jpg', names=None, max_size=640, max_subplots=16):
112 | # Plot image grid with labels
113 |
114 | if isinstance(images, torch.Tensor):
115 | images = images.cpu().float().numpy()
116 | if isinstance(targets, torch.Tensor):
117 | targets = targets.cpu().numpy()
118 |
119 | # un-normalise
120 | if np.max(images[0]) <= 1:
121 | images *= 255
122 |
123 | tl = 3 # line thickness
124 | tf = max(tl - 1, 1) # font thickness
125 | bs, _, h, w = images.shape # batch size, _, height, width
126 | bs = min(bs, max_subplots) # limit plot images
127 | ns = np.ceil(bs ** 0.5) # number of subplots (square)
128 |
129 | # Check if we should resize
130 | scale_factor = max_size / max(h, w)
131 | if scale_factor < 1:
132 | h = math.ceil(scale_factor * h)
133 | w = math.ceil(scale_factor * w)
134 |
135 | colors = color_list() # list of colors
136 | mosaic = np.full((int(ns * h), int(ns * w), 3), 255, dtype=np.uint8) # init
137 | for i, img in enumerate(images):
138 | if i == max_subplots: # if last batch has fewer images than we expect
139 | break
140 |
141 | block_x = int(w * (i // ns))
142 | block_y = int(h * (i % ns))
143 |
144 | img = img.transpose(1, 2, 0)
145 | if scale_factor < 1:
146 | img = cv2.resize(img, (w, h))
147 |
148 | mosaic[block_y:block_y + h, block_x:block_x + w, :] = img
149 | if len(targets) > 0:
150 | image_targets = targets[targets[:, 0] == i]
151 | boxes = xywh2xyxy(image_targets[:, 2:6]).T
152 | classes = image_targets[:, 1].astype('int')
153 | labels = image_targets.shape[1] == 6 # labels if no conf column
154 | conf = None if labels else image_targets[:, 6] # check for confidence presence (label vs pred)
155 |
156 | boxes[[0, 2]] *= w
157 | boxes[[0, 2]] += block_x
158 | boxes[[1, 3]] *= h
159 | boxes[[1, 3]] += block_y
160 | for j, box in enumerate(boxes.T):
161 | cls = int(classes[j])
162 | color = colors[cls % len(colors)]
163 | cls = names[cls] if names else cls
164 | if labels or conf[j] > 0.25: # 0.25 conf thresh
165 | label = '%s' % cls if labels else '%s %.1f' % (cls, conf[j])
166 | plot_one_box(box, mosaic, label=label, color=color, line_thickness=tl)
167 |
168 | # Draw image filename labels
169 | if paths:
170 | label = Path(paths[i]).name[:40] # trim to 40 char
171 | t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0]
172 | cv2.putText(mosaic, label, (block_x + 5, block_y + t_size[1] + 5), 0, tl / 3, [220, 220, 220], thickness=tf,
173 | lineType=cv2.LINE_AA)
174 |
175 | # Image border
176 | cv2.rectangle(mosaic, (block_x, block_y), (block_x + w, block_y + h), (255, 255, 255), thickness=3)
177 |
178 | if fname:
179 | r = min(1280. / max(h, w) / ns, 1.0) # ratio to limit image size
180 | mosaic = cv2.resize(mosaic, (int(ns * w * r), int(ns * h * r)), interpolation=cv2.INTER_AREA)
181 | # cv2.imwrite(fname, cv2.cvtColor(mosaic, cv2.COLOR_BGR2RGB)) # cv2 save
182 | Image.fromarray(mosaic).save(fname) # PIL save
183 | return mosaic
184 |
185 |
186 | def plot_lr_scheduler(optimizer, scheduler, epochs=300, save_dir=''):
187 | # Plot LR simulating training for full epochs
188 | optimizer, scheduler = copy(optimizer), copy(scheduler) # do not modify originals
189 | y = []
190 | for _ in range(epochs):
191 | scheduler.step()
192 | y.append(optimizer.param_groups[0]['lr'])
193 | plt.plot(y, '.-', label='LR')
194 | plt.xlabel('epoch')
195 | plt.ylabel('LR')
196 | plt.grid()
197 | plt.xlim(0, epochs)
198 | plt.ylim(0)
199 | plt.tight_layout()
200 | plt.savefig(Path(save_dir) / 'LR.png', dpi=200)
201 |
202 |
203 | def plot_test_txt(): # from utils.general import *; plot_test()
204 | # Plot test.txt histograms
205 | x = np.loadtxt('test.txt', dtype=np.float32)
206 | box = xyxy2xywh(x[:, :4])
207 | cx, cy = box[:, 0], box[:, 1]
208 |
209 | fig, ax = plt.subplots(1, 1, figsize=(6, 6), tight_layout=True)
210 | ax.hist2d(cx, cy, bins=600, cmax=10, cmin=0)
211 | ax.set_aspect('equal')
212 | plt.savefig('hist2d.png', dpi=300)
213 |
214 | fig, ax = plt.subplots(1, 2, figsize=(12, 6), tight_layout=True)
215 | ax[0].hist(cx, bins=600)
216 | ax[1].hist(cy, bins=600)
217 | plt.savefig('hist1d.png', dpi=200)
218 |
219 |
220 | def plot_targets_txt(): # from utils.general import *; plot_targets_txt()
221 | # Plot targets.txt histograms
222 | x = np.loadtxt('targets.txt', dtype=np.float32).T
223 | s = ['x targets', 'y targets', 'width targets', 'height targets']
224 | fig, ax = plt.subplots(2, 2, figsize=(8, 8), tight_layout=True)
225 | ax = ax.ravel()
226 | for i in range(4):
227 | ax[i].hist(x[i], bins=100, label='%.3g +/- %.3g' % (x[i].mean(), x[i].std()))
228 | ax[i].legend()
229 | ax[i].set_title(s[i])
230 | plt.savefig('targets.jpg', dpi=200)
231 |
232 |
233 | def plot_study_txt(f='study.txt', x=None): # from utils.general import *; plot_study_txt()
234 | # Plot study.txt generated by test.py
235 | fig, ax = plt.subplots(2, 4, figsize=(10, 6), tight_layout=True)
236 | ax = ax.ravel()
237 |
238 | fig2, ax2 = plt.subplots(1, 1, figsize=(8, 4), tight_layout=True)
239 | for f in ['study/study_coco_yolo%s.txt' % x for x in ['s', 'm', 'l', 'x']]:
240 | y = np.loadtxt(f, dtype=np.float32, usecols=[0, 1, 2, 3, 7, 8, 9], ndmin=2).T
241 | x = np.arange(y.shape[1]) if x is None else np.array(x)
242 | s = ['P', 'R', 'mAP@.5', 'mAP@.5:.95', 't_inference (ms/img)', 't_NMS (ms/img)', 't_total (ms/img)']
243 | for i in range(7):
244 | ax[i].plot(x, y[i], '.-', linewidth=2, markersize=8)
245 | ax[i].set_title(s[i])
246 |
247 | j = y[3].argmax() + 1
248 | ax2.plot(y[6, :j], y[3, :j] * 1E2, '.-', linewidth=2, markersize=8,
249 | label=Path(f).stem.replace('study_coco_', '').replace('yolo', 'YOLO'))
250 |
251 | ax2.plot(1E3 / np.array([209, 140, 97, 58, 35, 18]), [34.6, 40.5, 43.0, 47.5, 49.7, 51.5],
252 | 'k.-', linewidth=2, markersize=8, alpha=.25, label='EfficientDet')
253 |
254 | ax2.grid()
255 | ax2.set_xlim(0, 30)
256 | ax2.set_ylim(28, 50)
257 | ax2.set_yticks(np.arange(30, 55, 5))
258 | ax2.set_xlabel('GPU Speed (ms/img)')
259 | ax2.set_ylabel('COCO AP val')
260 | ax2.legend(loc='lower right')
261 | plt.savefig('study_mAP_latency.png', dpi=300)
262 | plt.savefig(f.replace('.txt', '.png'), dpi=300)
263 |
264 |
265 | def plot_labels(labels, save_dir=''):
266 | # plot dataset labels
267 | c, b = labels[:, 0], labels[:, 1:].transpose() # classes, boxes
268 | nc = int(c.max() + 1) # number of classes
269 |
270 | fig, ax = plt.subplots(2, 2, figsize=(8, 8), tight_layout=True)
271 | ax = ax.ravel()
272 | ax[0].hist(c, bins=np.linspace(0, nc, nc + 1) - 0.5, rwidth=0.8)
273 | ax[0].set_xlabel('classes')
274 | ax[1].scatter(b[0], b[1], c=hist2d(b[0], b[1], 90), cmap='jet')
275 | ax[1].set_xlabel('x')
276 | ax[1].set_ylabel('y')
277 | ax[2].scatter(b[2], b[3], c=hist2d(b[2], b[3], 90), cmap='jet')
278 | ax[2].set_xlabel('width')
279 | ax[2].set_ylabel('height')
280 | plt.savefig(Path(save_dir) / 'labels.png', dpi=200)
281 | plt.close()
282 |
283 | # seaborn correlogram
284 | try:
285 | import seaborn as sns
286 | import pandas as pd
287 | x = pd.DataFrame(b.transpose(), columns=['x', 'y', 'width', 'height'])
288 | sns.pairplot(x, corner=True, diag_kind='hist', kind='scatter', markers='o',
289 | plot_kws=dict(s=3, edgecolor=None, linewidth=1, alpha=0.02),
290 | diag_kws=dict(bins=50))
291 | plt.savefig(Path(save_dir) / 'labels_correlogram.png', dpi=200)
292 | plt.close()
293 | except Exception as e:
294 | pass
295 |
296 |
297 | def plot_evolution(yaml_file='data/hyp.finetune.yaml'): # from utils.general import *; plot_evolution()
298 | # Plot hyperparameter evolution results in evolve.txt
299 | with open(yaml_file) as f:
300 | hyp = yaml.load(f, Loader=yaml.FullLoader)
301 | x = np.loadtxt('evolve.txt', ndmin=2)
302 | f = fitness(x)
303 | # weights = (f - f.min()) ** 2 # for weighted results
304 | plt.figure(figsize=(10, 12), tight_layout=True)
305 | matplotlib.rc('font', **{'size': 8})
306 | for i, (k, v) in enumerate(hyp.items()):
307 | y = x[:, i + 7]
308 | # mu = (y * weights).sum() / weights.sum() # best weighted result
309 | mu = y[f.argmax()] # best single result
310 | plt.subplot(6, 5, i + 1)
311 | plt.scatter(y, f, c=hist2d(y, f, 20), cmap='viridis', alpha=.8, edgecolors='none')
312 | plt.plot(mu, f.max(), 'k+', markersize=15)
313 | plt.title('%s = %.3g' % (k, mu), fontdict={'size': 9}) # limit to 40 characters
314 | if i % 5 != 0:
315 | plt.yticks([])
316 | print('%15s: %.3g' % (k, mu))
317 | plt.savefig('evolve.png', dpi=200)
318 | print('\nPlot saved as evolve.png')
319 |
320 |
321 | def plot_results_overlay(start=0, stop=0): # from utils.general import *; plot_results_overlay()
322 | # Plot training 'results*.txt', overlaying train and val losses
323 | s = ['train', 'train', 'train', 'Precision', 'mAP@0.5', 'val', 'val', 'val', 'Recall', 'mAP@0.5:0.95'] # legends
324 | t = ['Box', 'Objectness', 'Classification', 'P-R', 'mAP-F1'] # titles
325 | for f in sorted(glob.glob('results*.txt') + glob.glob('../../Downloads/results*.txt')):
326 | results = np.loadtxt(f, usecols=[2, 3, 4, 8, 9, 12, 13, 14, 10, 11], ndmin=2).T
327 | n = results.shape[1] # number of rows
328 | x = range(start, min(stop, n) if stop else n)
329 | fig, ax = plt.subplots(1, 5, figsize=(14, 3.5), tight_layout=True)
330 | ax = ax.ravel()
331 | for i in range(5):
332 | for j in [i, i + 5]:
333 | y = results[j, x]
334 | ax[i].plot(x, y, marker='.', label=s[j])
335 | # y_smooth = butter_lowpass_filtfilt(y)
336 | # ax[i].plot(x, np.gradient(y_smooth), marker='.', label=s[j])
337 |
338 | ax[i].set_title(t[i])
339 | ax[i].legend()
340 | ax[i].set_ylabel(f) if i == 0 else None # add filename
341 | fig.savefig(f.replace('.txt', '.png'), dpi=200)
342 |
343 |
344 | def plot_results(start=0, stop=0, bucket='', id=(), labels=(), save_dir=''):
345 | # from utils.general import *; plot_results(save_dir='runs/train/exp0')
346 | # Plot training 'results*.txt'
347 | fig, ax = plt.subplots(2, 5, figsize=(12, 6))
348 | ax = ax.ravel()
349 | s = ['Box', 'Objectness', 'Classification', 'Precision', 'Recall',
350 | 'val Box', 'val Objectness', 'val Classification', 'mAP@0.5', 'mAP@0.5:0.95']
351 | if bucket:
352 | # os.system('rm -rf storage.googleapis.com')
353 | # files = ['https://storage.googleapis.com/%s/results%g.txt' % (bucket, x) for x in id]
354 | files = ['%g.txt' % x for x in id]
355 | c = ('gsutil cp ' + '%s ' * len(files) + '.') % tuple('gs://%s/%g.txt' % (bucket, x) for x in id)
356 | os.system(c)
357 | else:
358 | files = glob.glob(str(Path(save_dir) / '*.txt')) + glob.glob('../../Downloads/results*.txt')
359 | assert len(files), 'No results.txt files found in %s, nothing to plot.' % os.path.abspath(save_dir)
360 | for fi, f in enumerate(files):
361 | try:
362 | results = np.loadtxt(f, usecols=[2, 3, 4, 8, 9, 12, 13, 14, 10, 11], ndmin=2).T
363 | n = results.shape[1] # number of rows
364 | x = range(start, min(stop, n) if stop else n)
365 | for i in range(10):
366 | y = results[i, x]
367 | if i in [0, 1, 2, 5, 6, 7]:
368 | y[y == 0] = np.nan # don't show zero loss values
369 | # y /= y[0] # normalize
370 | label = labels[fi] if len(labels) else Path(f).stem
371 | ax[i].plot(x, y, marker='.', label=label, linewidth=1, markersize=6)
372 | ax[i].set_title(s[i])
373 | # if i in [5, 6, 7]: # share train and val loss y axes
374 | # ax[i].get_shared_y_axes().join(ax[i], ax[i - 5])
375 | except Exception as e:
376 | print('Warning: Plotting error for %s; %s' % (f, e))
377 |
378 | fig.tight_layout()
379 | ax[1].legend()
380 | fig.savefig(Path(save_dir) / 'results.png', dpi=200)
381 |
--------------------------------------------------------------------------------
/utils/torch_utils.py:
--------------------------------------------------------------------------------
1 | # PyTorch utils
2 |
3 | import logging
4 | import math
5 | import os
6 | import time
7 | from contextlib import contextmanager
8 | from copy import deepcopy
9 |
10 | import torch
11 | import torch.backends.cudnn as cudnn
12 | import torch.nn as nn
13 | import torch.nn.functional as F
14 | import torchvision
15 |
16 | logger = logging.getLogger(__name__)
17 |
18 |
19 | @contextmanager
20 | def torch_distributed_zero_first(local_rank: int):
21 | """
22 | Decorator to make all processes in distributed training wait for each local_master to do something.
23 | """
24 | if local_rank not in [-1, 0]:
25 | torch.distributed.barrier()
26 | yield
27 | if local_rank == 0:
28 | torch.distributed.barrier()
29 |
30 |
31 | def init_torch_seeds(seed=0):
32 | # Speed-reproducibility tradeoff https://pytorch.org/docs/stable/notes/randomness.html
33 | torch.manual_seed(seed)
34 | if seed == 0: # slower, more reproducible
35 | cudnn.deterministic = True
36 | cudnn.benchmark = False
37 | else: # faster, less reproducible
38 | cudnn.deterministic = False
39 | cudnn.benchmark = True
40 |
41 |
42 | def select_device(device='', batch_size=None):
43 | # device = 'cpu' or '0' or '0,1,2,3'
44 | cpu_request = device.lower() == 'cpu'
45 | if device and not cpu_request: # if device requested other than 'cpu'
46 | os.environ['CUDA_VISIBLE_DEVICES'] = device # set environment variable
47 | assert torch.cuda.is_available(), 'CUDA unavailable, invalid device %s requested' % device # check availablity
48 |
49 | cuda = False if cpu_request else torch.cuda.is_available()
50 | if cuda:
51 | c = 1024 ** 2 # bytes to MB
52 | ng = torch.cuda.device_count()
53 | if ng > 1 and batch_size: # check that batch_size is compatible with device_count
54 | assert batch_size % ng == 0, 'batch-size %g not multiple of GPU count %g' % (batch_size, ng)
55 | x = [torch.cuda.get_device_properties(i) for i in range(ng)]
56 | s = f'Using torch {torch.__version__} '
57 | for i in range(0, ng):
58 | if i == 1:
59 | s = ' ' * len(s)
60 | logger.info("%sCUDA:%g (%s, %dMB)" % (s, i, x[i].name, x[i].total_memory / c))
61 | else:
62 | logger.info(f'Using torch {torch.__version__} CPU')
63 |
64 | logger.info('') # skip a line
65 | return torch.device('cuda:0' if cuda else 'cpu')
66 |
67 |
68 | def time_synchronized():
69 | torch.cuda.synchronize() if torch.cuda.is_available() else None
70 | return time.time()
71 |
72 |
73 | def is_parallel(model):
74 | return type(model) in (nn.parallel.DataParallel, nn.parallel.DistributedDataParallel)
75 |
76 |
77 | def intersect_dicts(da, db, exclude=()):
78 | # Dictionary intersection of matching keys and shapes, omitting 'exclude' keys, using da values
79 | return {k: v for k, v in da.items() if k in db and not any(x in k for x in exclude) and v.shape == db[k].shape}
80 |
81 |
82 | def initialize_weights(model):
83 | for m in model.modules():
84 | t = type(m)
85 | if t is nn.Conv2d:
86 | pass # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
87 | elif t is nn.BatchNorm2d:
88 | m.eps = 1e-3
89 | m.momentum = 0.03
90 | elif t in [nn.Hardswish, nn.LeakyReLU, nn.ReLU, nn.ReLU6]:
91 | m.inplace = True
92 |
93 |
94 | def find_modules(model, mclass=nn.Conv2d):
95 | # Finds layer indices matching module class 'mclass'
96 | return [i for i, m in enumerate(model.module_list) if isinstance(m, mclass)]
97 |
98 |
99 | def sparsity(model):
100 | # Return global model sparsity
101 | a, b = 0., 0.
102 | for p in model.parameters():
103 | a += p.numel()
104 | b += (p == 0).sum()
105 | return b / a
106 |
107 |
108 | def prune(model, amount=0.3):
109 | # Prune model to requested global sparsity
110 | import torch.nn.utils.prune as prune
111 | print('Pruning model... ', end='')
112 | for name, m in model.named_modules():
113 | if isinstance(m, nn.Conv2d):
114 | prune.l1_unstructured(m, name='weight', amount=amount) # prune
115 | prune.remove(m, 'weight') # make permanent
116 | print(' %.3g global sparsity' % sparsity(model))
117 |
118 |
119 | def fuse_conv_and_bn(conv, bn):
120 | # Fuse convolution and batchnorm layers https://tehnokv.com/posts/fusing-batchnorm-and-conv/
121 | fusedconv = nn.Conv2d(conv.in_channels,
122 | conv.out_channels,
123 | kernel_size=conv.kernel_size,
124 | stride=conv.stride,
125 | padding=conv.padding,
126 | groups=conv.groups,
127 | bias=True).requires_grad_(False).to(conv.weight.device)
128 |
129 | # prepare filters
130 | w_conv = conv.weight.clone().view(conv.out_channels, -1)
131 | w_bn = torch.diag(bn.weight.div(torch.sqrt(bn.eps + bn.running_var)))
132 | fusedconv.weight.copy_(torch.mm(w_bn, w_conv).view(fusedconv.weight.size()))
133 |
134 | # prepare spatial bias
135 | b_conv = torch.zeros(conv.weight.size(0), device=conv.weight.device) if conv.bias is None else conv.bias
136 | b_bn = bn.bias - bn.weight.mul(bn.running_mean).div(torch.sqrt(bn.running_var + bn.eps))
137 | fusedconv.bias.copy_(torch.mm(w_bn, b_conv.reshape(-1, 1)).reshape(-1) + b_bn)
138 |
139 | return fusedconv
140 |
141 |
142 | def model_info(model, verbose=False, img_size=640):
143 | # Model information. img_size may be int or list, i.e. img_size=640 or img_size=[640, 320]
144 | n_p = sum(x.numel() for x in model.parameters()) # number parameters
145 | n_g = sum(x.numel() for x in model.parameters() if x.requires_grad) # number gradients
146 | if verbose:
147 | print('%5s %40s %9s %12s %20s %10s %10s' % ('layer', 'name', 'gradient', 'parameters', 'shape', 'mu', 'sigma'))
148 | for i, (name, p) in enumerate(model.named_parameters()):
149 | name = name.replace('module_list.', '')
150 | print('%5g %40s %9s %12g %20s %10.3g %10.3g' %
151 | (i, name, p.requires_grad, p.numel(), list(p.shape), p.mean(), p.std()))
152 |
153 | try: # FLOPS
154 | from thop import profile
155 | flops = profile(deepcopy(model), inputs=(torch.zeros(1, 3, img_size, img_size),), verbose=False)[0] / 1E9 * 2
156 | img_size = img_size if isinstance(img_size, list) else [img_size, img_size] # expand if int/float
157 | fs = ', %.9f GFLOPS' % (flops) # 640x640 FLOPS
158 | except (ImportError, Exception):
159 | fs = ''
160 |
161 | logger.info(f"Model Summary: {len(list(model.modules()))} layers, {n_p} parameters, {n_g} gradients{fs}")
162 |
163 |
164 | def load_classifier(name='resnet101', n=2):
165 | # Loads a pretrained model reshaped to n-class output
166 | model = torchvision.models.__dict__[name](pretrained=True)
167 |
168 | # ResNet model properties
169 | # input_size = [3, 224, 224]
170 | # input_space = 'RGB'
171 | # input_range = [0, 1]
172 | # mean = [0.485, 0.456, 0.406]
173 | # std = [0.229, 0.224, 0.225]
174 |
175 | # Reshape output to n classes
176 | filters = model.fc.weight.shape[1]
177 | model.fc.bias = nn.Parameter(torch.zeros(n), requires_grad=True)
178 | model.fc.weight = nn.Parameter(torch.zeros(n, filters), requires_grad=True)
179 | model.fc.out_features = n
180 | return model
181 |
182 |
183 | def scale_img(img, ratio=1.0, same_shape=False): # img(16,3,256,416), r=ratio
184 | # scales img(bs,3,y,x) by ratio
185 | if ratio == 1.0:
186 | return img
187 | else:
188 | h, w = img.shape[2:]
189 | s = (int(h * ratio), int(w * ratio)) # new size
190 | img = F.interpolate(img, size=s, mode='bilinear', align_corners=False) # resize
191 | if not same_shape: # pad/crop img
192 | gs = 32 # (pixels) grid size
193 | h, w = [math.ceil(x * ratio / gs) * gs for x in (h, w)]
194 | return F.pad(img, [0, w - s[1], 0, h - s[0]], value=0.447) # value = imagenet mean
195 |
196 |
197 | def copy_attr(a, b, include=(), exclude=()):
198 | # Copy attributes from b to a, options to only include [...] and to exclude [...]
199 | for k, v in b.__dict__.items():
200 | if (len(include) and k not in include) or k.startswith('_') or k in exclude:
201 | continue
202 | else:
203 | setattr(a, k, v)
204 |
205 |
206 | class ModelEMA:
207 | """ Model Exponential Moving Average from https://github.com/rwightman/pytorch-image-models
208 | Keep a moving average of everything in the model state_dict (parameters and buffers).
209 | This is intended to allow functionality like
210 | https://www.tensorflow.org/api_docs/python/tf/train/ExponentialMovingAverage
211 | A smoothed version of the weights is necessary for some training schemes to perform well.
212 | This class is sensitive where it is initialized in the sequence of model init,
213 | GPU assignment and distributed training wrappers.
214 | """
215 |
216 | def __init__(self, model, decay=0.9999, updates=0):
217 | # Create EMA
218 | self.ema = deepcopy(model.module if is_parallel(model) else model).eval() # FP32 EMA
219 | # if next(model.parameters()).device.type != 'cpu':
220 | # self.ema.half() # FP16 EMA
221 | self.updates = updates # number of EMA updates
222 | self.decay = lambda x: decay * (1 - math.exp(-x / 2000)) # decay exponential ramp (to help early epochs)
223 | for p in self.ema.parameters():
224 | p.requires_grad_(False)
225 |
226 | def update(self, model):
227 | # Update EMA parameters
228 | with torch.no_grad():
229 | self.updates += 1
230 | d = self.decay(self.updates)
231 |
232 | msd = model.module.state_dict() if is_parallel(model) else model.state_dict() # model state_dict
233 | for k, v in self.ema.state_dict().items():
234 | if v.dtype.is_floating_point:
235 | v *= d
236 | v += (1. - d) * msd[k].detach()
237 |
238 | def update_attr(self, model, include=(), exclude=('process_group', 'reducer')):
239 | # Update EMA attributes
240 | copy_attr(self.ema, model, include, exclude)
241 |
--------------------------------------------------------------------------------
/weights/put your weights file here.txt:
--------------------------------------------------------------------------------
1 | yolov4-paspp.pt
2 | yolov4-pacsp-s.pt
3 | yolov4-pacsp.pt
4 | yolov4-pacsp-x.pt
--------------------------------------------------------------------------------