├── README.md ├── cfg ├── yolov4-csp-leaky.cfg ├── yolov4-csp-mish.cfg ├── yolov4-csp-s-leaky.cfg ├── yolov4-csp-s-mish.cfg ├── yolov4-csp-x-leaky.cfg ├── yolov4-csp-x-mish.cfg ├── yolov4-pacsp-mish.cfg ├── yolov4-pacsp-s-mish.cfg ├── yolov4-pacsp-s.cfg ├── yolov4-pacsp-x-mish.cfg ├── yolov4-pacsp-x.cfg ├── yolov4-pacsp.cfg ├── yolov4-paspp.cfg ├── yolov4-tiny.cfg └── yolov4.cfg ├── data ├── coco.data ├── coco.names ├── coco.yaml ├── coco1.data ├── coco1.txt ├── coco16.data ├── coco16.txt ├── coco1cls.data ├── coco1cls.txt ├── coco2014.data ├── coco2017.data ├── coco64.data ├── coco64.txt ├── coco_paper.names ├── get_coco2014.sh ├── get_coco2017.sh ├── hyp.scratch.s.yaml ├── hyp.scratch.yaml └── samples │ ├── bus.jpg │ └── zidane.jpg ├── detect.py ├── images └── scalingCSP.png ├── models ├── export.py └── models.py ├── requirements.txt ├── test.py ├── train.py ├── utils ├── __init__.py ├── activations.py ├── adabound.py ├── autoanchor.py ├── datasets.py ├── evolve.sh ├── gcp.sh ├── general.py ├── google_utils.py ├── layers.py ├── loss.py ├── metrics.py ├── parse_config.py ├── plots.py ├── torch_utils.py └── utils.py └── weights └── put your weights file here.txt /README.md: -------------------------------------------------------------------------------- 1 | # YOLOv4 2 | 3 | This is PyTorch implementation of [YOLOv4](https://github.com/AlexeyAB/darknet) which is based on [ultralytics/yolov3](https://github.com/ultralytics/yolov3). 4 | 5 | * [[original Darknet implementation of YOLOv4]](https://github.com/AlexeyAB/darknet) 6 | 7 | * [[ultralytics/yolov5 based PyTorch implementation of YOLOv4]](https://github.com/WongKinYiu/PyTorch_YOLOv4/tree/u5). 8 | 9 | ### development log 10 | 11 |

Expand

12 | 13 | * `2021-10-31` - support [RS loss](https://arxiv.org/abs/2107.11669), [aLRP loss](https://arxiv.org/abs/2009.13592), [AP loss](https://arxiv.org/abs/2008.07294). 14 | * `2021-10-30` - support [alpha IoU](https://arxiv.org/abs/2110.13675). 15 | * `2021-10-20` - design resolution calibration methods. 16 | * `2021-10-15` - support joint detection, instance segmentation, and semantic segmentation. [`seg-yolo`]() 17 | * `2021-10-13` - design ratio yolo. 18 | * `2021-09-22` - pytorch 1.9 compatibility. 19 | * `2021-09-21` - support [DIM](https://arxiv.org/abs/1808.06670). 20 | * `2021-09-16` - support [Dynamic Head](https://arxiv.org/abs/2106.08322). 21 | * `2021-08-28` - design domain adaptive training. 22 | * `2021-08-22` - design re-balance models. 23 | * `2021-08-21` - support [simOTA](https://arxiv.org/abs/2107.08430). 24 | * `2021-08-14` - design approximation-based methods. 25 | * `2021-07-27` - design new decoders. 26 | * `2021-07-22` - support 1) decoupled head, 2) anchor-free, and 3) multi positives in [yolox](https://arxiv.org/abs/2107.08430). 27 | * `2021-07-10` - design distribution-based implicit modeling. 28 | * `2021-07-06` - support outlooker attention. [`volo`](https://arxiv.org/abs/2106.13112) 29 | * `2021-07-06` - design self emsemble training method. 30 | * `2021-06-23` - design cross multi-stage correlation module. 31 | * `2021-06-18` - design cross stage cross correlation module. 32 | * `2021-06-17` - support cross correlation module. [`ccn`](https://arxiv.org/abs/2010.12138) 33 | * `2021-06-17` - support attention modules. [`cbam`](https://arxiv.org/abs/1807.06521) [`saan`](https://arxiv.org/abs/2010.12138) 34 | * `2021-04-20` - support swin transformer. [`swin`](https://arxiv.org/abs/2103.14030) 35 | * `2021-03-16` - design new stem layers. 36 | * `2021-03-13` - design implicit modeling. [`nn`]() [`mf`]() [`lc`]() 37 | * `2021-01-26` - support vision transformer. [`tr`](https://arxiv.org/abs/2010.11929) 38 | * `2021-01-26` - design mask objectness. 39 | * `2021-01-25` - design rotate augmentation. 40 | * `2021-01-23` - design collage augmentation. 41 | * `2021-01-22` - support [VoVNet](https://arxiv.org/abs/1904.09730), [VoVNetv2](https://arxiv.org/abs/1911.06667). 42 | * `2021-01-22` - support [EIoU](https://arxiv.org/abs/2101.08158). 43 | * `2021-01-19` - support instance segmentation. [`mask-yolo`]() 44 | * `2021-01-17` - support anchor-free-based methods. [`center-yolo`]() 45 | * `2021-01-14` - support joint detection and classification. [`classify-yolo`]() 46 | * `2020-01-02` - design new [PRN](https://github.com/WongKinYiu/PartialResidualNetworks) and [CSP](https://github.com/WongKinYiu/CrossStagePartialNetworks)-based models. 47 | * `2020-12-22` - support transfer learning. 48 | * `2020-12-18` - support non-local series self-attention blocks. [`gc`](https://arxiv.org/abs/1904.11492) [`dnl`](https://arxiv.org/abs/2006.06668) 49 | * `2020-12-16` - support down-sampling blocks in cspnet paper. [`down-c`]() [`down-d`](https://arxiv.org/abs/1812.01187) 50 | * `2020-12-03` - support imitation learning. 51 | * `2020-12-02` - support [squeeze and excitation](https://arxiv.org/abs/1709.01507). 52 | * `2020-11-26` - support multi-class multi-anchor joint detection and embedding. 53 | * `2020-11-25` - support [joint detection and embedding](https://arxiv.org/abs/1909.12605). [`track-yolo`]() 54 | * `2020-11-23` - support teacher-student learning. 55 | * `2020-11-17` - pytorch 1.7 compatibility. 56 | * `2020-11-06` - support inference with initial weights. 57 | * `2020-10-21` - fully supported by darknet. 58 | * `2020-09-18` - design fine-tune methods. 59 | * `2020-08-29` - support [deformable kernel](https://arxiv.org/abs/1910.02940). 60 | * `2020-08-25` - pytorch 1.6 compatibility. 61 | * `2020-08-24` - support channel last training/testing. 62 | * `2020-08-16` - design CSPPRN. 63 | * `2020-08-15` - design deeper model. [`csp-p6-mish`]() 64 | * `2020-08-11` - support [HarDNet](https://arxiv.org/abs/1909.00948). [`hard39-pacsp`]() [`hard68-pacsp`]() [`hard85-pacsp`]() 65 | * `2020-08-10` - add DDP training. 66 | * `2020-08-06` - support [DCN](https://arxiv.org/abs/1703.06211), [DCNv2](https://arxiv.org/abs/1811.11168). [`yolov4-dcn`]() 67 | * `2020-08-01` - add pytorch hub. 68 | * `2020-07-31` - support [ResNet](https://arxiv.org/abs/1512.03385), [ResNeXt](https://arxiv.org/abs/1611.05431), [CSPResNet](https://github.com/WongKinYiu/CrossStagePartialNetworks), [CSPResNeXt](https://github.com/WongKinYiu/CrossStagePartialNetworks). [`r50-pacsp`]() [`x50-pacsp`]() [`cspr50-pacsp`]() [`cspx50-pacsp`]() 69 | * `2020-07-28` - support [SAM](https://arxiv.org/abs/2004.10934). [`yolov4-pacsp-sam`]() 70 | * `2020-07-24` - update api. 71 | * `2020-07-23` - support CUDA accelerated Mish activation function. 72 | * `2020-07-19` - support and training tiny YOLOv4. [`yolov4-tiny`]() 73 | * `2020-07-15` - design and training conditional YOLOv4. [`yolov4-pacsp-conditional`]() 74 | * `2020-07-13` - support [MixUp](https://arxiv.org/abs/1710.09412) data augmentation. 75 | * `2020-07-03` - design new stem layers. 76 | * `2020-06-16` - support floating16 of GPU inference. 77 | * `2020-06-14` - convert .pt to .weights for darknet fine-tuning. 78 | * `2020-06-13` - update multi-scale training strategy. 79 | * `2020-06-12` - design scaled YOLOv4 follow [ultralytics](https://github.com/ultralytics/yolov5). [`yolov4-pacsp-s`]() [`yolov4-pacsp-m`]() [`yolov4-pacsp-l`]() [`yolov4-pacsp-x`]() 80 | * `2020-06-07` - design [scaling methods](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/images/scalingCSP.png) for CSP-based models. [`yolov4-pacsp-25`]() [`yolov4-pacsp-75`]() 81 | * `2020-06-03` - update COCO2014 to COCO2017. 82 | * `2020-05-30` - update FPN neck to CSPFPN. [`yolov4-yocsp`]() [`yolov4-yocsp-mish`]() 83 | * `2020-05-24` - update neck of YOLOv4 to CSPPAN. [`yolov4-pacsp`]() [`yolov4-pacsp-mish`]() 84 | * `2020-05-15` - training YOLOv4 with Mish activation function. [`yolov4-yospp-mish`]() [`yolov4-paspp-mish`]() 85 | * `2020-05-08` - design and training YOLOv4 with [FPN](https://arxiv.org/abs/1612.03144) neck. [`yolov4-yospp`]() 86 | * `2020-05-01` - training YOLOv4 with Leaky activation function using PyTorch. [`yolov4-paspp`]() [`PAN`](https://arxiv.org/abs/1803.01534) 87 | 88 |

89 | 90 | ## Pretrained Models & Comparison 91 | 92 | 93 | | Model | Test Size | AP^test | AP₅₀^test | AP₇₅^test | AP_S^test | AP_M^test | AP_L^test | cfg | weights | 94 | | :-- | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | 95 | | **YOLOv4** | 640 | 50.0% | 68.4% | 54.7% | 30.5% | 54.3% | 63.3% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4.weights) | 96 | | | | | | | | | 97 | | **YOLOv4**_pacsp-s | 640 | 39.0% | 57.8% | 42.4% | 20.6% | 42.6% | 50.0% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-csp-s-leaky.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4-pacsp-s.weights) | 98 | | **YOLOv4**_pacsp | 640 | 49.8% | 68.4% | 54.3% | 30.1% | 54.0% | 63.4% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-csp-leaky.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4-pacsp.weights) | 99 | | **YOLOv4**_pacsp-x | 640 | **52.2%** | **70.5%** | **56.8%** | **32.7%** | **56.3%** | **65.9%** | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-csp-x-leaky.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4-pacsp-x.weights) | 100 | | | | | | | | | 101 | | **YOLOv4**_pacsp-s-mish | 640 | 40.8% | 59.5% | 44.3% | 22.4% | 44.6% | 51.8% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-csp-s-mish.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4-pacsp-s-mish.weights) | 102 | | **YOLOv4**_pacsp-mish | 640 | 50.9% | 69.4% | 55.5% | 31.2% | 55.0% | 64.7% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-csp-mish.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4-pacsp-mish.weights) | 103 | | **YOLOv4**_pacsp-x-mish | 640 | 52.8% | 71.1% | 57.5% | 33.6% | 56.9% | 66.6% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-csp-x-mish.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4-pacsp-x-mish.weights) | 104 | 105 | | Model | Test Size | AP^val | AP₅₀^val | AP₇₅^val | AP_S^val | AP_M^val | AP_L^val | cfg | weights | 106 | | :-- | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | 107 | | **YOLOv4** | 640 | 49.7% | 68.2% | 54.3% | 32.9% | 54.8% | 63.7% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4.weights) | 108 | | | | | | | | | 109 | | **YOLOv4**_pacsp-s | 640 | 38.9% | 57.7% | 42.2% | 21.9% | 43.3% | 51.9% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-csp-s-leaky.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4-pacsp-s.weights) | 110 | | **YOLOv4**_pacsp | 640 | 49.4% | 68.1% | 53.8% | 32.7% | 54.2% | 64.0% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-csp-leaky.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4-pacsp.weights) | 111 | | **YOLOv4**_pacsp-x | 640 | **51.6%** | **70.1%** | **56.2%** | **35.3%** | **56.4%** | **66.9%** | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-csp-x-leaky.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4-pacsp-x.weights) | 112 | | | | | | | | | 113 | | **YOLOv4**_pacsp-s-mish | 640 | 40.7% | 59.5% | 44.2% | 25.3% | 45.1% | 53.4% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-csp-s-mish.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4-pacsp-s-mish.weights) | 114 | | **YOLOv4**_pacsp-mish | 640 | 50.8% | 69.4% | 55.4% | 34.3% | 55.5% | 65.7% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-csp-mish.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4-pacsp-mish.weights) | 115 | | **YOLOv4**_pacsp-x-mish | 640 | 52.6% | 71.0% | 57.2% | 36.4% | 57.3% | 67.6% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-csp-x-mish.cfg) | [weights](https://github.com/WongKinYiu/PyTorch_YOLOv4/releases/download/weights/yolov4-pacsp-x-mish.weights) | 116 | 117 |

archive

118 | 119 | | Model | Test Size | AP^val | AP₅₀^val | AP₇₅^val | AP_S^val | AP_M^val | AP_L^val | cfg | weights | 120 | | :-- | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | 121 | | **YOLOv4** | 640 | 48.4% | 67.1% | 52.9% | 31.7% | 53.8% | 62.0% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4.cfg) | [weights](https://drive.google.com/file/d/14zPRaYxMOe7hXi6N-Vs_QbWs6ue_CZPd/view?usp=sharing) | 122 | | | | | | | | | 123 | | **YOLOv4**_pacsp-s | 640 | 37.0% | 55.7% | 40.0% | 20.2% | 41.6% | 48.4% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-s.cfg) | [weights](https://drive.google.com/file/d/1PiS9pF4tsydPN4-vMjiJPHjIOJMeRwWS/view?usp=sharing) | 124 | | **YOLOv4**_pacsp | 640 | 47.7% | 66.4% | 52.0% | 32.3% | 53.0% | 61.7% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp.cfg) | [weights](https://drive.google.com/file/d/1C7xwfYzPF4dKFAmDNCetdTCB_cPvsuwf/view?usp=sharing) | 125 | | **YOLOv4**_pacsp-x | 640 | **50.0%** | **68.3%** | **54.5%** | **33.9%** | **55.4%** | **63.7%** | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-x.cfg) | [weights](https://drive.google.com/file/d/1kWzJk5DJNlW9Xf2xR89OfmrEoeY9Szzj/view?usp=sharing) | 126 | | | | | | | | | 127 | | **YOLOv4**_pacsp-s-mish | 640 | 38.8% | 57.8% | 42.0% | 21.6% | 43.7% | 51.1% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-s-mish.cfg) | [weights](https://drive.google.com/file/d/1OiDhQqYH23GrP6f5vU2j_DvA8PqL0pcF/view?usp=sharing) | 128 | | **YOLOv4**_pacsp-mish | 640 | 48.8% | 67.2% | 53.4% | 31.5% | 54.4% | 62.2% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-mish.cfg) | [weights](https://drive.google.com/file/d/1mk9mkM0_B9e_QgPxF6pBIB6uXDxZENsk/view?usp=sharing) | 129 | | **YOLOv4**_pacsp-x-mish | 640 | 51.2% | 69.4% | 55.9% | 35.0% | 56.5% | 65.0% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-x-mish.cfg) | [weights](https://drive.google.com/file/d/1kZee29alFFnm1rlJieAyHzB3Niywew_0/view?usp=sharing) | 130 | 131 | | Model | Test Size | AP^val | AP₅₀^val | AP₇₅^val | AP_S^val | AP_M^val | AP_L^val | cfg | weights | 132 | | :-- | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | 133 | | **YOLOv4** | 672 | 47.7% | 66.7% | 52.1% | 30.5% | 52.6% | 61.4% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4.cfg) | [weights](https://drive.google.com/file/d/137U-oLekAu-J-fe0E_seTblVxnU3tlNC/view?usp=sharing) | 134 | | | | | | | | | 135 | | **YOLOv4**_pacsp-s | 672 | 36.6% | 55.5% | 39.6% | 21.2% | 41.1% | 47.0% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-s.cfg) | [weights](https://drive.google.com/file/d/1-QZc043NMNa_O0oLaB3r0XYKFRSktfsd/view?usp=sharing) | 136 | | **YOLOv4**_pacsp | 672 | 47.2% | 66.2% | 51.6% | 30.4% | 52.3% | 60.8% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp.cfg) | [weights](https://drive.google.com/file/d/1sIpu29jEBZ3VI_1uy2Q1f3iEzvIpBZbP/view?usp=sharing) | 137 | | **YOLOv4**_pacsp-x | 672 | **49.3%** | **68.1%** | **53.6%** | **31.8%** | **54.5%** | **63.6%** | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-x.cfg) | [weights](https://drive.google.com/file/d/1aZRfA2CD9SdIwmscbyp6rXZjGysDvaYv/view?usp=sharing) | 138 | | | | | | | | | 139 | | **YOLOv4**_pacsp-s-mish | 672 | 38.6% | 57.7% | 41.8% | 22.3% | 43.5% | 49.3% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-s-mish.cfg) | [weights](https://drive.google.com/file/d/1q0zbQKcSNSf_AxWQv6DAUPXeaTywPqVB/view?usp=sharing) | 140 | | (+BoF) | 640 | 39.9% | 59.1% | 43.1% | 24.4% | 45.2% | 51.4% | | [weights](https://drive.google.com/file/d/1-8PqBaI8oYb7TB9L-KMzvjZcK_VaGXCF/view?usp=sharing) | 141 | | **YOLOv4**_pacsp-mish | 672 | 48.1% | 66.9% | 52.3% | 30.8% | 53.4% | 61.7% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-mish.cfg) | [weights](https://drive.google.com/file/d/116yreAUTK_dTJErDuDVX2WTIBcd5YPSI/view?usp=sharing) | 142 | | (+BoF) | 640 | 49.3% | 68.2% | 53.8% | 31.9% | 54.9% | 62.8% | | [weights](https://drive.google.com/file/d/12qRrqDRlUElsR_TI97j4qkrttrNKKG3k/view?usp=sharing) | 143 | | **YOLOv4**_pacsp-x-mish | 672 | 50.0% | 68.5% | 54.4% | 32.9% | 54.9% | 64.0% | [cfg](https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/cfg/yolov4-pacsp-x-mish.cfg) | [weights](https://drive.google.com/file/d/1GGCrokkRZ06CZ5MUCVokbX1FF2e1DbPF/view?usp=sharing) | 144 | | (+BoF) | 640 | **51.0%** | **69.7%** | **55.5%** | **33.3%** | **56.2%** | **65.5%** | | [weights](https://drive.google.com/file/d/1lVmSqItSKywg6yk1qiCvgOYw55O03Qgj/view?usp=sharing) | 145 | | | | | | | | | 146 | 147 |

148 | 149 | ## Requirements 150 | 151 | docker (recommanded): 152 | ``` 153 | # create the docker container, you can change the share memory size if you have more. 154 | nvidia-docker run --name yolov4 -it -v your_coco_path/:/coco/ -v your_code_path/:/yolo --shm-size=64g nvcr.io/nvidia/pytorch:20.11-py3 155 | 156 | # apt install required packages 157 | apt update 158 | apt install -y zip htop screen libgl1-mesa-glx 159 | 160 | # pip install required packages 161 | pip install seaborn thop 162 | 163 | # install mish-cuda if you want to use mish activation 164 | # https://github.com/thomasbrandon/mish-cuda 165 | # https://github.com/JunnYu/mish-cuda 166 | cd / 167 | git clone https://github.com/JunnYu/mish-cuda 168 | cd mish-cuda 169 | python setup.py build install 170 | 171 | # go to code folder 172 | cd /yolo 173 | ``` 174 | 175 | local: 176 | ``` 177 | pip install -r requirements.txt 178 | ``` 179 | ※ For running Mish models, please install https://github.com/thomasbrandon/mish-cuda 180 | 181 | ## Training 182 | 183 | ``` 184 | python train.py --device 0 --batch-size 16 --img 640 640 --data coco.yaml --cfg cfg/yolov4-pacsp.cfg --weights '' --name yolov4-pacsp 185 | ``` 186 | 187 | ## Testing 188 | 189 | ``` 190 | python test.py --img 640 --conf 0.001 --batch 8 --device 0 --data coco.yaml --cfg cfg/yolov4-pacsp.cfg --weights weights/yolov4-pacsp.pt 191 | ``` 192 | 193 | ## Citation 194 | 195 | ``` 196 | @article{bochkovskiy2020yolov4, 197 | title={{YOLOv4}: Optimal Speed and Accuracy of Object Detection}, 198 | author={Bochkovskiy, Alexey and Wang, Chien-Yao and Liao, Hong-Yuan Mark}, 199 | journal={arXiv preprint arXiv:2004.10934}, 200 | year={2020} 201 | } 202 | ``` 203 | 204 | ``` 205 | @inproceedings{wang2020cspnet, 206 | title={{CSPNet}: A New Backbone That Can Enhance Learning Capability of {CNN}}, 207 | author={Wang, Chien-Yao and Mark Liao, Hong-Yuan and Wu, Yueh-Hua and Chen, Ping-Yang and Hsieh, Jun-Wei and Yeh, I-Hau}, 208 | booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops}, 209 | pages={390--391}, 210 | year={2020} 211 | } 212 | ``` 213 | 214 | ## Acknowledgements 215 | 216 | * [https://github.com/AlexeyAB/darknet](https://github.com/AlexeyAB/darknet) 217 | * [https://github.com/ultralytics/yolov3](https://github.com/ultralytics/yolov3) 218 | * [https://github.com/ultralytics/yolov5](https://github.com/ultralytics/yolov5) 219 | -------------------------------------------------------------------------------- /cfg/yolov4-csp-s-leaky.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | #batch=1 4 | #subdivisions=1 5 | # Training 6 | batch=64 7 | subdivisions=8 8 | width=512 9 | height=512 10 | channels=3 11 | momentum=0.949 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.00261 19 | burn_in=1000 20 | max_batches = 500500 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | #cutmix=1 26 | mosaic=1 27 | 28 | 29 | # ============ Backbone ============ # 30 | 31 | # Stem 32 | 33 | # 0 34 | [convolutional] 35 | batch_normalize=1 36 | filters=32 37 | size=3 38 | stride=1 39 | pad=1 40 | activation=leaky 41 | 42 | # P1 43 | 44 | # Downsample 45 | 46 | [convolutional] 47 | batch_normalize=1 48 | filters=32 49 | size=3 50 | stride=2 51 | pad=1 52 | activation=leaky 53 | 54 | # Residual Block 55 | 56 | [convolutional] 57 | batch_normalize=1 58 | filters=32 59 | size=1 60 | stride=1 61 | pad=1 62 | activation=leaky 63 | 64 | [convolutional] 65 | batch_normalize=1 66 | filters=32 67 | size=3 68 | stride=1 69 | pad=1 70 | activation=leaky 71 | 72 | # 4 (previous+1+3k) 73 | [shortcut] 74 | from=-3 75 | activation=linear 76 | 77 | # P2 78 | 79 | # Downsample 80 | 81 | [convolutional] 82 | batch_normalize=1 83 | filters=64 84 | size=3 85 | stride=2 86 | pad=1 87 | activation=leaky 88 | 89 | # Split 90 | 91 | [convolutional] 92 | batch_normalize=1 93 | filters=32 94 | size=1 95 | stride=1 96 | pad=1 97 | activation=leaky 98 | 99 | [route] 100 | layers = -2 101 | 102 | [convolutional] 103 | batch_normalize=1 104 | filters=32 105 | size=1 106 | stride=1 107 | pad=1 108 | activation=leaky 109 | 110 | # Residual Block 111 | 112 | [convolutional] 113 | batch_normalize=1 114 | filters=32 115 | size=1 116 | stride=1 117 | pad=1 118 | activation=leaky 119 | 120 | [convolutional] 121 | batch_normalize=1 122 | filters=32 123 | size=3 124 | stride=1 125 | pad=1 126 | activation=leaky 127 | 128 | [shortcut] 129 | from=-3 130 | activation=linear 131 | 132 | # Transition first 133 | 134 | [convolutional] 135 | batch_normalize=1 136 | filters=32 137 | size=1 138 | stride=1 139 | pad=1 140 | activation=leaky 141 | 142 | # Merge [-1, -(3k+4)] 143 | 144 | [route] 145 | layers = -1,-7 146 | 147 | # Transition last 148 | 149 | # 14 (previous+7+3k) 150 | [convolutional] 151 | batch_normalize=1 152 | filters=64 153 | size=1 154 | stride=1 155 | pad=1 156 | activation=leaky 157 | 158 | # P3 159 | 160 | # Downsample 161 | 162 | [convolutional] 163 | batch_normalize=1 164 | filters=128 165 | size=3 166 | stride=2 167 | pad=1 168 | activation=leaky 169 | 170 | # Split 171 | 172 | [convolutional] 173 | batch_normalize=1 174 | filters=64 175 | size=1 176 | stride=1 177 | pad=1 178 | activation=leaky 179 | 180 | [route] 181 | layers = -2 182 | 183 | [convolutional] 184 | batch_normalize=1 185 | filters=64 186 | size=1 187 | stride=1 188 | pad=1 189 | activation=leaky 190 | 191 | # Residual Block 192 | 193 | [convolutional] 194 | batch_normalize=1 195 | filters=64 196 | size=1 197 | stride=1 198 | pad=1 199 | activation=leaky 200 | 201 | [convolutional] 202 | batch_normalize=1 203 | filters=64 204 | size=3 205 | stride=1 206 | pad=1 207 | activation=leaky 208 | 209 | [shortcut] 210 | from=-3 211 | activation=linear 212 | 213 | # Transition first 214 | 215 | [convolutional] 216 | batch_normalize=1 217 | filters=64 218 | size=1 219 | stride=1 220 | pad=1 221 | activation=leaky 222 | 223 | # Merge [-1 -(4+3k)] 224 | 225 | [route] 226 | layers = -1,-7 227 | 228 | # Transition last 229 | 230 | # 24 (previous+7+3k) 231 | [convolutional] 232 | batch_normalize=1 233 | filters=128 234 | size=1 235 | stride=1 236 | pad=1 237 | activation=leaky 238 | 239 | # P4 240 | 241 | # Downsample 242 | 243 | [convolutional] 244 | batch_normalize=1 245 | filters=256 246 | size=3 247 | stride=2 248 | pad=1 249 | activation=leaky 250 | 251 | # Split 252 | 253 | [convolutional] 254 | batch_normalize=1 255 | filters=128 256 | size=1 257 | stride=1 258 | pad=1 259 | activation=leaky 260 | 261 | [route] 262 | layers = -2 263 | 264 | [convolutional] 265 | batch_normalize=1 266 | filters=128 267 | size=1 268 | stride=1 269 | pad=1 270 | activation=leaky 271 | 272 | # Residual Block 273 | 274 | [convolutional] 275 | batch_normalize=1 276 | filters=128 277 | size=1 278 | stride=1 279 | pad=1 280 | activation=leaky 281 | 282 | [convolutional] 283 | batch_normalize=1 284 | filters=128 285 | size=3 286 | stride=1 287 | pad=1 288 | activation=leaky 289 | 290 | [shortcut] 291 | from=-3 292 | activation=linear 293 | 294 | # Transition first 295 | 296 | [convolutional] 297 | batch_normalize=1 298 | filters=128 299 | size=1 300 | stride=1 301 | pad=1 302 | activation=leaky 303 | 304 | # Merge [-1 -(3k+4)] 305 | 306 | [route] 307 | layers = -1,-7 308 | 309 | # Transition last 310 | 311 | # 34 (previous+7+3k) 312 | [convolutional] 313 | batch_normalize=1 314 | filters=256 315 | size=1 316 | stride=1 317 | pad=1 318 | activation=leaky 319 | 320 | # P5 321 | 322 | # Downsample 323 | 324 | [convolutional] 325 | batch_normalize=1 326 | filters=512 327 | size=3 328 | stride=2 329 | pad=1 330 | activation=leaky 331 | 332 | # Split 333 | 334 | [convolutional] 335 | batch_normalize=1 336 | filters=256 337 | size=1 338 | stride=1 339 | pad=1 340 | activation=leaky 341 | 342 | [route] 343 | layers = -2 344 | 345 | [convolutional] 346 | batch_normalize=1 347 | filters=256 348 | size=1 349 | stride=1 350 | pad=1 351 | activation=leaky 352 | 353 | # Residual Block 354 | 355 | [convolutional] 356 | batch_normalize=1 357 | filters=256 358 | size=1 359 | stride=1 360 | pad=1 361 | activation=leaky 362 | 363 | [convolutional] 364 | batch_normalize=1 365 | filters=256 366 | size=3 367 | stride=1 368 | pad=1 369 | activation=leaky 370 | 371 | [shortcut] 372 | from=-3 373 | activation=linear 374 | 375 | # Transition first 376 | 377 | [convolutional] 378 | batch_normalize=1 379 | filters=256 380 | size=1 381 | stride=1 382 | pad=1 383 | activation=leaky 384 | 385 | # Merge [-1 -(3k+4)] 386 | 387 | [route] 388 | layers = -1,-7 389 | 390 | # Transition last 391 | 392 | # 44 (previous+7+3k) 393 | [convolutional] 394 | batch_normalize=1 395 | filters=512 396 | size=1 397 | stride=1 398 | pad=1 399 | activation=leaky 400 | 401 | # ============ End of Backbone ============ # 402 | 403 | # ============ Neck ============ # 404 | 405 | # CSPSPP 406 | 407 | [convolutional] 408 | batch_normalize=1 409 | filters=256 410 | size=1 411 | stride=1 412 | pad=1 413 | activation=leaky 414 | 415 | [route] 416 | layers = -2 417 | 418 | [convolutional] 419 | batch_normalize=1 420 | filters=256 421 | size=1 422 | stride=1 423 | pad=1 424 | activation=leaky 425 | 426 | ### SPP ### 427 | [maxpool] 428 | stride=1 429 | size=5 430 | 431 | [route] 432 | layers=-2 433 | 434 | [maxpool] 435 | stride=1 436 | size=9 437 | 438 | [route] 439 | layers=-4 440 | 441 | [maxpool] 442 | stride=1 443 | size=13 444 | 445 | [route] 446 | layers=-1,-3,-5,-6 447 | ### End SPP ### 448 | 449 | [convolutional] 450 | batch_normalize=1 451 | filters=256 452 | size=1 453 | stride=1 454 | pad=1 455 | activation=leaky 456 | 457 | [convolutional] 458 | batch_normalize=1 459 | size=3 460 | stride=1 461 | pad=1 462 | filters=256 463 | activation=leaky 464 | 465 | [route] 466 | layers = -1, -11 467 | 468 | # 57 (previous+6+5+2k) 469 | [convolutional] 470 | batch_normalize=1 471 | filters=256 472 | size=1 473 | stride=1 474 | pad=1 475 | activation=leaky 476 | 477 | # End of CSPSPP 478 | 479 | 480 | # FPN-4 481 | 482 | [convolutional] 483 | batch_normalize=1 484 | filters=128 485 | size=1 486 | stride=1 487 | pad=1 488 | activation=leaky 489 | 490 | [upsample] 491 | stride=2 492 | 493 | [route] 494 | layers = 34 495 | 496 | [convolutional] 497 | batch_normalize=1 498 | filters=128 499 | size=1 500 | stride=1 501 | pad=1 502 | activation=leaky 503 | 504 | [route] 505 | layers = -1, -3 506 | 507 | [convolutional] 508 | batch_normalize=1 509 | filters=128 510 | size=1 511 | stride=1 512 | pad=1 513 | activation=leaky 514 | 515 | # Split 516 | 517 | [convolutional] 518 | batch_normalize=1 519 | filters=128 520 | size=1 521 | stride=1 522 | pad=1 523 | activation=leaky 524 | 525 | [route] 526 | layers = -2 527 | 528 | # Plain Block 529 | 530 | [convolutional] 531 | batch_normalize=1 532 | filters=128 533 | size=1 534 | stride=1 535 | pad=1 536 | activation=leaky 537 | 538 | [convolutional] 539 | batch_normalize=1 540 | size=3 541 | stride=1 542 | pad=1 543 | filters=128 544 | activation=leaky 545 | 546 | # Merge [-1, -(2k+2)] 547 | 548 | [route] 549 | layers = -1, -4 550 | 551 | # Transition last 552 | 553 | # 69 (previous+6+4+2k) 554 | [convolutional] 555 | batch_normalize=1 556 | filters=128 557 | size=1 558 | stride=1 559 | pad=1 560 | activation=leaky 561 | 562 | 563 | # FPN-3 564 | 565 | [convolutional] 566 | batch_normalize=1 567 | filters=64 568 | size=1 569 | stride=1 570 | pad=1 571 | activation=leaky 572 | 573 | [upsample] 574 | stride=2 575 | 576 | [route] 577 | layers = 24 578 | 579 | [convolutional] 580 | batch_normalize=1 581 | filters=64 582 | size=1 583 | stride=1 584 | pad=1 585 | activation=leaky 586 | 587 | [route] 588 | layers = -1, -3 589 | 590 | [convolutional] 591 | batch_normalize=1 592 | filters=64 593 | size=1 594 | stride=1 595 | pad=1 596 | activation=leaky 597 | 598 | # Split 599 | 600 | [convolutional] 601 | batch_normalize=1 602 | filters=64 603 | size=1 604 | stride=1 605 | pad=1 606 | activation=leaky 607 | 608 | [route] 609 | layers = -2 610 | 611 | # Plain Block 612 | 613 | [convolutional] 614 | batch_normalize=1 615 | filters=64 616 | size=1 617 | stride=1 618 | pad=1 619 | activation=leaky 620 | 621 | [convolutional] 622 | batch_normalize=1 623 | size=3 624 | stride=1 625 | pad=1 626 | filters=64 627 | activation=leaky 628 | 629 | # Merge [-1, -(2k+2)] 630 | 631 | [route] 632 | layers = -1, -4 633 | 634 | # Transition last 635 | 636 | # 81 (previous+6+4+2k) 637 | [convolutional] 638 | batch_normalize=1 639 | filters=64 640 | size=1 641 | stride=1 642 | pad=1 643 | activation=leaky 644 | 645 | 646 | # PAN-4 647 | 648 | [convolutional] 649 | batch_normalize=1 650 | size=3 651 | stride=2 652 | pad=1 653 | filters=128 654 | activation=leaky 655 | 656 | [route] 657 | layers = -1, 69 658 | 659 | [convolutional] 660 | batch_normalize=1 661 | filters=128 662 | size=1 663 | stride=1 664 | pad=1 665 | activation=leaky 666 | 667 | # Split 668 | 669 | [convolutional] 670 | batch_normalize=1 671 | filters=128 672 | size=1 673 | stride=1 674 | pad=1 675 | activation=leaky 676 | 677 | [route] 678 | layers = -2 679 | 680 | # Plain Block 681 | 682 | [convolutional] 683 | batch_normalize=1 684 | filters=128 685 | size=1 686 | stride=1 687 | pad=1 688 | activation=leaky 689 | 690 | [convolutional] 691 | batch_normalize=1 692 | size=3 693 | stride=1 694 | pad=1 695 | filters=128 696 | activation=leaky 697 | 698 | [route] 699 | layers = -1,-4 700 | 701 | # Transition last 702 | 703 | # 90 (previous+3+4+2k) 704 | [convolutional] 705 | batch_normalize=1 706 | filters=128 707 | size=1 708 | stride=1 709 | pad=1 710 | activation=leaky 711 | 712 | 713 | # PAN-5 714 | 715 | [convolutional] 716 | batch_normalize=1 717 | size=3 718 | stride=2 719 | pad=1 720 | filters=256 721 | activation=leaky 722 | 723 | [route] 724 | layers = -1, 57 725 | 726 | [convolutional] 727 | batch_normalize=1 728 | filters=256 729 | size=1 730 | stride=1 731 | pad=1 732 | activation=leaky 733 | 734 | # Split 735 | 736 | [convolutional] 737 | batch_normalize=1 738 | filters=256 739 | size=1 740 | stride=1 741 | pad=1 742 | activation=leaky 743 | 744 | [route] 745 | layers = -2 746 | 747 | # Plain Block 748 | 749 | [convolutional] 750 | batch_normalize=1 751 | filters=256 752 | size=1 753 | stride=1 754 | pad=1 755 | activation=leaky 756 | 757 | [convolutional] 758 | batch_normalize=1 759 | size=3 760 | stride=1 761 | pad=1 762 | filters=256 763 | activation=leaky 764 | 765 | [route] 766 | layers = -1,-4 767 | 768 | # Transition last 769 | 770 | # 99 (previous+3+4+2k) 771 | [convolutional] 772 | batch_normalize=1 773 | filters=256 774 | size=1 775 | stride=1 776 | pad=1 777 | activation=leaky 778 | 779 | # ============ End of Neck ============ # 780 | 781 | # ============ Head ============ # 782 | 783 | # YOLO-3 784 | 785 | [route] 786 | layers = 81 787 | 788 | [convolutional] 789 | batch_normalize=1 790 | size=3 791 | stride=1 792 | pad=1 793 | filters=128 794 | activation=leaky 795 | 796 | [convolutional] 797 | size=1 798 | stride=1 799 | pad=1 800 | filters=255 801 | activation=linear 802 | 803 | [yolo] 804 | mask = 0,1,2 805 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 806 | classes=80 807 | num=9 808 | jitter=.3 809 | ignore_thresh = .7 810 | truth_thresh = 1 811 | random=1 812 | scale_x_y = 1.05 813 | iou_thresh=0.213 814 | cls_normalizer=1.0 815 | iou_normalizer=0.07 816 | iou_loss=ciou 817 | nms_kind=greedynms 818 | beta_nms=0.6 819 | 820 | 821 | # YOLO-4 822 | 823 | [route] 824 | layers = 90 825 | 826 | [convolutional] 827 | batch_normalize=1 828 | size=3 829 | stride=1 830 | pad=1 831 | filters=256 832 | activation=leaky 833 | 834 | [convolutional] 835 | size=1 836 | stride=1 837 | pad=1 838 | filters=255 839 | activation=linear 840 | 841 | [yolo] 842 | mask = 3,4,5 843 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 844 | classes=80 845 | num=9 846 | jitter=.3 847 | ignore_thresh = .7 848 | truth_thresh = 1 849 | random=1 850 | scale_x_y = 1.05 851 | iou_thresh=0.213 852 | cls_normalizer=1.0 853 | iou_normalizer=0.07 854 | iou_loss=ciou 855 | nms_kind=greedynms 856 | beta_nms=0.6 857 | 858 | 859 | # YOLO-5 860 | 861 | [route] 862 | layers = 99 863 | 864 | [convolutional] 865 | batch_normalize=1 866 | size=3 867 | stride=1 868 | pad=1 869 | filters=512 870 | activation=leaky 871 | 872 | [convolutional] 873 | size=1 874 | stride=1 875 | pad=1 876 | filters=255 877 | activation=linear 878 | 879 | [yolo] 880 | mask = 6,7,8 881 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 882 | classes=80 883 | num=9 884 | jitter=.3 885 | ignore_thresh = .7 886 | truth_thresh = 1 887 | random=1 888 | scale_x_y = 1.05 889 | iou_thresh=0.213 890 | cls_normalizer=1.0 891 | iou_normalizer=0.07 892 | iou_loss=ciou 893 | nms_kind=greedynms 894 | beta_nms=0.6 -------------------------------------------------------------------------------- /cfg/yolov4-csp-s-mish.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | #batch=1 4 | #subdivisions=1 5 | # Training 6 | batch=64 7 | subdivisions=8 8 | width=512 9 | height=512 10 | channels=3 11 | momentum=0.949 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.00261 19 | burn_in=1000 20 | max_batches = 500500 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | #cutmix=1 26 | mosaic=1 27 | 28 | 29 | # ============ Backbone ============ # 30 | 31 | # Stem 32 | 33 | # 0 34 | [convolutional] 35 | batch_normalize=1 36 | filters=32 37 | size=3 38 | stride=1 39 | pad=1 40 | activation=mish 41 | 42 | # P1 43 | 44 | # Downsample 45 | 46 | [convolutional] 47 | batch_normalize=1 48 | filters=32 49 | size=3 50 | stride=2 51 | pad=1 52 | activation=mish 53 | 54 | # Residual Block 55 | 56 | [convolutional] 57 | batch_normalize=1 58 | filters=32 59 | size=1 60 | stride=1 61 | pad=1 62 | activation=mish 63 | 64 | [convolutional] 65 | batch_normalize=1 66 | filters=32 67 | size=3 68 | stride=1 69 | pad=1 70 | activation=mish 71 | 72 | # 4 (previous+1+3k) 73 | [shortcut] 74 | from=-3 75 | activation=linear 76 | 77 | # P2 78 | 79 | # Downsample 80 | 81 | [convolutional] 82 | batch_normalize=1 83 | filters=64 84 | size=3 85 | stride=2 86 | pad=1 87 | activation=mish 88 | 89 | # Split 90 | 91 | [convolutional] 92 | batch_normalize=1 93 | filters=32 94 | size=1 95 | stride=1 96 | pad=1 97 | activation=mish 98 | 99 | [route] 100 | layers = -2 101 | 102 | [convolutional] 103 | batch_normalize=1 104 | filters=32 105 | size=1 106 | stride=1 107 | pad=1 108 | activation=mish 109 | 110 | # Residual Block 111 | 112 | [convolutional] 113 | batch_normalize=1 114 | filters=32 115 | size=1 116 | stride=1 117 | pad=1 118 | activation=mish 119 | 120 | [convolutional] 121 | batch_normalize=1 122 | filters=32 123 | size=3 124 | stride=1 125 | pad=1 126 | activation=mish 127 | 128 | [shortcut] 129 | from=-3 130 | activation=linear 131 | 132 | # Transition first 133 | 134 | [convolutional] 135 | batch_normalize=1 136 | filters=32 137 | size=1 138 | stride=1 139 | pad=1 140 | activation=mish 141 | 142 | # Merge [-1, -(3k+4)] 143 | 144 | [route] 145 | layers = -1,-7 146 | 147 | # Transition last 148 | 149 | # 14 (previous+7+3k) 150 | [convolutional] 151 | batch_normalize=1 152 | filters=64 153 | size=1 154 | stride=1 155 | pad=1 156 | activation=mish 157 | 158 | # P3 159 | 160 | # Downsample 161 | 162 | [convolutional] 163 | batch_normalize=1 164 | filters=128 165 | size=3 166 | stride=2 167 | pad=1 168 | activation=mish 169 | 170 | # Split 171 | 172 | [convolutional] 173 | batch_normalize=1 174 | filters=64 175 | size=1 176 | stride=1 177 | pad=1 178 | activation=mish 179 | 180 | [route] 181 | layers = -2 182 | 183 | [convolutional] 184 | batch_normalize=1 185 | filters=64 186 | size=1 187 | stride=1 188 | pad=1 189 | activation=mish 190 | 191 | # Residual Block 192 | 193 | [convolutional] 194 | batch_normalize=1 195 | filters=64 196 | size=1 197 | stride=1 198 | pad=1 199 | activation=mish 200 | 201 | [convolutional] 202 | batch_normalize=1 203 | filters=64 204 | size=3 205 | stride=1 206 | pad=1 207 | activation=mish 208 | 209 | [shortcut] 210 | from=-3 211 | activation=linear 212 | 213 | # Transition first 214 | 215 | [convolutional] 216 | batch_normalize=1 217 | filters=64 218 | size=1 219 | stride=1 220 | pad=1 221 | activation=mish 222 | 223 | # Merge [-1 -(4+3k)] 224 | 225 | [route] 226 | layers = -1,-7 227 | 228 | # Transition last 229 | 230 | # 24 (previous+7+3k) 231 | [convolutional] 232 | batch_normalize=1 233 | filters=128 234 | size=1 235 | stride=1 236 | pad=1 237 | activation=mish 238 | 239 | # P4 240 | 241 | # Downsample 242 | 243 | [convolutional] 244 | batch_normalize=1 245 | filters=256 246 | size=3 247 | stride=2 248 | pad=1 249 | activation=mish 250 | 251 | # Split 252 | 253 | [convolutional] 254 | batch_normalize=1 255 | filters=128 256 | size=1 257 | stride=1 258 | pad=1 259 | activation=mish 260 | 261 | [route] 262 | layers = -2 263 | 264 | [convolutional] 265 | batch_normalize=1 266 | filters=128 267 | size=1 268 | stride=1 269 | pad=1 270 | activation=mish 271 | 272 | # Residual Block 273 | 274 | [convolutional] 275 | batch_normalize=1 276 | filters=128 277 | size=1 278 | stride=1 279 | pad=1 280 | activation=mish 281 | 282 | [convolutional] 283 | batch_normalize=1 284 | filters=128 285 | size=3 286 | stride=1 287 | pad=1 288 | activation=mish 289 | 290 | [shortcut] 291 | from=-3 292 | activation=linear 293 | 294 | # Transition first 295 | 296 | [convolutional] 297 | batch_normalize=1 298 | filters=128 299 | size=1 300 | stride=1 301 | pad=1 302 | activation=mish 303 | 304 | # Merge [-1 -(3k+4)] 305 | 306 | [route] 307 | layers = -1,-7 308 | 309 | # Transition last 310 | 311 | # 34 (previous+7+3k) 312 | [convolutional] 313 | batch_normalize=1 314 | filters=256 315 | size=1 316 | stride=1 317 | pad=1 318 | activation=mish 319 | 320 | # P5 321 | 322 | # Downsample 323 | 324 | [convolutional] 325 | batch_normalize=1 326 | filters=512 327 | size=3 328 | stride=2 329 | pad=1 330 | activation=mish 331 | 332 | # Split 333 | 334 | [convolutional] 335 | batch_normalize=1 336 | filters=256 337 | size=1 338 | stride=1 339 | pad=1 340 | activation=mish 341 | 342 | [route] 343 | layers = -2 344 | 345 | [convolutional] 346 | batch_normalize=1 347 | filters=256 348 | size=1 349 | stride=1 350 | pad=1 351 | activation=mish 352 | 353 | # Residual Block 354 | 355 | [convolutional] 356 | batch_normalize=1 357 | filters=256 358 | size=1 359 | stride=1 360 | pad=1 361 | activation=mish 362 | 363 | [convolutional] 364 | batch_normalize=1 365 | filters=256 366 | size=3 367 | stride=1 368 | pad=1 369 | activation=mish 370 | 371 | [shortcut] 372 | from=-3 373 | activation=linear 374 | 375 | # Transition first 376 | 377 | [convolutional] 378 | batch_normalize=1 379 | filters=256 380 | size=1 381 | stride=1 382 | pad=1 383 | activation=mish 384 | 385 | # Merge [-1 -(3k+4)] 386 | 387 | [route] 388 | layers = -1,-7 389 | 390 | # Transition last 391 | 392 | # 44 (previous+7+3k) 393 | [convolutional] 394 | batch_normalize=1 395 | filters=512 396 | size=1 397 | stride=1 398 | pad=1 399 | activation=mish 400 | 401 | # ============ End of Backbone ============ # 402 | 403 | # ============ Neck ============ # 404 | 405 | # CSPSPP 406 | 407 | [convolutional] 408 | batch_normalize=1 409 | filters=256 410 | size=1 411 | stride=1 412 | pad=1 413 | activation=mish 414 | 415 | [route] 416 | layers = -2 417 | 418 | [convolutional] 419 | batch_normalize=1 420 | filters=256 421 | size=1 422 | stride=1 423 | pad=1 424 | activation=mish 425 | 426 | ### SPP ### 427 | [maxpool] 428 | stride=1 429 | size=5 430 | 431 | [route] 432 | layers=-2 433 | 434 | [maxpool] 435 | stride=1 436 | size=9 437 | 438 | [route] 439 | layers=-4 440 | 441 | [maxpool] 442 | stride=1 443 | size=13 444 | 445 | [route] 446 | layers=-1,-3,-5,-6 447 | ### End SPP ### 448 | 449 | [convolutional] 450 | batch_normalize=1 451 | filters=256 452 | size=1 453 | stride=1 454 | pad=1 455 | activation=mish 456 | 457 | [convolutional] 458 | batch_normalize=1 459 | size=3 460 | stride=1 461 | pad=1 462 | filters=256 463 | activation=mish 464 | 465 | [route] 466 | layers = -1, -11 467 | 468 | # 57 (previous+6+5+2k) 469 | [convolutional] 470 | batch_normalize=1 471 | filters=256 472 | size=1 473 | stride=1 474 | pad=1 475 | activation=mish 476 | 477 | # End of CSPSPP 478 | 479 | 480 | # FPN-4 481 | 482 | [convolutional] 483 | batch_normalize=1 484 | filters=128 485 | size=1 486 | stride=1 487 | pad=1 488 | activation=mish 489 | 490 | [upsample] 491 | stride=2 492 | 493 | [route] 494 | layers = 34 495 | 496 | [convolutional] 497 | batch_normalize=1 498 | filters=128 499 | size=1 500 | stride=1 501 | pad=1 502 | activation=mish 503 | 504 | [route] 505 | layers = -1, -3 506 | 507 | [convolutional] 508 | batch_normalize=1 509 | filters=128 510 | size=1 511 | stride=1 512 | pad=1 513 | activation=mish 514 | 515 | # Split 516 | 517 | [convolutional] 518 | batch_normalize=1 519 | filters=128 520 | size=1 521 | stride=1 522 | pad=1 523 | activation=mish 524 | 525 | [route] 526 | layers = -2 527 | 528 | # Plain Block 529 | 530 | [convolutional] 531 | batch_normalize=1 532 | filters=128 533 | size=1 534 | stride=1 535 | pad=1 536 | activation=mish 537 | 538 | [convolutional] 539 | batch_normalize=1 540 | size=3 541 | stride=1 542 | pad=1 543 | filters=128 544 | activation=mish 545 | 546 | # Merge [-1, -(2k+2)] 547 | 548 | [route] 549 | layers = -1, -4 550 | 551 | # Transition last 552 | 553 | # 69 (previous+6+4+2k) 554 | [convolutional] 555 | batch_normalize=1 556 | filters=128 557 | size=1 558 | stride=1 559 | pad=1 560 | activation=mish 561 | 562 | 563 | # FPN-3 564 | 565 | [convolutional] 566 | batch_normalize=1 567 | filters=64 568 | size=1 569 | stride=1 570 | pad=1 571 | activation=mish 572 | 573 | [upsample] 574 | stride=2 575 | 576 | [route] 577 | layers = 24 578 | 579 | [convolutional] 580 | batch_normalize=1 581 | filters=64 582 | size=1 583 | stride=1 584 | pad=1 585 | activation=mish 586 | 587 | [route] 588 | layers = -1, -3 589 | 590 | [convolutional] 591 | batch_normalize=1 592 | filters=64 593 | size=1 594 | stride=1 595 | pad=1 596 | activation=mish 597 | 598 | # Split 599 | 600 | [convolutional] 601 | batch_normalize=1 602 | filters=64 603 | size=1 604 | stride=1 605 | pad=1 606 | activation=mish 607 | 608 | [route] 609 | layers = -2 610 | 611 | # Plain Block 612 | 613 | [convolutional] 614 | batch_normalize=1 615 | filters=64 616 | size=1 617 | stride=1 618 | pad=1 619 | activation=mish 620 | 621 | [convolutional] 622 | batch_normalize=1 623 | size=3 624 | stride=1 625 | pad=1 626 | filters=64 627 | activation=mish 628 | 629 | # Merge [-1, -(2k+2)] 630 | 631 | [route] 632 | layers = -1, -4 633 | 634 | # Transition last 635 | 636 | # 81 (previous+6+4+2k) 637 | [convolutional] 638 | batch_normalize=1 639 | filters=64 640 | size=1 641 | stride=1 642 | pad=1 643 | activation=mish 644 | 645 | 646 | # PAN-4 647 | 648 | [convolutional] 649 | batch_normalize=1 650 | size=3 651 | stride=2 652 | pad=1 653 | filters=128 654 | activation=mish 655 | 656 | [route] 657 | layers = -1, 69 658 | 659 | [convolutional] 660 | batch_normalize=1 661 | filters=128 662 | size=1 663 | stride=1 664 | pad=1 665 | activation=mish 666 | 667 | # Split 668 | 669 | [convolutional] 670 | batch_normalize=1 671 | filters=128 672 | size=1 673 | stride=1 674 | pad=1 675 | activation=mish 676 | 677 | [route] 678 | layers = -2 679 | 680 | # Plain Block 681 | 682 | [convolutional] 683 | batch_normalize=1 684 | filters=128 685 | size=1 686 | stride=1 687 | pad=1 688 | activation=mish 689 | 690 | [convolutional] 691 | batch_normalize=1 692 | size=3 693 | stride=1 694 | pad=1 695 | filters=128 696 | activation=mish 697 | 698 | [route] 699 | layers = -1,-4 700 | 701 | # Transition last 702 | 703 | # 90 (previous+3+4+2k) 704 | [convolutional] 705 | batch_normalize=1 706 | filters=128 707 | size=1 708 | stride=1 709 | pad=1 710 | activation=mish 711 | 712 | 713 | # PAN-5 714 | 715 | [convolutional] 716 | batch_normalize=1 717 | size=3 718 | stride=2 719 | pad=1 720 | filters=256 721 | activation=mish 722 | 723 | [route] 724 | layers = -1, 57 725 | 726 | [convolutional] 727 | batch_normalize=1 728 | filters=256 729 | size=1 730 | stride=1 731 | pad=1 732 | activation=mish 733 | 734 | # Split 735 | 736 | [convolutional] 737 | batch_normalize=1 738 | filters=256 739 | size=1 740 | stride=1 741 | pad=1 742 | activation=mish 743 | 744 | [route] 745 | layers = -2 746 | 747 | # Plain Block 748 | 749 | [convolutional] 750 | batch_normalize=1 751 | filters=256 752 | size=1 753 | stride=1 754 | pad=1 755 | activation=mish 756 | 757 | [convolutional] 758 | batch_normalize=1 759 | size=3 760 | stride=1 761 | pad=1 762 | filters=256 763 | activation=mish 764 | 765 | [route] 766 | layers = -1,-4 767 | 768 | # Transition last 769 | 770 | # 99 (previous+3+4+2k) 771 | [convolutional] 772 | batch_normalize=1 773 | filters=256 774 | size=1 775 | stride=1 776 | pad=1 777 | activation=mish 778 | 779 | # ============ End of Neck ============ # 780 | 781 | # ============ Head ============ # 782 | 783 | # YOLO-3 784 | 785 | [route] 786 | layers = 81 787 | 788 | [convolutional] 789 | batch_normalize=1 790 | size=3 791 | stride=1 792 | pad=1 793 | filters=128 794 | activation=mish 795 | 796 | [convolutional] 797 | size=1 798 | stride=1 799 | pad=1 800 | filters=255 801 | activation=linear 802 | 803 | [yolo] 804 | mask = 0,1,2 805 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 806 | classes=80 807 | num=9 808 | jitter=.3 809 | ignore_thresh = .7 810 | truth_thresh = 1 811 | random=1 812 | scale_x_y = 1.05 813 | iou_thresh=0.213 814 | cls_normalizer=1.0 815 | iou_normalizer=0.07 816 | iou_loss=ciou 817 | nms_kind=greedynms 818 | beta_nms=0.6 819 | 820 | 821 | # YOLO-4 822 | 823 | [route] 824 | layers = 90 825 | 826 | [convolutional] 827 | batch_normalize=1 828 | size=3 829 | stride=1 830 | pad=1 831 | filters=256 832 | activation=mish 833 | 834 | [convolutional] 835 | size=1 836 | stride=1 837 | pad=1 838 | filters=255 839 | activation=linear 840 | 841 | [yolo] 842 | mask = 3,4,5 843 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 844 | classes=80 845 | num=9 846 | jitter=.3 847 | ignore_thresh = .7 848 | truth_thresh = 1 849 | random=1 850 | scale_x_y = 1.05 851 | iou_thresh=0.213 852 | cls_normalizer=1.0 853 | iou_normalizer=0.07 854 | iou_loss=ciou 855 | nms_kind=greedynms 856 | beta_nms=0.6 857 | 858 | 859 | # YOLO-5 860 | 861 | [route] 862 | layers = 99 863 | 864 | [convolutional] 865 | batch_normalize=1 866 | size=3 867 | stride=1 868 | pad=1 869 | filters=512 870 | activation=mish 871 | 872 | [convolutional] 873 | size=1 874 | stride=1 875 | pad=1 876 | filters=255 877 | activation=linear 878 | 879 | [yolo] 880 | mask = 6,7,8 881 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 882 | classes=80 883 | num=9 884 | jitter=.3 885 | ignore_thresh = .7 886 | truth_thresh = 1 887 | random=1 888 | scale_x_y = 1.05 889 | iou_thresh=0.213 890 | cls_normalizer=1.0 891 | iou_normalizer=0.07 892 | iou_loss=ciou 893 | nms_kind=greedynms 894 | beta_nms=0.6 -------------------------------------------------------------------------------- /cfg/yolov4-pacsp-s-mish.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | #batch=1 4 | #subdivisions=1 5 | # Training 6 | batch=64 7 | subdivisions=8 8 | width=512 9 | height=512 10 | channels=3 11 | momentum=0.949 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.00261 19 | burn_in=1000 20 | max_batches = 500500 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | mosaic=1 26 | 27 | [convolutional] 28 | batch_normalize=1 29 | filters=32 30 | size=3 31 | stride=1 32 | pad=1 33 | activation=mish 34 | 35 | # Downsample 36 | 37 | [convolutional] 38 | batch_normalize=1 39 | filters=32 40 | size=3 41 | stride=2 42 | pad=1 43 | activation=mish 44 | 45 | [convolutional] 46 | batch_normalize=1 47 | filters=32 48 | size=1 49 | stride=1 50 | pad=1 51 | activation=mish 52 | 53 | [convolutional] 54 | batch_normalize=1 55 | filters=32 56 | size=3 57 | stride=1 58 | pad=1 59 | activation=mish 60 | 61 | [shortcut] 62 | from=-3 63 | activation=linear 64 | 65 | # Downsample 66 | 67 | [convolutional] 68 | batch_normalize=1 69 | filters=64 70 | size=3 71 | stride=2 72 | pad=1 73 | activation=mish 74 | 75 | [convolutional] 76 | batch_normalize=1 77 | filters=32 78 | size=1 79 | stride=1 80 | pad=1 81 | activation=mish 82 | 83 | [route] 84 | layers = -2 85 | 86 | [convolutional] 87 | batch_normalize=1 88 | filters=32 89 | size=1 90 | stride=1 91 | pad=1 92 | activation=mish 93 | 94 | [convolutional] 95 | batch_normalize=1 96 | filters=32 97 | size=1 98 | stride=1 99 | pad=1 100 | activation=mish 101 | 102 | [convolutional] 103 | batch_normalize=1 104 | filters=32 105 | size=3 106 | stride=1 107 | pad=1 108 | activation=mish 109 | 110 | [shortcut] 111 | from=-3 112 | activation=linear 113 | 114 | [convolutional] 115 | batch_normalize=1 116 | filters=32 117 | size=1 118 | stride=1 119 | pad=1 120 | activation=mish 121 | 122 | [route] 123 | layers = -1,-7 124 | 125 | [convolutional] 126 | batch_normalize=1 127 | filters=64 128 | size=1 129 | stride=1 130 | pad=1 131 | activation=mish 132 | 133 | # Downsample 134 | 135 | [convolutional] 136 | batch_normalize=1 137 | filters=128 138 | size=3 139 | stride=2 140 | pad=1 141 | activation=mish 142 | 143 | [convolutional] 144 | batch_normalize=1 145 | filters=64 146 | size=1 147 | stride=1 148 | pad=1 149 | activation=mish 150 | 151 | [route] 152 | layers = -2 153 | 154 | [convolutional] 155 | batch_normalize=1 156 | filters=64 157 | size=1 158 | stride=1 159 | pad=1 160 | activation=mish 161 | 162 | [convolutional] 163 | batch_normalize=1 164 | filters=64 165 | size=1 166 | stride=1 167 | pad=1 168 | activation=mish 169 | 170 | [convolutional] 171 | batch_normalize=1 172 | filters=64 173 | size=3 174 | stride=1 175 | pad=1 176 | activation=mish 177 | 178 | [shortcut] 179 | from=-3 180 | activation=linear 181 | 182 | [convolutional] 183 | batch_normalize=1 184 | filters=64 185 | size=1 186 | stride=1 187 | pad=1 188 | activation=mish 189 | 190 | [route] 191 | layers = -1,-7 192 | 193 | [convolutional] 194 | batch_normalize=1 195 | filters=128 196 | size=1 197 | stride=1 198 | pad=1 199 | activation=mish 200 | 201 | # Downsample 202 | 203 | [convolutional] 204 | batch_normalize=1 205 | filters=256 206 | size=3 207 | stride=2 208 | pad=1 209 | activation=mish 210 | 211 | [convolutional] 212 | batch_normalize=1 213 | filters=128 214 | size=1 215 | stride=1 216 | pad=1 217 | activation=mish 218 | 219 | [route] 220 | layers = -2 221 | 222 | [convolutional] 223 | batch_normalize=1 224 | filters=128 225 | size=1 226 | stride=1 227 | pad=1 228 | activation=mish 229 | 230 | [convolutional] 231 | batch_normalize=1 232 | filters=128 233 | size=1 234 | stride=1 235 | pad=1 236 | activation=mish 237 | 238 | [convolutional] 239 | batch_normalize=1 240 | filters=128 241 | size=3 242 | stride=1 243 | pad=1 244 | activation=mish 245 | 246 | [shortcut] 247 | from=-3 248 | activation=linear 249 | 250 | [convolutional] 251 | batch_normalize=1 252 | filters=128 253 | size=1 254 | stride=1 255 | pad=1 256 | activation=mish 257 | 258 | [route] 259 | layers = -1,-7 260 | 261 | [convolutional] 262 | batch_normalize=1 263 | filters=256 264 | size=1 265 | stride=1 266 | pad=1 267 | activation=mish 268 | 269 | # Downsample 270 | 271 | [convolutional] 272 | batch_normalize=1 273 | filters=512 274 | size=3 275 | stride=2 276 | pad=1 277 | activation=mish 278 | 279 | [convolutional] 280 | batch_normalize=1 281 | filters=256 282 | size=1 283 | stride=1 284 | pad=1 285 | activation=mish 286 | 287 | [route] 288 | layers = -2 289 | 290 | [convolutional] 291 | batch_normalize=1 292 | filters=256 293 | size=1 294 | stride=1 295 | pad=1 296 | activation=mish 297 | 298 | [convolutional] 299 | batch_normalize=1 300 | filters=256 301 | size=1 302 | stride=1 303 | pad=1 304 | activation=mish 305 | 306 | [convolutional] 307 | batch_normalize=1 308 | filters=256 309 | size=3 310 | stride=1 311 | pad=1 312 | activation=mish 313 | 314 | [shortcut] 315 | from=-3 316 | activation=linear 317 | 318 | [convolutional] 319 | batch_normalize=1 320 | filters=256 321 | size=1 322 | stride=1 323 | pad=1 324 | activation=mish 325 | 326 | [route] 327 | layers = -1,-7 328 | 329 | [convolutional] 330 | batch_normalize=1 331 | filters=512 332 | size=1 333 | stride=1 334 | pad=1 335 | activation=mish 336 | 337 | ########################## 338 | 339 | [convolutional] 340 | batch_normalize=1 341 | filters=256 342 | size=1 343 | stride=1 344 | pad=1 345 | activation=mish 346 | 347 | [route] 348 | layers = -2 349 | 350 | [convolutional] 351 | batch_normalize=1 352 | filters=256 353 | size=1 354 | stride=1 355 | pad=1 356 | activation=mish 357 | 358 | ### SPP ### 359 | [maxpool] 360 | stride=1 361 | size=5 362 | 363 | [route] 364 | layers=-2 365 | 366 | [maxpool] 367 | stride=1 368 | size=9 369 | 370 | [route] 371 | layers=-4 372 | 373 | [maxpool] 374 | stride=1 375 | size=13 376 | 377 | [route] 378 | layers=-1,-3,-5,-6 379 | ### End SPP ### 380 | 381 | [convolutional] 382 | batch_normalize=1 383 | filters=256 384 | size=1 385 | stride=1 386 | pad=1 387 | activation=mish 388 | 389 | [convolutional] 390 | batch_normalize=1 391 | size=3 392 | stride=1 393 | pad=1 394 | filters=256 395 | activation=mish 396 | 397 | [route] 398 | layers = -1, -11 399 | 400 | [convolutional] 401 | batch_normalize=1 402 | filters=256 403 | size=1 404 | stride=1 405 | pad=1 406 | activation=mish 407 | 408 | [convolutional] 409 | batch_normalize=1 410 | filters=128 411 | size=1 412 | stride=1 413 | pad=1 414 | activation=mish 415 | 416 | [upsample] 417 | stride=2 418 | 419 | [route] 420 | layers = 34 421 | 422 | [convolutional] 423 | batch_normalize=1 424 | filters=128 425 | size=1 426 | stride=1 427 | pad=1 428 | activation=mish 429 | 430 | [route] 431 | layers = -1, -3 432 | 433 | [convolutional] 434 | batch_normalize=1 435 | filters=128 436 | size=1 437 | stride=1 438 | pad=1 439 | activation=mish 440 | 441 | [convolutional] 442 | batch_normalize=1 443 | filters=128 444 | size=1 445 | stride=1 446 | pad=1 447 | activation=mish 448 | 449 | [route] 450 | layers = -2 451 | 452 | [convolutional] 453 | batch_normalize=1 454 | filters=128 455 | size=1 456 | stride=1 457 | pad=1 458 | activation=mish 459 | 460 | [convolutional] 461 | batch_normalize=1 462 | size=3 463 | stride=1 464 | pad=1 465 | filters=128 466 | activation=mish 467 | 468 | [route] 469 | layers = -1, -4 470 | 471 | [convolutional] 472 | batch_normalize=1 473 | filters=128 474 | size=1 475 | stride=1 476 | pad=1 477 | activation=mish 478 | 479 | [convolutional] 480 | batch_normalize=1 481 | filters=64 482 | size=1 483 | stride=1 484 | pad=1 485 | activation=mish 486 | 487 | [upsample] 488 | stride=2 489 | 490 | [route] 491 | layers = 24 492 | 493 | [convolutional] 494 | batch_normalize=1 495 | filters=64 496 | size=1 497 | stride=1 498 | pad=1 499 | activation=mish 500 | 501 | [route] 502 | layers = -1, -3 503 | 504 | [convolutional] 505 | batch_normalize=1 506 | filters=64 507 | size=1 508 | stride=1 509 | pad=1 510 | activation=mish 511 | 512 | [convolutional] 513 | batch_normalize=1 514 | filters=64 515 | size=1 516 | stride=1 517 | pad=1 518 | activation=mish 519 | 520 | [route] 521 | layers = -2 522 | 523 | [convolutional] 524 | batch_normalize=1 525 | filters=64 526 | size=1 527 | stride=1 528 | pad=1 529 | activation=mish 530 | 531 | [convolutional] 532 | batch_normalize=1 533 | size=3 534 | stride=1 535 | pad=1 536 | filters=64 537 | activation=mish 538 | 539 | [route] 540 | layers = -1, -4 541 | 542 | [convolutional] 543 | batch_normalize=1 544 | filters=64 545 | size=1 546 | stride=1 547 | pad=1 548 | activation=mish 549 | 550 | ########################## 551 | 552 | [convolutional] 553 | batch_normalize=1 554 | size=3 555 | stride=1 556 | pad=1 557 | filters=128 558 | activation=mish 559 | 560 | [convolutional] 561 | size=1 562 | stride=1 563 | pad=1 564 | filters=255 565 | activation=linear 566 | 567 | 568 | [yolo] 569 | mask = 0,1,2 570 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 571 | classes=80 572 | num=9 573 | jitter=.3 574 | ignore_thresh = .7 575 | truth_thresh = 1 576 | random=1 577 | scale_x_y = 1.05 578 | iou_thresh=0.213 579 | cls_normalizer=1.0 580 | iou_normalizer=0.07 581 | iou_loss=ciou 582 | nms_kind=greedynms 583 | beta_nms=0.6 584 | 585 | [route] 586 | layers = -4 587 | 588 | [convolutional] 589 | batch_normalize=1 590 | size=3 591 | stride=2 592 | pad=1 593 | filters=128 594 | activation=mish 595 | 596 | [route] 597 | layers = -1, -18 598 | 599 | [convolutional] 600 | batch_normalize=1 601 | filters=128 602 | size=1 603 | stride=1 604 | pad=1 605 | activation=mish 606 | 607 | [convolutional] 608 | batch_normalize=1 609 | filters=128 610 | size=1 611 | stride=1 612 | pad=1 613 | activation=mish 614 | 615 | [route] 616 | layers = -2 617 | 618 | [convolutional] 619 | batch_normalize=1 620 | filters=128 621 | size=1 622 | stride=1 623 | pad=1 624 | activation=mish 625 | 626 | [convolutional] 627 | batch_normalize=1 628 | size=3 629 | stride=1 630 | pad=1 631 | filters=128 632 | activation=mish 633 | 634 | [route] 635 | layers = -1,-4 636 | 637 | [convolutional] 638 | batch_normalize=1 639 | filters=128 640 | size=1 641 | stride=1 642 | pad=1 643 | activation=mish 644 | 645 | [convolutional] 646 | batch_normalize=1 647 | size=3 648 | stride=1 649 | pad=1 650 | filters=256 651 | activation=mish 652 | 653 | [convolutional] 654 | size=1 655 | stride=1 656 | pad=1 657 | filters=255 658 | activation=linear 659 | 660 | 661 | [yolo] 662 | mask = 3,4,5 663 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 664 | classes=80 665 | num=9 666 | jitter=.3 667 | ignore_thresh = .7 668 | truth_thresh = 1 669 | random=1 670 | scale_x_y = 1.05 671 | iou_thresh=0.213 672 | cls_normalizer=1.0 673 | iou_normalizer=0.07 674 | iou_loss=ciou 675 | nms_kind=greedynms 676 | beta_nms=0.6 677 | 678 | [route] 679 | layers = -4 680 | 681 | [convolutional] 682 | batch_normalize=1 683 | size=3 684 | stride=2 685 | pad=1 686 | filters=256 687 | activation=mish 688 | 689 | [route] 690 | layers = -1, -43 691 | 692 | [convolutional] 693 | batch_normalize=1 694 | filters=256 695 | size=1 696 | stride=1 697 | pad=1 698 | activation=mish 699 | 700 | [convolutional] 701 | batch_normalize=1 702 | filters=256 703 | size=1 704 | stride=1 705 | pad=1 706 | activation=mish 707 | 708 | [route] 709 | layers = -2 710 | 711 | [convolutional] 712 | batch_normalize=1 713 | filters=256 714 | size=1 715 | stride=1 716 | pad=1 717 | activation=mish 718 | 719 | [convolutional] 720 | batch_normalize=1 721 | size=3 722 | stride=1 723 | pad=1 724 | filters=256 725 | activation=mish 726 | 727 | [route] 728 | layers = -1,-4 729 | 730 | [convolutional] 731 | batch_normalize=1 732 | filters=256 733 | size=1 734 | stride=1 735 | pad=1 736 | activation=mish 737 | 738 | [convolutional] 739 | batch_normalize=1 740 | size=3 741 | stride=1 742 | pad=1 743 | filters=512 744 | activation=mish 745 | 746 | [convolutional] 747 | size=1 748 | stride=1 749 | pad=1 750 | filters=255 751 | activation=linear 752 | 753 | 754 | [yolo] 755 | mask = 6,7,8 756 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 757 | classes=80 758 | num=9 759 | jitter=.3 760 | ignore_thresh = .7 761 | truth_thresh = 1 762 | random=1 763 | scale_x_y = 1.05 764 | iou_thresh=0.213 765 | cls_normalizer=1.0 766 | iou_normalizer=0.07 767 | iou_loss=ciou 768 | nms_kind=greedynms 769 | beta_nms=0.6 770 | -------------------------------------------------------------------------------- /cfg/yolov4-pacsp-s.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | #batch=1 4 | #subdivisions=1 5 | # Training 6 | batch=64 7 | subdivisions=8 8 | width=512 9 | height=512 10 | channels=3 11 | momentum=0.949 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.00261 19 | burn_in=1000 20 | max_batches = 500500 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | mosaic=1 26 | 27 | [convolutional] 28 | batch_normalize=1 29 | filters=32 30 | size=3 31 | stride=1 32 | pad=1 33 | activation=leaky 34 | 35 | # Downsample 36 | 37 | [convolutional] 38 | batch_normalize=1 39 | filters=32 40 | size=3 41 | stride=2 42 | pad=1 43 | activation=leaky 44 | 45 | [convolutional] 46 | batch_normalize=1 47 | filters=32 48 | size=1 49 | stride=1 50 | pad=1 51 | activation=leaky 52 | 53 | [convolutional] 54 | batch_normalize=1 55 | filters=32 56 | size=3 57 | stride=1 58 | pad=1 59 | activation=leaky 60 | 61 | [shortcut] 62 | from=-3 63 | activation=linear 64 | 65 | # Downsample 66 | 67 | [convolutional] 68 | batch_normalize=1 69 | filters=64 70 | size=3 71 | stride=2 72 | pad=1 73 | activation=leaky 74 | 75 | [convolutional] 76 | batch_normalize=1 77 | filters=32 78 | size=1 79 | stride=1 80 | pad=1 81 | activation=leaky 82 | 83 | [route] 84 | layers = -2 85 | 86 | [convolutional] 87 | batch_normalize=1 88 | filters=32 89 | size=1 90 | stride=1 91 | pad=1 92 | activation=leaky 93 | 94 | [convolutional] 95 | batch_normalize=1 96 | filters=32 97 | size=1 98 | stride=1 99 | pad=1 100 | activation=leaky 101 | 102 | [convolutional] 103 | batch_normalize=1 104 | filters=32 105 | size=3 106 | stride=1 107 | pad=1 108 | activation=leaky 109 | 110 | [shortcut] 111 | from=-3 112 | activation=linear 113 | 114 | [convolutional] 115 | batch_normalize=1 116 | filters=32 117 | size=1 118 | stride=1 119 | pad=1 120 | activation=leaky 121 | 122 | [route] 123 | layers = -1,-7 124 | 125 | [convolutional] 126 | batch_normalize=1 127 | filters=64 128 | size=1 129 | stride=1 130 | pad=1 131 | activation=leaky 132 | 133 | # Downsample 134 | 135 | [convolutional] 136 | batch_normalize=1 137 | filters=128 138 | size=3 139 | stride=2 140 | pad=1 141 | activation=leaky 142 | 143 | [convolutional] 144 | batch_normalize=1 145 | filters=64 146 | size=1 147 | stride=1 148 | pad=1 149 | activation=leaky 150 | 151 | [route] 152 | layers = -2 153 | 154 | [convolutional] 155 | batch_normalize=1 156 | filters=64 157 | size=1 158 | stride=1 159 | pad=1 160 | activation=leaky 161 | 162 | [convolutional] 163 | batch_normalize=1 164 | filters=64 165 | size=1 166 | stride=1 167 | pad=1 168 | activation=leaky 169 | 170 | [convolutional] 171 | batch_normalize=1 172 | filters=64 173 | size=3 174 | stride=1 175 | pad=1 176 | activation=leaky 177 | 178 | [shortcut] 179 | from=-3 180 | activation=linear 181 | 182 | [convolutional] 183 | batch_normalize=1 184 | filters=64 185 | size=1 186 | stride=1 187 | pad=1 188 | activation=leaky 189 | 190 | [route] 191 | layers = -1,-7 192 | 193 | [convolutional] 194 | batch_normalize=1 195 | filters=128 196 | size=1 197 | stride=1 198 | pad=1 199 | activation=leaky 200 | 201 | # Downsample 202 | 203 | [convolutional] 204 | batch_normalize=1 205 | filters=256 206 | size=3 207 | stride=2 208 | pad=1 209 | activation=leaky 210 | 211 | [convolutional] 212 | batch_normalize=1 213 | filters=128 214 | size=1 215 | stride=1 216 | pad=1 217 | activation=leaky 218 | 219 | [route] 220 | layers = -2 221 | 222 | [convolutional] 223 | batch_normalize=1 224 | filters=128 225 | size=1 226 | stride=1 227 | pad=1 228 | activation=leaky 229 | 230 | [convolutional] 231 | batch_normalize=1 232 | filters=128 233 | size=1 234 | stride=1 235 | pad=1 236 | activation=leaky 237 | 238 | [convolutional] 239 | batch_normalize=1 240 | filters=128 241 | size=3 242 | stride=1 243 | pad=1 244 | activation=leaky 245 | 246 | [shortcut] 247 | from=-3 248 | activation=linear 249 | 250 | [convolutional] 251 | batch_normalize=1 252 | filters=128 253 | size=1 254 | stride=1 255 | pad=1 256 | activation=leaky 257 | 258 | [route] 259 | layers = -1,-7 260 | 261 | [convolutional] 262 | batch_normalize=1 263 | filters=256 264 | size=1 265 | stride=1 266 | pad=1 267 | activation=leaky 268 | 269 | # Downsample 270 | 271 | [convolutional] 272 | batch_normalize=1 273 | filters=512 274 | size=3 275 | stride=2 276 | pad=1 277 | activation=leaky 278 | 279 | [convolutional] 280 | batch_normalize=1 281 | filters=256 282 | size=1 283 | stride=1 284 | pad=1 285 | activation=leaky 286 | 287 | [route] 288 | layers = -2 289 | 290 | [convolutional] 291 | batch_normalize=1 292 | filters=256 293 | size=1 294 | stride=1 295 | pad=1 296 | activation=leaky 297 | 298 | [convolutional] 299 | batch_normalize=1 300 | filters=256 301 | size=1 302 | stride=1 303 | pad=1 304 | activation=leaky 305 | 306 | [convolutional] 307 | batch_normalize=1 308 | filters=256 309 | size=3 310 | stride=1 311 | pad=1 312 | activation=leaky 313 | 314 | [shortcut] 315 | from=-3 316 | activation=linear 317 | 318 | [convolutional] 319 | batch_normalize=1 320 | filters=256 321 | size=1 322 | stride=1 323 | pad=1 324 | activation=leaky 325 | 326 | [route] 327 | layers = -1,-7 328 | 329 | [convolutional] 330 | batch_normalize=1 331 | filters=512 332 | size=1 333 | stride=1 334 | pad=1 335 | activation=leaky 336 | 337 | ########################## 338 | 339 | [convolutional] 340 | batch_normalize=1 341 | filters=256 342 | size=1 343 | stride=1 344 | pad=1 345 | activation=leaky 346 | 347 | [route] 348 | layers = -2 349 | 350 | [convolutional] 351 | batch_normalize=1 352 | filters=256 353 | size=1 354 | stride=1 355 | pad=1 356 | activation=leaky 357 | 358 | ### SPP ### 359 | [maxpool] 360 | stride=1 361 | size=5 362 | 363 | [route] 364 | layers=-2 365 | 366 | [maxpool] 367 | stride=1 368 | size=9 369 | 370 | [route] 371 | layers=-4 372 | 373 | [maxpool] 374 | stride=1 375 | size=13 376 | 377 | [route] 378 | layers=-1,-3,-5,-6 379 | ### End SPP ### 380 | 381 | [convolutional] 382 | batch_normalize=1 383 | filters=256 384 | size=1 385 | stride=1 386 | pad=1 387 | activation=leaky 388 | 389 | [convolutional] 390 | batch_normalize=1 391 | size=3 392 | stride=1 393 | pad=1 394 | filters=256 395 | activation=leaky 396 | 397 | [route] 398 | layers = -1, -11 399 | 400 | [convolutional] 401 | batch_normalize=1 402 | filters=256 403 | size=1 404 | stride=1 405 | pad=1 406 | activation=leaky 407 | 408 | [convolutional] 409 | batch_normalize=1 410 | filters=128 411 | size=1 412 | stride=1 413 | pad=1 414 | activation=leaky 415 | 416 | [upsample] 417 | stride=2 418 | 419 | [route] 420 | layers = 34 421 | 422 | [convolutional] 423 | batch_normalize=1 424 | filters=128 425 | size=1 426 | stride=1 427 | pad=1 428 | activation=leaky 429 | 430 | [route] 431 | layers = -1, -3 432 | 433 | [convolutional] 434 | batch_normalize=1 435 | filters=128 436 | size=1 437 | stride=1 438 | pad=1 439 | activation=leaky 440 | 441 | [convolutional] 442 | batch_normalize=1 443 | filters=128 444 | size=1 445 | stride=1 446 | pad=1 447 | activation=leaky 448 | 449 | [route] 450 | layers = -2 451 | 452 | [convolutional] 453 | batch_normalize=1 454 | filters=128 455 | size=1 456 | stride=1 457 | pad=1 458 | activation=leaky 459 | 460 | [convolutional] 461 | batch_normalize=1 462 | size=3 463 | stride=1 464 | pad=1 465 | filters=128 466 | activation=leaky 467 | 468 | [route] 469 | layers = -1, -4 470 | 471 | [convolutional] 472 | batch_normalize=1 473 | filters=128 474 | size=1 475 | stride=1 476 | pad=1 477 | activation=leaky 478 | 479 | [convolutional] 480 | batch_normalize=1 481 | filters=64 482 | size=1 483 | stride=1 484 | pad=1 485 | activation=leaky 486 | 487 | [upsample] 488 | stride=2 489 | 490 | [route] 491 | layers = 24 492 | 493 | [convolutional] 494 | batch_normalize=1 495 | filters=64 496 | size=1 497 | stride=1 498 | pad=1 499 | activation=leaky 500 | 501 | [route] 502 | layers = -1, -3 503 | 504 | [convolutional] 505 | batch_normalize=1 506 | filters=64 507 | size=1 508 | stride=1 509 | pad=1 510 | activation=leaky 511 | 512 | [convolutional] 513 | batch_normalize=1 514 | filters=64 515 | size=1 516 | stride=1 517 | pad=1 518 | activation=leaky 519 | 520 | [route] 521 | layers = -2 522 | 523 | [convolutional] 524 | batch_normalize=1 525 | filters=64 526 | size=1 527 | stride=1 528 | pad=1 529 | activation=leaky 530 | 531 | [convolutional] 532 | batch_normalize=1 533 | size=3 534 | stride=1 535 | pad=1 536 | filters=64 537 | activation=leaky 538 | 539 | [route] 540 | layers = -1, -4 541 | 542 | [convolutional] 543 | batch_normalize=1 544 | filters=64 545 | size=1 546 | stride=1 547 | pad=1 548 | activation=leaky 549 | 550 | ########################## 551 | 552 | [convolutional] 553 | batch_normalize=1 554 | size=3 555 | stride=1 556 | pad=1 557 | filters=128 558 | activation=leaky 559 | 560 | [convolutional] 561 | size=1 562 | stride=1 563 | pad=1 564 | filters=255 565 | activation=linear 566 | 567 | 568 | [yolo] 569 | mask = 0,1,2 570 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 571 | classes=80 572 | num=9 573 | jitter=.3 574 | ignore_thresh = .7 575 | truth_thresh = 1 576 | random=1 577 | scale_x_y = 1.05 578 | iou_thresh=0.213 579 | cls_normalizer=1.0 580 | iou_normalizer=0.07 581 | iou_loss=ciou 582 | nms_kind=greedynms 583 | beta_nms=0.6 584 | 585 | [route] 586 | layers = -4 587 | 588 | [convolutional] 589 | batch_normalize=1 590 | size=3 591 | stride=2 592 | pad=1 593 | filters=128 594 | activation=leaky 595 | 596 | [route] 597 | layers = -1, -18 598 | 599 | [convolutional] 600 | batch_normalize=1 601 | filters=128 602 | size=1 603 | stride=1 604 | pad=1 605 | activation=leaky 606 | 607 | [convolutional] 608 | batch_normalize=1 609 | filters=128 610 | size=1 611 | stride=1 612 | pad=1 613 | activation=leaky 614 | 615 | [route] 616 | layers = -2 617 | 618 | [convolutional] 619 | batch_normalize=1 620 | filters=128 621 | size=1 622 | stride=1 623 | pad=1 624 | activation=leaky 625 | 626 | [convolutional] 627 | batch_normalize=1 628 | size=3 629 | stride=1 630 | pad=1 631 | filters=128 632 | activation=leaky 633 | 634 | [route] 635 | layers = -1,-4 636 | 637 | [convolutional] 638 | batch_normalize=1 639 | filters=128 640 | size=1 641 | stride=1 642 | pad=1 643 | activation=leaky 644 | 645 | [convolutional] 646 | batch_normalize=1 647 | size=3 648 | stride=1 649 | pad=1 650 | filters=256 651 | activation=leaky 652 | 653 | [convolutional] 654 | size=1 655 | stride=1 656 | pad=1 657 | filters=255 658 | activation=linear 659 | 660 | 661 | [yolo] 662 | mask = 3,4,5 663 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 664 | classes=80 665 | num=9 666 | jitter=.3 667 | ignore_thresh = .7 668 | truth_thresh = 1 669 | random=1 670 | scale_x_y = 1.05 671 | iou_thresh=0.213 672 | cls_normalizer=1.0 673 | iou_normalizer=0.07 674 | iou_loss=ciou 675 | nms_kind=greedynms 676 | beta_nms=0.6 677 | 678 | [route] 679 | layers = -4 680 | 681 | [convolutional] 682 | batch_normalize=1 683 | size=3 684 | stride=2 685 | pad=1 686 | filters=256 687 | activation=leaky 688 | 689 | [route] 690 | layers = -1, -43 691 | 692 | [convolutional] 693 | batch_normalize=1 694 | filters=256 695 | size=1 696 | stride=1 697 | pad=1 698 | activation=leaky 699 | 700 | [convolutional] 701 | batch_normalize=1 702 | filters=256 703 | size=1 704 | stride=1 705 | pad=1 706 | activation=leaky 707 | 708 | [route] 709 | layers = -2 710 | 711 | [convolutional] 712 | batch_normalize=1 713 | filters=256 714 | size=1 715 | stride=1 716 | pad=1 717 | activation=leaky 718 | 719 | [convolutional] 720 | batch_normalize=1 721 | size=3 722 | stride=1 723 | pad=1 724 | filters=256 725 | activation=leaky 726 | 727 | [route] 728 | layers = -1,-4 729 | 730 | [convolutional] 731 | batch_normalize=1 732 | filters=256 733 | size=1 734 | stride=1 735 | pad=1 736 | activation=leaky 737 | 738 | [convolutional] 739 | batch_normalize=1 740 | size=3 741 | stride=1 742 | pad=1 743 | filters=512 744 | activation=leaky 745 | 746 | [convolutional] 747 | size=1 748 | stride=1 749 | pad=1 750 | filters=255 751 | activation=linear 752 | 753 | 754 | [yolo] 755 | mask = 6,7,8 756 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 757 | classes=80 758 | num=9 759 | jitter=.3 760 | ignore_thresh = .7 761 | truth_thresh = 1 762 | random=1 763 | scale_x_y = 1.05 764 | iou_thresh=0.213 765 | cls_normalizer=1.0 766 | iou_normalizer=0.07 767 | iou_loss=ciou 768 | nms_kind=greedynms 769 | beta_nms=0.6 770 | -------------------------------------------------------------------------------- /cfg/yolov4-tiny.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | #batch=1 4 | #subdivisions=1 5 | # Training 6 | batch=64 7 | subdivisions=1 8 | width=416 9 | height=416 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.00261 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | [convolutional] 26 | batch_normalize=1 27 | filters=32 28 | size=3 29 | stride=2 30 | pad=1 31 | activation=leaky 32 | 33 | [convolutional] 34 | batch_normalize=1 35 | filters=64 36 | size=3 37 | stride=2 38 | pad=1 39 | activation=leaky 40 | 41 | [convolutional] 42 | batch_normalize=1 43 | filters=64 44 | size=3 45 | stride=1 46 | pad=1 47 | activation=leaky 48 | 49 | [route_lhalf] 50 | layers=-1 51 | 52 | [convolutional] 53 | batch_normalize=1 54 | filters=32 55 | size=3 56 | stride=1 57 | pad=1 58 | activation=leaky 59 | 60 | [convolutional] 61 | batch_normalize=1 62 | filters=32 63 | size=3 64 | stride=1 65 | pad=1 66 | activation=leaky 67 | 68 | [route] 69 | layers = -1,-2 70 | 71 | [convolutional] 72 | batch_normalize=1 73 | filters=64 74 | size=1 75 | stride=1 76 | pad=1 77 | activation=leaky 78 | 79 | [route] 80 | layers = -6,-1 81 | 82 | [maxpool] 83 | size=2 84 | stride=2 85 | 86 | [convolutional] 87 | batch_normalize=1 88 | filters=128 89 | size=3 90 | stride=1 91 | pad=1 92 | activation=leaky 93 | 94 | [route_lhalf] 95 | layers=-1 96 | 97 | [convolutional] 98 | batch_normalize=1 99 | filters=64 100 | size=3 101 | stride=1 102 | pad=1 103 | activation=leaky 104 | 105 | [convolutional] 106 | batch_normalize=1 107 | filters=64 108 | size=3 109 | stride=1 110 | pad=1 111 | activation=leaky 112 | 113 | [route] 114 | layers = -1,-2 115 | 116 | [convolutional] 117 | batch_normalize=1 118 | filters=128 119 | size=1 120 | stride=1 121 | pad=1 122 | activation=leaky 123 | 124 | [route] 125 | layers = -6,-1 126 | 127 | [maxpool] 128 | size=2 129 | stride=2 130 | 131 | [convolutional] 132 | batch_normalize=1 133 | filters=256 134 | size=3 135 | stride=1 136 | pad=1 137 | activation=leaky 138 | 139 | [route_lhalf] 140 | layers=-1 141 | 142 | [convolutional] 143 | batch_normalize=1 144 | filters=128 145 | size=3 146 | stride=1 147 | pad=1 148 | activation=leaky 149 | 150 | [convolutional] 151 | batch_normalize=1 152 | filters=128 153 | size=3 154 | stride=1 155 | pad=1 156 | activation=leaky 157 | 158 | [route] 159 | layers = -1,-2 160 | 161 | [convolutional] 162 | batch_normalize=1 163 | filters=256 164 | size=1 165 | stride=1 166 | pad=1 167 | activation=leaky 168 | 169 | [route] 170 | layers = -6,-1 171 | 172 | [maxpool] 173 | size=2 174 | stride=2 175 | 176 | [convolutional] 177 | batch_normalize=1 178 | filters=512 179 | size=3 180 | stride=1 181 | pad=1 182 | activation=leaky 183 | 184 | ################################## 185 | 186 | [convolutional] 187 | batch_normalize=1 188 | filters=256 189 | size=1 190 | stride=1 191 | pad=1 192 | activation=leaky 193 | 194 | [convolutional] 195 | batch_normalize=1 196 | filters=512 197 | size=3 198 | stride=1 199 | pad=1 200 | activation=leaky 201 | 202 | [convolutional] 203 | size=1 204 | stride=1 205 | pad=1 206 | filters=255 207 | activation=linear 208 | 209 | 210 | 211 | [yolo] 212 | mask = 3,4,5 213 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 214 | classes=80 215 | num=6 216 | jitter=.3 217 | scale_x_y = 1.05 218 | cls_normalizer=1.0 219 | iou_normalizer=0.07 220 | iou_loss=ciou 221 | ignore_thresh = .7 222 | truth_thresh = 1 223 | random=0 224 | nms_kind=greedynms 225 | beta_nms=0.6 226 | 227 | [route] 228 | layers = -4 229 | 230 | [convolutional] 231 | batch_normalize=1 232 | filters=128 233 | size=1 234 | stride=1 235 | pad=1 236 | activation=leaky 237 | 238 | [upsample] 239 | stride=2 240 | 241 | [route] 242 | layers = -1, 23 243 | 244 | [convolutional] 245 | batch_normalize=1 246 | filters=256 247 | size=3 248 | stride=1 249 | pad=1 250 | activation=leaky 251 | 252 | [convolutional] 253 | size=1 254 | stride=1 255 | pad=1 256 | filters=255 257 | activation=linear 258 | 259 | [yolo] 260 | mask = 1,2,3 261 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 262 | classes=80 263 | num=6 264 | jitter=.3 265 | scale_x_y = 1.05 266 | cls_normalizer=1.0 267 | iou_normalizer=0.07 268 | iou_loss=ciou 269 | ignore_thresh = .7 270 | truth_thresh = 1 271 | random=0 272 | nms_kind=greedynms 273 | beta_nms=0.6 274 | -------------------------------------------------------------------------------- /cfg/yolov4.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | batch=64 3 | subdivisions=8 4 | # Training 5 | #width=512 6 | #height=512 7 | width=608 8 | height=608 9 | channels=3 10 | momentum=0.949 11 | decay=0.0005 12 | angle=0 13 | saturation = 1.5 14 | exposure = 1.5 15 | hue=.1 16 | 17 | learning_rate=0.0013 18 | burn_in=1000 19 | max_batches = 500500 20 | policy=steps 21 | steps=400000,450000 22 | scales=.1,.1 23 | 24 | #cutmix=1 25 | mosaic=1 26 | 27 | #:104x104 54:52x52 85:26x26 104:13x13 for 416 28 | 29 | [convolutional] 30 | batch_normalize=1 31 | filters=32 32 | size=3 33 | stride=1 34 | pad=1 35 | activation=mish 36 | 37 | # Downsample 38 | 39 | [convolutional] 40 | batch_normalize=1 41 | filters=64 42 | size=3 43 | stride=2 44 | pad=1 45 | activation=mish 46 | 47 | [convolutional] 48 | batch_normalize=1 49 | filters=64 50 | size=1 51 | stride=1 52 | pad=1 53 | activation=mish 54 | 55 | [route] 56 | layers = -2 57 | 58 | [convolutional] 59 | batch_normalize=1 60 | filters=64 61 | size=1 62 | stride=1 63 | pad=1 64 | activation=mish 65 | 66 | [convolutional] 67 | batch_normalize=1 68 | filters=32 69 | size=1 70 | stride=1 71 | pad=1 72 | activation=mish 73 | 74 | [convolutional] 75 | batch_normalize=1 76 | filters=64 77 | size=3 78 | stride=1 79 | pad=1 80 | activation=mish 81 | 82 | [shortcut] 83 | from=-3 84 | activation=linear 85 | 86 | [convolutional] 87 | batch_normalize=1 88 | filters=64 89 | size=1 90 | stride=1 91 | pad=1 92 | activation=mish 93 | 94 | [route] 95 | layers = -1,-7 96 | 97 | [convolutional] 98 | batch_normalize=1 99 | filters=64 100 | size=1 101 | stride=1 102 | pad=1 103 | activation=mish 104 | 105 | # Downsample 106 | 107 | [convolutional] 108 | batch_normalize=1 109 | filters=128 110 | size=3 111 | stride=2 112 | pad=1 113 | activation=mish 114 | 115 | [convolutional] 116 | batch_normalize=1 117 | filters=64 118 | size=1 119 | stride=1 120 | pad=1 121 | activation=mish 122 | 123 | [route] 124 | layers = -2 125 | 126 | [convolutional] 127 | batch_normalize=1 128 | filters=64 129 | size=1 130 | stride=1 131 | pad=1 132 | activation=mish 133 | 134 | [convolutional] 135 | batch_normalize=1 136 | filters=64 137 | size=1 138 | stride=1 139 | pad=1 140 | activation=mish 141 | 142 | [convolutional] 143 | batch_normalize=1 144 | filters=64 145 | size=3 146 | stride=1 147 | pad=1 148 | activation=mish 149 | 150 | [shortcut] 151 | from=-3 152 | activation=linear 153 | 154 | [convolutional] 155 | batch_normalize=1 156 | filters=64 157 | size=1 158 | stride=1 159 | pad=1 160 | activation=mish 161 | 162 | [convolutional] 163 | batch_normalize=1 164 | filters=64 165 | size=3 166 | stride=1 167 | pad=1 168 | activation=mish 169 | 170 | [shortcut] 171 | from=-3 172 | activation=linear 173 | 174 | [convolutional] 175 | batch_normalize=1 176 | filters=64 177 | size=1 178 | stride=1 179 | pad=1 180 | activation=mish 181 | 182 | [route] 183 | layers = -1,-10 184 | 185 | [convolutional] 186 | batch_normalize=1 187 | filters=128 188 | size=1 189 | stride=1 190 | pad=1 191 | activation=mish 192 | 193 | # Downsample 194 | 195 | [convolutional] 196 | batch_normalize=1 197 | filters=256 198 | size=3 199 | stride=2 200 | pad=1 201 | activation=mish 202 | 203 | [convolutional] 204 | batch_normalize=1 205 | filters=128 206 | size=1 207 | stride=1 208 | pad=1 209 | activation=mish 210 | 211 | [route] 212 | layers = -2 213 | 214 | [convolutional] 215 | batch_normalize=1 216 | filters=128 217 | size=1 218 | stride=1 219 | pad=1 220 | activation=mish 221 | 222 | [convolutional] 223 | batch_normalize=1 224 | filters=128 225 | size=1 226 | stride=1 227 | pad=1 228 | activation=mish 229 | 230 | [convolutional] 231 | batch_normalize=1 232 | filters=128 233 | size=3 234 | stride=1 235 | pad=1 236 | activation=mish 237 | 238 | [shortcut] 239 | from=-3 240 | activation=linear 241 | 242 | [convolutional] 243 | batch_normalize=1 244 | filters=128 245 | size=1 246 | stride=1 247 | pad=1 248 | activation=mish 249 | 250 | [convolutional] 251 | batch_normalize=1 252 | filters=128 253 | size=3 254 | stride=1 255 | pad=1 256 | activation=mish 257 | 258 | [shortcut] 259 | from=-3 260 | activation=linear 261 | 262 | [convolutional] 263 | batch_normalize=1 264 | filters=128 265 | size=1 266 | stride=1 267 | pad=1 268 | activation=mish 269 | 270 | [convolutional] 271 | batch_normalize=1 272 | filters=128 273 | size=3 274 | stride=1 275 | pad=1 276 | activation=mish 277 | 278 | [shortcut] 279 | from=-3 280 | activation=linear 281 | 282 | [convolutional] 283 | batch_normalize=1 284 | filters=128 285 | size=1 286 | stride=1 287 | pad=1 288 | activation=mish 289 | 290 | [convolutional] 291 | batch_normalize=1 292 | filters=128 293 | size=3 294 | stride=1 295 | pad=1 296 | activation=mish 297 | 298 | [shortcut] 299 | from=-3 300 | activation=linear 301 | 302 | 303 | [convolutional] 304 | batch_normalize=1 305 | filters=128 306 | size=1 307 | stride=1 308 | pad=1 309 | activation=mish 310 | 311 | [convolutional] 312 | batch_normalize=1 313 | filters=128 314 | size=3 315 | stride=1 316 | pad=1 317 | activation=mish 318 | 319 | [shortcut] 320 | from=-3 321 | activation=linear 322 | 323 | [convolutional] 324 | batch_normalize=1 325 | filters=128 326 | size=1 327 | stride=1 328 | pad=1 329 | activation=mish 330 | 331 | [convolutional] 332 | batch_normalize=1 333 | filters=128 334 | size=3 335 | stride=1 336 | pad=1 337 | activation=mish 338 | 339 | [shortcut] 340 | from=-3 341 | activation=linear 342 | 343 | [convolutional] 344 | batch_normalize=1 345 | filters=128 346 | size=1 347 | stride=1 348 | pad=1 349 | activation=mish 350 | 351 | [convolutional] 352 | batch_normalize=1 353 | filters=128 354 | size=3 355 | stride=1 356 | pad=1 357 | activation=mish 358 | 359 | [shortcut] 360 | from=-3 361 | activation=linear 362 | 363 | [convolutional] 364 | batch_normalize=1 365 | filters=128 366 | size=1 367 | stride=1 368 | pad=1 369 | activation=mish 370 | 371 | [convolutional] 372 | batch_normalize=1 373 | filters=128 374 | size=3 375 | stride=1 376 | pad=1 377 | activation=mish 378 | 379 | [shortcut] 380 | from=-3 381 | activation=linear 382 | 383 | [convolutional] 384 | batch_normalize=1 385 | filters=128 386 | size=1 387 | stride=1 388 | pad=1 389 | activation=mish 390 | 391 | [route] 392 | layers = -1,-28 393 | 394 | [convolutional] 395 | batch_normalize=1 396 | filters=256 397 | size=1 398 | stride=1 399 | pad=1 400 | activation=mish 401 | 402 | # Downsample 403 | 404 | [convolutional] 405 | batch_normalize=1 406 | filters=512 407 | size=3 408 | stride=2 409 | pad=1 410 | activation=mish 411 | 412 | [convolutional] 413 | batch_normalize=1 414 | filters=256 415 | size=1 416 | stride=1 417 | pad=1 418 | activation=mish 419 | 420 | [route] 421 | layers = -2 422 | 423 | [convolutional] 424 | batch_normalize=1 425 | filters=256 426 | size=1 427 | stride=1 428 | pad=1 429 | activation=mish 430 | 431 | [convolutional] 432 | batch_normalize=1 433 | filters=256 434 | size=1 435 | stride=1 436 | pad=1 437 | activation=mish 438 | 439 | [convolutional] 440 | batch_normalize=1 441 | filters=256 442 | size=3 443 | stride=1 444 | pad=1 445 | activation=mish 446 | 447 | [shortcut] 448 | from=-3 449 | activation=linear 450 | 451 | 452 | [convolutional] 453 | batch_normalize=1 454 | filters=256 455 | size=1 456 | stride=1 457 | pad=1 458 | activation=mish 459 | 460 | [convolutional] 461 | batch_normalize=1 462 | filters=256 463 | size=3 464 | stride=1 465 | pad=1 466 | activation=mish 467 | 468 | [shortcut] 469 | from=-3 470 | activation=linear 471 | 472 | 473 | [convolutional] 474 | batch_normalize=1 475 | filters=256 476 | size=1 477 | stride=1 478 | pad=1 479 | activation=mish 480 | 481 | [convolutional] 482 | batch_normalize=1 483 | filters=256 484 | size=3 485 | stride=1 486 | pad=1 487 | activation=mish 488 | 489 | [shortcut] 490 | from=-3 491 | activation=linear 492 | 493 | 494 | [convolutional] 495 | batch_normalize=1 496 | filters=256 497 | size=1 498 | stride=1 499 | pad=1 500 | activation=mish 501 | 502 | [convolutional] 503 | batch_normalize=1 504 | filters=256 505 | size=3 506 | stride=1 507 | pad=1 508 | activation=mish 509 | 510 | [shortcut] 511 | from=-3 512 | activation=linear 513 | 514 | 515 | [convolutional] 516 | batch_normalize=1 517 | filters=256 518 | size=1 519 | stride=1 520 | pad=1 521 | activation=mish 522 | 523 | [convolutional] 524 | batch_normalize=1 525 | filters=256 526 | size=3 527 | stride=1 528 | pad=1 529 | activation=mish 530 | 531 | [shortcut] 532 | from=-3 533 | activation=linear 534 | 535 | 536 | [convolutional] 537 | batch_normalize=1 538 | filters=256 539 | size=1 540 | stride=1 541 | pad=1 542 | activation=mish 543 | 544 | [convolutional] 545 | batch_normalize=1 546 | filters=256 547 | size=3 548 | stride=1 549 | pad=1 550 | activation=mish 551 | 552 | [shortcut] 553 | from=-3 554 | activation=linear 555 | 556 | 557 | [convolutional] 558 | batch_normalize=1 559 | filters=256 560 | size=1 561 | stride=1 562 | pad=1 563 | activation=mish 564 | 565 | [convolutional] 566 | batch_normalize=1 567 | filters=256 568 | size=3 569 | stride=1 570 | pad=1 571 | activation=mish 572 | 573 | [shortcut] 574 | from=-3 575 | activation=linear 576 | 577 | [convolutional] 578 | batch_normalize=1 579 | filters=256 580 | size=1 581 | stride=1 582 | pad=1 583 | activation=mish 584 | 585 | [convolutional] 586 | batch_normalize=1 587 | filters=256 588 | size=3 589 | stride=1 590 | pad=1 591 | activation=mish 592 | 593 | [shortcut] 594 | from=-3 595 | activation=linear 596 | 597 | [convolutional] 598 | batch_normalize=1 599 | filters=256 600 | size=1 601 | stride=1 602 | pad=1 603 | activation=mish 604 | 605 | [route] 606 | layers = -1,-28 607 | 608 | [convolutional] 609 | batch_normalize=1 610 | filters=512 611 | size=1 612 | stride=1 613 | pad=1 614 | activation=mish 615 | 616 | # Downsample 617 | 618 | [convolutional] 619 | batch_normalize=1 620 | filters=1024 621 | size=3 622 | stride=2 623 | pad=1 624 | activation=mish 625 | 626 | [convolutional] 627 | batch_normalize=1 628 | filters=512 629 | size=1 630 | stride=1 631 | pad=1 632 | activation=mish 633 | 634 | [route] 635 | layers = -2 636 | 637 | [convolutional] 638 | batch_normalize=1 639 | filters=512 640 | size=1 641 | stride=1 642 | pad=1 643 | activation=mish 644 | 645 | [convolutional] 646 | batch_normalize=1 647 | filters=512 648 | size=1 649 | stride=1 650 | pad=1 651 | activation=mish 652 | 653 | [convolutional] 654 | batch_normalize=1 655 | filters=512 656 | size=3 657 | stride=1 658 | pad=1 659 | activation=mish 660 | 661 | [shortcut] 662 | from=-3 663 | activation=linear 664 | 665 | [convolutional] 666 | batch_normalize=1 667 | filters=512 668 | size=1 669 | stride=1 670 | pad=1 671 | activation=mish 672 | 673 | [convolutional] 674 | batch_normalize=1 675 | filters=512 676 | size=3 677 | stride=1 678 | pad=1 679 | activation=mish 680 | 681 | [shortcut] 682 | from=-3 683 | activation=linear 684 | 685 | [convolutional] 686 | batch_normalize=1 687 | filters=512 688 | size=1 689 | stride=1 690 | pad=1 691 | activation=mish 692 | 693 | [convolutional] 694 | batch_normalize=1 695 | filters=512 696 | size=3 697 | stride=1 698 | pad=1 699 | activation=mish 700 | 701 | [shortcut] 702 | from=-3 703 | activation=linear 704 | 705 | [convolutional] 706 | batch_normalize=1 707 | filters=512 708 | size=1 709 | stride=1 710 | pad=1 711 | activation=mish 712 | 713 | [convolutional] 714 | batch_normalize=1 715 | filters=512 716 | size=3 717 | stride=1 718 | pad=1 719 | activation=mish 720 | 721 | [shortcut] 722 | from=-3 723 | activation=linear 724 | 725 | [convolutional] 726 | batch_normalize=1 727 | filters=512 728 | size=1 729 | stride=1 730 | pad=1 731 | activation=mish 732 | 733 | [route] 734 | layers = -1,-16 735 | 736 | [convolutional] 737 | batch_normalize=1 738 | filters=1024 739 | size=1 740 | stride=1 741 | pad=1 742 | activation=mish 743 | 744 | ########################## 745 | 746 | [convolutional] 747 | batch_normalize=1 748 | filters=512 749 | size=1 750 | stride=1 751 | pad=1 752 | activation=leaky 753 | 754 | [convolutional] 755 | batch_normalize=1 756 | size=3 757 | stride=1 758 | pad=1 759 | filters=1024 760 | activation=leaky 761 | 762 | [convolutional] 763 | batch_normalize=1 764 | filters=512 765 | size=1 766 | stride=1 767 | pad=1 768 | activation=leaky 769 | 770 | ### SPP ### 771 | [maxpool] 772 | stride=1 773 | size=5 774 | 775 | [route] 776 | layers=-2 777 | 778 | [maxpool] 779 | stride=1 780 | size=9 781 | 782 | [route] 783 | layers=-4 784 | 785 | [maxpool] 786 | stride=1 787 | size=13 788 | 789 | [route] 790 | layers=-1,-3,-5,-6 791 | ### End SPP ### 792 | 793 | [convolutional] 794 | batch_normalize=1 795 | filters=512 796 | size=1 797 | stride=1 798 | pad=1 799 | activation=leaky 800 | 801 | [convolutional] 802 | batch_normalize=1 803 | size=3 804 | stride=1 805 | pad=1 806 | filters=1024 807 | activation=leaky 808 | 809 | [convolutional] 810 | batch_normalize=1 811 | filters=512 812 | size=1 813 | stride=1 814 | pad=1 815 | activation=leaky 816 | 817 | [convolutional] 818 | batch_normalize=1 819 | filters=256 820 | size=1 821 | stride=1 822 | pad=1 823 | activation=leaky 824 | 825 | [upsample] 826 | stride=2 827 | 828 | [route] 829 | layers = 85 830 | 831 | [convolutional] 832 | batch_normalize=1 833 | filters=256 834 | size=1 835 | stride=1 836 | pad=1 837 | activation=leaky 838 | 839 | [route] 840 | layers = -1, -3 841 | 842 | [convolutional] 843 | batch_normalize=1 844 | filters=256 845 | size=1 846 | stride=1 847 | pad=1 848 | activation=leaky 849 | 850 | [convolutional] 851 | batch_normalize=1 852 | size=3 853 | stride=1 854 | pad=1 855 | filters=512 856 | activation=leaky 857 | 858 | [convolutional] 859 | batch_normalize=1 860 | filters=256 861 | size=1 862 | stride=1 863 | pad=1 864 | activation=leaky 865 | 866 | [convolutional] 867 | batch_normalize=1 868 | size=3 869 | stride=1 870 | pad=1 871 | filters=512 872 | activation=leaky 873 | 874 | [convolutional] 875 | batch_normalize=1 876 | filters=256 877 | size=1 878 | stride=1 879 | pad=1 880 | activation=leaky 881 | 882 | [convolutional] 883 | batch_normalize=1 884 | filters=128 885 | size=1 886 | stride=1 887 | pad=1 888 | activation=leaky 889 | 890 | [upsample] 891 | stride=2 892 | 893 | [route] 894 | layers = 54 895 | 896 | [convolutional] 897 | batch_normalize=1 898 | filters=128 899 | size=1 900 | stride=1 901 | pad=1 902 | activation=leaky 903 | 904 | [route] 905 | layers = -1, -3 906 | 907 | [convolutional] 908 | batch_normalize=1 909 | filters=128 910 | size=1 911 | stride=1 912 | pad=1 913 | activation=leaky 914 | 915 | [convolutional] 916 | batch_normalize=1 917 | size=3 918 | stride=1 919 | pad=1 920 | filters=256 921 | activation=leaky 922 | 923 | [convolutional] 924 | batch_normalize=1 925 | filters=128 926 | size=1 927 | stride=1 928 | pad=1 929 | activation=leaky 930 | 931 | [convolutional] 932 | batch_normalize=1 933 | size=3 934 | stride=1 935 | pad=1 936 | filters=256 937 | activation=leaky 938 | 939 | [convolutional] 940 | batch_normalize=1 941 | filters=128 942 | size=1 943 | stride=1 944 | pad=1 945 | activation=leaky 946 | 947 | ########################## 948 | 949 | [convolutional] 950 | batch_normalize=1 951 | size=3 952 | stride=1 953 | pad=1 954 | filters=256 955 | activation=leaky 956 | 957 | [convolutional] 958 | size=1 959 | stride=1 960 | pad=1 961 | filters=255 962 | activation=linear 963 | 964 | 965 | [yolo] 966 | mask = 0,1,2 967 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 968 | classes=80 969 | num=9 970 | jitter=.3 971 | ignore_thresh = .7 972 | truth_thresh = 1 973 | scale_x_y = 1.2 974 | iou_thresh=0.213 975 | cls_normalizer=1.0 976 | iou_normalizer=0.07 977 | iou_loss=ciou 978 | nms_kind=greedynms 979 | beta_nms=0.6 980 | 981 | 982 | [route] 983 | layers = -4 984 | 985 | [convolutional] 986 | batch_normalize=1 987 | size=3 988 | stride=2 989 | pad=1 990 | filters=256 991 | activation=leaky 992 | 993 | [route] 994 | layers = -1, -16 995 | 996 | [convolutional] 997 | batch_normalize=1 998 | filters=256 999 | size=1 1000 | stride=1 1001 | pad=1 1002 | activation=leaky 1003 | 1004 | [convolutional] 1005 | batch_normalize=1 1006 | size=3 1007 | stride=1 1008 | pad=1 1009 | filters=512 1010 | activation=leaky 1011 | 1012 | [convolutional] 1013 | batch_normalize=1 1014 | filters=256 1015 | size=1 1016 | stride=1 1017 | pad=1 1018 | activation=leaky 1019 | 1020 | [convolutional] 1021 | batch_normalize=1 1022 | size=3 1023 | stride=1 1024 | pad=1 1025 | filters=512 1026 | activation=leaky 1027 | 1028 | [convolutional] 1029 | batch_normalize=1 1030 | filters=256 1031 | size=1 1032 | stride=1 1033 | pad=1 1034 | activation=leaky 1035 | 1036 | [convolutional] 1037 | batch_normalize=1 1038 | size=3 1039 | stride=1 1040 | pad=1 1041 | filters=512 1042 | activation=leaky 1043 | 1044 | [convolutional] 1045 | size=1 1046 | stride=1 1047 | pad=1 1048 | filters=255 1049 | activation=linear 1050 | 1051 | 1052 | [yolo] 1053 | mask = 3,4,5 1054 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 1055 | classes=80 1056 | num=9 1057 | jitter=.3 1058 | ignore_thresh = .7 1059 | truth_thresh = 1 1060 | scale_x_y = 1.1 1061 | iou_thresh=0.213 1062 | cls_normalizer=1.0 1063 | iou_normalizer=0.07 1064 | iou_loss=ciou 1065 | nms_kind=greedynms 1066 | beta_nms=0.6 1067 | 1068 | 1069 | [route] 1070 | layers = -4 1071 | 1072 | [convolutional] 1073 | batch_normalize=1 1074 | size=3 1075 | stride=2 1076 | pad=1 1077 | filters=512 1078 | activation=leaky 1079 | 1080 | [route] 1081 | layers = -1, -37 1082 | 1083 | [convolutional] 1084 | batch_normalize=1 1085 | filters=512 1086 | size=1 1087 | stride=1 1088 | pad=1 1089 | activation=leaky 1090 | 1091 | [convolutional] 1092 | batch_normalize=1 1093 | size=3 1094 | stride=1 1095 | pad=1 1096 | filters=1024 1097 | activation=leaky 1098 | 1099 | [convolutional] 1100 | batch_normalize=1 1101 | filters=512 1102 | size=1 1103 | stride=1 1104 | pad=1 1105 | activation=leaky 1106 | 1107 | [convolutional] 1108 | batch_normalize=1 1109 | size=3 1110 | stride=1 1111 | pad=1 1112 | filters=1024 1113 | activation=leaky 1114 | 1115 | [convolutional] 1116 | batch_normalize=1 1117 | filters=512 1118 | size=1 1119 | stride=1 1120 | pad=1 1121 | activation=leaky 1122 | 1123 | [convolutional] 1124 | batch_normalize=1 1125 | size=3 1126 | stride=1 1127 | pad=1 1128 | filters=1024 1129 | activation=leaky 1130 | 1131 | [convolutional] 1132 | size=1 1133 | stride=1 1134 | pad=1 1135 | filters=255 1136 | activation=linear 1137 | 1138 | 1139 | [yolo] 1140 | mask = 6,7,8 1141 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 1142 | classes=80 1143 | num=9 1144 | jitter=.3 1145 | ignore_thresh = .7 1146 | truth_thresh = 1 1147 | random=1 1148 | scale_x_y = 1.05 1149 | iou_thresh=0.213 1150 | cls_normalizer=1.0 1151 | iou_normalizer=0.07 1152 | iou_loss=ciou 1153 | nms_kind=greedynms 1154 | beta_nms=0.6 1155 | -------------------------------------------------------------------------------- /data/coco.data: -------------------------------------------------------------------------------- 1 | classes=80 2 | train=../coco/train2017.txt 3 | valid=../coco/testdev2017.txt 4 | names=data/coco.names 5 | -------------------------------------------------------------------------------- /data/coco.names: -------------------------------------------------------------------------------- 1 | person 2 | bicycle 3 | car 4 | motorcycle 5 | airplane 6 | bus 7 | train 8 | truck 9 | boat 10 | traffic light 11 | fire hydrant 12 | stop sign 13 | parking meter 14 | bench 15 | bird 16 | cat 17 | dog 18 | horse 19 | sheep 20 | cow 21 | elephant 22 | bear 23 | zebra 24 | giraffe 25 | backpack 26 | umbrella 27 | handbag 28 | tie 29 | suitcase 30 | frisbee 31 | skis 32 | snowboard 33 | sports ball 34 | kite 35 | baseball bat 36 | baseball glove 37 | skateboard 38 | surfboard 39 | tennis racket 40 | bottle 41 | wine glass 42 | cup 43 | fork 44 | knife 45 | spoon 46 | bowl 47 | banana 48 | apple 49 | sandwich 50 | orange 51 | broccoli 52 | carrot 53 | hot dog 54 | pizza 55 | donut 56 | cake 57 | chair 58 | couch 59 | potted plant 60 | bed 61 | dining table 62 | toilet 63 | tv 64 | laptop 65 | mouse 66 | remote 67 | keyboard 68 | cell phone 69 | microwave 70 | oven 71 | toaster 72 | sink 73 | refrigerator 74 | book 75 | clock 76 | vase 77 | scissors 78 | teddy bear 79 | hair drier 80 | toothbrush 81 | -------------------------------------------------------------------------------- /data/coco.yaml: -------------------------------------------------------------------------------- 1 | # train and val datasets (image directory or *.txt file with image paths) 2 | train: ../coco/train2017.txt # 118k images 3 | val: ../coco/val2017.txt # 5k images 4 | test: ../coco/testdev2017.txt # 20k images for submission to https://competitions.codalab.org/competitions/20794 5 | 6 | # number of classes 7 | nc: 80 8 | 9 | # class names 10 | names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 11 | 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 12 | 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 13 | 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 14 | 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 15 | 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 16 | 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 17 | 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 18 | 'hair drier', 'toothbrush'] 19 | -------------------------------------------------------------------------------- /data/coco1.data: -------------------------------------------------------------------------------- 1 | classes=80 2 | train=data/coco1.txt 3 | valid=data/coco1.txt 4 | names=data/coco.names 5 | -------------------------------------------------------------------------------- /data/coco1.txt: -------------------------------------------------------------------------------- 1 | ../coco/images/train2017/000000109622.jpg 2 | -------------------------------------------------------------------------------- /data/coco16.data: -------------------------------------------------------------------------------- 1 | classes=80 2 | train=data/coco16.txt 3 | valid=data/coco16.txt 4 | names=data/coco.names 5 | -------------------------------------------------------------------------------- /data/coco16.txt: -------------------------------------------------------------------------------- 1 | ../coco/images/train2017/000000109622.jpg 2 | ../coco/images/train2017/000000160694.jpg 3 | ../coco/images/train2017/000000308590.jpg 4 | ../coco/images/train2017/000000327573.jpg 5 | ../coco/images/train2017/000000062929.jpg 6 | ../coco/images/train2017/000000512793.jpg 7 | ../coco/images/train2017/000000371735.jpg 8 | ../coco/images/train2017/000000148118.jpg 9 | ../coco/images/train2017/000000309856.jpg 10 | ../coco/images/train2017/000000141882.jpg 11 | ../coco/images/train2017/000000318783.jpg 12 | ../coco/images/train2017/000000337760.jpg 13 | ../coco/images/train2017/000000298197.jpg 14 | ../coco/images/train2017/000000042421.jpg 15 | ../coco/images/train2017/000000328898.jpg 16 | ../coco/images/train2017/000000458856.jpg 17 | -------------------------------------------------------------------------------- /data/coco1cls.data: -------------------------------------------------------------------------------- 1 | classes=1 2 | train=data/coco1cls.txt 3 | valid=data/coco1cls.txt 4 | names=data/coco.names 5 | -------------------------------------------------------------------------------- /data/coco1cls.txt: -------------------------------------------------------------------------------- 1 | ../coco/images/train2017/000000000901.jpg 2 | ../coco/images/train2017/000000001464.jpg 3 | ../coco/images/train2017/000000003220.jpg 4 | ../coco/images/train2017/000000003365.jpg 5 | ../coco/images/train2017/000000004772.jpg 6 | ../coco/images/train2017/000000009987.jpg 7 | ../coco/images/train2017/000000010498.jpg 8 | ../coco/images/train2017/000000012455.jpg 9 | ../coco/images/train2017/000000013992.jpg 10 | ../coco/images/train2017/000000014125.jpg 11 | ../coco/images/train2017/000000016314.jpg 12 | ../coco/images/train2017/000000016670.jpg 13 | ../coco/images/train2017/000000018412.jpg 14 | ../coco/images/train2017/000000021212.jpg 15 | ../coco/images/train2017/000000021826.jpg 16 | ../coco/images/train2017/000000030566.jpg 17 | -------------------------------------------------------------------------------- /data/coco2014.data: -------------------------------------------------------------------------------- 1 | classes=80 2 | train=../coco/trainvalno5k.txt 3 | valid=../coco/5k.txt 4 | names=data/coco.names 5 | -------------------------------------------------------------------------------- /data/coco2017.data: -------------------------------------------------------------------------------- 1 | classes=80 2 | train=../coco/train2017.txt 3 | valid=../coco/val2017.txt 4 | names=data/coco.names 5 | -------------------------------------------------------------------------------- /data/coco64.data: -------------------------------------------------------------------------------- 1 | classes=80 2 | train=data/coco64.txt 3 | valid=data/coco64.txt 4 | names=data/coco.names 5 | -------------------------------------------------------------------------------- /data/coco64.txt: -------------------------------------------------------------------------------- 1 | ../coco/images/train2017/000000109622.jpg 2 | ../coco/images/train2017/000000160694.jpg 3 | ../coco/images/train2017/000000308590.jpg 4 | ../coco/images/train2017/000000327573.jpg 5 | ../coco/images/train2017/000000062929.jpg 6 | ../coco/images/train2017/000000512793.jpg 7 | ../coco/images/train2017/000000371735.jpg 8 | ../coco/images/train2017/000000148118.jpg 9 | ../coco/images/train2017/000000309856.jpg 10 | ../coco/images/train2017/000000141882.jpg 11 | ../coco/images/train2017/000000318783.jpg 12 | ../coco/images/train2017/000000337760.jpg 13 | ../coco/images/train2017/000000298197.jpg 14 | ../coco/images/train2017/000000042421.jpg 15 | ../coco/images/train2017/000000328898.jpg 16 | ../coco/images/train2017/000000458856.jpg 17 | ../coco/images/train2017/000000073824.jpg 18 | ../coco/images/train2017/000000252846.jpg 19 | ../coco/images/train2017/000000459590.jpg 20 | ../coco/images/train2017/000000273650.jpg 21 | ../coco/images/train2017/000000331311.jpg 22 | ../coco/images/train2017/000000156326.jpg 23 | ../coco/images/train2017/000000262985.jpg 24 | ../coco/images/train2017/000000253580.jpg 25 | ../coco/images/train2017/000000447976.jpg 26 | ../coco/images/train2017/000000378077.jpg 27 | ../coco/images/train2017/000000259913.jpg 28 | ../coco/images/train2017/000000424553.jpg 29 | ../coco/images/train2017/000000000612.jpg 30 | ../coco/images/train2017/000000267625.jpg 31 | ../coco/images/train2017/000000566012.jpg 32 | ../coco/images/train2017/000000196664.jpg 33 | ../coco/images/train2017/000000363331.jpg 34 | ../coco/images/train2017/000000057992.jpg 35 | ../coco/images/train2017/000000520047.jpg 36 | ../coco/images/train2017/000000453903.jpg 37 | ../coco/images/train2017/000000162083.jpg 38 | ../coco/images/train2017/000000268516.jpg 39 | ../coco/images/train2017/000000277436.jpg 40 | ../coco/images/train2017/000000189744.jpg 41 | ../coco/images/train2017/000000041128.jpg 42 | ../coco/images/train2017/000000527728.jpg 43 | ../coco/images/train2017/000000465269.jpg 44 | ../coco/images/train2017/000000246833.jpg 45 | ../coco/images/train2017/000000076784.jpg 46 | ../coco/images/train2017/000000323715.jpg 47 | ../coco/images/train2017/000000560463.jpg 48 | ../coco/images/train2017/000000006263.jpg 49 | ../coco/images/train2017/000000094701.jpg 50 | ../coco/images/train2017/000000521359.jpg 51 | ../coco/images/train2017/000000302903.jpg 52 | ../coco/images/train2017/000000047559.jpg 53 | ../coco/images/train2017/000000480583.jpg 54 | ../coco/images/train2017/000000050025.jpg 55 | ../coco/images/train2017/000000084512.jpg 56 | ../coco/images/train2017/000000508913.jpg 57 | ../coco/images/train2017/000000093708.jpg 58 | ../coco/images/train2017/000000070493.jpg 59 | ../coco/images/train2017/000000539270.jpg 60 | ../coco/images/train2017/000000474402.jpg 61 | ../coco/images/train2017/000000209842.jpg 62 | ../coco/images/train2017/000000028820.jpg 63 | ../coco/images/train2017/000000154257.jpg 64 | ../coco/images/train2017/000000342499.jpg 65 | -------------------------------------------------------------------------------- /data/coco_paper.names: -------------------------------------------------------------------------------- 1 | person 2 | bicycle 3 | car 4 | motorcycle 5 | airplane 6 | bus 7 | train 8 | truck 9 | boat 10 | traffic light 11 | fire hydrant 12 | street sign 13 | stop sign 14 | parking meter 15 | bench 16 | bird 17 | cat 18 | dog 19 | horse 20 | sheep 21 | cow 22 | elephant 23 | bear 24 | zebra 25 | giraffe 26 | hat 27 | backpack 28 | umbrella 29 | shoe 30 | eye glasses 31 | handbag 32 | tie 33 | suitcase 34 | frisbee 35 | skis 36 | snowboard 37 | sports ball 38 | kite 39 | baseball bat 40 | baseball glove 41 | skateboard 42 | surfboard 43 | tennis racket 44 | bottle 45 | plate 46 | wine glass 47 | cup 48 | fork 49 | knife 50 | spoon 51 | bowl 52 | banana 53 | apple 54 | sandwich 55 | orange 56 | broccoli 57 | carrot 58 | hot dog 59 | pizza 60 | donut 61 | cake 62 | chair 63 | couch 64 | potted plant 65 | bed 66 | mirror 67 | dining table 68 | window 69 | desk 70 | toilet 71 | door 72 | tv 73 | laptop 74 | mouse 75 | remote 76 | keyboard 77 | cell phone 78 | microwave 79 | oven 80 | toaster 81 | sink 82 | refrigerator 83 | blender 84 | book 85 | clock 86 | vase 87 | scissors 88 | teddy bear 89 | hair drier 90 | toothbrush 91 | hair brush -------------------------------------------------------------------------------- /data/get_coco2014.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Zip coco folder 3 | # zip -r coco.zip coco 4 | # tar -czvf coco.tar.gz coco 5 | 6 | # Download labels from Google Drive, accepting presented query 7 | filename="coco2014labels.zip" 8 | fileid="1s6-CmF5_SElM28r52P1OUrCcuXZN-SFo" 9 | curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id=${fileid}" > /dev/null 10 | curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=${fileid}" -o ${filename} 11 | rm ./cookie 12 | 13 | # Unzip labels 14 | unzip -q ${filename} # for coco.zip 15 | # tar -xzf ${filename} # for coco.tar.gz 16 | rm ${filename} 17 | 18 | # Download and unzip images 19 | cd coco/images 20 | f="train2014.zip" && curl http://images.cocodataset.org/zips/$f -o $f && unzip -q $f && rm $f 21 | f="val2014.zip" && curl http://images.cocodataset.org/zips/$f -o $f && unzip -q $f && rm $f 22 | 23 | # cd out 24 | cd ../.. 25 | -------------------------------------------------------------------------------- /data/get_coco2017.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Zip coco folder 3 | # zip -r coco.zip coco 4 | # tar -czvf coco.tar.gz coco 5 | 6 | # Download labels from Google Drive, accepting presented query 7 | filename="coco2017labels.zip" 8 | fileid="1cXZR_ckHki6nddOmcysCuuJFM--T-Q6L" 9 | curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id=${fileid}" > /dev/null 10 | curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=${fileid}" -o ${filename} 11 | rm ./cookie 12 | 13 | # Unzip labels 14 | unzip -q ${filename} # for coco.zip 15 | # tar -xzf ${filename} # for coco.tar.gz 16 | rm ${filename} 17 | 18 | # Download and unzip images 19 | cd coco/images 20 | f="train2017.zip" && curl http://images.cocodataset.org/zips/$f -o $f && unzip -q $f && rm $f 21 | f="val2017.zip" && curl http://images.cocodataset.org/zips/$f -o $f && unzip -q $f && rm $f 22 | 23 | # cd out 24 | cd ../.. 25 | -------------------------------------------------------------------------------- /data/hyp.scratch.s.yaml: -------------------------------------------------------------------------------- 1 | lr0: 0.01 # initial learning rate (SGD=1E-2, Adam=1E-3) 2 | lrf: 0.1 # final OneCycleLR learning rate (lr0 * lrf) 3 | momentum: 0.937 # SGD momentum/Adam beta1 4 | weight_decay: 0.0005 # optimizer weight decay 5e-4 5 | warmup_epochs: 3.0 # warmup epochs (fractions ok) 6 | warmup_momentum: 0.8 # warmup initial momentum 7 | warmup_bias_lr: 0.1 # warmup initial bias lr 8 | box: 0.05 # box loss gain 9 | cls: 0.5 # cls loss gain 10 | cls_pw: 1.0 # cls BCELoss positive_weight 11 | obj: 1.0 # obj loss gain (scale with pixels) 12 | obj_pw: 1.0 # obj BCELoss positive_weight 13 | iou_t: 0.20 # IoU training threshold 14 | anchor_t: 4.0 # anchor-multiple threshold 15 | # anchors: 3 # anchors per output layer (0 to ignore) 16 | fl_gamma: 0.0 # focal loss gamma (efficientDet default gamma=1.5) 17 | hsv_h: 0.015 # image HSV-Hue augmentation (fraction) 18 | hsv_s: 0.7 # image HSV-Saturation augmentation (fraction) 19 | hsv_v: 0.4 # image HSV-Value augmentation (fraction) 20 | degrees: 0.0 # image rotation (+/- deg) 21 | translate: 0.0 # image translation (+/- fraction) 22 | scale: 0.5 # image scale (+/- gain) 23 | shear: 0.0 # image shear (+/- deg) 24 | perspective: 0.0 # image perspective (+/- fraction), range 0-0.001 25 | flipud: 0.0 # image flip up-down (probability) 26 | fliplr: 0.5 # image flip left-right (probability) 27 | mosaic: 1.0 # image mosaic (probability) 28 | mixup: 0.0 # image mixup (probability) 29 | -------------------------------------------------------------------------------- /data/hyp.scratch.yaml: -------------------------------------------------------------------------------- 1 | lr0: 0.01 # initial learning rate (SGD=1E-2, Adam=1E-3) 2 | lrf: 0.1 # final OneCycleLR learning rate (lr0 * lrf) 3 | momentum: 0.937 # SGD momentum/Adam beta1 4 | weight_decay: 0.0005 # optimizer weight decay 5e-4 5 | warmup_epochs: 3.0 # warmup epochs (fractions ok) 6 | warmup_momentum: 0.8 # warmup initial momentum 7 | warmup_bias_lr: 0.1 # warmup initial bias lr 8 | box: 0.05 # box loss gain 9 | cls: 0.3 # cls loss gain 10 | cls_pw: 1.0 # cls BCELoss positive_weight 11 | obj: 0.6 # obj loss gain (scale with pixels) 12 | obj_pw: 1.0 # obj BCELoss positive_weight 13 | iou_t: 0.20 # IoU training threshold 14 | anchor_t: 4.0 # anchor-multiple threshold 15 | # anchors: 3 # anchors per output layer (0 to ignore) 16 | fl_gamma: 0.0 # focal loss gamma (efficientDet default gamma=1.5) 17 | hsv_h: 0.015 # image HSV-Hue augmentation (fraction) 18 | hsv_s: 0.7 # image HSV-Saturation augmentation (fraction) 19 | hsv_v: 0.4 # image HSV-Value augmentation (fraction) 20 | degrees: 0.0 # image rotation (+/- deg) 21 | translate: 0.1 # image translation (+/- fraction) 22 | scale: 0.9 # image scale (+/- gain) 23 | shear: 0.0 # image shear (+/- deg) 24 | perspective: 0.0 # image perspective (+/- fraction), range 0-0.001 25 | flipud: 0.0 # image flip up-down (probability) 26 | fliplr: 0.5 # image flip left-right (probability) 27 | mosaic: 1.0 # image mosaic (probability) 28 | mixup: 0.0 # image mixup (probability) 29 | -------------------------------------------------------------------------------- /data/samples/bus.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WongKinYiu/PyTorch_YOLOv4/6e88dc21813e614c9848a2767fd0bac13d26fd51/data/samples/bus.jpg -------------------------------------------------------------------------------- /data/samples/zidane.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WongKinYiu/PyTorch_YOLOv4/6e88dc21813e614c9848a2767fd0bac13d26fd51/data/samples/zidane.jpg -------------------------------------------------------------------------------- /detect.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | import platform 4 | import shutil 5 | import time 6 | from pathlib import Path 7 | 8 | import cv2 9 | import torch 10 | import torch.backends.cudnn as cudnn 11 | from numpy import random 12 | 13 | from utils.google_utils import attempt_load 14 | from utils.datasets import LoadStreams, LoadImages 15 | from utils.general import ( 16 | check_img_size, non_max_suppression, apply_classifier, scale_coords, xyxy2xywh, strip_optimizer) 17 | from utils.plots import plot_one_box 18 | from utils.torch_utils import select_device, load_classifier, time_synchronized 19 | 20 | from models.models import * 21 | from utils.datasets import * 22 | from utils.general import * 23 | 24 | def load_classes(path): 25 | # Loads *.names file at 'path' 26 | with open(path, 'r') as f: 27 | names = f.read().split('\n') 28 | return list(filter(None, names)) # filter removes empty strings (such as last line) 29 | 30 | def detect(save_img=False): 31 | out, source, weights, view_img, save_txt, imgsz, cfg, names = \ 32 | opt.output, opt.source, opt.weights, opt.view_img, opt.save_txt, opt.img_size, opt.cfg, opt.names 33 | webcam = source == '0' or source.startswith('rtsp') or source.startswith('http') or source.endswith('.txt') 34 | 35 | # Initialize 36 | device = select_device(opt.device) 37 | if os.path.exists(out): 38 | shutil.rmtree(out) # delete output folder 39 | os.makedirs(out) # make new output folder 40 | half = device.type != 'cpu' # half precision only supported on CUDA 41 | 42 | # Load model 43 | model = Darknet(cfg, imgsz).cuda() 44 | try: 45 | model.load_state_dict(torch.load(weights[0], map_location=device)['model']) 46 | #model = attempt_load(weights, map_location=device) # load FP32 model 47 | #imgsz = check_img_size(imgsz, s=model.stride.max()) # check img_size 48 | except: 49 | load_darknet_weights(model, weights[0]) 50 | model.to(device).eval() 51 | if half: 52 | model.half() # to FP16 53 | 54 | # Second-stage classifier 55 | classify = False 56 | if classify: 57 | modelc = load_classifier(name='resnet101', n=2) # initialize 58 | modelc.load_state_dict(torch.load('weights/resnet101.pt', map_location=device)['model']) # load weights 59 | modelc.to(device).eval() 60 | 61 | # Set Dataloader 62 | vid_path, vid_writer = None, None 63 | if webcam: 64 | view_img = True 65 | cudnn.benchmark = True # set True to speed up constant image size inference 66 | dataset = LoadStreams(source, img_size=imgsz) 67 | else: 68 | save_img = True 69 | dataset = LoadImages(source, img_size=imgsz, auto_size=64) 70 | 71 | # Get names and colors 72 | names = load_classes(names) 73 | colors = [[random.randint(0, 255) for _ in range(3)] for _ in range(len(names))] 74 | 75 | # Run inference 76 | t0 = time.time() 77 | img = torch.zeros((1, 3, imgsz, imgsz), device=device) # init img 78 | _ = model(img.half() if half else img) if device.type != 'cpu' else None # run once 79 | for path, img, im0s, vid_cap in dataset: 80 | img = torch.from_numpy(img).to(device) 81 | img = img.half() if half else img.float() # uint8 to fp16/32 82 | img /= 255.0 # 0 - 255 to 0.0 - 1.0 83 | if img.ndimension() == 3: 84 | img = img.unsqueeze(0) 85 | 86 | # Inference 87 | t1 = time_synchronized() 88 | pred = model(img, augment=opt.augment)[0] 89 | 90 | # Apply NMS 91 | pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres, classes=opt.classes, agnostic=opt.agnostic_nms) 92 | t2 = time_synchronized() 93 | 94 | # Apply Classifier 95 | if classify: 96 | pred = apply_classifier(pred, modelc, img, im0s) 97 | 98 | # Process detections 99 | for i, det in enumerate(pred): # detections per image 100 | if webcam: # batch_size >= 1 101 | p, s, im0 = path[i], '%g: ' % i, im0s[i].copy() 102 | else: 103 | p, s, im0 = path, '', im0s 104 | 105 | save_path = str(Path(out) / Path(p).name) 106 | txt_path = str(Path(out) / Path(p).stem) + ('_%g' % dataset.frame if dataset.mode == 'video' else '') 107 | s += '%gx%g ' % img.shape[2:] # print string 108 | gn = torch.tensor(im0.shape)[[1, 0, 1, 0]] # normalization gain whwh 109 | if det is not None and len(det): 110 | # Rescale boxes from img_size to im0 size 111 | det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round() 112 | 113 | # Print results 114 | for c in det[:, -1].unique(): 115 | n = (det[:, -1] == c).sum() # detections per class 116 | s += '%g %ss, ' % (n, names[int(c)]) # add to string 117 | 118 | # Write results 119 | for *xyxy, conf, cls in det: 120 | if save_txt: # Write to file 121 | xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist() # normalized xywh 122 | with open(txt_path + '.txt', 'a') as f: 123 | f.write(('%g ' * 5 + '\n') % (cls, *xywh)) # label format 124 | 125 | if save_img or view_img: # Add bbox to image 126 | label = '%s %.2f' % (names[int(cls)], conf) 127 | plot_one_box(xyxy, im0, label=label, color=colors[int(cls)], line_thickness=3) 128 | 129 | # Print time (inference + NMS) 130 | print('%sDone. (%.3fs)' % (s, t2 - t1)) 131 | 132 | # Stream results 133 | if view_img: 134 | cv2.imshow(p, im0) 135 | if cv2.waitKey(1) == ord('q'): # q to quit 136 | raise StopIteration 137 | 138 | # Save results (image with detections) 139 | if save_img: 140 | if dataset.mode == 'images': 141 | cv2.imwrite(save_path, im0) 142 | else: 143 | if vid_path != save_path: # new video 144 | vid_path = save_path 145 | if isinstance(vid_writer, cv2.VideoWriter): 146 | vid_writer.release() # release previous video writer 147 | 148 | fourcc = 'mp4v' # output video codec 149 | fps = vid_cap.get(cv2.CAP_PROP_FPS) 150 | w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH)) 151 | h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) 152 | vid_writer = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*fourcc), fps, (w, h)) 153 | vid_writer.write(im0) 154 | 155 | if save_txt or save_img: 156 | print('Results saved to %s' % Path(out)) 157 | if platform == 'darwin' and not opt.update: # MacOS 158 | os.system('open ' + save_path) 159 | 160 | print('Done. (%.3fs)' % (time.time() - t0)) 161 | 162 | 163 | if __name__ == '__main__': 164 | parser = argparse.ArgumentParser() 165 | parser.add_argument('--weights', nargs='+', type=str, default='yolov4.weights', help='model.pt path(s)') 166 | parser.add_argument('--source', type=str, default='inference/images', help='source') # file/folder, 0 for webcam 167 | parser.add_argument('--output', type=str, default='inference/output', help='output folder') # output folder 168 | parser.add_argument('--img-size', type=int, default=640, help='inference size (pixels)') 169 | parser.add_argument('--conf-thres', type=float, default=0.4, help='object confidence threshold') 170 | parser.add_argument('--iou-thres', type=float, default=0.5, help='IOU threshold for NMS') 171 | parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu') 172 | parser.add_argument('--view-img', action='store_true', help='display results') 173 | parser.add_argument('--save-txt', action='store_true', help='save results to *.txt') 174 | parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --class 0, or --class 0 2 3') 175 | parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS') 176 | parser.add_argument('--augment', action='store_true', help='augmented inference') 177 | parser.add_argument('--update', action='store_true', help='update all models') 178 | parser.add_argument('--cfg', type=str, default='models/yolov4.cfg', help='*.cfg path') 179 | parser.add_argument('--names', type=str, default='data/coco.names', help='*.cfg path') 180 | opt = parser.parse_args() 181 | print(opt) 182 | 183 | with torch.no_grad(): 184 | if opt.update: # update all models (to fix SourceChangeWarning) 185 | for opt.weights in ['']: 186 | detect() 187 | strip_optimizer(opt.weights) 188 | else: 189 | detect() 190 | -------------------------------------------------------------------------------- /images/scalingCSP.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WongKinYiu/PyTorch_YOLOv4/6e88dc21813e614c9848a2767fd0bac13d26fd51/images/scalingCSP.png -------------------------------------------------------------------------------- /models/export.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | 3 | import torch 4 | 5 | from utils.google_utils import attempt_download 6 | 7 | if __name__ == '__main__': 8 | parser = argparse.ArgumentParser() 9 | parser.add_argument('--weights', type=str, default='./yolov4-csp.pt', help='weights path') 10 | parser.add_argument('--img-size', nargs='+', type=int, default=[640, 640], help='image size') 11 | parser.add_argument('--batch-size', type=int, default=1, help='batch size') 12 | opt = parser.parse_args() 13 | opt.img_size *= 2 if len(opt.img_size) == 1 else 1 # expand 14 | print(opt) 15 | 16 | # Input 17 | img = torch.zeros((opt.batch_size, 3, *opt.img_size)) # image size(1,3,320,192) iDetection 18 | 19 | # Load PyTorch model 20 | attempt_download(opt.weights) 21 | model = torch.load(opt.weights, map_location=torch.device('cpu'))['model'].float() 22 | model.eval() 23 | model.model[-1].export = True # set Detect() layer export=True 24 | y = model(img) # dry run 25 | 26 | # TorchScript export 27 | try: 28 | print('\nStarting TorchScript export with torch %s...' % torch.__version__) 29 | f = opt.weights.replace('.pt', '.torchscript.pt') # filename 30 | ts = torch.jit.trace(model, img) 31 | ts.save(f) 32 | print('TorchScript export success, saved as %s' % f) 33 | except Exception as e: 34 | print('TorchScript export failure: %s' % e) 35 | 36 | # ONNX export 37 | try: 38 | import onnx 39 | 40 | print('\nStarting ONNX export with onnx %s...' % onnx.__version__) 41 | f = opt.weights.replace('.pt', '.onnx') # filename 42 | model.fuse() # only for ONNX 43 | torch.onnx.export(model, img, f, verbose=False, opset_version=12, input_names=['images'], 44 | output_names=['classes', 'boxes'] if y is None else ['output']) 45 | 46 | # Checks 47 | onnx_model = onnx.load(f) # load onnx model 48 | onnx.checker.check_model(onnx_model) # check onnx model 49 | print(onnx.helper.printable_graph(onnx_model.graph)) # print a human readable model 50 | print('ONNX export success, saved as %s' % f) 51 | except Exception as e: 52 | print('ONNX export failure: %s' % e) 53 | 54 | # CoreML export 55 | try: 56 | import coremltools as ct 57 | 58 | print('\nStarting CoreML export with coremltools %s...' % ct.__version__) 59 | # convert model from torchscript and apply pixel scaling as per detect.py 60 | model = ct.convert(ts, inputs=[ct.ImageType(name='images', shape=img.shape, scale=1 / 255.0, bias=[0, 0, 0])]) 61 | f = opt.weights.replace('.pt', '.mlmodel') # filename 62 | model.save(f) 63 | print('CoreML export success, saved as %s' % f) 64 | except Exception as e: 65 | print('CoreML export failure: %s' % e) 66 | 67 | # Finish 68 | print('\nExport complete. Visualize with https://github.com/lutzroeder/netron.') 69 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy == 1.17 2 | opencv-python >= 4.1 3 | torch == 1.6 4 | torchvision 5 | matplotlib 6 | pycocotools 7 | tqdm 8 | pillow 9 | tensorboard >= 1.14 10 | -------------------------------------------------------------------------------- /test.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import glob 3 | import json 4 | import os 5 | from pathlib import Path 6 | 7 | import numpy as np 8 | import torch 9 | import yaml 10 | from tqdm import tqdm 11 | 12 | from utils.google_utils import attempt_load 13 | from utils.datasets import create_dataloader 14 | from utils.general import coco80_to_coco91_class, check_dataset, check_file, check_img_size, box_iou, \ 15 | non_max_suppression, scale_coords, xyxy2xywh, xywh2xyxy, clip_coords, set_logging, increment_path 16 | from utils.loss import compute_loss 17 | from utils.metrics import ap_per_class 18 | from utils.plots import plot_images, output_to_target 19 | from utils.torch_utils import select_device, time_synchronized 20 | 21 | from models.models import * 22 | 23 | def load_classes(path): 24 | # Loads *.names file at 'path' 25 | with open(path, 'r') as f: 26 | names = f.read().split('\n') 27 | return list(filter(None, names)) # filter removes empty strings (such as last line) 28 | 29 | 30 | def test(data, 31 | weights=None, 32 | batch_size=16, 33 | imgsz=640, 34 | conf_thres=0.001, 35 | iou_thres=0.6, # for NMS 36 | save_json=False, 37 | single_cls=False, 38 | augment=False, 39 | verbose=False, 40 | model=None, 41 | dataloader=None, 42 | save_dir=Path(''), # for saving images 43 | save_txt=False, # for auto-labelling 44 | save_conf=False, 45 | plots=True, 46 | log_imgs=0): # number of logged images 47 | 48 | # Initialize/load model and set device 49 | training = model is not None 50 | if training: # called by train.py 51 | device = next(model.parameters()).device # get model device 52 | 53 | else: # called directly 54 | set_logging() 55 | device = select_device(opt.device, batch_size=batch_size) 56 | save_txt = opt.save_txt # save *.txt labels 57 | 58 | # Directories 59 | save_dir = Path(increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok)) # increment run 60 | (save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True, exist_ok=True) # make dir 61 | 62 | # Load model 63 | model = Darknet(opt.cfg).to(device) 64 | 65 | # load model 66 | try: 67 | ckpt = torch.load(weights[0], map_location=device) # load checkpoint 68 | ckpt['model'] = {k: v for k, v in ckpt['model'].items() if model.state_dict()[k].numel() == v.numel()} 69 | model.load_state_dict(ckpt['model'], strict=False) 70 | except: 71 | load_darknet_weights(model, weights[0]) 72 | imgsz = check_img_size(imgsz, s=64) # check img_size 73 | 74 | # Half 75 | half = device.type != 'cpu' # half precision only supported on CUDA 76 | if half: 77 | model.half() 78 | 79 | # Configure 80 | model.eval() 81 | is_coco = data.endswith('coco.yaml') # is COCO dataset 82 | with open(data) as f: 83 | data = yaml.load(f, Loader=yaml.FullLoader) # model dict 84 | check_dataset(data) # check 85 | nc = 1 if single_cls else int(data['nc']) # number of classes 86 | iouv = torch.linspace(0.5, 0.95, 10).to(device) # iou vector for mAP@0.5:0.95 87 | niou = iouv.numel() 88 | 89 | # Logging 90 | log_imgs, wandb = min(log_imgs, 100), None # ceil 91 | try: 92 | import wandb # Weights & Biases 93 | except ImportError: 94 | log_imgs = 0 95 | 96 | # Dataloader 97 | if not training: 98 | img = torch.zeros((1, 3, imgsz, imgsz), device=device) # init img 99 | _ = model(img.half() if half else img) if device.type != 'cpu' else None # run once 100 | path = data['test'] if opt.task == 'test' else data['val'] # path to val/test images 101 | dataloader = create_dataloader(path, imgsz, batch_size, 64, opt, pad=0.5, rect=True)[0] 102 | 103 | seen = 0 104 | try: 105 | names = model.names if hasattr(model, 'names') else model.module.names 106 | except: 107 | names = load_classes(opt.names) 108 | coco91class = coco80_to_coco91_class() 109 | s = ('%20s' + '%12s' * 6) % ('Class', 'Images', 'Targets', 'P', 'R', 'mAP@.5', 'mAP@.5:.95') 110 | p, r, f1, mp, mr, map50, map, t0, t1 = 0., 0., 0., 0., 0., 0., 0., 0., 0. 111 | loss = torch.zeros(3, device=device) 112 | jdict, stats, ap, ap_class, wandb_images = [], [], [], [], [] 113 | for batch_i, (img, targets, paths, shapes) in enumerate(tqdm(dataloader, desc=s)): 114 | img = img.to(device, non_blocking=True) 115 | img = img.half() if half else img.float() # uint8 to fp16/32 116 | img /= 255.0 # 0 - 255 to 0.0 - 1.0 117 | targets = targets.to(device) 118 | nb, _, height, width = img.shape # batch size, channels, height, width 119 | whwh = torch.Tensor([width, height, width, height]).to(device) 120 | 121 | # Disable gradients 122 | with torch.no_grad(): 123 | # Run model 124 | t = time_synchronized() 125 | inf_out, train_out = model(img, augment=augment) # inference and training outputs 126 | t0 += time_synchronized() - t 127 | 128 | # Compute loss 129 | if training: # if model has loss hyperparameters 130 | loss += compute_loss([x.float() for x in train_out], targets, model)[1][:3] # box, obj, cls 131 | 132 | # Run NMS 133 | t = time_synchronized() 134 | output = non_max_suppression(inf_out, conf_thres=conf_thres, iou_thres=iou_thres) 135 | t1 += time_synchronized() - t 136 | 137 | # Statistics per image 138 | for si, pred in enumerate(output): 139 | labels = targets[targets[:, 0] == si, 1:] 140 | nl = len(labels) 141 | tcls = labels[:, 0].tolist() if nl else [] # target class 142 | seen += 1 143 | 144 | if len(pred) == 0: 145 | if nl: 146 | stats.append((torch.zeros(0, niou, dtype=torch.bool), torch.Tensor(), torch.Tensor(), tcls)) 147 | continue 148 | 149 | # Append to text file 150 | path = Path(paths[si]) 151 | if save_txt: 152 | gn = torch.tensor(shapes[si][0])[[1, 0, 1, 0]] # normalization gain whwh 153 | x = pred.clone() 154 | x[:, :4] = scale_coords(img[si].shape[1:], x[:, :4], shapes[si][0], shapes[si][1]) # to original 155 | for *xyxy, conf, cls in x: 156 | xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist() # normalized xywh 157 | line = (cls, *xywh, conf) if save_conf else (cls, *xywh) # label format 158 | with open(save_dir / 'labels' / (path.stem + '.txt'), 'a') as f: 159 | f.write(('%g ' * len(line)).rstrip() % line + '\n') 160 | 161 | # W&B logging 162 | if plots and len(wandb_images) < log_imgs: 163 | box_data = [{"position": {"minX": xyxy[0], "minY": xyxy[1], "maxX": xyxy[2], "maxY": xyxy[3]}, 164 | "class_id": int(cls), 165 | "box_caption": "%s %.3f" % (names[cls], conf), 166 | "scores": {"class_score": conf}, 167 | "domain": "pixel"} for *xyxy, conf, cls in pred.tolist()] 168 | boxes = {"predictions": {"box_data": box_data, "class_labels": names}} 169 | wandb_images.append(wandb.Image(img[si], boxes=boxes, caption=path.name)) 170 | 171 | # Clip boxes to image bounds 172 | clip_coords(pred, (height, width)) 173 | 174 | # Append to pycocotools JSON dictionary 175 | if save_json: 176 | # [{"image_id": 42, "category_id": 18, "bbox": [258.15, 41.29, 348.26, 243.78], "score": 0.236}, ... 177 | image_id = int(path.stem) if path.stem.isnumeric() else path.stem 178 | box = pred[:, :4].clone() # xyxy 179 | scale_coords(img[si].shape[1:], box, shapes[si][0], shapes[si][1]) # to original shape 180 | box = xyxy2xywh(box) # xywh 181 | box[:, :2] -= box[:, 2:] / 2 # xy center to top-left corner 182 | for p, b in zip(pred.tolist(), box.tolist()): 183 | jdict.append({'image_id': image_id, 184 | 'category_id': coco91class[int(p[5])] if is_coco else int(p[5]), 185 | 'bbox': [round(x, 3) for x in b], 186 | 'score': round(p[4], 5)}) 187 | 188 | # Assign all predictions as incorrect 189 | correct = torch.zeros(pred.shape[0], niou, dtype=torch.bool, device=device) 190 | if nl: 191 | detected = [] # target indices 192 | tcls_tensor = labels[:, 0] 193 | 194 | # target boxes 195 | tbox = xywh2xyxy(labels[:, 1:5]) * whwh 196 | 197 | # Per target class 198 | for cls in torch.unique(tcls_tensor): 199 | ti = (cls == tcls_tensor).nonzero(as_tuple=False).view(-1) # prediction indices 200 | pi = (cls == pred[:, 5]).nonzero(as_tuple=False).view(-1) # target indices 201 | 202 | # Search for detections 203 | if pi.shape[0]: 204 | # Prediction to target ious 205 | ious, i = box_iou(pred[pi, :4], tbox[ti]).max(1) # best ious, indices 206 | 207 | # Append detections 208 | detected_set = set() 209 | for j in (ious > iouv[0]).nonzero(as_tuple=False): 210 | d = ti[i[j]] # detected target 211 | if d.item() not in detected_set: 212 | detected_set.add(d.item()) 213 | detected.append(d) 214 | correct[pi[j]] = ious[j] > iouv # iou_thres is 1xn 215 | if len(detected) == nl: # all targets already located in image 216 | break 217 | 218 | # Append statistics (correct, conf, pcls, tcls) 219 | stats.append((correct.cpu(), pred[:, 4].cpu(), pred[:, 5].cpu(), tcls)) 220 | 221 | # Plot images 222 | if plots and batch_i < 3: 223 | f = save_dir / f'test_batch{batch_i}_labels.jpg' # filename 224 | plot_images(img, targets, paths, f, names) # labels 225 | f = save_dir / f'test_batch{batch_i}_pred.jpg' 226 | plot_images(img, output_to_target(output, width, height), paths, f, names) # predictions 227 | 228 | # Compute statistics 229 | stats = [np.concatenate(x, 0) for x in zip(*stats)] # to numpy 230 | if len(stats) and stats[0].any(): 231 | p, r, ap, f1, ap_class = ap_per_class(*stats, plot=plots, fname=save_dir / 'precision-recall_curve.png') 232 | p, r, ap50, ap = p[:, 0], r[:, 0], ap[:, 0], ap.mean(1) # [P, R, AP@0.5, AP@0.5:0.95] 233 | mp, mr, map50, map = p.mean(), r.mean(), ap50.mean(), ap.mean() 234 | nt = np.bincount(stats[3].astype(np.int64), minlength=nc) # number of targets per class 235 | else: 236 | nt = torch.zeros(1) 237 | 238 | # W&B logging 239 | if plots and wandb: 240 | wandb.log({"Images": wandb_images}) 241 | wandb.log({"Validation": [wandb.Image(str(x), caption=x.name) for x in sorted(save_dir.glob('test*.jpg'))]}) 242 | 243 | # Print results 244 | pf = '%20s' + '%12.3g' * 6 # print format 245 | print(pf % ('all', seen, nt.sum(), mp, mr, map50, map)) 246 | 247 | # Print results per class 248 | if verbose and nc > 1 and len(stats): 249 | for i, c in enumerate(ap_class): 250 | print(pf % (names[c], seen, nt[c], p[i], r[i], ap50[i], ap[i])) 251 | 252 | # Print speeds 253 | t = tuple(x / seen * 1E3 for x in (t0, t1, t0 + t1)) + (imgsz, imgsz, batch_size) # tuple 254 | if not training: 255 | print('Speed: %.1f/%.1f/%.1f ms inference/NMS/total per %gx%g image at batch-size %g' % t) 256 | 257 | # Save JSON 258 | if save_json and len(jdict): 259 | w = Path(weights[0] if isinstance(weights, list) else weights).stem if weights is not None else '' # weights 260 | anno_json = glob.glob('../coco/annotations/instances_val*.json')[0] # annotations json 261 | pred_json = str(save_dir / f"{w}_predictions.json") # predictions json 262 | print('\nEvaluating pycocotools mAP... saving %s...' % pred_json) 263 | with open(pred_json, 'w') as f: 264 | json.dump(jdict, f) 265 | 266 | try: # https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb 267 | from pycocotools.coco import COCO 268 | from pycocotools.cocoeval import COCOeval 269 | 270 | anno = COCO(anno_json) # init annotations api 271 | pred = anno.loadRes(pred_json) # init predictions api 272 | eval = COCOeval(anno, pred, 'bbox') 273 | if is_coco: 274 | eval.params.imgIds = [int(Path(x).stem) for x in dataloader.dataset.img_files] # image IDs to evaluate 275 | eval.evaluate() 276 | eval.accumulate() 277 | eval.summarize() 278 | map, map50 = eval.stats[:2] # update results (mAP@0.5:0.95, mAP@0.5) 279 | except Exception as e: 280 | print('ERROR: pycocotools unable to run: %s' % e) 281 | 282 | # Return results 283 | if not training: 284 | print('Results saved to %s' % save_dir) 285 | model.float() # for training 286 | maps = np.zeros(nc) + map 287 | for i, c in enumerate(ap_class): 288 | maps[c] = ap[i] 289 | return (mp, mr, map50, map, *(loss.cpu() / len(dataloader)).tolist()), maps, t 290 | 291 | 292 | if __name__ == '__main__': 293 | parser = argparse.ArgumentParser(prog='test.py') 294 | parser.add_argument('--weights', nargs='+', type=str, default='yolov4.pt', help='model.pt path(s)') 295 | parser.add_argument('--data', type=str, default='data/coco.yaml', help='*.data path') 296 | parser.add_argument('--batch-size', type=int, default=32, help='size of each image batch') 297 | parser.add_argument('--img-size', type=int, default=640, help='inference size (pixels)') 298 | parser.add_argument('--conf-thres', type=float, default=0.001, help='object confidence threshold') 299 | parser.add_argument('--iou-thres', type=float, default=0.65, help='IOU threshold for NMS') 300 | parser.add_argument('--task', default='val', help="'val', 'test', 'study'") 301 | parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu') 302 | parser.add_argument('--single-cls', action='store_true', help='treat as single-class dataset') 303 | parser.add_argument('--augment', action='store_true', help='augmented inference') 304 | parser.add_argument('--verbose', action='store_true', help='report mAP by class') 305 | parser.add_argument('--save-txt', action='store_true', help='save results to *.txt') 306 | parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels') 307 | parser.add_argument('--save-json', action='store_true', help='save a cocoapi-compatible JSON results file') 308 | parser.add_argument('--project', default='runs/test', help='save to project/name') 309 | parser.add_argument('--name', default='exp', help='save to project/name') 310 | parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment') 311 | parser.add_argument('--cfg', type=str, default='cfg/yolov4.cfg', help='*.cfg path') 312 | parser.add_argument('--names', type=str, default='data/coco.names', help='*.cfg path') 313 | opt = parser.parse_args() 314 | opt.save_json |= opt.data.endswith('coco.yaml') 315 | opt.data = check_file(opt.data) # check file 316 | print(opt) 317 | 318 | if opt.task in ['val', 'test']: # run normally 319 | test(opt.data, 320 | opt.weights, 321 | opt.batch_size, 322 | opt.img_size, 323 | opt.conf_thres, 324 | opt.iou_thres, 325 | opt.save_json, 326 | opt.single_cls, 327 | opt.augment, 328 | opt.verbose, 329 | save_txt=opt.save_txt, 330 | save_conf=opt.save_conf, 331 | ) 332 | 333 | elif opt.task == 'study': # run over a range of settings and save/plot 334 | for weights in ['yolov4-pacsp.weights', 'yolov4-pacsp-x.weishts']: 335 | f = 'study_%s_%s.txt' % (Path(opt.data).stem, Path(weights).stem) # filename to save to 336 | x = list(range(320, 800, 64)) # x axis 337 | y = [] # y axis 338 | for i in x: # img-size 339 | print('\nRunning %s point %s...' % (f, i)) 340 | r, _, t = test(opt.data, weights, opt.batch_size, i, opt.conf_thres, opt.iou_thres, opt.save_json) 341 | y.append(r + t) # results and times 342 | np.savetxt(f, y, fmt='%10.4g') # save 343 | os.system('zip -r study.zip study_*.txt') 344 | # utils.general.plot_study_txt(f, x) # plot 345 | -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /utils/activations.py: -------------------------------------------------------------------------------- 1 | # Activation functions 2 | 3 | import torch 4 | import torch.nn as nn 5 | import torch.nn.functional as F 6 | 7 | 8 | # Swish https://arxiv.org/pdf/1905.02244.pdf --------------------------------------------------------------------------- 9 | class Swish(nn.Module): # 10 | @staticmethod 11 | def forward(x): 12 | return x * torch.sigmoid(x) 13 | 14 | 15 | class Hardswish(nn.Module): # export-friendly version of nn.Hardswish() 16 | @staticmethod 17 | def forward(x): 18 | # return x * F.hardsigmoid(x) # for torchscript and CoreML 19 | return x * F.hardtanh(x + 3, 0., 6.) / 6. # for torchscript, CoreML and ONNX 20 | 21 | 22 | class MemoryEfficientSwish(nn.Module): 23 | class F(torch.autograd.Function): 24 | @staticmethod 25 | def forward(ctx, x): 26 | ctx.save_for_backward(x) 27 | return x * torch.sigmoid(x) 28 | 29 | @staticmethod 30 | def backward(ctx, grad_output): 31 | x = ctx.saved_tensors[0] 32 | sx = torch.sigmoid(x) 33 | return grad_output * (sx * (1 + x * (1 - sx))) 34 | 35 | def forward(self, x): 36 | return self.F.apply(x) 37 | 38 | 39 | # Mish https://github.com/digantamisra98/Mish -------------------------------------------------------------------------- 40 | class Mish(nn.Module): 41 | @staticmethod 42 | def forward(x): 43 | return x * F.softplus(x).tanh() 44 | 45 | 46 | class MemoryEfficientMish(nn.Module): 47 | class F(torch.autograd.Function): 48 | @staticmethod 49 | def forward(ctx, x): 50 | ctx.save_for_backward(x) 51 | return x.mul(torch.tanh(F.softplus(x))) # x * tanh(ln(1 + exp(x))) 52 | 53 | @staticmethod 54 | def backward(ctx, grad_output): 55 | x = ctx.saved_tensors[0] 56 | sx = torch.sigmoid(x) 57 | fx = F.softplus(x).tanh() 58 | return grad_output * (fx + x * sx * (1 - fx * fx)) 59 | 60 | def forward(self, x): 61 | return self.F.apply(x) 62 | 63 | 64 | # FReLU https://arxiv.org/abs/2007.11824 ------------------------------------------------------------------------------- 65 | class FReLU(nn.Module): 66 | def __init__(self, c1, k=3): # ch_in, kernel 67 | super().__init__() 68 | self.conv = nn.Conv2d(c1, c1, k, 1, 1, groups=c1) 69 | self.bn = nn.BatchNorm2d(c1) 70 | 71 | def forward(self, x): 72 | return torch.max(x, self.bn(self.conv(x))) 73 | -------------------------------------------------------------------------------- /utils/adabound.py: -------------------------------------------------------------------------------- 1 | import math 2 | 3 | import torch 4 | from torch.optim.optimizer import Optimizer 5 | 6 | 7 | class AdaBound(Optimizer): 8 | """Implements AdaBound algorithm. 9 | It has been proposed in `Adaptive Gradient Methods with Dynamic Bound of Learning Rate`_. 10 | Arguments: 11 | params (iterable): iterable of parameters to optimize or dicts defining 12 | parameter groups 13 | lr (float, optional): Adam learning rate (default: 1e-3) 14 | betas (Tuple[float, float], optional): coefficients used for computing 15 | running averages of gradient and its square (default: (0.9, 0.999)) 16 | final_lr (float, optional): final (SGD) learning rate (default: 0.1) 17 | gamma (float, optional): convergence speed of the bound functions (default: 1e-3) 18 | eps (float, optional): term added to the denominator to improve 19 | numerical stability (default: 1e-8) 20 | weight_decay (float, optional): weight decay (L2 penalty) (default: 0) 21 | amsbound (boolean, optional): whether to use the AMSBound variant of this algorithm 22 | .. Adaptive Gradient Methods with Dynamic Bound of Learning Rate: 23 | https://openreview.net/forum?id=Bkg3g2R9FX 24 | """ 25 | 26 | def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), final_lr=0.1, gamma=1e-3, 27 | eps=1e-8, weight_decay=0, amsbound=False): 28 | if not 0.0 <= lr: 29 | raise ValueError("Invalid learning rate: {}".format(lr)) 30 | if not 0.0 <= eps: 31 | raise ValueError("Invalid epsilon value: {}".format(eps)) 32 | if not 0.0 <= betas[0] < 1.0: 33 | raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0])) 34 | if not 0.0 <= betas[1] < 1.0: 35 | raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1])) 36 | if not 0.0 <= final_lr: 37 | raise ValueError("Invalid final learning rate: {}".format(final_lr)) 38 | if not 0.0 <= gamma < 1.0: 39 | raise ValueError("Invalid gamma parameter: {}".format(gamma)) 40 | defaults = dict(lr=lr, betas=betas, final_lr=final_lr, gamma=gamma, eps=eps, 41 | weight_decay=weight_decay, amsbound=amsbound) 42 | super(AdaBound, self).__init__(params, defaults) 43 | 44 | self.base_lrs = list(map(lambda group: group['lr'], self.param_groups)) 45 | 46 | def __setstate__(self, state): 47 | super(AdaBound, self).__setstate__(state) 48 | for group in self.param_groups: 49 | group.setdefault('amsbound', False) 50 | 51 | def step(self, closure=None): 52 | """Performs a single optimization step. 53 | Arguments: 54 | closure (callable, optional): A closure that reevaluates the model 55 | and returns the loss. 56 | """ 57 | loss = None 58 | if closure is not None: 59 | loss = closure() 60 | 61 | for group, base_lr in zip(self.param_groups, self.base_lrs): 62 | for p in group['params']: 63 | if p.grad is None: 64 | continue 65 | grad = p.grad.data 66 | if grad.is_sparse: 67 | raise RuntimeError( 68 | 'Adam does not support sparse gradients, please consider SparseAdam instead') 69 | amsbound = group['amsbound'] 70 | 71 | state = self.state[p] 72 | 73 | # State initialization 74 | if len(state) == 0: 75 | state['step'] = 0 76 | # Exponential moving average of gradient values 77 | state['exp_avg'] = torch.zeros_like(p.data) 78 | # Exponential moving average of squared gradient values 79 | state['exp_avg_sq'] = torch.zeros_like(p.data) 80 | if amsbound: 81 | # Maintains max of all exp. moving avg. of sq. grad. values 82 | state['max_exp_avg_sq'] = torch.zeros_like(p.data) 83 | 84 | exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq'] 85 | if amsbound: 86 | max_exp_avg_sq = state['max_exp_avg_sq'] 87 | beta1, beta2 = group['betas'] 88 | 89 | state['step'] += 1 90 | 91 | if group['weight_decay'] != 0: 92 | grad = grad.add(group['weight_decay'], p.data) 93 | 94 | # Decay the first and second moment running average coefficient 95 | exp_avg.mul_(beta1).add_(1 - beta1, grad) 96 | exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad) 97 | if amsbound: 98 | # Maintains the maximum of all 2nd moment running avg. till now 99 | torch.max(max_exp_avg_sq, exp_avg_sq, out=max_exp_avg_sq) 100 | # Use the max. for normalizing running avg. of gradient 101 | denom = max_exp_avg_sq.sqrt().add_(group['eps']) 102 | else: 103 | denom = exp_avg_sq.sqrt().add_(group['eps']) 104 | 105 | bias_correction1 = 1 - beta1 ** state['step'] 106 | bias_correction2 = 1 - beta2 ** state['step'] 107 | step_size = group['lr'] * math.sqrt(bias_correction2) / bias_correction1 108 | 109 | # Applies bounds on actual learning rate 110 | # lr_scheduler cannot affect final_lr, this is a workaround to apply lr decay 111 | final_lr = group['final_lr'] * group['lr'] / base_lr 112 | lower_bound = final_lr * (1 - 1 / (group['gamma'] * state['step'] + 1)) 113 | upper_bound = final_lr * (1 + 1 / (group['gamma'] * state['step'])) 114 | step_size = torch.full_like(denom, step_size) 115 | step_size.div_(denom).clamp_(lower_bound, upper_bound).mul_(exp_avg) 116 | 117 | p.data.add_(-step_size) 118 | 119 | return loss 120 | 121 | 122 | class AdaBoundW(Optimizer): 123 | """Implements AdaBound algorithm with Decoupled Weight Decay (arxiv.org/abs/1711.05101) 124 | It has been proposed in `Adaptive Gradient Methods with Dynamic Bound of Learning Rate`_. 125 | Arguments: 126 | params (iterable): iterable of parameters to optimize or dicts defining 127 | parameter groups 128 | lr (float, optional): Adam learning rate (default: 1e-3) 129 | betas (Tuple[float, float], optional): coefficients used for computing 130 | running averages of gradient and its square (default: (0.9, 0.999)) 131 | final_lr (float, optional): final (SGD) learning rate (default: 0.1) 132 | gamma (float, optional): convergence speed of the bound functions (default: 1e-3) 133 | eps (float, optional): term added to the denominator to improve 134 | numerical stability (default: 1e-8) 135 | weight_decay (float, optional): weight decay (L2 penalty) (default: 0) 136 | amsbound (boolean, optional): whether to use the AMSBound variant of this algorithm 137 | .. Adaptive Gradient Methods with Dynamic Bound of Learning Rate: 138 | https://openreview.net/forum?id=Bkg3g2R9FX 139 | """ 140 | 141 | def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), final_lr=0.1, gamma=1e-3, 142 | eps=1e-8, weight_decay=0, amsbound=False): 143 | if not 0.0 <= lr: 144 | raise ValueError("Invalid learning rate: {}".format(lr)) 145 | if not 0.0 <= eps: 146 | raise ValueError("Invalid epsilon value: {}".format(eps)) 147 | if not 0.0 <= betas[0] < 1.0: 148 | raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0])) 149 | if not 0.0 <= betas[1] < 1.0: 150 | raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1])) 151 | if not 0.0 <= final_lr: 152 | raise ValueError("Invalid final learning rate: {}".format(final_lr)) 153 | if not 0.0 <= gamma < 1.0: 154 | raise ValueError("Invalid gamma parameter: {}".format(gamma)) 155 | defaults = dict(lr=lr, betas=betas, final_lr=final_lr, gamma=gamma, eps=eps, 156 | weight_decay=weight_decay, amsbound=amsbound) 157 | super(AdaBoundW, self).__init__(params, defaults) 158 | 159 | self.base_lrs = list(map(lambda group: group['lr'], self.param_groups)) 160 | 161 | def __setstate__(self, state): 162 | super(AdaBoundW, self).__setstate__(state) 163 | for group in self.param_groups: 164 | group.setdefault('amsbound', False) 165 | 166 | def step(self, closure=None): 167 | """Performs a single optimization step. 168 | Arguments: 169 | closure (callable, optional): A closure that reevaluates the model 170 | and returns the loss. 171 | """ 172 | loss = None 173 | if closure is not None: 174 | loss = closure() 175 | 176 | for group, base_lr in zip(self.param_groups, self.base_lrs): 177 | for p in group['params']: 178 | if p.grad is None: 179 | continue 180 | grad = p.grad.data 181 | if grad.is_sparse: 182 | raise RuntimeError( 183 | 'Adam does not support sparse gradients, please consider SparseAdam instead') 184 | amsbound = group['amsbound'] 185 | 186 | state = self.state[p] 187 | 188 | # State initialization 189 | if len(state) == 0: 190 | state['step'] = 0 191 | # Exponential moving average of gradient values 192 | state['exp_avg'] = torch.zeros_like(p.data) 193 | # Exponential moving average of squared gradient values 194 | state['exp_avg_sq'] = torch.zeros_like(p.data) 195 | if amsbound: 196 | # Maintains max of all exp. moving avg. of sq. grad. values 197 | state['max_exp_avg_sq'] = torch.zeros_like(p.data) 198 | 199 | exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq'] 200 | if amsbound: 201 | max_exp_avg_sq = state['max_exp_avg_sq'] 202 | beta1, beta2 = group['betas'] 203 | 204 | state['step'] += 1 205 | 206 | # Decay the first and second moment running average coefficient 207 | exp_avg.mul_(beta1).add_(1 - beta1, grad) 208 | exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad) 209 | if amsbound: 210 | # Maintains the maximum of all 2nd moment running avg. till now 211 | torch.max(max_exp_avg_sq, exp_avg_sq, out=max_exp_avg_sq) 212 | # Use the max. for normalizing running avg. of gradient 213 | denom = max_exp_avg_sq.sqrt().add_(group['eps']) 214 | else: 215 | denom = exp_avg_sq.sqrt().add_(group['eps']) 216 | 217 | bias_correction1 = 1 - beta1 ** state['step'] 218 | bias_correction2 = 1 - beta2 ** state['step'] 219 | step_size = group['lr'] * math.sqrt(bias_correction2) / bias_correction1 220 | 221 | # Applies bounds on actual learning rate 222 | # lr_scheduler cannot affect final_lr, this is a workaround to apply lr decay 223 | final_lr = group['final_lr'] * group['lr'] / base_lr 224 | lower_bound = final_lr * (1 - 1 / (group['gamma'] * state['step'] + 1)) 225 | upper_bound = final_lr * (1 + 1 / (group['gamma'] * state['step'])) 226 | step_size = torch.full_like(denom, step_size) 227 | step_size.div_(denom).clamp_(lower_bound, upper_bound).mul_(exp_avg) 228 | 229 | if group['weight_decay'] != 0: 230 | decayed_weights = torch.mul(p.data, group['weight_decay']) 231 | p.data.add_(-step_size) 232 | p.data.sub_(decayed_weights) 233 | else: 234 | p.data.add_(-step_size) 235 | 236 | return loss 237 | -------------------------------------------------------------------------------- /utils/autoanchor.py: -------------------------------------------------------------------------------- 1 | # Auto-anchor utils 2 | 3 | import numpy as np 4 | import torch 5 | import yaml 6 | from scipy.cluster.vq import kmeans 7 | from tqdm import tqdm 8 | 9 | 10 | def check_anchor_order(m): 11 | # Check anchor order against stride order for YOLOv5 Detect() module m, and correct if necessary 12 | a = m.anchor_grid.prod(-1).view(-1) # anchor area 13 | da = a[-1] - a[0] # delta a 14 | ds = m.stride[-1] - m.stride[0] # delta s 15 | if da.sign() != ds.sign(): # same order 16 | print('Reversing anchor order') 17 | m.anchors[:] = m.anchors.flip(0) 18 | m.anchor_grid[:] = m.anchor_grid.flip(0) 19 | 20 | 21 | def check_anchors(dataset, model, thr=4.0, imgsz=640): 22 | # Check anchor fit to data, recompute if necessary 23 | print('\nAnalyzing anchors... ', end='') 24 | m = model.module.model[-1] if hasattr(model, 'module') else model.model[-1] # Detect() 25 | shapes = imgsz * dataset.shapes / dataset.shapes.max(1, keepdims=True) 26 | scale = np.random.uniform(0.9, 1.1, size=(shapes.shape[0], 1)) # augment scale 27 | wh = torch.tensor(np.concatenate([l[:, 3:5] * s for s, l in zip(shapes * scale, dataset.labels)])).float() # wh 28 | 29 | def metric(k): # compute metric 30 | r = wh[:, None] / k[None] 31 | x = torch.min(r, 1. / r).min(2)[0] # ratio metric 32 | best = x.max(1)[0] # best_x 33 | aat = (x > 1. / thr).float().sum(1).mean() # anchors above threshold 34 | bpr = (best > 1. / thr).float().mean() # best possible recall 35 | return bpr, aat 36 | 37 | bpr, aat = metric(m.anchor_grid.clone().cpu().view(-1, 2)) 38 | print('anchors/target = %.2f, Best Possible Recall (BPR) = %.4f' % (aat, bpr), end='') 39 | if bpr < 0.98: # threshold to recompute 40 | print('. Attempting to improve anchors, please wait...') 41 | na = m.anchor_grid.numel() // 2 # number of anchors 42 | new_anchors = kmean_anchors(dataset, n=na, img_size=imgsz, thr=thr, gen=1000, verbose=False) 43 | new_bpr = metric(new_anchors.reshape(-1, 2))[0] 44 | if new_bpr > bpr: # replace anchors 45 | new_anchors = torch.tensor(new_anchors, device=m.anchors.device).type_as(m.anchors) 46 | m.anchor_grid[:] = new_anchors.clone().view_as(m.anchor_grid) # for inference 47 | m.anchors[:] = new_anchors.clone().view_as(m.anchors) / m.stride.to(m.anchors.device).view(-1, 1, 1) # loss 48 | check_anchor_order(m) 49 | print('New anchors saved to model. Update model *.yaml to use these anchors in the future.') 50 | else: 51 | print('Original anchors better than new anchors. Proceeding with original anchors.') 52 | print('') # newline 53 | 54 | 55 | def kmean_anchors(path='./data/coco128.yaml', n=9, img_size=640, thr=4.0, gen=1000, verbose=True): 56 | """ Creates kmeans-evolved anchors from training dataset 57 | Arguments: 58 | path: path to dataset *.yaml, or a loaded dataset 59 | n: number of anchors 60 | img_size: image size used for training 61 | thr: anchor-label wh ratio threshold hyperparameter hyp['anchor_t'] used for training, default=4.0 62 | gen: generations to evolve anchors using genetic algorithm 63 | verbose: print all results 64 | Return: 65 | k: kmeans evolved anchors 66 | Usage: 67 | from utils.general import *; _ = kmean_anchors() 68 | """ 69 | thr = 1. / thr 70 | 71 | def metric(k, wh): # compute metrics 72 | r = wh[:, None] / k[None] 73 | x = torch.min(r, 1. / r).min(2)[0] # ratio metric 74 | # x = wh_iou(wh, torch.tensor(k)) # iou metric 75 | return x, x.max(1)[0] # x, best_x 76 | 77 | def anchor_fitness(k): # mutation fitness 78 | _, best = metric(torch.tensor(k, dtype=torch.float32), wh) 79 | return (best * (best > thr).float()).mean() # fitness 80 | 81 | def print_results(k): 82 | k = k[np.argsort(k.prod(1))] # sort small to large 83 | x, best = metric(k, wh0) 84 | bpr, aat = (best > thr).float().mean(), (x > thr).float().mean() * n # best possible recall, anch > thr 85 | print('thr=%.2f: %.4f best possible recall, %.2f anchors past thr' % (thr, bpr, aat)) 86 | print('n=%g, img_size=%s, metric_all=%.3f/%.3f-mean/best, past_thr=%.3f-mean: ' % 87 | (n, img_size, x.mean(), best.mean(), x[x > thr].mean()), end='') 88 | for i, x in enumerate(k): 89 | print('%i,%i' % (round(x[0]), round(x[1])), end=', ' if i < len(k) - 1 else '\n') # use in *.cfg 90 | return k 91 | 92 | if isinstance(path, str): # *.yaml file 93 | with open(path) as f: 94 | data_dict = yaml.load(f, Loader=yaml.FullLoader) # model dict 95 | from utils.datasets import LoadImagesAndLabels 96 | dataset = LoadImagesAndLabels(data_dict['train'], augment=True, rect=True) 97 | else: 98 | dataset = path # dataset 99 | 100 | # Get label wh 101 | shapes = img_size * dataset.shapes / dataset.shapes.max(1, keepdims=True) 102 | wh0 = np.concatenate([l[:, 3:5] * s for s, l in zip(shapes, dataset.labels)]) # wh 103 | 104 | # Filter 105 | i = (wh0 < 3.0).any(1).sum() 106 | if i: 107 | print('WARNING: Extremely small objects found. ' 108 | '%g of %g labels are < 3 pixels in width or height.' % (i, len(wh0))) 109 | wh = wh0[(wh0 >= 2.0).any(1)] # filter > 2 pixels 110 | 111 | # Kmeans calculation 112 | print('Running kmeans for %g anchors on %g points...' % (n, len(wh))) 113 | s = wh.std(0) # sigmas for whitening 114 | k, dist = kmeans(wh / s, n, iter=30) # points, mean distance 115 | k *= s 116 | wh = torch.tensor(wh, dtype=torch.float32) # filtered 117 | wh0 = torch.tensor(wh0, dtype=torch.float32) # unfiltered 118 | k = print_results(k) 119 | 120 | # Plot 121 | # k, d = [None] * 20, [None] * 20 122 | # for i in tqdm(range(1, 21)): 123 | # k[i-1], d[i-1] = kmeans(wh / s, i) # points, mean distance 124 | # fig, ax = plt.subplots(1, 2, figsize=(14, 7)) 125 | # ax = ax.ravel() 126 | # ax[0].plot(np.arange(1, 21), np.array(d) ** 2, marker='.') 127 | # fig, ax = plt.subplots(1, 2, figsize=(14, 7)) # plot wh 128 | # ax[0].hist(wh[wh[:, 0]<100, 0],400) 129 | # ax[1].hist(wh[wh[:, 1]<100, 1],400) 130 | # fig.tight_layout() 131 | # fig.savefig('wh.png', dpi=200) 132 | 133 | # Evolve 134 | npr = np.random 135 | f, sh, mp, s = anchor_fitness(k), k.shape, 0.9, 0.1 # fitness, generations, mutation prob, sigma 136 | pbar = tqdm(range(gen), desc='Evolving anchors with Genetic Algorithm') # progress bar 137 | for _ in pbar: 138 | v = np.ones(sh) 139 | while (v == 1).all(): # mutate until a change occurs (prevent duplicates) 140 | v = ((npr.random(sh) < mp) * npr.random() * npr.randn(*sh) * s + 1).clip(0.3, 3.0) 141 | kg = (k.copy() * v).clip(min=2.0) 142 | fg = anchor_fitness(kg) 143 | if fg > f: 144 | f, k = fg, kg.copy() 145 | pbar.desc = 'Evolving anchors with Genetic Algorithm: fitness = %.4f' % f 146 | if verbose: 147 | print_results(k) 148 | 149 | return print_results(k) 150 | -------------------------------------------------------------------------------- /utils/evolve.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | #for i in 0 1 2 3 3 | #do 4 | # t=ultralytics/yolov3:v139 && sudo docker pull $t && sudo nvidia-docker run -d --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t utils/evolve.sh $i 5 | # sleep 30 6 | #done 7 | 8 | while true; do 9 | # python3 train.py --data ../data/sm4/out.data --img-size 320 --epochs 100 --batch 64 --accum 1 --weights yolov3-tiny.conv.15 --multi --bucket ult/wer --evolve --cache --device $1 --cfg yolov3-tiny3-1cls.cfg --single --adam 10 | # python3 train.py --data ../out/data.data --img-size 608 --epochs 10 --batch 8 --accum 8 --weights ultralytics68.pt --multi --bucket ult/athena --evolve --device $1 --cfg yolov3-spp-1cls.cfg 11 | 12 | python3 train.py --data coco2014.data --img-size 512 608 --epochs 27 --batch 8 --accum 8 --evolve --weights '' --bucket ult/coco/sppa_512 --device $1 --cfg yolov3-sppa.cfg --multi 13 | done 14 | 15 | 16 | # coco epoch times --img-size 416 608 --epochs 27 --batch 16 --accum 4 17 | # 36:34 2080ti 18 | # 21:58 V100 19 | # 63:00 T4 -------------------------------------------------------------------------------- /utils/gcp.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | # New VM 4 | rm -rf sample_data yolov3 5 | git clone https://github.com/ultralytics/yolov3 6 | # git clone -b test --depth 1 https://github.com/ultralytics/yolov3 test # branch 7 | # sudo apt-get install zip 8 | #git clone https://github.com/NVIDIA/apex && cd apex && pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" . --user && cd .. && rm -rf apex 9 | sudo conda install -yc conda-forge scikit-image pycocotools 10 | # python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('193Zp_ye-3qXMonR1nZj3YyxMtQkMy50k','coco2014.zip')" 11 | python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('1WQT6SOktSe8Uw6r10-2JhbEhMY5DJaph','coco2017.zip')" 12 | python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('1C3HewOG9akA3y456SZLBJZfNDPkBwAto','knife.zip')" 13 | python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('13g3LqdpkNE8sPosVJT6KFXlfoMypzRP4','sm4.zip')" 14 | sudo shutdown 15 | 16 | # Mount local SSD 17 | lsblk 18 | sudo mkfs.ext4 -F /dev/nvme0n1 19 | sudo mkdir -p /mnt/disks/nvme0n1 20 | sudo mount /dev/nvme0n1 /mnt/disks/nvme0n1 21 | sudo chmod a+w /mnt/disks/nvme0n1 22 | cp -r coco /mnt/disks/nvme0n1 23 | 24 | # Kill All 25 | t=ultralytics/yolov3:v1 26 | docker kill $(docker ps -a -q --filter ancestor=$t) 27 | 28 | # Evolve coco 29 | sudo -s 30 | t=ultralytics/yolov3:evolve 31 | # docker kill $(docker ps -a -q --filter ancestor=$t) 32 | for i in 0 1 6 7 33 | do 34 | docker pull $t && docker run --gpus all -d --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t bash utils/evolve.sh $i 35 | sleep 30 36 | done 37 | 38 | #COCO training 39 | n=131 && t=ultralytics/coco:v131 && sudo docker pull $t && sudo docker run -it --gpus all --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t python3 train.py --data coco2014.data --img-size 320 640 --epochs 300 --batch 16 --weights '' --device 0 --cfg yolov3-spp.cfg --bucket ult/coco --name $n && sudo shutdown 40 | n=132 && t=ultralytics/coco:v131 && sudo docker pull $t && sudo docker run -it --gpus all --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t python3 train.py --data coco2014.data --img-size 320 640 --epochs 300 --batch 64 --weights '' --device 0 --cfg yolov3-tiny.cfg --bucket ult/coco --name $n && sudo shutdown 41 | -------------------------------------------------------------------------------- /utils/google_utils.py: -------------------------------------------------------------------------------- 1 | # Google utils: https://cloud.google.com/storage/docs/reference/libraries 2 | 3 | import os 4 | import platform 5 | import subprocess 6 | import time 7 | from pathlib import Path 8 | 9 | import torch 10 | import torch.nn as nn 11 | 12 | 13 | def gsutil_getsize(url=''): 14 | # gs://bucket/file size https://cloud.google.com/storage/docs/gsutil/commands/du 15 | s = subprocess.check_output('gsutil du %s' % url, shell=True).decode('utf-8') 16 | return eval(s.split(' ')[0]) if len(s) else 0 # bytes 17 | 18 | 19 | def attempt_download(weights): 20 | # Attempt to download pretrained weights if not found locally 21 | weights = weights.strip().replace("'", '') 22 | file = Path(weights).name 23 | 24 | msg = weights + ' missing, try downloading from https://github.com/WongKinYiu/ScaledYOLOv4/releases/' 25 | models = ['yolov4-csp.pt', 'yolov4-csp-x.pt'] # available models 26 | 27 | if file in models and not os.path.isfile(weights): 28 | 29 | try: # GitHub 30 | url = 'https://github.com/WongKinYiu/ScaledYOLOv4/releases/download/v1.0/' + file 31 | print('Downloading %s to %s...' % (url, weights)) 32 | torch.hub.download_url_to_file(url, weights) 33 | assert os.path.exists(weights) and os.path.getsize(weights) > 1E6 # check 34 | except Exception as e: # GCP 35 | print('ERROR: Download failure.') 36 | print('') 37 | 38 | 39 | def attempt_load(weights, map_location=None): 40 | # Loads an ensemble of models weights=[a,b,c] or a single model weights=[a] or weights=a 41 | model = Ensemble() 42 | for w in weights if isinstance(weights, list) else [weights]: 43 | attempt_download(w) 44 | model.append(torch.load(w, map_location=map_location)['model'].float().fuse().eval()) # load FP32 model 45 | 46 | if len(model) == 1: 47 | return model[-1] # return model 48 | else: 49 | print('Ensemble created with %s\n' % weights) 50 | for k in ['names', 'stride']: 51 | setattr(model, k, getattr(model[-1], k)) 52 | return model # return ensemble 53 | 54 | 55 | def gdrive_download(id='1n_oKgR81BJtqk75b00eAjdv03qVCQn2f', name='coco128.zip'): 56 | # Downloads a file from Google Drive. from utils.google_utils import *; gdrive_download() 57 | t = time.time() 58 | 59 | print('Downloading https://drive.google.com/uc?export=download&id=%s as %s... ' % (id, name), end='') 60 | os.remove(name) if os.path.exists(name) else None # remove existing 61 | os.remove('cookie') if os.path.exists('cookie') else None 62 | 63 | # Attempt file download 64 | out = "NUL" if platform.system() == "Windows" else "/dev/null" 65 | os.system('curl -c ./cookie -s -L "drive.google.com/uc?export=download&id=%s" > %s ' % (id, out)) 66 | if os.path.exists('cookie'): # large file 67 | s = 'curl -Lb ./cookie "drive.google.com/uc?export=download&confirm=%s&id=%s" -o %s' % (get_token(), id, name) 68 | else: # small file 69 | s = 'curl -s -L -o %s "drive.google.com/uc?export=download&id=%s"' % (name, id) 70 | r = os.system(s) # execute, capture return 71 | os.remove('cookie') if os.path.exists('cookie') else None 72 | 73 | # Error check 74 | if r != 0: 75 | os.remove(name) if os.path.exists(name) else None # remove partial 76 | print('Download error ') # raise Exception('Download error') 77 | return r 78 | 79 | # Unzip if archive 80 | if name.endswith('.zip'): 81 | print('unzipping... ', end='') 82 | os.system('unzip -q %s' % name) # unzip 83 | os.remove(name) # remove zip to free space 84 | 85 | print('Done (%.1fs)' % (time.time() - t)) 86 | return r 87 | 88 | 89 | def get_token(cookie="./cookie"): 90 | with open(cookie) as f: 91 | for line in f: 92 | if "download" in line: 93 | return line.split()[-1] 94 | return "" 95 | 96 | 97 | class Ensemble(nn.ModuleList): 98 | # Ensemble of models 99 | def __init__(self): 100 | super(Ensemble, self).__init__() 101 | 102 | def forward(self, x, augment=False): 103 | y = [] 104 | for module in self: 105 | y.append(module(x, augment)[0]) 106 | # y = torch.stack(y).max(0)[0] # max ensemble 107 | # y = torch.cat(y, 1) # nms ensemble 108 | y = torch.stack(y).mean(0) # mean ensemble 109 | return y, None # inference, train output 110 | 111 | 112 | # def upload_blob(bucket_name, source_file_name, destination_blob_name): 113 | # # Uploads a file to a bucket 114 | # # https://cloud.google.com/storage/docs/uploading-objects#storage-upload-object-python 115 | # 116 | # storage_client = storage.Client() 117 | # bucket = storage_client.get_bucket(bucket_name) 118 | # blob = bucket.blob(destination_blob_name) 119 | # 120 | # blob.upload_from_filename(source_file_name) 121 | # 122 | # print('File {} uploaded to {}.'.format( 123 | # source_file_name, 124 | # destination_blob_name)) 125 | # 126 | # 127 | # def download_blob(bucket_name, source_blob_name, destination_file_name): 128 | # # Uploads a blob from a bucket 129 | # storage_client = storage.Client() 130 | # bucket = storage_client.get_bucket(bucket_name) 131 | # blob = bucket.blob(source_blob_name) 132 | # 133 | # blob.download_to_filename(destination_file_name) 134 | # 135 | # print('Blob {} downloaded to {}.'.format( 136 | # source_blob_name, 137 | # destination_file_name)) 138 | -------------------------------------------------------------------------------- /utils/layers.py: -------------------------------------------------------------------------------- 1 | import torch.nn.functional as F 2 | 3 | from utils.general import * 4 | 5 | import torch 6 | from torch import nn 7 | 8 | try: 9 | from mish_cuda import MishCuda as Mish 10 | 11 | except: 12 | class Mish(nn.Module): # https://github.com/digantamisra98/Mish 13 | def forward(self, x): 14 | return x * F.softplus(x).tanh() 15 | 16 | 17 | class Reorg(nn.Module): 18 | def forward(self, x): 19 | return torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1) 20 | 21 | 22 | def make_divisible(v, divisor): 23 | # Function ensures all layers have a channel number that is divisible by 8 24 | # https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py 25 | return math.ceil(v / divisor) * divisor 26 | 27 | 28 | class Flatten(nn.Module): 29 | # Use after nn.AdaptiveAvgPool2d(1) to remove last 2 dimensions 30 | def forward(self, x): 31 | return x.view(x.size(0), -1) 32 | 33 | 34 | class Concat(nn.Module): 35 | # Concatenate a list of tensors along dimension 36 | def __init__(self, dimension=1): 37 | super(Concat, self).__init__() 38 | self.d = dimension 39 | 40 | def forward(self, x): 41 | return torch.cat(x, self.d) 42 | 43 | 44 | class FeatureConcat(nn.Module): 45 | def __init__(self, layers): 46 | super(FeatureConcat, self).__init__() 47 | self.layers = layers # layer indices 48 | self.multiple = len(layers) > 1 # multiple layers flag 49 | 50 | def forward(self, x, outputs): 51 | return torch.cat([outputs[i] for i in self.layers], 1) if self.multiple else outputs[self.layers[0]] 52 | 53 | 54 | class FeatureConcat2(nn.Module): 55 | def __init__(self, layers): 56 | super(FeatureConcat2, self).__init__() 57 | self.layers = layers # layer indices 58 | self.multiple = len(layers) > 1 # multiple layers flag 59 | 60 | def forward(self, x, outputs): 61 | return torch.cat([outputs[self.layers[0]], outputs[self.layers[1]].detach()], 1) 62 | 63 | 64 | class FeatureConcat3(nn.Module): 65 | def __init__(self, layers): 66 | super(FeatureConcat3, self).__init__() 67 | self.layers = layers # layer indices 68 | self.multiple = len(layers) > 1 # multiple layers flag 69 | 70 | def forward(self, x, outputs): 71 | return torch.cat([outputs[self.layers[0]], outputs[self.layers[1]].detach(), outputs[self.layers[2]].detach()], 1) 72 | 73 | 74 | class FeatureConcat_l(nn.Module): 75 | def __init__(self, layers): 76 | super(FeatureConcat_l, self).__init__() 77 | self.layers = layers # layer indices 78 | self.multiple = len(layers) > 1 # multiple layers flag 79 | 80 | def forward(self, x, outputs): 81 | return torch.cat([outputs[i][:,:outputs[i].shape[1]//2,:,:] for i in self.layers], 1) if self.multiple else outputs[self.layers[0]][:,:outputs[self.layers[0]].shape[1]//2,:,:] 82 | 83 | 84 | class WeightedFeatureFusion(nn.Module): # weighted sum of 2 or more layers https://arxiv.org/abs/1911.09070 85 | def __init__(self, layers, weight=False): 86 | super(WeightedFeatureFusion, self).__init__() 87 | self.layers = layers # layer indices 88 | self.weight = weight # apply weights boolean 89 | self.n = len(layers) + 1 # number of layers 90 | if weight: 91 | self.w = nn.Parameter(torch.zeros(self.n), requires_grad=True) # layer weights 92 | 93 | def forward(self, x, outputs): 94 | # Weights 95 | if self.weight: 96 | w = torch.sigmoid(self.w) * (2 / self.n) # sigmoid weights (0-1) 97 | x = x * w[0] 98 | 99 | # Fusion 100 | nx = x.shape[1] # input channels 101 | for i in range(self.n - 1): 102 | a = outputs[self.layers[i]] * w[i + 1] if self.weight else outputs[self.layers[i]] # feature to add 103 | na = a.shape[1] # feature channels 104 | 105 | # Adjust channels 106 | if nx == na: # same shape 107 | x = x + a 108 | elif nx > na: # slice input 109 | x[:, :na] = x[:, :na] + a # or a = nn.ZeroPad2d((0, 0, 0, 0, 0, dc))(a); x = x + a 110 | else: # slice feature 111 | x = x + a[:, :nx] 112 | 113 | return x 114 | 115 | 116 | class MixConv2d(nn.Module): # MixConv: Mixed Depthwise Convolutional Kernels https://arxiv.org/abs/1907.09595 117 | def __init__(self, in_ch, out_ch, k=(3, 5, 7), stride=1, dilation=1, bias=True, method='equal_params'): 118 | super(MixConv2d, self).__init__() 119 | 120 | groups = len(k) 121 | if method == 'equal_ch': # equal channels per group 122 | i = torch.linspace(0, groups - 1E-6, out_ch).floor() # out_ch indices 123 | ch = [(i == g).sum() for g in range(groups)] 124 | else: # 'equal_params': equal parameter count per group 125 | b = [out_ch] + [0] * groups 126 | a = np.eye(groups + 1, groups, k=-1) 127 | a -= np.roll(a, 1, axis=1) 128 | a *= np.array(k) ** 2 129 | a[0] = 1 130 | ch = np.linalg.lstsq(a, b, rcond=None)[0].round().astype(int) # solve for equal weight indices, ax = b 131 | 132 | self.m = nn.ModuleList([nn.Conv2d(in_channels=in_ch, 133 | out_channels=ch[g], 134 | kernel_size=k[g], 135 | stride=stride, 136 | padding=k[g] // 2, # 'same' pad 137 | dilation=dilation, 138 | bias=bias) for g in range(groups)]) 139 | 140 | def forward(self, x): 141 | return torch.cat([m(x) for m in self.m], 1) 142 | 143 | 144 | # Activation functions below ------------------------------------------------------------------------------------------- 145 | class SwishImplementation(torch.autograd.Function): 146 | @staticmethod 147 | def forward(ctx, x): 148 | ctx.save_for_backward(x) 149 | return x * torch.sigmoid(x) 150 | 151 | @staticmethod 152 | def backward(ctx, grad_output): 153 | x = ctx.saved_tensors[0] 154 | sx = torch.sigmoid(x) # sigmoid(ctx) 155 | return grad_output * (sx * (1 + x * (1 - sx))) 156 | 157 | 158 | class MishImplementation(torch.autograd.Function): 159 | @staticmethod 160 | def forward(ctx, x): 161 | ctx.save_for_backward(x) 162 | return x.mul(torch.tanh(F.softplus(x))) # x * tanh(ln(1 + exp(x))) 163 | 164 | @staticmethod 165 | def backward(ctx, grad_output): 166 | x = ctx.saved_tensors[0] 167 | sx = torch.sigmoid(x) 168 | fx = F.softplus(x).tanh() 169 | return grad_output * (fx + x * sx * (1 - fx * fx)) 170 | 171 | 172 | class MemoryEfficientSwish(nn.Module): 173 | def forward(self, x): 174 | return SwishImplementation.apply(x) 175 | 176 | 177 | class MemoryEfficientMish(nn.Module): 178 | def forward(self, x): 179 | return MishImplementation.apply(x) 180 | 181 | 182 | class Swish(nn.Module): 183 | def forward(self, x): 184 | return x * torch.sigmoid(x) 185 | 186 | 187 | class HardSwish(nn.Module): # https://arxiv.org/pdf/1905.02244.pdf 188 | def forward(self, x): 189 | return x * F.hardtanh(x + 3, 0., 6., True) / 6. 190 | 191 | 192 | class DeformConv2d(nn.Module): 193 | def __init__(self, inc, outc, kernel_size=3, padding=1, stride=1, bias=None, modulation=False): 194 | """ 195 | Args: 196 | modulation (bool, optional): If True, Modulated Defomable Convolution (Deformable ConvNets v2). 197 | """ 198 | super(DeformConv2d, self).__init__() 199 | self.kernel_size = kernel_size 200 | self.padding = padding 201 | self.stride = stride 202 | self.zero_padding = nn.ZeroPad2d(padding) 203 | self.conv = nn.Conv2d(inc, outc, kernel_size=kernel_size, stride=kernel_size, bias=bias) 204 | 205 | self.p_conv = nn.Conv2d(inc, 2*kernel_size*kernel_size, kernel_size=3, padding=1, stride=stride) 206 | nn.init.constant_(self.p_conv.weight, 0) 207 | self.p_conv.register_backward_hook(self._set_lr) 208 | 209 | self.modulation = modulation 210 | if modulation: 211 | self.m_conv = nn.Conv2d(inc, kernel_size*kernel_size, kernel_size=3, padding=1, stride=stride) 212 | nn.init.constant_(self.m_conv.weight, 0) 213 | self.m_conv.register_backward_hook(self._set_lr) 214 | 215 | @staticmethod 216 | def _set_lr(module, grad_input, grad_output): 217 | grad_input = (grad_input[i] * 0.1 for i in range(len(grad_input))) 218 | grad_output = (grad_output[i] * 0.1 for i in range(len(grad_output))) 219 | 220 | def forward(self, x): 221 | offset = self.p_conv(x) 222 | if self.modulation: 223 | m = torch.sigmoid(self.m_conv(x)) 224 | 225 | dtype = offset.data.type() 226 | ks = self.kernel_size 227 | N = offset.size(1) // 2 228 | 229 | if self.padding: 230 | x = self.zero_padding(x) 231 | 232 | # (b, 2N, h, w) 233 | p = self._get_p(offset, dtype) 234 | 235 | # (b, h, w, 2N) 236 | p = p.contiguous().permute(0, 2, 3, 1) 237 | q_lt = p.detach().floor() 238 | q_rb = q_lt + 1 239 | 240 | q_lt = torch.cat([torch.clamp(q_lt[..., :N], 0, x.size(2)-1), torch.clamp(q_lt[..., N:], 0, x.size(3)-1)], dim=-1).long() 241 | q_rb = torch.cat([torch.clamp(q_rb[..., :N], 0, x.size(2)-1), torch.clamp(q_rb[..., N:], 0, x.size(3)-1)], dim=-1).long() 242 | q_lb = torch.cat([q_lt[..., :N], q_rb[..., N:]], dim=-1) 243 | q_rt = torch.cat([q_rb[..., :N], q_lt[..., N:]], dim=-1) 244 | 245 | # clip p 246 | p = torch.cat([torch.clamp(p[..., :N], 0, x.size(2)-1), torch.clamp(p[..., N:], 0, x.size(3)-1)], dim=-1) 247 | 248 | # bilinear kernel (b, h, w, N) 249 | g_lt = (1 + (q_lt[..., :N].type_as(p) - p[..., :N])) * (1 + (q_lt[..., N:].type_as(p) - p[..., N:])) 250 | g_rb = (1 - (q_rb[..., :N].type_as(p) - p[..., :N])) * (1 - (q_rb[..., N:].type_as(p) - p[..., N:])) 251 | g_lb = (1 + (q_lb[..., :N].type_as(p) - p[..., :N])) * (1 - (q_lb[..., N:].type_as(p) - p[..., N:])) 252 | g_rt = (1 - (q_rt[..., :N].type_as(p) - p[..., :N])) * (1 + (q_rt[..., N:].type_as(p) - p[..., N:])) 253 | 254 | # (b, c, h, w, N) 255 | x_q_lt = self._get_x_q(x, q_lt, N) 256 | x_q_rb = self._get_x_q(x, q_rb, N) 257 | x_q_lb = self._get_x_q(x, q_lb, N) 258 | x_q_rt = self._get_x_q(x, q_rt, N) 259 | 260 | # (b, c, h, w, N) 261 | x_offset = g_lt.unsqueeze(dim=1) * x_q_lt + \ 262 | g_rb.unsqueeze(dim=1) * x_q_rb + \ 263 | g_lb.unsqueeze(dim=1) * x_q_lb + \ 264 | g_rt.unsqueeze(dim=1) * x_q_rt 265 | 266 | # modulation 267 | if self.modulation: 268 | m = m.contiguous().permute(0, 2, 3, 1) 269 | m = m.unsqueeze(dim=1) 270 | m = torch.cat([m for _ in range(x_offset.size(1))], dim=1) 271 | x_offset *= m 272 | 273 | x_offset = self._reshape_x_offset(x_offset, ks) 274 | out = self.conv(x_offset) 275 | 276 | return out 277 | 278 | def _get_p_n(self, N, dtype): 279 | p_n_x, p_n_y = torch.meshgrid( 280 | torch.arange(-(self.kernel_size-1)//2, (self.kernel_size-1)//2+1), 281 | torch.arange(-(self.kernel_size-1)//2, (self.kernel_size-1)//2+1)) 282 | # (2N, 1) 283 | p_n = torch.cat([torch.flatten(p_n_x), torch.flatten(p_n_y)], 0) 284 | p_n = p_n.view(1, 2*N, 1, 1).type(dtype) 285 | 286 | return p_n 287 | 288 | def _get_p_0(self, h, w, N, dtype): 289 | p_0_x, p_0_y = torch.meshgrid( 290 | torch.arange(1, h*self.stride+1, self.stride), 291 | torch.arange(1, w*self.stride+1, self.stride)) 292 | p_0_x = torch.flatten(p_0_x).view(1, 1, h, w).repeat(1, N, 1, 1) 293 | p_0_y = torch.flatten(p_0_y).view(1, 1, h, w).repeat(1, N, 1, 1) 294 | p_0 = torch.cat([p_0_x, p_0_y], 1).type(dtype) 295 | 296 | return p_0 297 | 298 | def _get_p(self, offset, dtype): 299 | N, h, w = offset.size(1)//2, offset.size(2), offset.size(3) 300 | 301 | # (1, 2N, 1, 1) 302 | p_n = self._get_p_n(N, dtype) 303 | # (1, 2N, h, w) 304 | p_0 = self._get_p_0(h, w, N, dtype) 305 | p = p_0 + p_n + offset 306 | return p 307 | 308 | def _get_x_q(self, x, q, N): 309 | b, h, w, _ = q.size() 310 | padded_w = x.size(3) 311 | c = x.size(1) 312 | # (b, c, h*w) 313 | x = x.contiguous().view(b, c, -1) 314 | 315 | # (b, h, w, N) 316 | index = q[..., :N]*padded_w + q[..., N:] # offset_x*w + offset_y 317 | # (b, c, h*w*N) 318 | index = index.contiguous().unsqueeze(dim=1).expand(-1, c, -1, -1, -1).contiguous().view(b, c, -1) 319 | 320 | x_offset = x.gather(dim=-1, index=index).contiguous().view(b, c, h, w, N) 321 | 322 | return x_offset 323 | 324 | @staticmethod 325 | def _reshape_x_offset(x_offset, ks): 326 | b, c, h, w, N = x_offset.size() 327 | x_offset = torch.cat([x_offset[..., s:s+ks].contiguous().view(b, c, h, w*ks) for s in range(0, N, ks)], dim=-1) 328 | x_offset = x_offset.contiguous().view(b, c, h*ks, w*ks) 329 | 330 | return x_offset 331 | 332 | 333 | class GAP(nn.Module): 334 | def __init__(self): 335 | super(GAP, self).__init__() 336 | self.avg_pool = nn.AdaptiveAvgPool2d(1) 337 | def forward(self, x): 338 | #b, c, _, _ = x.size() 339 | return self.avg_pool(x)#.view(b, c) 340 | 341 | 342 | class Silence(nn.Module): 343 | def __init__(self): 344 | super(Silence, self).__init__() 345 | def forward(self, x): 346 | return x 347 | 348 | 349 | class ScaleChannel(nn.Module): # weighted sum of 2 or more layers https://arxiv.org/abs/1911.09070 350 | def __init__(self, layers): 351 | super(ScaleChannel, self).__init__() 352 | self.layers = layers # layer indices 353 | 354 | def forward(self, x, outputs): 355 | a = outputs[self.layers[0]] 356 | return x.expand_as(a) * a 357 | 358 | 359 | class ScaleSpatial(nn.Module): # weighted sum of 2 or more layers https://arxiv.org/abs/1911.09070 360 | def __init__(self, layers): 361 | super(ScaleSpatial, self).__init__() 362 | self.layers = layers # layer indices 363 | 364 | def forward(self, x, outputs): 365 | a = outputs[self.layers[0]] 366 | return x * a 367 | -------------------------------------------------------------------------------- /utils/loss.py: -------------------------------------------------------------------------------- 1 | # Loss functions 2 | 3 | import torch 4 | import torch.nn as nn 5 | 6 | from utils.general import bbox_iou 7 | from utils.torch_utils import is_parallel 8 | 9 | 10 | def smooth_BCE(eps=0.1): # https://github.com/ultralytics/yolov3/issues/238#issuecomment-598028441 11 | # return positive, negative label smoothing BCE targets 12 | return 1.0 - 0.5 * eps, 0.5 * eps 13 | 14 | 15 | class BCEBlurWithLogitsLoss(nn.Module): 16 | # BCEwithLogitLoss() with reduced missing label effects. 17 | def __init__(self, alpha=0.05): 18 | super(BCEBlurWithLogitsLoss, self).__init__() 19 | self.loss_fcn = nn.BCEWithLogitsLoss(reduction='none') # must be nn.BCEWithLogitsLoss() 20 | self.alpha = alpha 21 | 22 | def forward(self, pred, true): 23 | loss = self.loss_fcn(pred, true) 24 | pred = torch.sigmoid(pred) # prob from logits 25 | dx = pred - true # reduce only missing label effects 26 | # dx = (pred - true).abs() # reduce missing label and false label effects 27 | alpha_factor = 1 - torch.exp((dx - 1) / (self.alpha + 1e-4)) 28 | loss *= alpha_factor 29 | return loss.mean() 30 | 31 | 32 | class FocalLoss(nn.Module): 33 | # Wraps focal loss around existing loss_fcn(), i.e. criteria = FocalLoss(nn.BCEWithLogitsLoss(), gamma=1.5) 34 | def __init__(self, loss_fcn, gamma=1.5, alpha=0.25): 35 | super(FocalLoss, self).__init__() 36 | self.loss_fcn = loss_fcn # must be nn.BCEWithLogitsLoss() 37 | self.gamma = gamma 38 | self.alpha = alpha 39 | self.reduction = loss_fcn.reduction 40 | self.loss_fcn.reduction = 'none' # required to apply FL to each element 41 | 42 | def forward(self, pred, true): 43 | loss = self.loss_fcn(pred, true) 44 | # p_t = torch.exp(-loss) 45 | # loss *= self.alpha * (1.000001 - p_t) ** self.gamma # non-zero power for gradient stability 46 | 47 | # TF implementation https://github.com/tensorflow/addons/blob/v0.7.1/tensorflow_addons/losses/focal_loss.py 48 | pred_prob = torch.sigmoid(pred) # prob from logits 49 | p_t = true * pred_prob + (1 - true) * (1 - pred_prob) 50 | alpha_factor = true * self.alpha + (1 - true) * (1 - self.alpha) 51 | modulating_factor = (1.0 - p_t) ** self.gamma 52 | loss *= alpha_factor * modulating_factor 53 | 54 | if self.reduction == 'mean': 55 | return loss.mean() 56 | elif self.reduction == 'sum': 57 | return loss.sum() 58 | else: # 'none' 59 | return loss 60 | 61 | 62 | def compute_loss(p, targets, model): # predictions, targets, model 63 | device = targets.device 64 | #print(device) 65 | lcls, lbox, lobj = torch.zeros(1, device=device), torch.zeros(1, device=device), torch.zeros(1, device=device) 66 | tcls, tbox, indices, anchors = build_targets(p, targets, model) # targets 67 | h = model.hyp # hyperparameters 68 | 69 | # Define criteria 70 | BCEcls = nn.BCEWithLogitsLoss(pos_weight=torch.Tensor([h['cls_pw']])).to(device) 71 | BCEobj = nn.BCEWithLogitsLoss(pos_weight=torch.Tensor([h['obj_pw']])).to(device) 72 | 73 | # Class label smoothing https://arxiv.org/pdf/1902.04103.pdf eqn 3 74 | cp, cn = smooth_BCE(eps=0.0) 75 | 76 | # Focal loss 77 | g = h['fl_gamma'] # focal loss gamma 78 | if g > 0: 79 | BCEcls, BCEobj = FocalLoss(BCEcls, g), FocalLoss(BCEobj, g) 80 | 81 | # Losses 82 | nt = 0 # number of targets 83 | no = len(p) # number of outputs 84 | balance = [4.0, 1.0, 0.4] if no == 3 else [4.0, 1.0, 0.4, 0.1] # P3-5 or P3-6 85 | balance = [4.0, 1.0, 0.5, 0.4, 0.1] if no == 5 else balance 86 | for i, pi in enumerate(p): # layer index, layer predictions 87 | b, a, gj, gi = indices[i] # image, anchor, gridy, gridx 88 | tobj = torch.zeros_like(pi[..., 0], device=device) # target obj 89 | 90 | n = b.shape[0] # number of targets 91 | if n: 92 | nt += n # cumulative targets 93 | ps = pi[b, a, gj, gi] # prediction subset corresponding to targets 94 | 95 | # Regression 96 | pxy = ps[:, :2].sigmoid() * 2. - 0.5 97 | pwh = (ps[:, 2:4].sigmoid() * 2) ** 2 * anchors[i] 98 | pbox = torch.cat((pxy, pwh), 1).to(device) # predicted box 99 | iou = bbox_iou(pbox.T, tbox[i], x1y1x2y2=False, CIoU=True) # iou(prediction, target) 100 | lbox += (1.0 - iou).mean() # iou loss 101 | 102 | # Objectness 103 | tobj[b, a, gj, gi] = (1.0 - model.gr) + model.gr * iou.detach().clamp(0).type(tobj.dtype) # iou ratio 104 | 105 | # Classification 106 | if model.nc > 1: # cls loss (only if multiple classes) 107 | t = torch.full_like(ps[:, 5:], cn, device=device) # targets 108 | t[range(n), tcls[i]] = cp 109 | lcls += BCEcls(ps[:, 5:], t) # BCE 110 | 111 | # Append targets to text file 112 | # with open('targets.txt', 'a') as file: 113 | # [file.write('%11.5g ' * 4 % tuple(x) + '\n') for x in torch.cat((txy[i], twh[i]), 1)] 114 | 115 | lobj += BCEobj(pi[..., 4], tobj) * balance[i] # obj loss 116 | 117 | s = 3 / no # output count scaling 118 | lbox *= h['box'] * s 119 | lobj *= h['obj'] * s * (1.4 if no >= 4 else 1.) 120 | lcls *= h['cls'] * s 121 | bs = tobj.shape[0] # batch size 122 | 123 | loss = lbox + lobj + lcls 124 | return loss * bs, torch.cat((lbox, lobj, lcls, loss)).detach() 125 | 126 | 127 | def build_targets(p, targets, model): 128 | nt = targets.shape[0] # number of anchors, targets 129 | tcls, tbox, indices, anch = [], [], [], [] 130 | gain = torch.ones(6, device=targets.device) # normalized to gridspace gain 131 | off = torch.tensor([[1, 0], [0, 1], [-1, 0], [0, -1]], device=targets.device).float() # overlap offsets 132 | 133 | g = 0.5 # offset 134 | multi_gpu = is_parallel(model) 135 | for i, jj in enumerate(model.module.yolo_layers if multi_gpu else model.yolo_layers): 136 | # get number of grid points and anchor vec for this yolo layer 137 | anchors = model.module.module_list[jj].anchor_vec if multi_gpu else model.module_list[jj].anchor_vec 138 | gain[2:] = torch.tensor(p[i].shape)[[3, 2, 3, 2]] # xyxy gain 139 | 140 | # Match targets to anchors 141 | a, t, offsets = [], targets * gain, 0 142 | if nt: 143 | na = anchors.shape[0] # number of anchors 144 | at = torch.arange(na).view(na, 1).repeat(1, nt) # anchor tensor, same as .repeat_interleave(nt) 145 | r = t[None, :, 4:6] / anchors[:, None] # wh ratio 146 | j = torch.max(r, 1. / r).max(2)[0] < model.hyp['anchor_t'] # compare 147 | # j = wh_iou(anchors, t[:, 4:6]) > model.hyp['iou_t'] # iou(3,n) = wh_iou(anchors(3,2), gwh(n,2)) 148 | a, t = at[j], t.repeat(na, 1, 1)[j] # filter 149 | 150 | # overlaps 151 | gxy = t[:, 2:4] # grid xy 152 | z = torch.zeros_like(gxy) 153 | j, k = ((gxy % 1. < g) & (gxy > 1.)).T 154 | l, m = ((gxy % 1. > (1 - g)) & (gxy < (gain[[2, 3]] - 1.))).T 155 | a, t = torch.cat((a, a[j], a[k], a[l], a[m]), 0), torch.cat((t, t[j], t[k], t[l], t[m]), 0) 156 | offsets = torch.cat((z, z[j] + off[0], z[k] + off[1], z[l] + off[2], z[m] + off[3]), 0) * g 157 | 158 | # Define 159 | b, c = t[:, :2].long().T # image, class 160 | gxy = t[:, 2:4] # grid xy 161 | gwh = t[:, 4:6] # grid wh 162 | gij = (gxy - offsets).long() 163 | gi, gj = gij.T # grid xy indices 164 | 165 | # Append 166 | #indices.append((b, a, gj, gi)) # image, anchor, grid indices 167 | indices.append((b, a, gj.clamp_(0, gain[3] - 1), gi.clamp_(0, gain[2] - 1))) # image, anchor, grid indices 168 | tbox.append(torch.cat((gxy - gij, gwh), 1)) # box 169 | anch.append(anchors[a]) # anchors 170 | tcls.append(c) # class 171 | 172 | return tcls, tbox, indices, anch 173 | -------------------------------------------------------------------------------- /utils/metrics.py: -------------------------------------------------------------------------------- 1 | # Model validation metrics 2 | 3 | import matplotlib.pyplot as plt 4 | import numpy as np 5 | 6 | 7 | def fitness(x): 8 | # Model fitness as a weighted combination of metrics 9 | w = [0.0, 0.0, 0.1, 0.9] # weights for [P, R, mAP@0.5, mAP@0.5:0.95] 10 | return (x[:, :4] * w).sum(1) 11 | 12 | 13 | def fitness_p(x): 14 | # Model fitness as a weighted combination of metrics 15 | w = [1.0, 0.0, 0.0, 0.0] # weights for [P, R, mAP@0.5, mAP@0.5:0.95] 16 | return (x[:, :4] * w).sum(1) 17 | 18 | 19 | def fitness_r(x): 20 | # Model fitness as a weighted combination of metrics 21 | w = [0.0, 1.0, 0.0, 0.0] # weights for [P, R, mAP@0.5, mAP@0.5:0.95] 22 | return (x[:, :4] * w).sum(1) 23 | 24 | 25 | def fitness_ap50(x): 26 | # Model fitness as a weighted combination of metrics 27 | w = [0.0, 0.0, 1.0, 0.0] # weights for [P, R, mAP@0.5, mAP@0.5:0.95] 28 | return (x[:, :4] * w).sum(1) 29 | 30 | 31 | def fitness_ap(x): 32 | # Model fitness as a weighted combination of metrics 33 | w = [0.0, 0.0, 0.0, 1.0] # weights for [P, R, mAP@0.5, mAP@0.5:0.95] 34 | return (x[:, :4] * w).sum(1) 35 | 36 | 37 | def fitness_f(x): 38 | # Model fitness as a weighted combination of metrics 39 | #w = [0.0, 0.0, 0.0, 1.0] # weights for [P, R, mAP@0.5, mAP@0.5:0.95] 40 | return ((x[:, 0]*x[:, 1])/(x[:, 0]+x[:, 1])) 41 | 42 | 43 | def ap_per_class(tp, conf, pred_cls, target_cls, plot=False, fname='precision-recall_curve.png'): 44 | """ Compute the average precision, given the recall and precision curves. 45 | Source: https://github.com/rafaelpadilla/Object-Detection-Metrics. 46 | # Arguments 47 | tp: True positives (nparray, nx1 or nx10). 48 | conf: Objectness value from 0-1 (nparray). 49 | pred_cls: Predicted object classes (nparray). 50 | target_cls: True object classes (nparray). 51 | plot: Plot precision-recall curve at mAP@0.5 52 | fname: Plot filename 53 | # Returns 54 | The average precision as computed in py-faster-rcnn. 55 | """ 56 | 57 | # Sort by objectness 58 | i = np.argsort(-conf) 59 | tp, conf, pred_cls = tp[i], conf[i], pred_cls[i] 60 | 61 | # Find unique classes 62 | unique_classes = np.unique(target_cls) 63 | 64 | # Create Precision-Recall curve and compute AP for each class 65 | px, py = np.linspace(0, 1, 1000), [] # for plotting 66 | pr_score = 0.1 # score to evaluate P and R https://github.com/ultralytics/yolov3/issues/898 67 | s = [unique_classes.shape[0], tp.shape[1]] # number class, number iou thresholds (i.e. 10 for mAP0.5...0.95) 68 | ap, p, r = np.zeros(s), np.zeros(s), np.zeros(s) 69 | for ci, c in enumerate(unique_classes): 70 | i = pred_cls == c 71 | n_l = (target_cls == c).sum() # number of labels 72 | n_p = i.sum() # number of predictions 73 | 74 | if n_p == 0 or n_l == 0: 75 | continue 76 | else: 77 | # Accumulate FPs and TPs 78 | fpc = (1 - tp[i]).cumsum(0) 79 | tpc = tp[i].cumsum(0) 80 | 81 | # Recall 82 | recall = tpc / (n_l + 1e-16) # recall curve 83 | r[ci] = np.interp(-pr_score, -conf[i], recall[:, 0]) # r at pr_score, negative x, xp because xp decreases 84 | 85 | # Precision 86 | precision = tpc / (tpc + fpc) # precision curve 87 | p[ci] = np.interp(-pr_score, -conf[i], precision[:, 0]) # p at pr_score 88 | 89 | # AP from recall-precision curve 90 | for j in range(tp.shape[1]): 91 | ap[ci, j], mpre, mrec = compute_ap(recall[:, j], precision[:, j]) 92 | if j == 0: 93 | py.append(np.interp(px, mrec, mpre)) # precision at mAP@0.5 94 | 95 | # Compute F1 score (harmonic mean of precision and recall) 96 | f1 = 2 * p * r / (p + r + 1e-16) 97 | 98 | if plot: 99 | py = np.stack(py, axis=1) 100 | fig, ax = plt.subplots(1, 1, figsize=(5, 5)) 101 | ax.plot(px, py, linewidth=0.5, color='grey') # plot(recall, precision) 102 | ax.plot(px, py.mean(1), linewidth=2, color='blue', label='all classes %.3f mAP@0.5' % ap[:, 0].mean()) 103 | ax.set_xlabel('Recall') 104 | ax.set_ylabel('Precision') 105 | ax.set_xlim(0, 1) 106 | ax.set_ylim(0, 1) 107 | plt.legend() 108 | fig.tight_layout() 109 | fig.savefig(fname, dpi=200) 110 | 111 | return p, r, ap, f1, unique_classes.astype('int32') 112 | 113 | 114 | def compute_ap(recall, precision): 115 | """ Compute the average precision, given the recall and precision curves. 116 | Source: https://github.com/rbgirshick/py-faster-rcnn. 117 | # Arguments 118 | recall: The recall curve (list). 119 | precision: The precision curve (list). 120 | # Returns 121 | The average precision as computed in py-faster-rcnn. 122 | """ 123 | 124 | # Append sentinel values to beginning and end 125 | mrec = recall # np.concatenate(([0.], recall, [recall[-1] + 1E-3])) 126 | mpre = precision # np.concatenate(([0.], precision, [0.])) 127 | 128 | # Compute the precision envelope 129 | mpre = np.flip(np.maximum.accumulate(np.flip(mpre))) 130 | 131 | # Integrate area under curve 132 | method = 'interp' # methods: 'continuous', 'interp' 133 | if method == 'interp': 134 | x = np.linspace(0, 1, 101) # 101-point interp (COCO) 135 | ap = np.trapz(np.interp(x, mrec, mpre), x) # integrate 136 | else: # 'continuous' 137 | i = np.where(mrec[1:] != mrec[:-1])[0] # points where x axis (recall) changes 138 | ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1]) # area under curve 139 | 140 | return ap, mpre, mrec 141 | -------------------------------------------------------------------------------- /utils/parse_config.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | import numpy as np 4 | 5 | 6 | def parse_model_cfg(path): 7 | # Parse the yolo *.cfg file and return module definitions path may be 'cfg/yolov3.cfg', 'yolov3.cfg', or 'yolov3' 8 | if not path.endswith('.cfg'): # add .cfg suffix if omitted 9 | path += '.cfg' 10 | if not os.path.exists(path) and os.path.exists('cfg' + os.sep + path): # add cfg/ prefix if omitted 11 | path = 'cfg' + os.sep + path 12 | 13 | with open(path, 'r') as f: 14 | lines = f.read().split('\n') 15 | lines = [x for x in lines if x and not x.startswith('#')] 16 | lines = [x.rstrip().lstrip() for x in lines] # get rid of fringe whitespaces 17 | mdefs = [] # module definitions 18 | for line in lines: 19 | if line.startswith('['): # This marks the start of a new block 20 | mdefs.append({}) 21 | mdefs[-1]['type'] = line[1:-1].rstrip() 22 | if mdefs[-1]['type'] == 'convolutional': 23 | mdefs[-1]['batch_normalize'] = 0 # pre-populate with zeros (may be overwritten later) 24 | 25 | else: 26 | key, val = line.split("=") 27 | key = key.rstrip() 28 | 29 | if key == 'anchors': # return nparray 30 | mdefs[-1][key] = np.array([float(x) for x in val.split(',')]).reshape((-1, 2)) # np anchors 31 | elif (key in ['from', 'layers', 'mask']) or (key == 'size' and ',' in val): # return array 32 | mdefs[-1][key] = [int(x) for x in val.split(',')] 33 | else: 34 | val = val.strip() 35 | if val.isnumeric(): # return int or float 36 | mdefs[-1][key] = int(val) if (int(val) - float(val)) == 0 else float(val) 37 | else: 38 | mdefs[-1][key] = val # return string 39 | 40 | # Check all fields are supported 41 | supported = ['type', 'batch_normalize', 'filters', 'size', 'stride', 'pad', 'activation', 'layers', 'groups', 42 | 'from', 'mask', 'anchors', 'classes', 'num', 'jitter', 'ignore_thresh', 'truth_thresh', 'random', 43 | 'stride_x', 'stride_y', 'weights_type', 'weights_normalization', 'scale_x_y', 'beta_nms', 'nms_kind', 44 | 'iou_loss', 'iou_normalizer', 'cls_normalizer', 'iou_thresh', 'atoms', 'na', 'nc'] 45 | 46 | f = [] # fields 47 | for x in mdefs[1:]: 48 | [f.append(k) for k in x if k not in f] 49 | u = [x for x in f if x not in supported] # unsupported fields 50 | assert not any(u), "Unsupported fields %s in %s. See https://github.com/ultralytics/yolov3/issues/631" % (u, path) 51 | 52 | return mdefs 53 | 54 | 55 | def parse_data_cfg(path): 56 | # Parses the data configuration file 57 | if not os.path.exists(path) and os.path.exists('data' + os.sep + path): # add data/ prefix if omitted 58 | path = 'data' + os.sep + path 59 | 60 | with open(path, 'r') as f: 61 | lines = f.readlines() 62 | 63 | options = dict() 64 | for line in lines: 65 | line = line.strip() 66 | if line == '' or line.startswith('#'): 67 | continue 68 | key, val = line.split('=') 69 | options[key.strip()] = val.strip() 70 | 71 | return options 72 | -------------------------------------------------------------------------------- /utils/plots.py: -------------------------------------------------------------------------------- 1 | # Plotting utils 2 | 3 | import glob 4 | import math 5 | import os 6 | import random 7 | from copy import copy 8 | from pathlib import Path 9 | 10 | import cv2 11 | import matplotlib 12 | import matplotlib.pyplot as plt 13 | import numpy as np 14 | import torch 15 | import yaml 16 | from PIL import Image 17 | from scipy.signal import butter, filtfilt 18 | 19 | from utils.general import xywh2xyxy, xyxy2xywh 20 | from utils.metrics import fitness 21 | 22 | # Settings 23 | matplotlib.use('Agg') # for writing to files only 24 | 25 | 26 | def color_list(): 27 | # Return first 10 plt colors as (r,g,b) https://stackoverflow.com/questions/51350872/python-from-color-name-to-rgb 28 | def hex2rgb(h): 29 | return tuple(int(h[1 + i:1 + i + 2], 16) for i in (0, 2, 4)) 30 | 31 | return [hex2rgb(h) for h in plt.rcParams['axes.prop_cycle'].by_key()['color']] 32 | 33 | 34 | def hist2d(x, y, n=100): 35 | # 2d histogram used in labels.png and evolve.png 36 | xedges, yedges = np.linspace(x.min(), x.max(), n), np.linspace(y.min(), y.max(), n) 37 | hist, xedges, yedges = np.histogram2d(x, y, (xedges, yedges)) 38 | xidx = np.clip(np.digitize(x, xedges) - 1, 0, hist.shape[0] - 1) 39 | yidx = np.clip(np.digitize(y, yedges) - 1, 0, hist.shape[1] - 1) 40 | return np.log(hist[xidx, yidx]) 41 | 42 | 43 | def butter_lowpass_filtfilt(data, cutoff=1500, fs=50000, order=5): 44 | # https://stackoverflow.com/questions/28536191/how-to-filter-smooth-with-scipy-numpy 45 | def butter_lowpass(cutoff, fs, order): 46 | nyq = 0.5 * fs 47 | normal_cutoff = cutoff / nyq 48 | return butter(order, normal_cutoff, btype='low', analog=False) 49 | 50 | b, a = butter_lowpass(cutoff, fs, order=order) 51 | return filtfilt(b, a, data) # forward-backward filter 52 | 53 | 54 | def plot_one_box(x, img, color=None, label=None, line_thickness=None): 55 | # Plots one bounding box on image img 56 | tl = line_thickness or round(0.002 * (img.shape[0] + img.shape[1]) / 2) + 1 # line/font thickness 57 | color = color or [random.randint(0, 255) for _ in range(3)] 58 | c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3])) 59 | cv2.rectangle(img, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA) 60 | if label: 61 | tf = max(tl - 1, 1) # font thickness 62 | t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0] 63 | c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3 64 | cv2.rectangle(img, c1, c2, color, -1, cv2.LINE_AA) # filled 65 | cv2.putText(img, label, (c1[0], c1[1] - 2), 0, tl / 3, [225, 255, 255], thickness=tf, lineType=cv2.LINE_AA) 66 | 67 | 68 | def plot_wh_methods(): # from utils.general import *; plot_wh_methods() 69 | # Compares the two methods for width-height anchor multiplication 70 | # https://github.com/ultralytics/yolov3/issues/168 71 | x = np.arange(-4.0, 4.0, .1) 72 | ya = np.exp(x) 73 | yb = torch.sigmoid(torch.from_numpy(x)).numpy() * 2 74 | 75 | fig = plt.figure(figsize=(6, 3), dpi=150) 76 | plt.plot(x, ya, '.-', label='YOLO') 77 | plt.plot(x, yb ** 2, '.-', label='YOLO ^2') 78 | plt.plot(x, yb ** 1.6, '.-', label='YOLO ^1.6') 79 | plt.xlim(left=-4, right=4) 80 | plt.ylim(bottom=0, top=6) 81 | plt.xlabel('input') 82 | plt.ylabel('output') 83 | plt.grid() 84 | plt.legend() 85 | fig.tight_layout() 86 | fig.savefig('comparison.png', dpi=200) 87 | 88 | 89 | def output_to_target(output, width, height): 90 | # Convert model output to target format [batch_id, class_id, x, y, w, h, conf] 91 | if isinstance(output, torch.Tensor): 92 | output = output.cpu().numpy() 93 | 94 | targets = [] 95 | for i, o in enumerate(output): 96 | if o is not None: 97 | for pred in o: 98 | box = pred[:4] 99 | w = (box[2] - box[0]) / width 100 | h = (box[3] - box[1]) / height 101 | x = box[0] / width + w / 2 102 | y = box[1] / height + h / 2 103 | conf = pred[4] 104 | cls = int(pred[5]) 105 | 106 | targets.append([i, cls, x, y, w, h, conf]) 107 | 108 | return np.array(targets) 109 | 110 | 111 | def plot_images(images, targets, paths=None, fname='images.jpg', names=None, max_size=640, max_subplots=16): 112 | # Plot image grid with labels 113 | 114 | if isinstance(images, torch.Tensor): 115 | images = images.cpu().float().numpy() 116 | if isinstance(targets, torch.Tensor): 117 | targets = targets.cpu().numpy() 118 | 119 | # un-normalise 120 | if np.max(images[0]) <= 1: 121 | images *= 255 122 | 123 | tl = 3 # line thickness 124 | tf = max(tl - 1, 1) # font thickness 125 | bs, _, h, w = images.shape # batch size, _, height, width 126 | bs = min(bs, max_subplots) # limit plot images 127 | ns = np.ceil(bs ** 0.5) # number of subplots (square) 128 | 129 | # Check if we should resize 130 | scale_factor = max_size / max(h, w) 131 | if scale_factor < 1: 132 | h = math.ceil(scale_factor * h) 133 | w = math.ceil(scale_factor * w) 134 | 135 | colors = color_list() # list of colors 136 | mosaic = np.full((int(ns * h), int(ns * w), 3), 255, dtype=np.uint8) # init 137 | for i, img in enumerate(images): 138 | if i == max_subplots: # if last batch has fewer images than we expect 139 | break 140 | 141 | block_x = int(w * (i // ns)) 142 | block_y = int(h * (i % ns)) 143 | 144 | img = img.transpose(1, 2, 0) 145 | if scale_factor < 1: 146 | img = cv2.resize(img, (w, h)) 147 | 148 | mosaic[block_y:block_y + h, block_x:block_x + w, :] = img 149 | if len(targets) > 0: 150 | image_targets = targets[targets[:, 0] == i] 151 | boxes = xywh2xyxy(image_targets[:, 2:6]).T 152 | classes = image_targets[:, 1].astype('int') 153 | labels = image_targets.shape[1] == 6 # labels if no conf column 154 | conf = None if labels else image_targets[:, 6] # check for confidence presence (label vs pred) 155 | 156 | boxes[[0, 2]] *= w 157 | boxes[[0, 2]] += block_x 158 | boxes[[1, 3]] *= h 159 | boxes[[1, 3]] += block_y 160 | for j, box in enumerate(boxes.T): 161 | cls = int(classes[j]) 162 | color = colors[cls % len(colors)] 163 | cls = names[cls] if names else cls 164 | if labels or conf[j] > 0.25: # 0.25 conf thresh 165 | label = '%s' % cls if labels else '%s %.1f' % (cls, conf[j]) 166 | plot_one_box(box, mosaic, label=label, color=color, line_thickness=tl) 167 | 168 | # Draw image filename labels 169 | if paths: 170 | label = Path(paths[i]).name[:40] # trim to 40 char 171 | t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0] 172 | cv2.putText(mosaic, label, (block_x + 5, block_y + t_size[1] + 5), 0, tl / 3, [220, 220, 220], thickness=tf, 173 | lineType=cv2.LINE_AA) 174 | 175 | # Image border 176 | cv2.rectangle(mosaic, (block_x, block_y), (block_x + w, block_y + h), (255, 255, 255), thickness=3) 177 | 178 | if fname: 179 | r = min(1280. / max(h, w) / ns, 1.0) # ratio to limit image size 180 | mosaic = cv2.resize(mosaic, (int(ns * w * r), int(ns * h * r)), interpolation=cv2.INTER_AREA) 181 | # cv2.imwrite(fname, cv2.cvtColor(mosaic, cv2.COLOR_BGR2RGB)) # cv2 save 182 | Image.fromarray(mosaic).save(fname) # PIL save 183 | return mosaic 184 | 185 | 186 | def plot_lr_scheduler(optimizer, scheduler, epochs=300, save_dir=''): 187 | # Plot LR simulating training for full epochs 188 | optimizer, scheduler = copy(optimizer), copy(scheduler) # do not modify originals 189 | y = [] 190 | for _ in range(epochs): 191 | scheduler.step() 192 | y.append(optimizer.param_groups[0]['lr']) 193 | plt.plot(y, '.-', label='LR') 194 | plt.xlabel('epoch') 195 | plt.ylabel('LR') 196 | plt.grid() 197 | plt.xlim(0, epochs) 198 | plt.ylim(0) 199 | plt.tight_layout() 200 | plt.savefig(Path(save_dir) / 'LR.png', dpi=200) 201 | 202 | 203 | def plot_test_txt(): # from utils.general import *; plot_test() 204 | # Plot test.txt histograms 205 | x = np.loadtxt('test.txt', dtype=np.float32) 206 | box = xyxy2xywh(x[:, :4]) 207 | cx, cy = box[:, 0], box[:, 1] 208 | 209 | fig, ax = plt.subplots(1, 1, figsize=(6, 6), tight_layout=True) 210 | ax.hist2d(cx, cy, bins=600, cmax=10, cmin=0) 211 | ax.set_aspect('equal') 212 | plt.savefig('hist2d.png', dpi=300) 213 | 214 | fig, ax = plt.subplots(1, 2, figsize=(12, 6), tight_layout=True) 215 | ax[0].hist(cx, bins=600) 216 | ax[1].hist(cy, bins=600) 217 | plt.savefig('hist1d.png', dpi=200) 218 | 219 | 220 | def plot_targets_txt(): # from utils.general import *; plot_targets_txt() 221 | # Plot targets.txt histograms 222 | x = np.loadtxt('targets.txt', dtype=np.float32).T 223 | s = ['x targets', 'y targets', 'width targets', 'height targets'] 224 | fig, ax = plt.subplots(2, 2, figsize=(8, 8), tight_layout=True) 225 | ax = ax.ravel() 226 | for i in range(4): 227 | ax[i].hist(x[i], bins=100, label='%.3g +/- %.3g' % (x[i].mean(), x[i].std())) 228 | ax[i].legend() 229 | ax[i].set_title(s[i]) 230 | plt.savefig('targets.jpg', dpi=200) 231 | 232 | 233 | def plot_study_txt(f='study.txt', x=None): # from utils.general import *; plot_study_txt() 234 | # Plot study.txt generated by test.py 235 | fig, ax = plt.subplots(2, 4, figsize=(10, 6), tight_layout=True) 236 | ax = ax.ravel() 237 | 238 | fig2, ax2 = plt.subplots(1, 1, figsize=(8, 4), tight_layout=True) 239 | for f in ['study/study_coco_yolo%s.txt' % x for x in ['s', 'm', 'l', 'x']]: 240 | y = np.loadtxt(f, dtype=np.float32, usecols=[0, 1, 2, 3, 7, 8, 9], ndmin=2).T 241 | x = np.arange(y.shape[1]) if x is None else np.array(x) 242 | s = ['P', 'R', 'mAP@.5', 'mAP@.5:.95', 't_inference (ms/img)', 't_NMS (ms/img)', 't_total (ms/img)'] 243 | for i in range(7): 244 | ax[i].plot(x, y[i], '.-', linewidth=2, markersize=8) 245 | ax[i].set_title(s[i]) 246 | 247 | j = y[3].argmax() + 1 248 | ax2.plot(y[6, :j], y[3, :j] * 1E2, '.-', linewidth=2, markersize=8, 249 | label=Path(f).stem.replace('study_coco_', '').replace('yolo', 'YOLO')) 250 | 251 | ax2.plot(1E3 / np.array([209, 140, 97, 58, 35, 18]), [34.6, 40.5, 43.0, 47.5, 49.7, 51.5], 252 | 'k.-', linewidth=2, markersize=8, alpha=.25, label='EfficientDet') 253 | 254 | ax2.grid() 255 | ax2.set_xlim(0, 30) 256 | ax2.set_ylim(28, 50) 257 | ax2.set_yticks(np.arange(30, 55, 5)) 258 | ax2.set_xlabel('GPU Speed (ms/img)') 259 | ax2.set_ylabel('COCO AP val') 260 | ax2.legend(loc='lower right') 261 | plt.savefig('study_mAP_latency.png', dpi=300) 262 | plt.savefig(f.replace('.txt', '.png'), dpi=300) 263 | 264 | 265 | def plot_labels(labels, save_dir=''): 266 | # plot dataset labels 267 | c, b = labels[:, 0], labels[:, 1:].transpose() # classes, boxes 268 | nc = int(c.max() + 1) # number of classes 269 | 270 | fig, ax = plt.subplots(2, 2, figsize=(8, 8), tight_layout=True) 271 | ax = ax.ravel() 272 | ax[0].hist(c, bins=np.linspace(0, nc, nc + 1) - 0.5, rwidth=0.8) 273 | ax[0].set_xlabel('classes') 274 | ax[1].scatter(b[0], b[1], c=hist2d(b[0], b[1], 90), cmap='jet') 275 | ax[1].set_xlabel('x') 276 | ax[1].set_ylabel('y') 277 | ax[2].scatter(b[2], b[3], c=hist2d(b[2], b[3], 90), cmap='jet') 278 | ax[2].set_xlabel('width') 279 | ax[2].set_ylabel('height') 280 | plt.savefig(Path(save_dir) / 'labels.png', dpi=200) 281 | plt.close() 282 | 283 | # seaborn correlogram 284 | try: 285 | import seaborn as sns 286 | import pandas as pd 287 | x = pd.DataFrame(b.transpose(), columns=['x', 'y', 'width', 'height']) 288 | sns.pairplot(x, corner=True, diag_kind='hist', kind='scatter', markers='o', 289 | plot_kws=dict(s=3, edgecolor=None, linewidth=1, alpha=0.02), 290 | diag_kws=dict(bins=50)) 291 | plt.savefig(Path(save_dir) / 'labels_correlogram.png', dpi=200) 292 | plt.close() 293 | except Exception as e: 294 | pass 295 | 296 | 297 | def plot_evolution(yaml_file='data/hyp.finetune.yaml'): # from utils.general import *; plot_evolution() 298 | # Plot hyperparameter evolution results in evolve.txt 299 | with open(yaml_file) as f: 300 | hyp = yaml.load(f, Loader=yaml.FullLoader) 301 | x = np.loadtxt('evolve.txt', ndmin=2) 302 | f = fitness(x) 303 | # weights = (f - f.min()) ** 2 # for weighted results 304 | plt.figure(figsize=(10, 12), tight_layout=True) 305 | matplotlib.rc('font', **{'size': 8}) 306 | for i, (k, v) in enumerate(hyp.items()): 307 | y = x[:, i + 7] 308 | # mu = (y * weights).sum() / weights.sum() # best weighted result 309 | mu = y[f.argmax()] # best single result 310 | plt.subplot(6, 5, i + 1) 311 | plt.scatter(y, f, c=hist2d(y, f, 20), cmap='viridis', alpha=.8, edgecolors='none') 312 | plt.plot(mu, f.max(), 'k+', markersize=15) 313 | plt.title('%s = %.3g' % (k, mu), fontdict={'size': 9}) # limit to 40 characters 314 | if i % 5 != 0: 315 | plt.yticks([]) 316 | print('%15s: %.3g' % (k, mu)) 317 | plt.savefig('evolve.png', dpi=200) 318 | print('\nPlot saved as evolve.png') 319 | 320 | 321 | def plot_results_overlay(start=0, stop=0): # from utils.general import *; plot_results_overlay() 322 | # Plot training 'results*.txt', overlaying train and val losses 323 | s = ['train', 'train', 'train', 'Precision', 'mAP@0.5', 'val', 'val', 'val', 'Recall', 'mAP@0.5:0.95'] # legends 324 | t = ['Box', 'Objectness', 'Classification', 'P-R', 'mAP-F1'] # titles 325 | for f in sorted(glob.glob('results*.txt') + glob.glob('../../Downloads/results*.txt')): 326 | results = np.loadtxt(f, usecols=[2, 3, 4, 8, 9, 12, 13, 14, 10, 11], ndmin=2).T 327 | n = results.shape[1] # number of rows 328 | x = range(start, min(stop, n) if stop else n) 329 | fig, ax = plt.subplots(1, 5, figsize=(14, 3.5), tight_layout=True) 330 | ax = ax.ravel() 331 | for i in range(5): 332 | for j in [i, i + 5]: 333 | y = results[j, x] 334 | ax[i].plot(x, y, marker='.', label=s[j]) 335 | # y_smooth = butter_lowpass_filtfilt(y) 336 | # ax[i].plot(x, np.gradient(y_smooth), marker='.', label=s[j]) 337 | 338 | ax[i].set_title(t[i]) 339 | ax[i].legend() 340 | ax[i].set_ylabel(f) if i == 0 else None # add filename 341 | fig.savefig(f.replace('.txt', '.png'), dpi=200) 342 | 343 | 344 | def plot_results(start=0, stop=0, bucket='', id=(), labels=(), save_dir=''): 345 | # from utils.general import *; plot_results(save_dir='runs/train/exp0') 346 | # Plot training 'results*.txt' 347 | fig, ax = plt.subplots(2, 5, figsize=(12, 6)) 348 | ax = ax.ravel() 349 | s = ['Box', 'Objectness', 'Classification', 'Precision', 'Recall', 350 | 'val Box', 'val Objectness', 'val Classification', 'mAP@0.5', 'mAP@0.5:0.95'] 351 | if bucket: 352 | # os.system('rm -rf storage.googleapis.com') 353 | # files = ['https://storage.googleapis.com/%s/results%g.txt' % (bucket, x) for x in id] 354 | files = ['%g.txt' % x for x in id] 355 | c = ('gsutil cp ' + '%s ' * len(files) + '.') % tuple('gs://%s/%g.txt' % (bucket, x) for x in id) 356 | os.system(c) 357 | else: 358 | files = glob.glob(str(Path(save_dir) / '*.txt')) + glob.glob('../../Downloads/results*.txt') 359 | assert len(files), 'No results.txt files found in %s, nothing to plot.' % os.path.abspath(save_dir) 360 | for fi, f in enumerate(files): 361 | try: 362 | results = np.loadtxt(f, usecols=[2, 3, 4, 8, 9, 12, 13, 14, 10, 11], ndmin=2).T 363 | n = results.shape[1] # number of rows 364 | x = range(start, min(stop, n) if stop else n) 365 | for i in range(10): 366 | y = results[i, x] 367 | if i in [0, 1, 2, 5, 6, 7]: 368 | y[y == 0] = np.nan # don't show zero loss values 369 | # y /= y[0] # normalize 370 | label = labels[fi] if len(labels) else Path(f).stem 371 | ax[i].plot(x, y, marker='.', label=label, linewidth=1, markersize=6) 372 | ax[i].set_title(s[i]) 373 | # if i in [5, 6, 7]: # share train and val loss y axes 374 | # ax[i].get_shared_y_axes().join(ax[i], ax[i - 5]) 375 | except Exception as e: 376 | print('Warning: Plotting error for %s; %s' % (f, e)) 377 | 378 | fig.tight_layout() 379 | ax[1].legend() 380 | fig.savefig(Path(save_dir) / 'results.png', dpi=200) 381 | -------------------------------------------------------------------------------- /utils/torch_utils.py: -------------------------------------------------------------------------------- 1 | # PyTorch utils 2 | 3 | import logging 4 | import math 5 | import os 6 | import time 7 | from contextlib import contextmanager 8 | from copy import deepcopy 9 | 10 | import torch 11 | import torch.backends.cudnn as cudnn 12 | import torch.nn as nn 13 | import torch.nn.functional as F 14 | import torchvision 15 | 16 | logger = logging.getLogger(__name__) 17 | 18 | 19 | @contextmanager 20 | def torch_distributed_zero_first(local_rank: int): 21 | """ 22 | Decorator to make all processes in distributed training wait for each local_master to do something. 23 | """ 24 | if local_rank not in [-1, 0]: 25 | torch.distributed.barrier() 26 | yield 27 | if local_rank == 0: 28 | torch.distributed.barrier() 29 | 30 | 31 | def init_torch_seeds(seed=0): 32 | # Speed-reproducibility tradeoff https://pytorch.org/docs/stable/notes/randomness.html 33 | torch.manual_seed(seed) 34 | if seed == 0: # slower, more reproducible 35 | cudnn.deterministic = True 36 | cudnn.benchmark = False 37 | else: # faster, less reproducible 38 | cudnn.deterministic = False 39 | cudnn.benchmark = True 40 | 41 | 42 | def select_device(device='', batch_size=None): 43 | # device = 'cpu' or '0' or '0,1,2,3' 44 | cpu_request = device.lower() == 'cpu' 45 | if device and not cpu_request: # if device requested other than 'cpu' 46 | os.environ['CUDA_VISIBLE_DEVICES'] = device # set environment variable 47 | assert torch.cuda.is_available(), 'CUDA unavailable, invalid device %s requested' % device # check availablity 48 | 49 | cuda = False if cpu_request else torch.cuda.is_available() 50 | if cuda: 51 | c = 1024 ** 2 # bytes to MB 52 | ng = torch.cuda.device_count() 53 | if ng > 1 and batch_size: # check that batch_size is compatible with device_count 54 | assert batch_size % ng == 0, 'batch-size %g not multiple of GPU count %g' % (batch_size, ng) 55 | x = [torch.cuda.get_device_properties(i) for i in range(ng)] 56 | s = f'Using torch {torch.__version__} ' 57 | for i in range(0, ng): 58 | if i == 1: 59 | s = ' ' * len(s) 60 | logger.info("%sCUDA:%g (%s, %dMB)" % (s, i, x[i].name, x[i].total_memory / c)) 61 | else: 62 | logger.info(f'Using torch {torch.__version__} CPU') 63 | 64 | logger.info('') # skip a line 65 | return torch.device('cuda:0' if cuda else 'cpu') 66 | 67 | 68 | def time_synchronized(): 69 | torch.cuda.synchronize() if torch.cuda.is_available() else None 70 | return time.time() 71 | 72 | 73 | def is_parallel(model): 74 | return type(model) in (nn.parallel.DataParallel, nn.parallel.DistributedDataParallel) 75 | 76 | 77 | def intersect_dicts(da, db, exclude=()): 78 | # Dictionary intersection of matching keys and shapes, omitting 'exclude' keys, using da values 79 | return {k: v for k, v in da.items() if k in db and not any(x in k for x in exclude) and v.shape == db[k].shape} 80 | 81 | 82 | def initialize_weights(model): 83 | for m in model.modules(): 84 | t = type(m) 85 | if t is nn.Conv2d: 86 | pass # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 87 | elif t is nn.BatchNorm2d: 88 | m.eps = 1e-3 89 | m.momentum = 0.03 90 | elif t in [nn.Hardswish, nn.LeakyReLU, nn.ReLU, nn.ReLU6]: 91 | m.inplace = True 92 | 93 | 94 | def find_modules(model, mclass=nn.Conv2d): 95 | # Finds layer indices matching module class 'mclass' 96 | return [i for i, m in enumerate(model.module_list) if isinstance(m, mclass)] 97 | 98 | 99 | def sparsity(model): 100 | # Return global model sparsity 101 | a, b = 0., 0. 102 | for p in model.parameters(): 103 | a += p.numel() 104 | b += (p == 0).sum() 105 | return b / a 106 | 107 | 108 | def prune(model, amount=0.3): 109 | # Prune model to requested global sparsity 110 | import torch.nn.utils.prune as prune 111 | print('Pruning model... ', end='') 112 | for name, m in model.named_modules(): 113 | if isinstance(m, nn.Conv2d): 114 | prune.l1_unstructured(m, name='weight', amount=amount) # prune 115 | prune.remove(m, 'weight') # make permanent 116 | print(' %.3g global sparsity' % sparsity(model)) 117 | 118 | 119 | def fuse_conv_and_bn(conv, bn): 120 | # Fuse convolution and batchnorm layers https://tehnokv.com/posts/fusing-batchnorm-and-conv/ 121 | fusedconv = nn.Conv2d(conv.in_channels, 122 | conv.out_channels, 123 | kernel_size=conv.kernel_size, 124 | stride=conv.stride, 125 | padding=conv.padding, 126 | groups=conv.groups, 127 | bias=True).requires_grad_(False).to(conv.weight.device) 128 | 129 | # prepare filters 130 | w_conv = conv.weight.clone().view(conv.out_channels, -1) 131 | w_bn = torch.diag(bn.weight.div(torch.sqrt(bn.eps + bn.running_var))) 132 | fusedconv.weight.copy_(torch.mm(w_bn, w_conv).view(fusedconv.weight.size())) 133 | 134 | # prepare spatial bias 135 | b_conv = torch.zeros(conv.weight.size(0), device=conv.weight.device) if conv.bias is None else conv.bias 136 | b_bn = bn.bias - bn.weight.mul(bn.running_mean).div(torch.sqrt(bn.running_var + bn.eps)) 137 | fusedconv.bias.copy_(torch.mm(w_bn, b_conv.reshape(-1, 1)).reshape(-1) + b_bn) 138 | 139 | return fusedconv 140 | 141 | 142 | def model_info(model, verbose=False, img_size=640): 143 | # Model information. img_size may be int or list, i.e. img_size=640 or img_size=[640, 320] 144 | n_p = sum(x.numel() for x in model.parameters()) # number parameters 145 | n_g = sum(x.numel() for x in model.parameters() if x.requires_grad) # number gradients 146 | if verbose: 147 | print('%5s %40s %9s %12s %20s %10s %10s' % ('layer', 'name', 'gradient', 'parameters', 'shape', 'mu', 'sigma')) 148 | for i, (name, p) in enumerate(model.named_parameters()): 149 | name = name.replace('module_list.', '') 150 | print('%5g %40s %9s %12g %20s %10.3g %10.3g' % 151 | (i, name, p.requires_grad, p.numel(), list(p.shape), p.mean(), p.std())) 152 | 153 | try: # FLOPS 154 | from thop import profile 155 | flops = profile(deepcopy(model), inputs=(torch.zeros(1, 3, img_size, img_size),), verbose=False)[0] / 1E9 * 2 156 | img_size = img_size if isinstance(img_size, list) else [img_size, img_size] # expand if int/float 157 | fs = ', %.9f GFLOPS' % (flops) # 640x640 FLOPS 158 | except (ImportError, Exception): 159 | fs = '' 160 | 161 | logger.info(f"Model Summary: {len(list(model.modules()))} layers, {n_p} parameters, {n_g} gradients{fs}") 162 | 163 | 164 | def load_classifier(name='resnet101', n=2): 165 | # Loads a pretrained model reshaped to n-class output 166 | model = torchvision.models.__dict__[name](pretrained=True) 167 | 168 | # ResNet model properties 169 | # input_size = [3, 224, 224] 170 | # input_space = 'RGB' 171 | # input_range = [0, 1] 172 | # mean = [0.485, 0.456, 0.406] 173 | # std = [0.229, 0.224, 0.225] 174 | 175 | # Reshape output to n classes 176 | filters = model.fc.weight.shape[1] 177 | model.fc.bias = nn.Parameter(torch.zeros(n), requires_grad=True) 178 | model.fc.weight = nn.Parameter(torch.zeros(n, filters), requires_grad=True) 179 | model.fc.out_features = n 180 | return model 181 | 182 | 183 | def scale_img(img, ratio=1.0, same_shape=False): # img(16,3,256,416), r=ratio 184 | # scales img(bs,3,y,x) by ratio 185 | if ratio == 1.0: 186 | return img 187 | else: 188 | h, w = img.shape[2:] 189 | s = (int(h * ratio), int(w * ratio)) # new size 190 | img = F.interpolate(img, size=s, mode='bilinear', align_corners=False) # resize 191 | if not same_shape: # pad/crop img 192 | gs = 32 # (pixels) grid size 193 | h, w = [math.ceil(x * ratio / gs) * gs for x in (h, w)] 194 | return F.pad(img, [0, w - s[1], 0, h - s[0]], value=0.447) # value = imagenet mean 195 | 196 | 197 | def copy_attr(a, b, include=(), exclude=()): 198 | # Copy attributes from b to a, options to only include [...] and to exclude [...] 199 | for k, v in b.__dict__.items(): 200 | if (len(include) and k not in include) or k.startswith('_') or k in exclude: 201 | continue 202 | else: 203 | setattr(a, k, v) 204 | 205 | 206 | class ModelEMA: 207 | """ Model Exponential Moving Average from https://github.com/rwightman/pytorch-image-models 208 | Keep a moving average of everything in the model state_dict (parameters and buffers). 209 | This is intended to allow functionality like 210 | https://www.tensorflow.org/api_docs/python/tf/train/ExponentialMovingAverage 211 | A smoothed version of the weights is necessary for some training schemes to perform well. 212 | This class is sensitive where it is initialized in the sequence of model init, 213 | GPU assignment and distributed training wrappers. 214 | """ 215 | 216 | def __init__(self, model, decay=0.9999, updates=0): 217 | # Create EMA 218 | self.ema = deepcopy(model.module if is_parallel(model) else model).eval() # FP32 EMA 219 | # if next(model.parameters()).device.type != 'cpu': 220 | # self.ema.half() # FP16 EMA 221 | self.updates = updates # number of EMA updates 222 | self.decay = lambda x: decay * (1 - math.exp(-x / 2000)) # decay exponential ramp (to help early epochs) 223 | for p in self.ema.parameters(): 224 | p.requires_grad_(False) 225 | 226 | def update(self, model): 227 | # Update EMA parameters 228 | with torch.no_grad(): 229 | self.updates += 1 230 | d = self.decay(self.updates) 231 | 232 | msd = model.module.state_dict() if is_parallel(model) else model.state_dict() # model state_dict 233 | for k, v in self.ema.state_dict().items(): 234 | if v.dtype.is_floating_point: 235 | v *= d 236 | v += (1. - d) * msd[k].detach() 237 | 238 | def update_attr(self, model, include=(), exclude=('process_group', 'reducer')): 239 | # Update EMA attributes 240 | copy_attr(self.ema, model, include, exclude) 241 | -------------------------------------------------------------------------------- /weights/put your weights file here.txt: -------------------------------------------------------------------------------- 1 | yolov4-paspp.pt 2 | yolov4-pacsp-s.pt 3 | yolov4-pacsp.pt 4 | yolov4-pacsp-x.pt --------------------------------------------------------------------------------