├── README.md
├── cfg
    ├── ghost-yolov3-visdrone.cfg
    ├── yolov3-hand.cfg
    ├── yolov3-shufflenetv2-hand.cfg
    ├── yolov3-shufflenetv2-visdrone.cfg
    ├── yolov3-tiny-hand-cbam.cfg
    ├── yolov3-tiny-hand-eca.cfg
    ├── yolov3-tiny-hand-se.cfg
    └── yolov3-tiny-hand.cfg
├── data
    ├── oxfordhand.data
    ├── oxfordhand.names
    ├── visdrone.data
    └── visdrone.names
├── detect.py
├── models.py
├── normal_prune.py
├── output
    ├── airplane.png
    ├── car.png
    ├── most.png
    └── test.py
├── test.py
├── train.py
├── utils
    ├── __init__.py
    ├── adabound.py
    ├── datasets.py
    ├── evolve.sh
    ├── gcp.sh
    ├── google_utils.py
    ├── keepgit
    ├── layers.py
    ├── parse_config.py
    ├── prune_utils.py
    ├── quant_dorefa.py
    ├── torch_utils.py
    ├── util_wqaq.py
    └── utils.py
└── weights
    └── download_yolov3_weights.sh


/README.md:
--------------------------------------------------------------------------------
  1 | # Introduction  
  2 | This Repository includes YOLOv3 with some lightweight backbones (***ShuffleNetV2, GhostNet, VoVNet***), some computer vision attention mechanism (***SE Block, CBAM Block, ECA Block***), pruning,quantization and distillation for GhostNet.
  3 | # Important Update
  4 | ***2020.6.1***     
  5 | (1) The best lightweight model——HuaWei GhostNet has been added as the YOLOv3 backbone! It is better than ShuffleNetV2. The result for visdrone dataset is as following.  
  6 | (2) Add Dorefa quantization method for arbitrary bit quantization! The result for visdrone dataset is as following.  
  7 | (3) And I delete the ShuffleNet and the attention mechanism.   
  8 | ***2020.6.24***    
  9 | (1) Add pruning according to NetworkSlimming.  
 10 | (2) Add distillation for higher mAP after pruning.  
 11 | (3) Add Imagenet pretraining model for GhostNet.  
 12 | ***2020.9.26***  
 13 | (1) Add VoVNet as the backbone. The result is excellent.
 14 | 
 15 | | Model | Params | FPS | mAP |
 16 | | ----- | ----- | ----- |----- |
 17 | | GhostNet+YOLOv3 | 23.49M | 62.5 | 35.1 |
 18 | | Pruned Model+Distillation | 5.81M | 76.9 | 34.3 |
 19 | | Pruned Model+INT8 | 5.81M | 75.1 | 34 |
 20 | | YOLOv5s | 7.27M | - | 32.7 |
 21 | | YOLOv5x | 88.5M | - | 41.8 |  
 22 | | VoVNet | 42.8M | 28.9 | 42.7 |  
 23 | 
 24 | ***Attention : Single GPU will be better***  
 25 | ***If you need previous attention model or have any question, you can add my WeChat: AutrefoisLethe***
 26 | # Environment  
 27 | * python 3.7  
 28 | * pytorch >= 1.1.0  
 29 | * opencv-python  
 30 | # Datasets
 31 | * Oxfordhand datasets (1 class, including human's hand)  
 32 | https://pan.baidu.com/s/1ZYKXMEvNef41MdG1NgWYiQ     (extract code: 00rw) 
 33 | * Visdrone Remote datasets (10 classes, including pedestrian, car, bus, etc)  
 34 | https://pan.baidu.com/s/1JzJ6APRym8K64taZgcDZfQ     (extract code: xyil)
 35 | * Bdd100K datasets (10 classes, including motor, train, traffic light, etc)  
 36 | https://pan.baidu.com/s/1dBrKEdy92Mxqg-JiyrVjkQ     (extract code: lm4p)
 37 | * Dior datasets (20 classes, including airplane, airport, bridge, etc)  
 38 | https://pan.baidu.com/s/1Fc-zJtHy-6iIewvsKWPDnA     (extract code: k2js) 
 39 | 
 40 | # Usage
 41 | 1. Download the datasets, place them in the ***data*** directory    
 42 | 2. Train the models by using following command (change the model structure by changing the cfg file)  
 43 | ```
 44 |   python3 train.py --data data/visdrone.data --batch-size 16 --cfg cfg/ghost-yolov3-visdrone.cfg --img-size 640
 45 | ```
 46 | 3. Detect objects using the trained model (place the pictures or videos in the ***samples*** directory)    
 47 | ```
 48 |   python3 detect.py --cfg cfg/ghostnet-yolov3-visdrone.cfg --weights weights/best.pt --data data/visdrone.data
 49 | ```
 50 | 4. Results:  
 51 | ![most](https://github.com/HaloTrouvaille/YOLO-Multi-Backbones-Attention/blob/master/output/most.png)  
 52 | ![car](https://github.com/HaloTrouvaille/YOLO-Multi-Backbones-Attention/blob/master/output/car.png)  
 53 | ![airplane](https://github.com/HaloTrouvaille/YOLO-Multi-Backbones-Attention/blob/master/output/airplane.png)  
 54 | # Pruning and Quantization 
 55 | ## Pruning
 56 | First of all, execute sparse training.  
 57 | ```
 58 | python3 train.py --data data/visdrone.data --batch-size 4 --cfg cfg/ghost-yolov3-visdrone.cfg --img-size 640 --epochs 300  --device 3 -sr --s 0.0001
 59 | ```
 60 | Then change cfg and weights in normal_prune.py then use following command  
 61 | ```
 62 | python normal_prune.py
 63 | ```
 64 | After obtaining pruned.cfg and corresponding weights file, you can fine-tune the pruned model via following command  
 65 | ```
 66 | python3 train.py --data data/visdrone.data --batch-size 4 --cfg pruned.cfg --img-size 640 --epochs 300  --device 3 --weights weights/xxx.weighs
 67 | ```
 68 | 
 69 | ## Quantization
 70 | If you want to quantize certain convolutional layer, you can just change the [convolutional] to [quan_convolutional] in cfg file. Then use following command  
 71 | ```
 72 |   python3 train.py --data data/visdrone.data --batch-size 16 --cfg cfg/ghostnet-yolov3-visdrone.cfg --img-size 640
 73 | ```
 74 | 
 75 | # Experiment Result for Changing YOLOv3 Backbone
 76 | ## ShuffleNetV2 + Two Scales Detection(YOLO Detector)
 77 | ### Using Oxfordhand datasets
 78 | | Model | Params | Model Size | mAP |
 79 | | ----- | ----- | ----- |----- |
 80 | | ShuffleNetV2 1x | 3.57M | 13.89MB | 51.2 |
 81 | | ShuffleNetV2 1.5x | 5.07M | 19.55MB | 56.4 |
 82 | | YOLOv3-tiny | 8.67M | 33.1MB | 60.3 |
 83 | ### Using Visdrone datasets(Incomplete training)
 84 | | Model | Params | Model Size | mAP |
 85 | | ----- | ----- | ----- |----- |
 86 | | ShuffleNetV2 1x | 3.59M | 13.99MB | 10.2 |
 87 | | ShuffleNetV2 1.5x | 5.09M | 19.63MB | 11 |
 88 | | YOLOv3-tiny | 8.69M | 33.9MB | 3.3 |
 89 | # Experiment Result for Attention Mechanism
 90 | ### Based on YOLOv3-tiny
 91 | SE Block paper : https://arxiv.org/abs/1709.01507  
 92 | CBAM Block paper : https://arxiv.org/abs/1807.06521  
 93 | ECA Block paper : https://arxiv.org/abs/1910.03151  
 94 | | Model | Params | mAP |
 95 | | ----- | ----- | ----- |
 96 | | YOLOv3-tiny | 8.67M | 60.3 |
 97 | | YOLOv3-tiny + SE | 8.933M | 62.3 |
 98 | | YOLOv3-tiny + CBAM | 8.81M | 62.7 |
 99 | | YOLOv3-tiny + ECA | 8.67M | 62.6 |
100 | 
101 |  
102 | # TODO
103 | - [x] ShuffleNetV2 backbone
104 | - [x] HuaWei GhostNet backbone 
105 | - [x] ImageNet pretraining
106 | - [x] COCO datasets training
107 | - [ ] Other detection strategies
108 | - [ ] Other pruning strategies
109 | 
110 | 
111 | 


--------------------------------------------------------------------------------
/cfg/ghost-yolov3-visdrone.cfg:
--------------------------------------------------------------------------------
   1 | 
   2 | [net]
   3 | # Testing
   4 | #batch=1
   5 | #subdivisions=1
   6 | # Training
   7 | batch=16
   8 | subdivisions=1
   9 | width=416
  10 | height=416
  11 | channels=3
  12 | momentum=0.9
  13 | decay=0.0005
  14 | angle=0
  15 | saturation = 1.5
  16 | exposure = 1.5
  17 | hue=.1
  18 | 
  19 | learning_rate=0.001
  20 | burn_in=1000
  21 | max_batches = 500200
  22 | policy=steps
  23 | steps=400000,450000
  24 | scales=.1,.1
  25 | # 0
  26 | [convolutional]
  27 | batch_normalize=1
  28 | filters=16
  29 | size=3
  30 | stride=2
  31 | pad=1
  32 | group=1
  33 | activation=relu
  34 | 
  35 | # ghost bottleneck  starts
  36 | 
  37 | #GB1-PConv #1
  38 | [convolutional]
  39 | batch_normalize=1
  40 | filters=8
  41 | size=1
  42 | stride=1
  43 | pad=0
  44 | group=1
  45 | activation=relu
  46 | 
  47 | #GB1-Cheap #2
  48 | [convolutional]
  49 | batch_normalize=1
  50 | filters=8
  51 | size=3
  52 | stride=1
  53 | pad=1
  54 | group=8
  55 | activation=relu
  56 | 
  57 | # 3
  58 | [route]
  59 | layers=-1, 1 
  60 | 
  61 | # 4 
  62 | [convolutional]
  63 | batch_normalize=1
  64 | filters=8
  65 | size=1
  66 | stride=1
  67 | pad=0
  68 | group=1
  69 | activation=none
  70 | 
  71 | # 5 
  72 | [convolutional]
  73 | batch_normalize=1
  74 | filters=8
  75 | size=3
  76 | stride=1
  77 | pad=1
  78 | group=8
  79 | activation=none
  80 | 
  81 | # 6
  82 | [route]
  83 | layers=-1,4
  84 | 
  85 | # 7
  86 | [shortcut]
  87 | from=-7
  88 | activation=none
  89 | 
  90 | # GB2-PConv # 8
  91 | [convolutional]
  92 | batch_normalize=1
  93 | filters=24
  94 | size=1
  95 | stride=1
  96 | pad=0
  97 | group=1
  98 | activation=relu
  99 | 
 100 | # GB2-Cheap # 9
 101 | [convolutional]
 102 | batch_normalize=1
 103 | filters=24
 104 | size=3
 105 | stride=1
 106 | pad=1
 107 | group=24
 108 | activation=relu
 109 | 
 110 | #10
 111 | [route]
 112 | layers=-1,8
 113 | 
 114 | #11
 115 | [convolutional]
 116 | batch_normalize=1
 117 | filters=48
 118 | size=3
 119 | stride=2
 120 | group=48
 121 | pad=1
 122 | activation=none
 123 | 
 124 | #12
 125 | [convolutional]
 126 | batch_normalize=1
 127 | filters=12
 128 | size=1
 129 | stride=1
 130 | group=1
 131 | pad=0
 132 | activation=none
 133 | 
 134 | #13
 135 | [convolutional]
 136 | batch_normalize=1
 137 | filters=12
 138 | size=3
 139 | stride=1
 140 | group=12
 141 | pad=1
 142 | activation=none
 143 | 
 144 | #14
 145 | [route]
 146 | layers=-1,12
 147 | 
 148 | #15
 149 | [route]
 150 | layers=7
 151 | 
 152 | #16
 153 | [convolutional]
 154 | batch_normalize=1
 155 | filters=16
 156 | size=3
 157 | stride=2
 158 | group=16
 159 | pad=1
 160 | activation=none
 161 | 
 162 | #17
 163 | [convolutional]
 164 | batch_normalize=1
 165 | filters=24
 166 | size=1
 167 | stride=1
 168 | group=1
 169 | pad=0
 170 | activation=none
 171 | 
 172 | #18
 173 | [shortcut]
 174 | from=-4
 175 | activation=none
 176 | 
 177 | # GB3-PConv #19
 178 | [convolutional]
 179 | batch_normalize=1
 180 | filters=36
 181 | size=1
 182 | stride=1
 183 | group=1
 184 | pad=0
 185 | activation=relu
 186 | 
 187 | # GB3-Cheap #20
 188 | [convolutional]
 189 | batch_normalize=1
 190 | filters=36
 191 | size=3
 192 | stride=1
 193 | group=36
 194 | pad=1
 195 | activation=relu
 196 | 
 197 | #21
 198 | [route]
 199 | layers=-1,19
 200 | 
 201 | #22
 202 | [convolutional]
 203 | batch_normalize=1
 204 | filters=12
 205 | size=1
 206 | stride=1
 207 | group=1
 208 | pad=0
 209 | activation=none
 210 | 
 211 | #23
 212 | [convolutional]
 213 | batch_normalize=1
 214 | filters=12
 215 | size=3
 216 | stride=1
 217 | group=12
 218 | pad=1
 219 | activation=none
 220 | 
 221 | #24
 222 | [route]
 223 | layers=-1,22
 224 | 
 225 | #25
 226 | [shortcut]
 227 | from=-7
 228 | activation=none
 229 | 
 230 | #GB4-PConv #26
 231 | [convolutional]
 232 | batch_normalize=1
 233 | filters=36
 234 | size=1
 235 | stride=1
 236 | group=1
 237 | pad=0
 238 | activation=relu
 239 | 
 240 | #GB4-Cheap #27
 241 | [convolutional]
 242 | batch_normalize=1
 243 | filters=36
 244 | size=3
 245 | stride=1
 246 | group=36
 247 | pad=1
 248 | activation=relu
 249 | 
 250 | #28
 251 | [route]
 252 | layers=-1,26
 253 | 
 254 | #29
 255 | [convolutional]
 256 | batch_normalize=1
 257 | filters=72
 258 | size=5
 259 | stride=2
 260 | group=72
 261 | pad=2
 262 | activation=none
 263 | 
 264 | #30
 265 | [se]
 266 | reduction=4
 267 | 
 268 | #31
 269 | [convolutional]
 270 | batch_normalize=1
 271 | filters=20
 272 | size=1
 273 | stride=1
 274 | group=1
 275 | pad=0
 276 | activation=none
 277 | 
 278 | #32
 279 | [convolutional]
 280 | batch_normalize=1
 281 | filters=20
 282 | size=3
 283 | stride=1
 284 | group=20
 285 | pad=1
 286 | activation=none
 287 | 
 288 | #33
 289 | [route]
 290 | layers=-1,31
 291 | 
 292 | #34
 293 | [route]
 294 | layers=25
 295 | 
 296 | #35
 297 | [convolutional]
 298 | batch_normalize=1
 299 | filters=24
 300 | size=5
 301 | stride=2
 302 | group=24
 303 | pad=2
 304 | activation=none
 305 | 
 306 | #36
 307 | [convolutional]
 308 | batch_normalize=1
 309 | filters=40
 310 | size=1
 311 | stride=1
 312 | group=1
 313 | pad=0
 314 | activation=none
 315 | 
 316 | #37
 317 | [shortcut]
 318 | from=-4
 319 | activation=none
 320 | 
 321 | #GB5-PConv #38
 322 | [convolutional]
 323 | batch_normalize=1
 324 | filters=60
 325 | size=1
 326 | stride=1
 327 | group=1
 328 | pad=0
 329 | activation=relu
 330 | 
 331 | #GB5-Cheap #39
 332 | [convolutional]
 333 | batch_normalize=1
 334 | filters=60
 335 | size=3
 336 | stride=1
 337 | group=60
 338 | pad=1
 339 | activation=relu
 340 | 
 341 | #40
 342 | [route]
 343 | layers=-1,38
 344 | 
 345 | #41
 346 | [se]
 347 | reduction=4
 348 | 
 349 | #42
 350 | [convolutional]
 351 | batch_normalize=1
 352 | filters=20
 353 | size=1
 354 | stride=1
 355 | group=1
 356 | pad=0
 357 | activation=none
 358 | 
 359 | #43
 360 | [convolutional]
 361 | batch_normalize=1
 362 | filters=20
 363 | size=3
 364 | stride=1
 365 | group=20
 366 | pad=1
 367 | activation=none
 368 | 
 369 | #44
 370 | [route]
 371 | layers=-1,42
 372 | 
 373 | #45
 374 | [shortcut]
 375 | from=-8
 376 | 
 377 | #GB6-PConv #46
 378 | [convolutional]
 379 | batch_normalize=1
 380 | filters=120
 381 | size=1
 382 | stride=1
 383 | group=1
 384 | pad=0
 385 | activation=relu
 386 | 
 387 | #GB6-Cheap #47
 388 | [convolutional]
 389 | batch_normalize=1
 390 | filters=120
 391 | size=3
 392 | stride=1
 393 | group=120
 394 | pad=1
 395 | activation=relu
 396 | 
 397 | #48
 398 | [route]
 399 | layers=-1,46
 400 | 
 401 | #49
 402 | [convolutional]
 403 | batch_normalize=1
 404 | filters=240
 405 | size=3
 406 | stride=2
 407 | group=240
 408 | pad=1
 409 | activation=none
 410 | 
 411 | #50
 412 | [convolutional]
 413 | batch_normalize=1
 414 | filters=40
 415 | size=1
 416 | stride=1
 417 | group=1
 418 | pad=0
 419 | activation=none
 420 | 
 421 | #51
 422 | [convolutional]
 423 | batch_normalize=1
 424 | filters=40
 425 | size=3
 426 | stride=1
 427 | group=40
 428 | pad=1
 429 | activation=none
 430 | 
 431 | #52
 432 | [route]
 433 | layers=-1,50
 434 | 
 435 | #53
 436 | [route]
 437 | layers=45
 438 | 
 439 | #54
 440 | [convolutional]
 441 | batch_normalize=1
 442 | filters=40
 443 | size=3
 444 | stride=2
 445 | group=40
 446 | pad=1
 447 | activation=none
 448 | 
 449 | #55
 450 | [convolutional]
 451 | batch_normalize=1
 452 | filters=80
 453 | size=1
 454 | stride=1
 455 | group=1
 456 | pad=0
 457 | activation=none
 458 | 
 459 | #56
 460 | [shortcut]
 461 | from=-4
 462 | activation=none
 463 | 
 464 | #GB7-PConv #57
 465 | [convolutional]
 466 | batch_normalize=1
 467 | filters=100
 468 | size=1
 469 | stride=1
 470 | group=1
 471 | pad=0
 472 | activation=relu
 473 | 
 474 | #GB7-Cheap #58
 475 | [convolutional]
 476 | batch_normalize=1
 477 | filters=100
 478 | size=3
 479 | stride=1
 480 | group=100
 481 | pad=1
 482 | activation=relu
 483 | 
 484 | #59
 485 | [route]
 486 | layers=-1,57
 487 | 
 488 | #60
 489 | [convolutional]
 490 | batch_normalize=1
 491 | filters=40
 492 | size=1
 493 | stride=1
 494 | group=1
 495 | pad=0
 496 | activation=none
 497 | 
 498 | #61
 499 | [convolutional]
 500 | batch_normalize=1
 501 | filters=40
 502 | size=3
 503 | stride=1
 504 | group=40
 505 | pad=1
 506 | activation=none
 507 | 
 508 | #62
 509 | [route]
 510 | layers=-1,60
 511 | 
 512 | #63
 513 | [shortcut]
 514 | from=-7
 515 | activation=none
 516 | 
 517 | #GB8-PConv #64
 518 | [convolutional]
 519 | batch_normalize=1
 520 | filters=92
 521 | size=1
 522 | stride=1
 523 | group=1
 524 | pad=0
 525 | activation=relu
 526 | 
 527 | #GB8-Cheap #65
 528 | [convolutional]
 529 | batch_normalize=1
 530 | filters=92
 531 | size=3
 532 | stride=1
 533 | group=92
 534 | pad=1
 535 | activation=relu
 536 | 
 537 | #66
 538 | [route]
 539 | layers=-1,64
 540 | 
 541 | #67
 542 | [convolutional]
 543 | batch_normalize=1
 544 | filters=40
 545 | size=1
 546 | stride=1
 547 | group=1
 548 | pad=0
 549 | activation=none
 550 | 
 551 | #68
 552 | [convolutional]
 553 | batch_normalize=1
 554 | filters=40
 555 | size=3
 556 | stride=1
 557 | group=40
 558 | pad=1
 559 | activation=none
 560 | 
 561 | #69
 562 | [route]
 563 | layers=-1,67
 564 | 
 565 | #70
 566 | [shortcut]
 567 | from=-7
 568 | activation=none
 569 | 
 570 | #GB9-PConv #71
 571 | [convolutional]
 572 | batch_normalize=1
 573 | filters=92
 574 | size=1
 575 | stride=1
 576 | group=1
 577 | pad=0
 578 | activation=relu
 579 | 
 580 | #GB9-Cheap #72
 581 | [convolutional]
 582 | batch_normalize=1
 583 | filters=92
 584 | size=3
 585 | stride=1
 586 | group=92
 587 | pad=1
 588 | activation=relu
 589 | 
 590 | #73
 591 | [route]
 592 | layers=-1,71
 593 | 
 594 | #74
 595 | [convolutional]
 596 | batch_normalize=1
 597 | filters=40
 598 | size=1
 599 | stride=1
 600 | group=1
 601 | pad=0
 602 | activation=none
 603 | 
 604 | #75
 605 | [convolutional]
 606 | batch_normalize=1
 607 | filters=40
 608 | size=3
 609 | stride=1
 610 | group=40
 611 | pad=1
 612 | activation=none
 613 | 
 614 | #76
 615 | [route]
 616 | layers=-1,74
 617 | 
 618 | #77
 619 | [shortcut]
 620 | from=-7
 621 | activation=none
 622 | 
 623 | #GB10-PConv #78
 624 | [convolutional]
 625 | batch_normalize=1
 626 | filters=240
 627 | size=1
 628 | stride=1
 629 | group=1
 630 | pad=0
 631 | activation=relu
 632 | 
 633 | 
 634 | #GB10-Cheap #79
 635 | [convolutional]
 636 | batch_normalize=1
 637 | filters=240
 638 | size=3
 639 | stride=1
 640 | group=240
 641 | pad=1
 642 | activation=relu
 643 | 
 644 | #80
 645 | [route]
 646 | layers=-1,78
 647 | 
 648 | #81
 649 | [se]
 650 | reduction=4
 651 | 
 652 | #82
 653 | [convolutional]
 654 | batch_normalize=1
 655 | filters=56
 656 | size=1
 657 | stride=1
 658 | group=1
 659 | pad=0
 660 | activation=none
 661 | 
 662 | #83
 663 | [convolutional]
 664 | batch_normalize=1
 665 | filters=56
 666 | size=3
 667 | stride=1
 668 | group=56
 669 | pad=1
 670 | activation=none
 671 | 
 672 | #84
 673 | [route]
 674 | layers=-1,82
 675 | 
 676 | #85
 677 | [route]
 678 | layers=77
 679 | 
 680 | #86
 681 | [convolutional]
 682 | batch_normalize=1
 683 | filters=80
 684 | size=3
 685 | stride=1
 686 | group=80
 687 | pad=1
 688 | activation=none
 689 | 
 690 | #87
 691 | [convolutional]
 692 | batch_normalize=1
 693 | filters=112
 694 | size=1
 695 | stride=1
 696 | group=1
 697 | pad=0
 698 | activation=none
 699 | 
 700 | #88
 701 | [shortcut]
 702 | from=-4
 703 | activation=none
 704 | 
 705 | #GB11-PConv #89
 706 | [convolutional]
 707 | batch_normalize=1
 708 | filters=336
 709 | size=1
 710 | stride=1
 711 | group=1
 712 | pad=0
 713 | activation=relu
 714 | 
 715 | #GB11-Cheap #90
 716 | [convolutional]
 717 | batch_normalize=1
 718 | filters=336
 719 | size=3
 720 | stride=1
 721 | group=336
 722 | pad=1
 723 | activation=relu
 724 | 
 725 | #91
 726 | [route]
 727 | layers=-1,89
 728 | 
 729 | #92
 730 | [se]
 731 | reduction=4
 732 | 
 733 | #93
 734 | [convolutional]
 735 | batch_normalize=1
 736 | filters=56
 737 | size=1
 738 | stride=1
 739 | group=1
 740 | pad=0
 741 | activation=none
 742 | 
 743 | #94
 744 | [convolutional]
 745 | batch_normalize=1
 746 | filters=56
 747 | size=3
 748 | stride=1
 749 | group=56
 750 | pad=1
 751 | activation=none
 752 | 
 753 | #95
 754 | [route]
 755 | layers=-1,93
 756 | 
 757 | #96
 758 | [shortcut]
 759 | from=-8
 760 | activation=none
 761 | 
 762 | #GB12-PConv #97
 763 | [convolutional]
 764 | batch_normalize=1
 765 | filters=336
 766 | size=1
 767 | stride=1
 768 | group=1
 769 | pad=0
 770 | activation=relu
 771 | 
 772 | #GB12-Cheap #98
 773 | [convolutional]
 774 | batch_normalize=1
 775 | filters=336
 776 | size=3
 777 | stride=1
 778 | group=336
 779 | pad=1
 780 | activation=relu
 781 | 
 782 | #99
 783 | [route]
 784 | layers=-1,97
 785 | 
 786 | #100
 787 | [convolutional]
 788 | batch_normalize=1
 789 | filters=672
 790 | size=5
 791 | stride=2
 792 | group=672
 793 | pad=2
 794 | activation=none
 795 | 
 796 | #101
 797 | [se]
 798 | reduction=4
 799 | 
 800 | #102
 801 | [convolutional]
 802 | batch_normalize=1
 803 | filters=80
 804 | size=1
 805 | stride=1
 806 | group=1
 807 | pad=0
 808 | activation=none
 809 | 
 810 | #103
 811 | [convolutional]
 812 | batch_normalize=1
 813 | filters=80
 814 | size=3
 815 | stride=1
 816 | group=80
 817 | pad=1
 818 | activation=none
 819 | 
 820 | #104
 821 | [route]
 822 | layers=-1,102
 823 | 
 824 | #105
 825 | [route]
 826 | layers=96
 827 | 
 828 | #106
 829 | [convolutional]
 830 | batch_normalize=1
 831 | filters=112
 832 | size=5
 833 | stride=2
 834 | group=112
 835 | pad=2
 836 | activation=none
 837 | 
 838 | #107
 839 | [convolutional]
 840 | batch_normalize=1
 841 | filters=160
 842 | size=1
 843 | stride=1
 844 | group=1
 845 | pad=0
 846 | activation=none
 847 | 
 848 | #108
 849 | [shortcut]
 850 | from=-4
 851 | activation=none
 852 | 
 853 | #GB13-PConv #109
 854 | [convolutional]
 855 | batch_normalize=1
 856 | filters=480
 857 | size=1
 858 | stride=1
 859 | group=1
 860 | pad=0
 861 | activation=relu
 862 | 
 863 | #GB13-Cheap #110
 864 | [convolutional]
 865 | batch_normalize=1
 866 | filters=480
 867 | size=3
 868 | stride=1
 869 | group=480
 870 | pad=1
 871 | activation=relu
 872 | 
 873 | #111
 874 | [route]
 875 | layers=-1,109
 876 | 
 877 | #112
 878 | [convolutional]
 879 | batch_normalize=1
 880 | filters=80
 881 | size=1
 882 | stride=1
 883 | group=1
 884 | pad=0
 885 | activation=none
 886 | 
 887 | #113
 888 | [convolutional]
 889 | batch_normalize=1
 890 | filters=80
 891 | size=3
 892 | stride=1
 893 | group=80
 894 | pad=1
 895 | activation=none
 896 | 
 897 | #114
 898 | [route]
 899 | layers=-1,112
 900 | 
 901 | #115
 902 | [shortcut]
 903 | from=-7
 904 | activation=none
 905 | 
 906 | #GB14-PConv #116
 907 | [convolutional]
 908 | batch_normalize=1
 909 | filters=480
 910 | size=1
 911 | stride=1
 912 | group=1
 913 | pad=0
 914 | activation=relu
 915 | 
 916 | #GB14-Cheap #117
 917 | [convolutional]
 918 | batch_normalize=1
 919 | filters=480
 920 | size=3
 921 | stride=1
 922 | group=480
 923 | pad=1
 924 | activation=relu
 925 | 
 926 | #118
 927 | [route]
 928 | layers=-1,116
 929 | 
 930 | #119
 931 | [se]
 932 | reduction=4
 933 | 
 934 | #120
 935 | [convolutional]
 936 | batch_normalize=1
 937 | filters=80
 938 | size=1
 939 | stride=1
 940 | group=1
 941 | pad=0
 942 | activation=none
 943 | 
 944 | #121
 945 | [convolutional]
 946 | batch_normalize=1
 947 | filters=80
 948 | size=3
 949 | stride=1
 950 | group=80
 951 | pad=1
 952 | activation=none
 953 | 
 954 | #122
 955 | [route]
 956 | layers=-1,120
 957 | 
 958 | #123
 959 | [shortcut]
 960 | from=-8
 961 | activation=none
 962 | 
 963 | #GB15-PConv #124
 964 | [convolutional]
 965 | batch_normalize=1
 966 | filters=480
 967 | size=1
 968 | stride=1
 969 | group=1
 970 | pad=0
 971 | activation=relu
 972 | 
 973 | #GB15-Cheap #125
 974 | [convolutional]
 975 | batch_normalize=1
 976 | filters=480
 977 | size=3
 978 | stride=1
 979 | group=480
 980 | pad=1
 981 | activation=relu
 982 | 
 983 | #126
 984 | [route]
 985 | layers=-1,124
 986 | 
 987 | #127
 988 | [convolutional]
 989 | batch_normalize=1
 990 | filters=80
 991 | size=1
 992 | stride=1
 993 | group=1
 994 | pad=0
 995 | activation=none
 996 | 
 997 | #128
 998 | [convolutional]
 999 | batch_normalize=1
1000 | filters=80
1001 | size=3
1002 | stride=1
1003 | group=80
1004 | pad=1
1005 | activation=none
1006 | 
1007 | #129
1008 | [route]
1009 | layers=-1,127
1010 | 
1011 | #130
1012 | [shortcut]
1013 | from=-7
1014 | activation=none
1015 | 
1016 | #GB16-PConv #131
1017 | [convolutional]
1018 | batch_normalize=1
1019 | filters=480
1020 | size=1
1021 | stride=1
1022 | group=1
1023 | pad=0
1024 | activation=relu
1025 | 
1026 | #GB16-Cheap #132
1027 | [convolutional]
1028 | batch_normalize=1
1029 | filters=480
1030 | size=3
1031 | stride=1
1032 | group=480
1033 | pad=1
1034 | activation=relu
1035 | 
1036 | #133
1037 | [route]
1038 | layers=-1,131
1039 | 
1040 | #134
1041 | [se]
1042 | reduction=4
1043 | 
1044 | #135
1045 | [convolutional]
1046 | batch_normalize=1
1047 | filters=80
1048 | size=1
1049 | stride=1
1050 | group=1
1051 | pad=0
1052 | activation=none
1053 | 
1054 | #136
1055 | [convolutional]
1056 | batch_normalize=1
1057 | filters=80
1058 | size=3
1059 | stride=1
1060 | group=80
1061 | pad=1
1062 | activation=none
1063 | 
1064 | #137
1065 | [route]
1066 | layers=-1,135
1067 | 
1068 | #138
1069 | [shortcut]
1070 | from=-8
1071 | 
1072 | #139
1073 | [convolutional]
1074 | batch_normalize=1
1075 | filters=960
1076 | size=1
1077 | stride=1
1078 | group=1
1079 | pad=0
1080 | activation=relu
1081 | 
1082 | 
1083 | 
1084 | 
1085 | #######Backbone结束
1086 | 
1087 | #140
1088 | [convolutional]
1089 | batch_normalize=1
1090 | filters=512
1091 | size=1
1092 | stride=1
1093 | pad=0
1094 | group=1
1095 | activation=leaky
1096 | 
1097 | #141
1098 | [convolutional]
1099 | batch_normalize=1
1100 | size=3
1101 | stride=1
1102 | pad=1
1103 | group=1
1104 | filters=1024
1105 | activation=leaky
1106 | 
1107 | #142
1108 | [convolutional]
1109 | batch_normalize=1
1110 | group=1
1111 | filters=512
1112 | size=1
1113 | stride=1
1114 | pad=0
1115 | activation=leaky
1116 | 
1117 | #143
1118 | [convolutional]
1119 | batch_normalize=1
1120 | size=3
1121 | stride=1
1122 | pad=1
1123 | group=1
1124 | filters=1024
1125 | activation=leaky
1126 | 
1127 | #144
1128 | [convolutional]
1129 | batch_normalize=1
1130 | filters=512
1131 | size=1
1132 | stride=1
1133 | pad=0
1134 | group=1
1135 | activation=leaky
1136 | 
1137 | #145
1138 | [convolutional]
1139 | batch_normalize=1
1140 | size=3
1141 | stride=1
1142 | pad=1
1143 | filters=1024
1144 | group=1
1145 | activation=leaky
1146 | 
1147 | #146
1148 | [convolutional]
1149 | size=1
1150 | stride=1
1151 | pad=0
1152 | filters=45
1153 | group=1
1154 | activation=linear
1155 | 
1156 | #147
1157 | [yolo]
1158 | mask = 6,7,8
1159 | anchors =  4,5,  6,10,  14,9,  11,18,  25,15,  21,30,  47,26,  37,53,  87,65
1160 | classes=10
1161 | num=9
1162 | jitter=.3
1163 | ignore_thresh = .7
1164 | truth_thresh = 1
1165 | random=1
1166 | 
1167 | #148
1168 | [route]
1169 | layers = -4
1170 | 
1171 | #149
1172 | [convolutional]
1173 | batch_normalize=1
1174 | filters=256
1175 | size=1
1176 | stride=1
1177 | group=1
1178 | pad=0
1179 | activation=leaky
1180 | 
1181 | #150
1182 | [upsample]
1183 | stride=2
1184 | 
1185 | #151
1186 | [route]
1187 | layers = -1, 96
1188 | 
1189 | #152
1190 | [convolutional]
1191 | batch_normalize=1
1192 | filters=256
1193 | size=1
1194 | group=1
1195 | stride=1
1196 | pad=0
1197 | activation=leaky
1198 | 
1199 | #153
1200 | [convolutional]
1201 | batch_normalize=1
1202 | size=3
1203 | stride=1
1204 | pad=1
1205 | group=1
1206 | filters=512
1207 | activation=leaky
1208 | 
1209 | #154
1210 | [convolutional]
1211 | batch_normalize=1
1212 | filters=256
1213 | size=1
1214 | group=1
1215 | stride=1
1216 | pad=0
1217 | activation=leaky
1218 | 
1219 | #155
1220 | [convolutional]
1221 | batch_normalize=1
1222 | size=3
1223 | group=1
1224 | stride=1
1225 | pad=1
1226 | filters=512
1227 | activation=leaky
1228 | 
1229 | #156
1230 | [convolutional]
1231 | batch_normalize=1
1232 | filters=256
1233 | size=1
1234 | stride=1
1235 | pad=0
1236 | group=1
1237 | activation=leaky
1238 | 
1239 | #157
1240 | [convolutional]
1241 | batch_normalize=1
1242 | size=3
1243 | stride=1
1244 | pad=1
1245 | filters=512
1246 | group=1
1247 | activation=leaky
1248 | 
1249 | #158
1250 | [convolutional]
1251 | size=1
1252 | stride=1
1253 | pad=0
1254 | group=1
1255 | filters=45
1256 | activation=linear
1257 | 
1258 | #159
1259 | [yolo]
1260 | mask = 3,4,5
1261 | anchors =  4,5,  6,10,  14,9,  11,18,  25,15,  21,30,  47,26,  37,53,  87,65
1262 | classes=10
1263 | num=9
1264 | jitter=.3
1265 | ignore_thresh = .7
1266 | truth_thresh = 1
1267 | random=1
1268 | 
1269 | 
1270 | #160
1271 | [route]
1272 | layers = -4
1273 | 
1274 | #161
1275 | [convolutional]
1276 | batch_normalize=1
1277 | filters=128
1278 | size=1
1279 | stride=1
1280 | pad=0
1281 | group=1
1282 | activation=leaky
1283 | 
1284 | #162
1285 | [upsample]
1286 | stride=2
1287 | 
1288 | #163
1289 | [route]
1290 | layers = -1, 45
1291 | 
1292 | #164
1293 | [convolutional]
1294 | batch_normalize=1
1295 | filters=128
1296 | size=1
1297 | group=1
1298 | stride=1
1299 | pad=0
1300 | activation=leaky
1301 | 
1302 | #165
1303 | [convolutional]
1304 | batch_normalize=1
1305 | size=3
1306 | stride=1
1307 | pad=1
1308 | group=1
1309 | filters=256
1310 | activation=leaky
1311 | 
1312 | #166
1313 | [convolutional]
1314 | batch_normalize=1
1315 | filters=128
1316 | size=1
1317 | group=1
1318 | stride=1
1319 | pad=0
1320 | activation=leaky
1321 | 
1322 | #167
1323 | [convolutional]
1324 | batch_normalize=1
1325 | size=3
1326 | stride=1
1327 | pad=1
1328 | group=1
1329 | filters=256
1330 | activation=leaky
1331 | 
1332 | #168
1333 | [convolutional]
1334 | batch_normalize=1
1335 | filters=128
1336 | size=1
1337 | group=1
1338 | stride=1
1339 | pad=0
1340 | activation=leaky
1341 | 
1342 | #169
1343 | [convolutional]
1344 | batch_normalize=1
1345 | size=3
1346 | stride=1
1347 | pad=1
1348 | group=1
1349 | filters=256
1350 | activation=leaky
1351 | 
1352 | #170
1353 | [convolutional]
1354 | size=1
1355 | stride=1
1356 | pad=0
1357 | group=1
1358 | filters=45
1359 | activation=linear
1360 | 
1361 | #171
1362 | [yolo]
1363 | mask = 0,1,2
1364 | anchors =  4,5,  6,10,  14,9,  11,18,  25,15,  21,30,  47,26,  37,53,  87,65
1365 | classes=10
1366 | num=9
1367 | jitter=.3
1368 | ignore_thresh = .7
1369 | truth_thresh = 1
1370 | random=1
1371 | 
1372 | 


--------------------------------------------------------------------------------
/cfg/yolov3-hand.cfg:
--------------------------------------------------------------------------------
  1 | 
  2 | [net]
  3 | # Testing
  4 | #batch=1
  5 | #subdivisions=1
  6 | # Training
  7 | batch=16
  8 | subdivisions=1
  9 | width=416
 10 | height=416
 11 | channels=3
 12 | momentum=0.9
 13 | decay=0.0005
 14 | angle=0
 15 | saturation = 1.5
 16 | exposure = 1.5
 17 | hue=.1
 18 | 
 19 | learning_rate=0.001
 20 | burn_in=1000
 21 | max_batches = 500200
 22 | policy=steps
 23 | steps=400000,450000
 24 | scales=.1,.1
 25 | 
 26 | [convolutional]
 27 | batch_normalize=1
 28 | filters=32
 29 | size=3
 30 | stride=1
 31 | pad=1
 32 | activation=leaky
 33 | 
 34 | # Downsample
 35 | 
 36 | [convolutional]
 37 | batch_normalize=1
 38 | filters=64
 39 | size=3
 40 | stride=2
 41 | pad=1
 42 | activation=leaky
 43 | 
 44 | [convolutional]
 45 | batch_normalize=1
 46 | filters=32
 47 | size=1
 48 | stride=1
 49 | pad=1
 50 | activation=leaky
 51 | 
 52 | [convolutional]
 53 | batch_normalize=1
 54 | filters=64
 55 | size=3
 56 | stride=1
 57 | pad=1
 58 | activation=leaky
 59 | 
 60 | [shortcut]
 61 | from=-3
 62 | activation=linear
 63 | 
 64 | # Downsample
 65 | 
 66 | [convolutional]
 67 | batch_normalize=1
 68 | filters=128
 69 | size=3
 70 | stride=2
 71 | pad=1
 72 | activation=leaky
 73 | 
 74 | [convolutional]
 75 | batch_normalize=1
 76 | filters=64
 77 | size=1
 78 | stride=1
 79 | pad=1
 80 | activation=leaky
 81 | 
 82 | [convolutional]
 83 | batch_normalize=1
 84 | filters=128
 85 | size=3
 86 | stride=1
 87 | pad=1
 88 | activation=leaky
 89 | 
 90 | [shortcut]
 91 | from=-3
 92 | activation=linear
 93 | 
 94 | [convolutional]
 95 | batch_normalize=1
 96 | filters=64
 97 | size=1
 98 | stride=1
 99 | pad=1
100 | activation=leaky
101 | 
102 | [convolutional]
103 | batch_normalize=1
104 | filters=128
105 | size=3
106 | stride=1
107 | pad=1
108 | activation=leaky
109 | 
110 | [shortcut]
111 | from=-3
112 | activation=linear
113 | 
114 | # Downsample
115 | 
116 | [convolutional]
117 | batch_normalize=1
118 | filters=256
119 | size=3
120 | stride=2
121 | pad=1
122 | activation=leaky
123 | 
124 | [convolutional]
125 | batch_normalize=1
126 | filters=128
127 | size=1
128 | stride=1
129 | pad=1
130 | activation=leaky
131 | 
132 | [convolutional]
133 | batch_normalize=1
134 | filters=256
135 | size=3
136 | stride=1
137 | pad=1
138 | activation=leaky
139 | 
140 | [shortcut]
141 | from=-3
142 | activation=linear
143 | 
144 | [convolutional]
145 | batch_normalize=1
146 | filters=128
147 | size=1
148 | stride=1
149 | pad=1
150 | activation=leaky
151 | 
152 | [convolutional]
153 | batch_normalize=1
154 | filters=256
155 | size=3
156 | stride=1
157 | pad=1
158 | activation=leaky
159 | 
160 | [shortcut]
161 | from=-3
162 | activation=linear
163 | 
164 | [convolutional]
165 | batch_normalize=1
166 | filters=128
167 | size=1
168 | stride=1
169 | pad=1
170 | activation=leaky
171 | 
172 | [convolutional]
173 | batch_normalize=1
174 | filters=256
175 | size=3
176 | stride=1
177 | pad=1
178 | activation=leaky
179 | 
180 | [shortcut]
181 | from=-3
182 | activation=linear
183 | 
184 | [convolutional]
185 | batch_normalize=1
186 | filters=128
187 | size=1
188 | stride=1
189 | pad=1
190 | activation=leaky
191 | 
192 | [convolutional]
193 | batch_normalize=1
194 | filters=256
195 | size=3
196 | stride=1
197 | pad=1
198 | activation=leaky
199 | 
200 | [shortcut]
201 | from=-3
202 | activation=linear
203 | 
204 | 
205 | [convolutional]
206 | batch_normalize=1
207 | filters=128
208 | size=1
209 | stride=1
210 | pad=1
211 | activation=leaky
212 | 
213 | [convolutional]
214 | batch_normalize=1
215 | filters=256
216 | size=3
217 | stride=1
218 | pad=1
219 | activation=leaky
220 | 
221 | [shortcut]
222 | from=-3
223 | activation=linear
224 | 
225 | [convolutional]
226 | batch_normalize=1
227 | filters=128
228 | size=1
229 | stride=1
230 | pad=1
231 | activation=leaky
232 | 
233 | [convolutional]
234 | batch_normalize=1
235 | filters=256
236 | size=3
237 | stride=1
238 | pad=1
239 | activation=leaky
240 | 
241 | [shortcut]
242 | from=-3
243 | activation=linear
244 | 
245 | [convolutional]
246 | batch_normalize=1
247 | filters=128
248 | size=1
249 | stride=1
250 | pad=1
251 | activation=leaky
252 | 
253 | [convolutional]
254 | batch_normalize=1
255 | filters=256
256 | size=3
257 | stride=1
258 | pad=1
259 | activation=leaky
260 | 
261 | [shortcut]
262 | from=-3
263 | activation=linear
264 | 
265 | [convolutional]
266 | batch_normalize=1
267 | filters=128
268 | size=1
269 | stride=1
270 | pad=1
271 | activation=leaky
272 | 
273 | [convolutional]
274 | batch_normalize=1
275 | filters=256
276 | size=3
277 | stride=1
278 | pad=1
279 | activation=leaky
280 | 
281 | [shortcut]
282 | from=-3
283 | activation=linear
284 | 
285 | # Downsample
286 | 
287 | [convolutional]
288 | batch_normalize=1
289 | filters=512
290 | size=3
291 | stride=2
292 | pad=1
293 | activation=leaky
294 | 
295 | [convolutional]
296 | batch_normalize=1
297 | filters=256
298 | size=1
299 | stride=1
300 | pad=1
301 | activation=leaky
302 | 
303 | [convolutional]
304 | batch_normalize=1
305 | filters=512
306 | size=3
307 | stride=1
308 | pad=1
309 | activation=leaky
310 | 
311 | [shortcut]
312 | from=-3
313 | activation=linear
314 | 
315 | 
316 | [convolutional]
317 | batch_normalize=1
318 | filters=256
319 | size=1
320 | stride=1
321 | pad=1
322 | activation=leaky
323 | 
324 | [convolutional]
325 | batch_normalize=1
326 | filters=512
327 | size=3
328 | stride=1
329 | pad=1
330 | activation=leaky
331 | 
332 | [shortcut]
333 | from=-3
334 | activation=linear
335 | 
336 | 
337 | [convolutional]
338 | batch_normalize=1
339 | filters=256
340 | size=1
341 | stride=1
342 | pad=1
343 | activation=leaky
344 | 
345 | [convolutional]
346 | batch_normalize=1
347 | filters=512
348 | size=3
349 | stride=1
350 | pad=1
351 | activation=leaky
352 | 
353 | [shortcut]
354 | from=-3
355 | activation=linear
356 | 
357 | 
358 | [convolutional]
359 | batch_normalize=1
360 | filters=256
361 | size=1
362 | stride=1
363 | pad=1
364 | activation=leaky
365 | 
366 | [convolutional]
367 | batch_normalize=1
368 | filters=512
369 | size=3
370 | stride=1
371 | pad=1
372 | activation=leaky
373 | 
374 | [shortcut]
375 | from=-3
376 | activation=linear
377 | 
378 | [convolutional]
379 | batch_normalize=1
380 | filters=256
381 | size=1
382 | stride=1
383 | pad=1
384 | activation=leaky
385 | 
386 | [convolutional]
387 | batch_normalize=1
388 | filters=512
389 | size=3
390 | stride=1
391 | pad=1
392 | activation=leaky
393 | 
394 | [shortcut]
395 | from=-3
396 | activation=linear
397 | 
398 | 
399 | [convolutional]
400 | batch_normalize=1
401 | filters=256
402 | size=1
403 | stride=1
404 | pad=1
405 | activation=leaky
406 | 
407 | [convolutional]
408 | batch_normalize=1
409 | filters=512
410 | size=3
411 | stride=1
412 | pad=1
413 | activation=leaky
414 | 
415 | [shortcut]
416 | from=-3
417 | activation=linear
418 | 
419 | 
420 | [convolutional]
421 | batch_normalize=1
422 | filters=256
423 | size=1
424 | stride=1
425 | pad=1
426 | activation=leaky
427 | 
428 | [convolutional]
429 | batch_normalize=1
430 | filters=512
431 | size=3
432 | stride=1
433 | pad=1
434 | activation=leaky
435 | 
436 | [shortcut]
437 | from=-3
438 | activation=linear
439 | 
440 | [convolutional]
441 | batch_normalize=1
442 | filters=256
443 | size=1
444 | stride=1
445 | pad=1
446 | activation=leaky
447 | 
448 | [convolutional]
449 | batch_normalize=1
450 | filters=512
451 | size=3
452 | stride=1
453 | pad=1
454 | activation=leaky
455 | 
456 | [shortcut]
457 | from=-3
458 | activation=linear
459 | 
460 | # Downsample
461 | 
462 | [convolutional]
463 | batch_normalize=1
464 | filters=1024
465 | size=3
466 | stride=2
467 | pad=1
468 | activation=leaky
469 | 
470 | [convolutional]
471 | batch_normalize=1
472 | filters=512
473 | size=1
474 | stride=1
475 | pad=1
476 | activation=leaky
477 | 
478 | [convolutional]
479 | batch_normalize=1
480 | filters=1024
481 | size=3
482 | stride=1
483 | pad=1
484 | activation=leaky
485 | 
486 | [shortcut]
487 | from=-3
488 | activation=linear
489 | 
490 | [convolutional]
491 | batch_normalize=1
492 | filters=512
493 | size=1
494 | stride=1
495 | pad=1
496 | activation=leaky
497 | 
498 | [convolutional]
499 | batch_normalize=1
500 | filters=1024
501 | size=3
502 | stride=1
503 | pad=1
504 | activation=leaky
505 | 
506 | [shortcut]
507 | from=-3
508 | activation=linear
509 | 
510 | [convolutional]
511 | batch_normalize=1
512 | filters=512
513 | size=1
514 | stride=1
515 | pad=1
516 | activation=leaky
517 | 
518 | [convolutional]
519 | batch_normalize=1
520 | filters=1024
521 | size=3
522 | stride=1
523 | pad=1
524 | activation=leaky
525 | 
526 | [shortcut]
527 | from=-3
528 | activation=linear
529 | 
530 | [convolutional]
531 | batch_normalize=1
532 | filters=512
533 | size=1
534 | stride=1
535 | pad=1
536 | activation=leaky
537 | 
538 | [convolutional]
539 | batch_normalize=1
540 | filters=1024
541 | size=3
542 | stride=1
543 | pad=1
544 | activation=leaky
545 | 
546 | [shortcut]
547 | from=-3
548 | activation=linear
549 | 
550 | ######################
551 | 
552 | [convolutional]
553 | batch_normalize=1
554 | filters=512
555 | size=1
556 | stride=1
557 | pad=1
558 | activation=leaky
559 | 
560 | [convolutional]
561 | batch_normalize=1
562 | size=3
563 | stride=1
564 | pad=1
565 | filters=1024
566 | activation=leaky
567 | 
568 | [convolutional]
569 | batch_normalize=1
570 | filters=512
571 | size=1
572 | stride=1
573 | pad=1
574 | activation=leaky
575 | 
576 | [convolutional]
577 | batch_normalize=1
578 | size=3
579 | stride=1
580 | pad=1
581 | filters=1024
582 | activation=leaky
583 | 
584 | [convolutional]
585 | batch_normalize=1
586 | filters=512
587 | size=1
588 | stride=1
589 | pad=1
590 | activation=leaky
591 | 
592 | [convolutional]
593 | batch_normalize=1
594 | size=3
595 | stride=1
596 | pad=1
597 | filters=1024
598 | activation=leaky
599 | 
600 | [convolutional]
601 | size=1
602 | stride=1
603 | pad=1
604 | filters=18
605 | activation=linear
606 | 
607 | 
608 | [yolo]
609 | mask = 6,7,8
610 | anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
611 | classes=1
612 | num=9
613 | jitter=.3
614 | ignore_thresh = .7
615 | truth_thresh = 1
616 | random=1
617 | 
618 | 
619 | [route]
620 | layers = -4
621 | 
622 | [convolutional]
623 | batch_normalize=1
624 | filters=256
625 | size=1
626 | stride=1
627 | pad=1
628 | activation=leaky
629 | 
630 | [upsample]
631 | stride=2
632 | 
633 | [route]
634 | layers = -1, 61
635 | 
636 | 
637 | 
638 | [convolutional]
639 | batch_normalize=1
640 | filters=256
641 | size=1
642 | stride=1
643 | pad=1
644 | activation=leaky
645 | 
646 | [convolutional]
647 | batch_normalize=1
648 | size=3
649 | stride=1
650 | pad=1
651 | filters=512
652 | activation=leaky
653 | 
654 | [convolutional]
655 | batch_normalize=1
656 | filters=256
657 | size=1
658 | stride=1
659 | pad=1
660 | activation=leaky
661 | 
662 | [convolutional]
663 | batch_normalize=1
664 | size=3
665 | stride=1
666 | pad=1
667 | filters=512
668 | activation=leaky
669 | 
670 | [convolutional]
671 | batch_normalize=1
672 | filters=256
673 | size=1
674 | stride=1
675 | pad=1
676 | activation=leaky
677 | 
678 | [convolutional]
679 | batch_normalize=1
680 | size=3
681 | stride=1
682 | pad=1
683 | filters=512
684 | activation=leaky
685 | 
686 | [convolutional]
687 | size=1
688 | stride=1
689 | pad=1
690 | filters=18
691 | activation=linear
692 | 
693 | 
694 | [yolo]
695 | mask = 3,4,5
696 | anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
697 | classes=1
698 | num=9
699 | jitter=.3
700 | ignore_thresh = .7
701 | truth_thresh = 1
702 | random=1
703 | 
704 | 
705 | 
706 | [route]
707 | layers = -4
708 | 
709 | [convolutional]
710 | batch_normalize=1
711 | filters=128
712 | size=1
713 | stride=1
714 | pad=1
715 | activation=leaky
716 | 
717 | [upsample]
718 | stride=2
719 | 
720 | [route]
721 | layers = -1, 36
722 | 
723 | 
724 | 
725 | [convolutional]
726 | batch_normalize=1
727 | filters=128
728 | size=1
729 | stride=1
730 | pad=1
731 | activation=leaky
732 | 
733 | [convolutional]
734 | batch_normalize=1
735 | size=3
736 | stride=1
737 | pad=1
738 | filters=256
739 | activation=leaky
740 | 
741 | [convolutional]
742 | batch_normalize=1
743 | filters=128
744 | size=1
745 | stride=1
746 | pad=1
747 | activation=leaky
748 | 
749 | [convolutional]
750 | batch_normalize=1
751 | size=3
752 | stride=1
753 | pad=1
754 | filters=256
755 | activation=leaky
756 | 
757 | [convolutional]
758 | batch_normalize=1
759 | filters=128
760 | size=1
761 | stride=1
762 | pad=1
763 | activation=leaky
764 | 
765 | [convolutional]
766 | batch_normalize=1
767 | size=3
768 | stride=1
769 | pad=1
770 | filters=256
771 | activation=leaky
772 | 
773 | [convolutional]
774 | size=1
775 | stride=1
776 | pad=1
777 | filters=18
778 | activation=linear
779 | 
780 | 
781 | [yolo]
782 | mask = 0,1,2
783 | anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
784 | classes=1
785 | num=9
786 | jitter=.3
787 | ignore_thresh = .7
788 | truth_thresh = 1
789 | random=1
790 | 
791 | 


--------------------------------------------------------------------------------
/cfg/yolov3-shufflenetv2-hand.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | batch=1
  4 | subdivisions=1
  5 | # Training
  6 | batch=16
  7 | subdivisions=1
  8 | width=416
  9 | height=416
 10 | channels=3
 11 | momentum=0.9
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.001
 19 | burn_in=1000
 20 | max_batches = 500200
 21 | policy=steps
 22 | steps=15,25,60,99,150,160,180
 23 | scales=0.5,0.5,0.1,0.5,0.5,0.1,0.1
 24 | 
 25 | # 0
 26 | [convolutional]
 27 | batch_normalize=1
 28 | filters=24
 29 | size=3
 30 | stride=2
 31 | pad=1
 32 | activation=HardSwish
 33 | 
 34 | #1
 35 | [maxpool]
 36 | size=3
 37 | stride=2
 38 | 
 39 | # 2
 40 | [stage2]
 41 | out_channels=176
 42 | 
 43 | [stage3]
 44 | out_channels=352
 45 | 
 46 | [eca]
 47 | kernel_size=16
 48 | 
 49 | [stage4]
 50 | out_channels=704
 51 | 
 52 | [eca]
 53 | kernel_size=16
 54 | 
 55 | [convolutional]
 56 | batch_normalize=1
 57 | filters=1024
 58 | size=1
 59 | stride=1
 60 | pad=1
 61 | activation=HardSwish
 62 | 
 63 | [eca]
 64 | kernel_size=16
 65 | 
 66 | 
 67 | 
 68 | ###########
 69 | 
 70 | 
 71 | [convolutional]
 72 | batch_normalize=1
 73 | filters=256
 74 | size=1
 75 | stride=1
 76 | pad=1
 77 | activation=leaky
 78 | 
 79 | [convolutional]
 80 | batch_normalize=1
 81 | filters=512
 82 | size=3
 83 | stride=1
 84 | pad=1
 85 | activation=leaky
 86 | 
 87 | 
 88 | [convolutional]
 89 | size=1
 90 | stride=1
 91 | pad=1
 92 | filters=18
 93 | activation=linear
 94 | 
 95 | 
 96 | [yolo]
 97 | mask = 3,4,5
 98 | anchors = 16,19,  28,30,  40,42,  58,57,  85,85,  154,152
 99 | classes=1
100 | num=6
101 | jitter=.3
102 | ignore_thresh = .7
103 | truth_thresh = 1
104 | random=1
105 | 
106 | [route]
107 | layers = -4
108 | 
109 | # 18
110 | [convolutional]
111 | batch_normalize=1
112 | filters=128
113 | size=1
114 | stride=1
115 | pad=1
116 | activation=leaky
117 | 
118 | # 19
119 | [upsample]
120 | stride=2
121 | 
122 | # 20
123 | [route]
124 | layers = -1, 3
125 | # 21
126 | [convolutional]
127 | batch_normalize=1
128 | filters=256
129 | size=3
130 | stride=1
131 | pad=1
132 | activation=leaky
133 | 
134 | # 22
135 | [convolutional]
136 | size=1
137 | stride=1
138 | pad=1
139 | filters=18
140 | activation=linear
141 | 
142 | # 23
143 | [yolo]
144 | mask = 0,1,2
145 | anchors = 16,19,  28,30,  40,42,  58,57,  85,85,  154,152
146 | classes=1
147 | num=6
148 | jitter=.3
149 | ignore_thresh = .7
150 | truth_thresh = 1
151 | random=1
152 | 
153 | 
154 | 
155 | 
156 | 
157 | 
158 | 


--------------------------------------------------------------------------------
/cfg/yolov3-shufflenetv2-visdrone.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | batch=1
  4 | subdivisions=1
  5 | # Training
  6 | batch=16
  7 | subdivisions=1
  8 | width=416
  9 | height=416
 10 | channels=3
 11 | momentum=0.9
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.001
 19 | burn_in=1000
 20 | max_batches = 500200
 21 | policy=steps
 22 | steps=15,25,60,99,150,160,180
 23 | scales=0.5,0.5,0.1,0.5,0.5,0.1,0.1
 24 | 
 25 | # 0
 26 | [convolutional]
 27 | batch_normalize=1
 28 | filters=24
 29 | size=3
 30 | stride=2
 31 | pad=1
 32 | activation=HardSwish
 33 | 
 34 | #1
 35 | [maxpool]
 36 | size=3
 37 | stride=2
 38 | 
 39 | # 2
 40 | [stage2]
 41 | out_channels=176
 42 | 
 43 | [stage3]
 44 | out_channels=352
 45 | 
 46 | [eca]
 47 | kernel_size=16
 48 | 
 49 | [stage4]
 50 | out_channels=704
 51 | 
 52 | [eca]
 53 | kernel_size=16
 54 | 
 55 | [convolutional]
 56 | batch_normalize=1
 57 | filters=1024
 58 | size=1
 59 | stride=1
 60 | pad=1
 61 | activation=HardSwish
 62 | 
 63 | [eca]
 64 | kernel_size=16
 65 | 
 66 | 
 67 | 
 68 | ###########
 69 | 
 70 | 
 71 | [convolutional]
 72 | batch_normalize=1
 73 | filters=256
 74 | size=1
 75 | stride=1
 76 | pad=1
 77 | activation=leaky
 78 | 
 79 | [convolutional]
 80 | batch_normalize=1
 81 | filters=512
 82 | size=3
 83 | stride=1
 84 | pad=1
 85 | activation=leaky
 86 | 
 87 | 
 88 | [convolutional]
 89 | size=1
 90 | stride=1
 91 | pad=1
 92 | filters=45
 93 | activation=linear
 94 | 
 95 | 
 96 | [yolo]
 97 | mask = 3,4,5
 98 | anchors = 5,6,  9,15,  18,11,  21,32,  38,23,  67,58
 99 | classes=10
100 | num=6
101 | jitter=.3
102 | ignore_thresh = .7
103 | truth_thresh = 1
104 | random=1
105 | 
106 | [route]
107 | layers = -4
108 | 
109 | # 18
110 | [convolutional]
111 | batch_normalize=1
112 | filters=128
113 | size=1
114 | stride=1
115 | pad=1
116 | activation=leaky
117 | 
118 | # 19
119 | [upsample]
120 | stride=2
121 | 
122 | # 20
123 | [route]
124 | layers = -1, 3
125 | # 21
126 | [convolutional]
127 | batch_normalize=1
128 | filters=256
129 | size=3
130 | stride=1
131 | pad=1
132 | activation=leaky
133 | 
134 | # 22
135 | [convolutional]
136 | size=1
137 | stride=1
138 | pad=1
139 | filters=45
140 | activation=linear
141 | 
142 | # 23
143 | [yolo]
144 | mask = 0,1,2
145 | anchors = 5,6,  9,15,  18,11,  21,32,  38,23,  67,58
146 | classes=10
147 | num=6
148 | jitter=.3
149 | ignore_thresh = .7
150 | truth_thresh = 1
151 | random=1
152 | 
153 | 
154 | 
155 | 
156 | 
157 | 
158 | 


--------------------------------------------------------------------------------
/cfg/yolov3-tiny-hand-cbam.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | batch=1
  4 | subdivisions=1
  5 | # Training
  6 | batch=16
  7 | subdivisions=1
  8 | width=416
  9 | height=416
 10 | channels=3
 11 | momentum=0.9
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.001
 19 | burn_in=1000
 20 | max_batches = 500200
 21 | policy=steps
 22 | steps=15,25,60,99,150,160,180
 23 | scales=0.5,0.5,0.1,0.5,0.5,0.1,0.1
 24 | 
 25 | # 0
 26 | [convolutional]
 27 | batch_normalize=1
 28 | filters=16
 29 | size=3
 30 | stride=1
 31 | pad=1
 32 | activation=leaky
 33 | 
 34 | # 1
 35 | [maxpool]
 36 | size=2
 37 | stride=2
 38 | 
 39 | # 2
 40 | [convolutional]
 41 | batch_normalize=1
 42 | filters=32
 43 | size=3
 44 | stride=1
 45 | pad=1
 46 | activation=leaky
 47 | 
 48 | # 3
 49 | [maxpool]
 50 | size=2
 51 | stride=2
 52 | 
 53 | # 4
 54 | [convolutional]
 55 | batch_normalize=1
 56 | filters=64
 57 | size=3
 58 | stride=1
 59 | pad=1
 60 | activation=leaky
 61 | 
 62 | # 5
 63 | [maxpool]
 64 | size=2
 65 | stride=2
 66 | 
 67 | # 6
 68 | [convolutional]
 69 | batch_normalize=1
 70 | filters=128
 71 | size=3
 72 | stride=1
 73 | pad=1
 74 | activation=leaky
 75 | 
 76 | # 7
 77 | [maxpool]
 78 | size=2
 79 | stride=2
 80 | 
 81 | # 8
 82 | [convolutional]
 83 | batch_normalize=1
 84 | filters=256
 85 | size=3
 86 | stride=1
 87 | pad=1
 88 | activation=leaky
 89 | 
 90 | # 9
 91 | [maxpool]
 92 | size=2
 93 | stride=2
 94 | 
 95 | # 10
 96 | [convolutional]
 97 | batch_normalize=1
 98 | filters=512
 99 | size=3
100 | stride=1
101 | pad=1
102 | activation=leaky
103 | 
104 | # 11
105 | [maxpool]
106 | size=2
107 | stride=1
108 | 
109 | # 12
110 | [convolutional]
111 | batch_normalize=1
112 | filters=1024
113 | size=3
114 | stride=1
115 | pad=1
116 | activation=leaky
117 | 
118 | [ca]
119 | ratio=16
120 | 
121 | [sa]
122 | kernelsize=7
123 | 
124 | ###########
125 | 
126 | # 13
127 | [convolutional]
128 | batch_normalize=1
129 | filters=256
130 | size=1
131 | stride=1
132 | pad=1
133 | activation=leaky
134 | 
135 | # 14
136 | [convolutional]
137 | batch_normalize=1
138 | filters=512
139 | size=3
140 | stride=1
141 | pad=1
142 | activation=leaky
143 | 
144 | # 15
145 | [convolutional]
146 | size=1
147 | stride=1
148 | pad=1
149 | filters=18
150 | activation=linear
151 | 
152 | 
153 | 
154 | # 16
155 | [yolo]
156 | mask = 3,4,5
157 | anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
158 | classes=1
159 | num=6
160 | jitter=.3
161 | ignore_thresh = .7
162 | truth_thresh = 1
163 | random=1
164 | 
165 | # 17
166 | [route]
167 | layers = -4
168 | 
169 | # 18
170 | [convolutional]
171 | batch_normalize=1
172 | filters=128
173 | size=1
174 | stride=1
175 | pad=1
176 | activation=leaky
177 | 
178 | # 19
179 | [upsample]
180 | stride=2
181 | 
182 | # 20
183 | [route]
184 | layers = -1, 8
185 | 
186 | # 21
187 | [convolutional]
188 | batch_normalize=1
189 | filters=256
190 | size=3
191 | stride=1
192 | pad=1
193 | activation=leaky
194 | 
195 | # 22
196 | [convolutional]
197 | size=1
198 | stride=1
199 | pad=1
200 | filters=18
201 | activation=linear
202 | 
203 | # 23
204 | [yolo]
205 | mask = 1,2,3
206 | anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
207 | classes=1
208 | num=6
209 | jitter=.3
210 | ignore_thresh = .7
211 | truth_thresh = 1
212 | random=1
213 | 


--------------------------------------------------------------------------------
/cfg/yolov3-tiny-hand-eca.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | batch=1
  4 | subdivisions=1
  5 | # Training
  6 | batch=16
  7 | subdivisions=1
  8 | width=416
  9 | height=416
 10 | channels=3
 11 | momentum=0.9
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.001
 19 | burn_in=1000
 20 | max_batches = 500200
 21 | policy=steps
 22 | steps=15,25,60,99,150,160,180
 23 | scales=0.5,0.5,0.1,0.5,0.5,0.1,0.1
 24 | 
 25 | # 0
 26 | [convolutional]
 27 | batch_normalize=1
 28 | filters=16
 29 | size=3
 30 | stride=1
 31 | pad=1
 32 | activation=leaky
 33 | 
 34 | # 1
 35 | [maxpool]
 36 | size=2
 37 | stride=2
 38 | 
 39 | # 2
 40 | [convolutional]
 41 | batch_normalize=1
 42 | filters=32
 43 | size=3
 44 | stride=1
 45 | pad=1
 46 | activation=leaky
 47 | 
 48 | # 3
 49 | [maxpool]
 50 | size=2
 51 | stride=2
 52 | 
 53 | # 4
 54 | [convolutional]
 55 | batch_normalize=1
 56 | filters=64
 57 | size=3
 58 | stride=1
 59 | pad=1
 60 | activation=leaky
 61 | 
 62 | # 5
 63 | [maxpool]
 64 | size=2
 65 | stride=2
 66 | 
 67 | # 6
 68 | [convolutional]
 69 | batch_normalize=1
 70 | filters=128
 71 | size=3
 72 | stride=1
 73 | pad=1
 74 | activation=leaky
 75 | 
 76 | # 7
 77 | [maxpool]
 78 | size=2
 79 | stride=2
 80 | 
 81 | # 8
 82 | [convolutional]
 83 | batch_normalize=1
 84 | filters=256
 85 | size=3
 86 | stride=1
 87 | pad=1
 88 | activation=leaky
 89 | 
 90 | # 9
 91 | [maxpool]
 92 | size=2
 93 | stride=2
 94 | 
 95 | # 10
 96 | [convolutional]
 97 | batch_normalize=1
 98 | filters=512
 99 | size=3
100 | stride=1
101 | pad=1
102 | activation=leaky
103 | 
104 | # 11
105 | [maxpool]
106 | size=2
107 | stride=1
108 | 
109 | # 12
110 | [convolutional]
111 | batch_normalize=1
112 | filters=1024
113 | size=3
114 | stride=1
115 | pad=1
116 | activation=leaky
117 | 
118 | [eca]
119 | kernel_size=16
120 | 
121 | 
122 | ###########
123 | 
124 | # 13
125 | [convolutional]
126 | batch_normalize=1
127 | filters=256
128 | size=1
129 | stride=1
130 | pad=1
131 | activation=leaky
132 | 
133 | # 14
134 | [convolutional]
135 | batch_normalize=1
136 | filters=512
137 | size=3
138 | stride=1
139 | pad=1
140 | activation=leaky
141 | 
142 | # 15
143 | [convolutional]
144 | size=1
145 | stride=1
146 | pad=1
147 | filters=18
148 | activation=linear
149 | 
150 | 
151 | 
152 | # 16
153 | [yolo]
154 | mask = 3,4,5
155 | anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
156 | classes=1
157 | num=6
158 | jitter=.3
159 | ignore_thresh = .7
160 | truth_thresh = 1
161 | random=1
162 | 
163 | # 17
164 | [route]
165 | layers = -4
166 | 
167 | # 18
168 | [convolutional]
169 | batch_normalize=1
170 | filters=128
171 | size=1
172 | stride=1
173 | pad=1
174 | activation=leaky
175 | 
176 | # 19
177 | [upsample]
178 | stride=2
179 | 
180 | # 20
181 | [route]
182 | layers = -1, 8
183 | 
184 | # 21
185 | [convolutional]
186 | batch_normalize=1
187 | filters=256
188 | size=3
189 | stride=1
190 | pad=1
191 | activation=leaky
192 | 
193 | # 22
194 | [convolutional]
195 | size=1
196 | stride=1
197 | pad=1
198 | filters=18
199 | activation=linear
200 | 
201 | # 23
202 | [yolo]
203 | mask = 1,2,3
204 | anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
205 | classes=1
206 | num=6
207 | jitter=.3
208 | ignore_thresh = .7
209 | truth_thresh = 1
210 | random=1
211 | 


--------------------------------------------------------------------------------
/cfg/yolov3-tiny-hand-se.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | batch=1
  4 | subdivisions=1
  5 | # Training
  6 | batch=16
  7 | subdivisions=1
  8 | width=416
  9 | height=416
 10 | channels=3
 11 | momentum=0.9
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.001
 19 | burn_in=1000
 20 | max_batches = 500200
 21 | policy=steps
 22 | steps=15,25,60,99,150,160,180
 23 | scales=0.5,0.5,0.1,0.5,0.5,0.1,0.1
 24 | 
 25 | # 0
 26 | [convolutional]
 27 | batch_normalize=1
 28 | filters=16
 29 | size=3
 30 | stride=1
 31 | pad=1
 32 | activation=leaky
 33 | 
 34 | # 1
 35 | [maxpool]
 36 | size=2
 37 | stride=2
 38 | 
 39 | # 2
 40 | [convolutional]
 41 | batch_normalize=1
 42 | filters=32
 43 | size=3
 44 | stride=1
 45 | pad=1
 46 | activation=leaky
 47 | 
 48 | # 3
 49 | [maxpool]
 50 | size=2
 51 | stride=2
 52 | 
 53 | # 4
 54 | [convolutional]
 55 | batch_normalize=1
 56 | filters=64
 57 | size=3
 58 | stride=1
 59 | pad=1
 60 | activation=leaky
 61 | 
 62 | # 5
 63 | [maxpool]
 64 | size=2
 65 | stride=2
 66 | 
 67 | # 6
 68 | [convolutional]
 69 | batch_normalize=1
 70 | filters=128
 71 | size=3
 72 | stride=1
 73 | pad=1
 74 | activation=leaky
 75 | 
 76 | # 7
 77 | [maxpool]
 78 | size=2
 79 | stride=2
 80 | 
 81 | # 8
 82 | [convolutional]
 83 | batch_normalize=1
 84 | filters=256
 85 | size=3
 86 | stride=1
 87 | pad=1
 88 | activation=leaky
 89 | 
 90 | # 9
 91 | [maxpool]
 92 | size=2
 93 | stride=2
 94 | 
 95 | # 10
 96 | [convolutional]
 97 | batch_normalize=1
 98 | filters=512
 99 | size=3
100 | stride=1
101 | pad=1
102 | activation=leaky
103 | 
104 | # 11
105 | [maxpool]
106 | size=2
107 | stride=1
108 | 
109 | # 12
110 | [convolutional]
111 | batch_normalize=1
112 | filters=1024
113 | size=3
114 | stride=1
115 | pad=1
116 | activation=leaky
117 | 
118 | [se]
119 | reduction=16
120 | 
121 | ###########
122 | 
123 | # 13
124 | [convolutional]
125 | batch_normalize=1
126 | filters=256
127 | size=1
128 | stride=1
129 | pad=1
130 | activation=leaky
131 | 
132 | # 14
133 | [convolutional]
134 | batch_normalize=1
135 | filters=512
136 | size=3
137 | stride=1
138 | pad=1
139 | activation=leaky
140 | 
141 | # 15
142 | [convolutional]
143 | size=1
144 | stride=1
145 | pad=1
146 | filters=18
147 | activation=linear
148 | 
149 | 
150 | 
151 | # 16
152 | [yolo]
153 | mask = 3,4,5
154 | anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
155 | classes=1
156 | num=6
157 | jitter=.3
158 | ignore_thresh = .7
159 | truth_thresh = 1
160 | random=1
161 | 
162 | # 17
163 | [route]
164 | layers = -4
165 | 
166 | # 18
167 | [convolutional]
168 | batch_normalize=1
169 | filters=128
170 | size=1
171 | stride=1
172 | pad=1
173 | activation=leaky
174 | 
175 | # 19
176 | [upsample]
177 | stride=2
178 | 
179 | # 20
180 | [route]
181 | layers = -1, 8
182 | 
183 | # 21
184 | [convolutional]
185 | batch_normalize=1
186 | filters=256
187 | size=3
188 | stride=1
189 | pad=1
190 | activation=leaky
191 | 
192 | # 22
193 | [convolutional]
194 | size=1
195 | stride=1
196 | pad=1
197 | filters=18
198 | activation=linear
199 | 
200 | # 23
201 | [yolo]
202 | mask = 1,2,3
203 | anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
204 | classes=1
205 | num=6
206 | jitter=.3
207 | ignore_thresh = .7
208 | truth_thresh = 1
209 | random=1
210 | 


--------------------------------------------------------------------------------
/cfg/yolov3-tiny-hand.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | batch=1
  4 | subdivisions=1
  5 | # Training
  6 | batch=16
  7 | subdivisions=1
  8 | width=416
  9 | height=416
 10 | channels=3
 11 | momentum=0.9
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.001
 19 | burn_in=1000
 20 | max_batches = 500200
 21 | policy=steps
 22 | steps=15,25,60,99,150,160,180
 23 | scales=0.5,0.5,0.1,0.5,0.5,0.1,0.1
 24 | 
 25 | # 0
 26 | [convolutional]
 27 | batch_normalize=1
 28 | filters=16
 29 | size=3
 30 | stride=1
 31 | pad=1
 32 | activation=leaky
 33 | 
 34 | # 1
 35 | [maxpool]
 36 | size=2
 37 | stride=2
 38 | 
 39 | # 2
 40 | [convolutional]
 41 | batch_normalize=1
 42 | filters=32
 43 | size=3
 44 | stride=1
 45 | pad=1
 46 | activation=leaky
 47 | 
 48 | # 3
 49 | [maxpool]
 50 | size=2
 51 | stride=2
 52 | 
 53 | # 4
 54 | [convolutional]
 55 | batch_normalize=1
 56 | filters=64
 57 | size=3
 58 | stride=1
 59 | pad=1
 60 | activation=leaky
 61 | 
 62 | # 5
 63 | [maxpool]
 64 | size=2
 65 | stride=2
 66 | 
 67 | # 6
 68 | [convolutional]
 69 | batch_normalize=1
 70 | filters=128
 71 | size=3
 72 | stride=1
 73 | pad=1
 74 | activation=leaky
 75 | 
 76 | # 7
 77 | [maxpool]
 78 | size=2
 79 | stride=2
 80 | 
 81 | # 8
 82 | [convolutional]
 83 | batch_normalize=1
 84 | filters=256
 85 | size=3
 86 | stride=1
 87 | pad=1
 88 | activation=leaky
 89 | 
 90 | # 9
 91 | [maxpool]
 92 | size=2
 93 | stride=2
 94 | 
 95 | # 10
 96 | [convolutional]
 97 | batch_normalize=1
 98 | filters=512
 99 | size=3
100 | stride=1
101 | pad=1
102 | activation=leaky
103 | 
104 | # 11
105 | [maxpool]
106 | size=2
107 | stride=1
108 | 
109 | # 12
110 | [convolutional]
111 | batch_normalize=1
112 | filters=1024
113 | size=3
114 | stride=1
115 | pad=1
116 | activation=leaky
117 | 
118 | ###########
119 | 
120 | # 13
121 | [convolutional]
122 | batch_normalize=1
123 | filters=256
124 | size=1
125 | stride=1
126 | pad=1
127 | activation=leaky
128 | 
129 | # 14
130 | [convolutional]
131 | batch_normalize=1
132 | filters=512
133 | size=3
134 | stride=1
135 | pad=1
136 | activation=leaky
137 | 
138 | # 15
139 | [convolutional]
140 | size=1
141 | stride=1
142 | pad=1
143 | filters=18
144 | activation=linear
145 | 
146 | 
147 | 
148 | # 16
149 | [yolo]
150 | mask = 3,4,5
151 | anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
152 | classes=1
153 | num=6
154 | jitter=.3
155 | ignore_thresh = .7
156 | truth_thresh = 1
157 | random=1
158 | 
159 | # 17
160 | [route]
161 | layers = -4
162 | 
163 | # 18
164 | [convolutional]
165 | batch_normalize=1
166 | filters=128
167 | size=1
168 | stride=1
169 | pad=1
170 | activation=leaky
171 | 
172 | # 19
173 | [upsample]
174 | stride=2
175 | 
176 | # 20
177 | [route]
178 | layers = -1, 8
179 | 
180 | # 21
181 | [convolutional]
182 | batch_normalize=1
183 | filters=256
184 | size=3
185 | stride=1
186 | pad=1
187 | activation=leaky
188 | 
189 | # 22
190 | [convolutional]
191 | size=1
192 | stride=1
193 | pad=1
194 | filters=18
195 | activation=linear
196 | 
197 | # 23
198 | [yolo]
199 | mask = 1,2,3
200 | anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
201 | classes=1
202 | num=6
203 | jitter=.3
204 | ignore_thresh = .7
205 | truth_thresh = 1
206 | random=1
207 | 


--------------------------------------------------------------------------------
/data/oxfordhand.data:
--------------------------------------------------------------------------------
1 | classes= 1
2 | train=data/train.txt
3 | valid=data/valid.txt
4 | names=data/oxfordhand.names
5 | 


--------------------------------------------------------------------------------
/data/oxfordhand.names:
--------------------------------------------------------------------------------
1 | hand
2 | 
3 | 


--------------------------------------------------------------------------------
/data/visdrone.data:
--------------------------------------------------------------------------------
1 | classes= 10
2 | train=data/visdrone/train.txt
3 | valid=data/visdrone/test.txt
4 | names=data/visdrone.names
5 | 


--------------------------------------------------------------------------------
/data/visdrone.names:
--------------------------------------------------------------------------------
 1 | pedestrian
 2 | people
 3 | bicycle
 4 | car
 5 | van
 6 | truck
 7 | tricycle
 8 | awning-tricycle
 9 | bus
10 | motor
11 | 


--------------------------------------------------------------------------------
/detect.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | 
  3 | from models import *  # set ONNX_EXPORT in models.py
  4 | from utils.datasets import *
  5 | from utils.utils import *
  6 | 
  7 | import os 
  8 | os.environ["CUDA_VISIBLE_DEVICES"] = '-1'
  9 | def detect(save_img=False):
 10 |     imgsz = (320, 192) if ONNX_EXPORT else opt.img_size  # (320, 192) or (416, 256) or (608, 352) for (height, width)
 11 |     out, source, weights, half, view_img, save_txt = opt.output, opt.source, opt.weights, opt.half, opt.view_img, opt.save_txt
 12 |     webcam = source == '0' or source.startswith('rtsp') or source.startswith('http') or source.endswith('.txt')
 13 | 
 14 |     # Initialize
 15 |     device = torch_utils.select_device(device='cpu' if ONNX_EXPORT else opt.device)
 16 |     if os.path.exists(out):
 17 |         shutil.rmtree(out)  # delete output folder
 18 |     os.makedirs(out)  # make new output folder
 19 | 
 20 |     # Initialize model
 21 |     model = Darknet(opt.cfg, imgsz)
 22 | 
 23 |     # Load weights
 24 |     attempt_download(weights)
 25 |     if weights.endswith('.pt'):  # pytorch format
 26 |         model.load_state_dict(torch.load(weights, map_location=device)['model'])
 27 |     else:  # darknet format
 28 |         load_darknet_weights(model, weights)
 29 | 
 30 |     # Second-stage classifier
 31 |     classify = False
 32 |     if classify:
 33 |         modelc = torch_utils.load_classifier(name='resnet101', n=2)  # initialize
 34 |         modelc.load_state_dict(torch.load('weights/resnet101.pt', map_location=device)['model'])  # load weights
 35 |         modelc.to(device).eval()
 36 | 
 37 |     # Eval mode
 38 |     model.to(device).eval()
 39 | 
 40 |     # Fuse Conv2d + BatchNorm2d layers
 41 |     # model.fuse()
 42 | 
 43 |     # Export mode
 44 |     if ONNX_EXPORT:
 45 |         model.fuse()
 46 |         img = torch.zeros((1, 3) + imgsz)  # (1, 3, 320, 192)
 47 |         f = opt.weights.replace(opt.weights.split('.')[-1], 'onnx')  # *.onnx filename
 48 |         torch.onnx.export(model, img, f, verbose=False, opset_version=11,
 49 |                           input_names=['images'], output_names=['classes', 'boxes'])
 50 | 
 51 |         # Validate exported model
 52 |         import onnx
 53 |         model = onnx.load(f)  # Load the ONNX model
 54 |         onnx.checker.check_model(model)  # Check that the IR is well formed
 55 |         print(onnx.helper.printable_graph(model.graph))  # Print a human readable representation of the graph
 56 |         return
 57 | 
 58 |     # Half precision
 59 |     half = half and device.type != 'cpu'  # half precision only supported on CUDA
 60 |     if half:
 61 |         model.half()
 62 | 
 63 |     # Set Dataloader
 64 |     vid_path, vid_writer = None, None
 65 |     if webcam:
 66 |         view_img = True
 67 |         torch.backends.cudnn.benchmark = True  # set True to speed up constant image size inference
 68 |         dataset = LoadStreams(source, img_size=imgsz)
 69 |     else:
 70 |         save_img = True
 71 |         dataset = LoadImages(source, img_size=imgsz)
 72 | 
 73 |     # Get names and colors
 74 |     names = load_classes(opt.names)
 75 |     colors = [[random.randint(0, 255) for _ in range(3)] for _ in range(len(names))]
 76 | 
 77 |     # Run inference
 78 |     t0 = time.time()
 79 |     img = torch.zeros((1, 3, imgsz, imgsz), device=device)  # init img
 80 |     _ = model(img.half() if half else img.float()) if device.type != 'cpu' else None  # run once
 81 |     for path, img, im0s, vid_cap in dataset:
 82 |         img = torch.from_numpy(img).to(device)
 83 |         img = img.half() if half else img.float()  # uint8 to fp16/32
 84 |         img /= 255.0  # 0 - 255 to 0.0 - 1.0
 85 |         if img.ndimension() == 3:
 86 |             img = img.unsqueeze(0)
 87 | 
 88 |         # Inference
 89 |         t1 = torch_utils.time_synchronized()
 90 |         pred = model(img, augment=opt.augment)[0]
 91 |         t2 = torch_utils.time_synchronized()
 92 | 
 93 |         # to float
 94 |         if half:
 95 |             pred = pred.float()
 96 | 
 97 |         # Apply NMS
 98 |         pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres,
 99 |                                    multi_label=False, classes=opt.classes, agnostic=opt.agnostic_nms)
100 | 
101 |         # Apply Classifier
102 |         if classify:
103 |             pred = apply_classifier(pred, modelc, img, im0s)
104 | 
105 |         # Process detections
106 |         for i, det in enumerate(pred):  # detections for image i
107 |             if webcam:  # batch_size >= 1
108 |                 p, s, im0 = path[i], '%g: ' % i, im0s[i]
109 |             else:
110 |                 p, s, im0 = path, '', im0s
111 | 
112 |             save_path = str(Path(out) / Path(p).name)
113 |             s += '%gx%g ' % img.shape[2:]  # print string
114 |             gn = torch.tensor(im0.shape)[[1, 0, 1, 0]]  #  normalization gain whwh
115 |             if det is not None and len(det):
116 |                 # Rescale boxes from imgsz to im0 size
117 |                 det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()
118 | 
119 |                 # Print results
120 |                 for c in det[:, -1].unique():
121 |                     n = (det[:, -1] == c).sum()  # detections per class
122 |                     s += '%g %ss, ' % (n, names[int(c)])  # add to string
123 | 
124 |                 # Write results
125 |                 for *xyxy, conf, cls in det:
126 |                     if save_txt:  # Write to file
127 |                         xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh
128 |                         with open(save_path[:save_path.rfind('.')] + '.txt', 'a') as file:
129 |                             file.write(('%g ' * 5 + '\n') % (cls, *xywh))  # label format
130 | 
131 |                     if save_img or view_img:  # Add bbox to image
132 |                         label = '%s %.2f' % (names[int(cls)], conf)
133 |                         plot_one_box(xyxy, im0, label=label, color=colors[int(cls)])
134 | 
135 |             # Print time (inference + NMS)
136 |             print('%sDone. (%.3fs)' % (s, t2 - t1))
137 | 
138 |             # Stream results
139 |             if view_img:
140 |                 cv2.imshow(p, im0)
141 |                 if cv2.waitKey(1) == ord('q'):  # q to quit
142 |                     raise StopIteration
143 | 
144 |             # Save results (image with detections)
145 |             if save_img:
146 |                 if dataset.mode == 'images':
147 |                     cv2.imwrite(save_path, im0)
148 |                 else:
149 |                     if vid_path != save_path:  # new video
150 |                         vid_path = save_path
151 |                         if isinstance(vid_writer, cv2.VideoWriter):
152 |                             vid_writer.release()  # release previous video writer
153 | 
154 |                         fps = vid_cap.get(cv2.CAP_PROP_FPS)
155 |                         w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
156 |                         h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
157 |                         vid_writer = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*opt.fourcc), fps, (w, h))
158 |                     vid_writer.write(im0)
159 | 
160 |     if save_txt or save_img:
161 |         print('Results saved to %s' % os.getcwd() + os.sep + out)
162 |         if platform == 'darwin':  # MacOS
163 |             os.system('open ' + save_path)
164 | 
165 |     print('Done. (%.3fs)' % (time.time() - t0))
166 | 
167 | 
168 | if __name__ == '__main__':
169 |     parser = argparse.ArgumentParser()
170 |     parser.add_argument('--cfg', type=str, default='cfg/yolov3-spp.cfg', help='*.cfg path')
171 |     parser.add_argument('--names', type=str, default='data/coco.names', help='*.names path')
172 |     parser.add_argument('--weights', type=str, default='weights/yolov3-spp-ultralytics.pt', help='weights path')
173 |     parser.add_argument('--source', type=str, default='data/samples', help='source')  # input file/folder, 0 for webcam
174 |     parser.add_argument('--output', type=str, default='output', help='output folder')  # output folder
175 |     parser.add_argument('--img-size', type=int, default=768, help='inference size (pixels)')
176 |     parser.add_argument('--conf-thres', type=float, default=0.2, help='object confidence threshold')
177 |     parser.add_argument('--iou-thres', type=float, default=0.6, help='IOU threshold for NMS')
178 |     parser.add_argument('--fourcc', type=str, default='mp4v', help='output video codec (verify ffmpeg support)')
179 |     parser.add_argument('--half', action='store_true', help='half precision FP16 inference')
180 |     parser.add_argument('--device', default='', help='device id (i.e. 0 or 0,1) or cpu')
181 |     parser.add_argument('--view-img', action='store_true', help='display results')
182 |     parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
183 |     parser.add_argument('--classes', nargs='+', type=int, help='filter by class')
184 |     parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')
185 |     parser.add_argument('--augment', action='store_true', help='augmented inference')
186 |     opt = parser.parse_args()
187 |     opt.cfg = list(glob.iglob('./**/' + opt.cfg, recursive=True))[0]  # find file
188 |     opt.names = list(glob.iglob('./**/' + opt.names, recursive=True))[0]  # find file
189 |     print(opt)
190 | 
191 |     with torch.no_grad():
192 |         detect()
193 | 


--------------------------------------------------------------------------------
/models.py:
--------------------------------------------------------------------------------
  1 | from utils.google_utils import *
  2 | from utils.layers import *
  3 | from utils.parse_config import *
  4 | from utils.quant_dorefa import QuanConv as Conv_q
  5 | from utils.util_wqaq import Conv2d_Q
  6 | import copy
  7 | ONNX_EXPORT = False
  8 | 
  9 | 
 10 | w_bit = 8
 11 | a_bit = 8
 12 | 
 13 | 
 14 | 
 15 | def create_modules(module_defs, img_size, cfg):
 16 |     # Constructs module list of layer blocks from module configuration in module_defs
 17 | 
 18 |     img_size = [img_size] * 2 if isinstance(img_size, int) else img_size  # expand if necessary
 19 |     hyper = module_defs.pop(0)  # cfg training hyperparams (unused)
 20 |     output_filters = [3]  # input channels
 21 |     module_list = nn.ModuleList()
 22 |     routs = []  # list of layers which rout to deeper layers
 23 |     yolo_index = -1
 24 | 
 25 |     for i, mdef in enumerate(module_defs):
 26 |         modules = nn.Sequential()
 27 |         if mdef['type'] == 'IAO_convolutional':
 28 |             bn = int(mdef['batch_normalize'])
 29 |             filters = int(mdef['filters'])
 30 |             kernel_size = int(mdef['size'])
 31 |             pad = int(mdef['pad'])
 32 |             #first = int(mdef['first'])
 33 |             modules.add_module('Conv2d', Conv2d_Q(in_channels=output_filters[-1],
 34 |                                                         out_channels=filters,
 35 |                                                         kernel_size=kernel_size,
 36 |                                                         stride=int(mdef['stride']),
 37 |                                                         padding=pad,
 38 |                                                         bias=not bn,
 39 |                                                         groups=int(mdef['group']),
 40 |                                                         a_bits=16,
 41 |                                                         w_bits=16,
 42 |                                                         q_type=1,
 43 |                                                         first_layer=0))
 44 | 
 45 |             if bn:
 46 |                 modules.add_module('BatchNorm2d',nn.BatchNorm2d(filters, momentum=0.01))
 47 |             if mdef['activation'] == 'relu':
 48 |                 modules.add_module('activation',nn.ReLU(inplace=True))
 49 |             elif mdef['activation'] == 'leaky':
 50 |                 modules.add_module('activation', nn.LeakyReLU(0.1, inplace=True))
 51 | 
 52 |         elif mdef['type'] == 'quan_convolutional':
 53 |             bn = int(mdef['batch_normalize'])
 54 |             filters = int(mdef['filters'])
 55 |             kernel_size = int(mdef['size'])
 56 |             pad = int(mdef['pad'])
 57 |             modules.add_module('Conv2d', Conv_q(in_channels=output_filters[-1],
 58 |                                                         out_channels=filters,
 59 |                                                         kernel_size=kernel_size,
 60 |                                                         stride=int(mdef['stride']),
 61 |                                                         padding=pad,
 62 |                                                         bias=not bn,
 63 |                                                         groups=int(mdef['group']),
 64 |                                                         nbit_w=w_bit,
 65 |                                                         nbit_a=a_bit
 66 |                                                         ))
 67 | 
 68 |             if bn:
 69 |                 modules.add_module('BatchNorm2d',nn.BatchNorm2d(filters, momentum=0.1))
 70 |             if mdef['activation'] == 'relu':
 71 |                 modules.add_module('activation',nn.ReLU(inplace=True))
 72 |             elif mdef['activation'] == 'leaky':
 73 |                 modules.add_module('activation', nn.LeakyReLU(0.1, inplace=True))
 74 |                 
 75 |         elif mdef['type'] == 'convolutional':
 76 |             bn = mdef['batch_normalize']
 77 |             filters = mdef['filters']
 78 |             k = mdef['size']  # kernel size
 79 |             stride = mdef['stride'] if 'stride' in mdef else (mdef['stride_y'], mdef['stride_x'])
 80 |             if isinstance(k, int):  # single-size conv
 81 |                 modules.add_module('Conv2d', nn.Conv2d(in_channels=output_filters[-1],
 82 |                                                        out_channels=filters,
 83 |                                                        kernel_size=k,
 84 |                                                        stride=stride,
 85 |                                                        #padding=k // 2 if mdef['pad'] else 0,
 86 |                                                        padding=mdef['pad'],
 87 |                                                        groups=mdef['group'] if 'group' in mdef else 1,
 88 |                                                        #groups=mdef['group'],
 89 |                                                        bias=not bn))
 90 |             else:  # multiple-size conv
 91 |                 modules.add_module('MixConv2d', MixConv2d(in_ch=output_filters[-1],
 92 |                                                           out_ch=filters,
 93 |                                                           k=k,
 94 |                                                           stride=stride,
 95 |                                                           bias=not bn))
 96 | 
 97 |             if bn:
 98 |                 modules.add_module('BatchNorm2d', nn.BatchNorm2d(filters, momentum=0.1, eps=1E-5))
 99 |             else:
100 |                 routs.append(i)  # detection output (goes into yolo layer)
101 | 
102 |             if mdef['activation'] == 'leaky':  # activation study https://github.com/ultralytics/yolov3/issues/441
103 |                 modules.add_module('activation', nn.LeakyReLU(0.1, inplace=True))
104 |             elif mdef['activation'] == 'swish':
105 |                 modules.add_module('activation', Swish())
106 |             elif mdef['activation'] == 'mish':
107 |                 modules.add_module('activation', Mish())
108 |             elif mdef['activation'] == 'relu':
109 |                 modules.add_module('activation', nn.ReLU(inplace=True))
110 | 
111 |         elif mdef['type'] == 'BatchNorm2d':
112 |             filters = output_filters[-1]
113 |             modules = nn.BatchNorm2d(filters, momentum=0.03, eps=1E-4)
114 |             if i == 0 and filters == 3:  # normalize RGB image
115 |                 # imagenet mean and var https://pytorch.org/docs/stable/torchvision/models.html#classification
116 |                 modules.running_mean = torch.tensor([0.485, 0.456, 0.406])
117 |                 modules.running_var = torch.tensor([0.0524, 0.0502, 0.0506])
118 | 
119 |         elif mdef['type'] == 'maxpool':
120 |             k = mdef['size']  # kernel size
121 |             stride = mdef['stride']
122 |             maxpool = nn.MaxPool2d(kernel_size=k, stride=stride, padding=(k - 1) // 2)
123 |             if k == 2 and stride == 1:  # yolov3-tiny
124 |                 modules.add_module('ZeroPad2d', nn.ZeroPad2d((0, 1, 0, 1)))
125 |                 modules.add_module('MaxPool2d', maxpool)
126 |             else:
127 |                 modules = maxpool
128 | 
129 |         elif mdef['type'] == 'upsample':
130 |             if ONNX_EXPORT:  # explicitly state size, avoid scale_factor
131 |                 g = (yolo_index + 1) * 2 / 32  # gain
132 |                 modules = nn.Upsample(size=tuple(int(x * g) for x in img_size))  # img_size = (320, 192)
133 |             else:
134 |                 modules = nn.Upsample(scale_factor=mdef['stride'])
135 | 
136 |         elif mdef['type'] == 'route':  # nn.Sequential() placeholder for 'route' layer
137 |             layers = mdef['layers']
138 |             filters = sum([output_filters[l + 1 if l > 0 else l] for l in layers])
139 |             routs.extend([i + l if l < 0 else l for l in layers])
140 |             modules = FeatureConcat(layers=layers)
141 | 
142 |         elif mdef['type'] == 'shortcut':  # nn.Sequential() placeholder for 'shortcut' layer
143 |             layers = mdef['from']
144 |             filters = output_filters[-1]
145 |             routs.extend([i + l if l < 0 else l for l in layers])
146 |             modules = WeightedFeatureFusion(layers=layers, weight='weights_type' in mdef)
147 | 
148 |         elif mdef['type'] == 'reorg3d':  # yolov3-spp-pan-scale
149 |             pass
150 | 
151 |         elif mdef['type'] == 'se':
152 |             modules.add_module('se',SELayer(output_filters[-1],reduction=int(mdef['reduction'])))
153 | 
154 |         elif mdef['type'] == 'yolo':
155 |             yolo_index += 1
156 |             stride = [32, 16, 8]  # P5, P4, P3 strides
157 |             if any(x in cfg for x in ['panet', 'yolov4', 'cd53']):  # stride order reversed
158 |                 stride = list(reversed(stride))
159 |             layers = mdef['from'] if 'from' in mdef else []
160 |             modules = YOLOLayer(anchors=mdef['anchors'][mdef['mask']],  # anchor list
161 |                                 nc=mdef['classes'],  # number of classes
162 |                                 img_size=img_size,  # (416, 416)
163 |                                 yolo_index=yolo_index,  # 0, 1, 2...
164 |                                 layers=layers,  # output layers
165 |                                 stride=stride[yolo_index])
166 | 
167 |             # Initialize preceding Conv2d() bias (https://arxiv.org/pdf/1708.02002.pdf section 3.3)
168 |             try:
169 |                 j = layers[yolo_index] if 'from' in mdef else -1
170 |                 bias_ = module_list[j][0].bias  # shape(255,)
171 |                 bias = bias_[:modules.no * modules.na].view(modules.na, -1)  # shape(3,85)
172 |                 bias[:, 4] += -4.5  # obj
173 |                 bias[:, 5:] += math.log(0.6 / (modules.nc - 0.99))  # cls (sigmoid(p) = 1/nc)
174 |                 module_list[j][0].bias = torch.nn.Parameter(bias_, requires_grad=bias_.requires_grad)
175 |             except:
176 |                 print('WARNING: smart bias initialization failure.')
177 | 
178 |         else:
179 |             print('Warning: Unrecognized Layer Type: ' + mdef['type'])
180 | 
181 |         # Register module list and number of output filters
182 |         module_list.append(modules)
183 |         output_filters.append(filters)
184 | 
185 |     routs_binary = [False] * (i + 1)
186 |     for i in routs:
187 |         routs_binary[i] = True
188 |     return module_list, routs_binary
189 | 
190 | 
191 | class YOLOLayer(nn.Module):
192 |     def __init__(self, anchors, nc, img_size, yolo_index, layers, stride):
193 |         super(YOLOLayer, self).__init__()
194 |         self.anchors = torch.Tensor(anchors)
195 |         self.index = yolo_index  # index of this layer in layers
196 |         self.layers = layers  # model output layer indices
197 |         self.stride = stride  # layer stride
198 |         self.nl = len(layers)  # number of output layers (3)
199 |         self.na = len(anchors)  # number of anchors (3)
200 |         self.nc = nc  # number of classes (80)
201 |         self.no = nc + 5  # number of outputs (85)
202 |         self.nx, self.ny, self.ng = 0, 0, 0  # initialize number of x, y gridpoints
203 |         self.anchor_vec = self.anchors / self.stride
204 |         self.anchor_wh = self.anchor_vec.view(1, self.na, 1, 1, 2)
205 | 
206 |         if ONNX_EXPORT:
207 |             self.training = False
208 |             self.create_grids((img_size[1] // stride, img_size[0] // stride))  # number x, y grid points
209 | 
210 |     def create_grids(self, ng=(13, 13), device='cpu'):
211 |         self.nx, self.ny = ng  # x and y grid size
212 |         self.ng = torch.tensor(ng, dtype=torch.float)
213 | 
214 |         # build xy offsets
215 |         if not self.training:
216 |             yv, xv = torch.meshgrid([torch.arange(self.ny, device=device), torch.arange(self.nx, device=device)])
217 |             self.grid = torch.stack((xv, yv), 2).view((1, 1, self.ny, self.nx, 2)).float()
218 | 
219 |         if self.anchor_vec.device != device:
220 |             self.anchor_vec = self.anchor_vec.to(device)
221 |             self.anchor_wh = self.anchor_wh.to(device)
222 | 
223 |     def forward(self, p, out):
224 |         ASFF = False  # https://arxiv.org/abs/1911.09516
225 |         if ASFF:
226 |             i, n = self.index, self.nl  # index in layers, number of layers
227 |             p = out[self.layers[i]]
228 |             bs, _, ny, nx = p.shape  # bs, 255, 13, 13
229 |             if (self.nx, self.ny) != (nx, ny):
230 |                 self.create_grids((nx, ny), p.device)
231 | 
232 |             # outputs and weights
233 |             # w = F.softmax(p[:, -n:], 1)  # normalized weights
234 |             w = torch.sigmoid(p[:, -n:]) * (2 / n)  # sigmoid weights (faster)
235 |             # w = w / w.sum(1).unsqueeze(1)  # normalize across layer dimension
236 | 
237 |             # weighted ASFF sum
238 |             p = out[self.layers[i]][:, :-n] * w[:, i:i + 1]
239 |             for j in range(n):
240 |                 if j != i:
241 |                     p += w[:, j:j + 1] * \
242 |                          F.interpolate(out[self.layers[j]][:, :-n], size=[ny, nx], mode='bilinear', align_corners=False)
243 | 
244 |         elif ONNX_EXPORT:
245 |             bs = 1  # batch size
246 |         else:
247 |             bs, _, ny, nx = p.shape  # bs, 255, 13, 13
248 |             if (self.nx, self.ny) != (nx, ny):
249 |                 self.create_grids((nx, ny), p.device)
250 | 
251 |         # p.view(bs, 255, 13, 13) -- > (bs, 3, 13, 13, 85)  # (bs, anchors, grid, grid, classes + xywh)
252 |         p = p.view(bs, self.na, self.no, self.ny, self.nx).permute(0, 1, 3, 4, 2).contiguous()  # prediction
253 | 
254 |         if self.training:
255 |             return p
256 | 
257 |         elif ONNX_EXPORT:
258 |             # Avoid broadcasting for ANE operations
259 |             m = self.na * self.nx * self.ny
260 |             ng = 1. / self.ng.repeat(m, 1)
261 |             grid = self.grid.repeat(1, self.na, 1, 1, 1).view(m, 2)
262 |             anchor_wh = self.anchor_wh.repeat(1, 1, self.nx, self.ny, 1).view(m, 2) * ng
263 | 
264 |             p = p.view(m, self.no)
265 |             xy = torch.sigmoid(p[:, 0:2]) + grid  # x, y
266 |             wh = torch.exp(p[:, 2:4]) * anchor_wh  # width, height
267 |             p_cls = torch.sigmoid(p[:, 4:5]) if self.nc == 1 else \
268 |                 torch.sigmoid(p[:, 5:self.no]) * torch.sigmoid(p[:, 4:5])  # conf
269 |             return p_cls, xy * ng, wh
270 | 
271 |         else:  # inference
272 |             io = p.clone()  # inference output
273 |             io[..., :2] = torch.sigmoid(io[..., :2]) + self.grid  # xy
274 |             io[..., 2:4] = torch.exp(io[..., 2:4]) * self.anchor_wh  # wh yolo method
275 |             io[..., :4] *= self.stride
276 |             torch.sigmoid_(io[..., 4:])
277 |             return io.view(bs, -1, self.no), p  # view [1, 3, 13, 13, 85] as [1, 507, 85]
278 | 
279 | 
280 | class Darknet(nn.Module):
281 |     # YOLOv3 object detection model
282 | 
283 |     def __init__(self, cfg, img_size=(416, 416), verbose=False):
284 |         super(Darknet, self).__init__()
285 |         
286 |         self.module_defs = parse_model_cfg(cfg)
287 |         self.hyper = copy.deepcopy(self.module_defs[0])
288 |         self.module_list, self.routs = create_modules(self.module_defs, img_size, cfg)
289 |         self.yolo_layers = get_yolo_layers(self)
290 |         # torch_utils.initialize_weights(self)
291 | 
292 |         # Darknet Header https://github.com/AlexeyAB/darknet/issues/2914#issuecomment-496675346
293 |         self.version = np.array([0, 2, 5], dtype=np.int32)  # (int32) version info: major, minor, revision
294 |         self.seen = np.array([0], dtype=np.int64)  # (int64) number of images seen during training
295 |         self.info(verbose) if not ONNX_EXPORT else None  # print model description
296 | 
297 |     def forward(self, x, augment=False, verbose=False):
298 | 
299 |         if not augment:
300 |             return self.forward_once(x)
301 |         else:  # Augment images (inference and test only) https://github.com/ultralytics/yolov3/issues/931
302 |             img_size = x.shape[-2:]  # height, width
303 |             s = [0.83, 0.67]  # scales
304 |             y = []
305 |             for i, xi in enumerate((x,
306 |                                     torch_utils.scale_img(x.flip(3), s[0], same_shape=False),  # flip-lr and scale
307 |                                     torch_utils.scale_img(x, s[1], same_shape=False),  # scale
308 |                                     )):
309 |                 # cv2.imwrite('img%g.jpg' % i, 255 * xi[0].numpy().transpose((1, 2, 0))[:, :, ::-1])
310 |                 y.append(self.forward_once(xi)[0])
311 | 
312 |             y[1][..., :4] /= s[0]  # scale
313 |             y[1][..., 0] = img_size[1] - y[1][..., 0]  # flip lr
314 |             y[2][..., :4] /= s[1]  # scale
315 | 
316 |             # for i, yi in enumerate(y):  # coco small, medium, large = < 32**2 < 96**2 <
317 |             #     area = yi[..., 2:4].prod(2)[:, :, None]
318 |             #     if i == 1:
319 |             #         yi *= (area < 96. ** 2).float()
320 |             #     elif i == 2:
321 |             #         yi *= (area > 32. ** 2).float()
322 |             #     y[i] = yi
323 | 
324 |             y = torch.cat(y, 1)
325 |             return y, None
326 | 
327 |     def forward_once(self, x, augment=False, verbose=False):
328 |         img_size = x.shape[-2:]  # height, width
329 |         yolo_out, out = [], []
330 |         if verbose:
331 |             print('0', x.shape)
332 |             str = ''
333 | 
334 |         # Augment images (inference and test only)
335 |         if augment:  # https://github.com/ultralytics/yolov3/issues/931
336 |             nb = x.shape[0]  # batch size
337 |             s = [0.83, 0.67]  # scales
338 |             x = torch.cat((x,
339 |                            torch_utils.scale_img(x.flip(3), s[0]),  # flip-lr and scale
340 |                            torch_utils.scale_img(x, s[1]),  # scale
341 |                            ), 0)
342 | 
343 |         for i, module in enumerate(self.module_list):
344 |             name = module.__class__.__name__
345 |             if name in ['WeightedFeatureFusion', 'FeatureConcat']:  # sum, concat
346 |                 if verbose:
347 |                     l = [i - 1] + module.layers  # layers
348 |                     sh = [list(x.shape)] + [list(out[i].shape) for i in module.layers]  # shapes
349 |                     str = ' >> ' + ' + '.join(['layer %g %s' % x for x in zip(l, sh)])
350 |                 x = module(x, out)  # WeightedFeatureFusion(), FeatureConcat()
351 |             elif name == 'YOLOLayer':
352 |                 yolo_out.append(module(x, out))
353 |             else:  # run module directly, i.e. mtype = 'convolutional', 'upsample', 'maxpool', 'batchnorm2d' etc.
354 |                 x = module(x)
355 | 
356 |             out.append(x if self.routs[i] else [])
357 |             if verbose:
358 |                 print('%g/%g %s -' % (i, len(self.module_list), name), list(x.shape), str)
359 |                 str = ''
360 | 
361 |         if self.training:  # train
362 |             return yolo_out
363 |         elif ONNX_EXPORT:  # export
364 |             x = [torch.cat(x, 0) for x in zip(*yolo_out)]
365 |             return x[0], torch.cat(x[1:3], 1)  # scores, boxes: 3780x80, 3780x4
366 |         else:  # inference or test
367 |             x, p = zip(*yolo_out)  # inference output, training output
368 |             x = torch.cat(x, 1)  # cat yolo outputs
369 |             if augment:  # de-augment results
370 |                 x = torch.split(x, nb, dim=0)
371 |                 x[1][..., :4] /= s[0]  # scale
372 |                 x[1][..., 0] = img_size[1] - x[1][..., 0]  # flip lr
373 |                 x[2][..., :4] /= s[1]  # scale
374 |                 x = torch.cat(x, 1)
375 |             return x, p
376 | 
377 |     def fuse(self):
378 |         # Fuse Conv2d + BatchNorm2d layers throughout model
379 |         print('Fusing layers...')
380 |         fused_list = nn.ModuleList()
381 |         for a in list(self.children())[0]:
382 |             if isinstance(a, nn.Sequential):
383 |                 for i, b in enumerate(a):
384 |                     if isinstance(b, nn.modules.batchnorm.BatchNorm2d):
385 |                         # fuse this bn layer with the previous conv2d layer
386 |                         conv = a[i - 1]
387 |                         fused = torch_utils.fuse_conv_and_bn(conv, b)
388 |                         a = nn.Sequential(fused, *list(a.children())[i + 1:])
389 |                         break
390 |             fused_list.append(a)
391 |         self.module_list = fused_list
392 |         self.info() if not ONNX_EXPORT else None  # yolov3-spp reduced from 225 to 152 layers
393 | 
394 |     def info(self, verbose=False):
395 |         torch_utils.model_info(self, verbose)
396 | 
397 | 
398 | def get_yolo_layers(model):
399 |     return [i for i, m in enumerate(model.module_list) if m.__class__.__name__ == 'YOLOLayer']  # [89, 101, 113]
400 | 
401 | 
402 | def load_darknet_weights(self, weights, cutoff=0):
403 |     # Parses and loads the weights stored in 'weights'
404 | 
405 |     # Establish cutoffs (load layers between 0 and cutoff. if cutoff = -1 all are loaded)
406 |     file = Path(weights).name
407 |     #print(file)
408 |     if file == 'darknet53.conv.74':
409 |         cutoff = 75
410 |     elif file == 'yolov3-tiny.conv.15':
411 |         cutoff = 15
412 |     #elif file == 'best.weights':
413 |         #print('load coco.weights')
414 |         #cutoff = -1
415 |     # Read weights file
416 |     with open(weights, 'rb') as f:
417 |         # Read Header https://github.com/AlexeyAB/darknet/issues/2914#issuecomment-496675346
418 |         self.version = np.fromfile(f, dtype=np.int32, count=3)  # (int32) version info: major, minor, revision
419 |         self.seen = np.fromfile(f, dtype=np.int64, count=1)  # (int64) number of images seen during training
420 | 
421 |         weights = np.fromfile(f, dtype=np.float32)  # the rest are weights
422 | 
423 |     ptr = 0
424 |     for i, (mdef, module) in enumerate(zip(self.module_defs[:cutoff], self.module_list[:cutoff])):
425 |         
426 |         if mdef['type'] == 'convolutional':
427 |             conv = module[0]
428 |             if mdef['batch_normalize']:
429 |                 # Load BN bias, weights, running mean and running variance
430 |                 bn = module[1]
431 |                 nb = bn.bias.numel()  # number of biases
432 |                 # Bias
433 |                 bn.bias.data.copy_(torch.from_numpy(weights[ptr:ptr + nb]).view_as(bn.bias))
434 |                 ptr += nb
435 |                 # Weight
436 |                 bn.weight.data.copy_(torch.from_numpy(weights[ptr:ptr + nb]).view_as(bn.weight))
437 |                 ptr += nb
438 |                 # Running Mean
439 |                 bn.running_mean.data.copy_(torch.from_numpy(weights[ptr:ptr + nb]).view_as(bn.running_mean))
440 |                 ptr += nb
441 |                 # Running Var
442 |                 bn.running_var.data.copy_(torch.from_numpy(weights[ptr:ptr + nb]).view_as(bn.running_var))
443 |                 ptr += nb
444 |             else:
445 |                 # Load conv. bias
446 |                 nb = conv.bias.numel()
447 |                 conv_b = torch.from_numpy(weights[ptr:ptr + nb]).view_as(conv.bias)
448 |                 conv.bias.data.copy_(conv_b)
449 |                 ptr += nb
450 |             # Load conv. weights
451 |             nw = conv.weight.numel()  # number of weights
452 |             conv.weight.data.copy_(torch.from_numpy(weights[ptr:ptr + nw]).view_as(conv.weight))
453 |             ptr += nb
454 |         
455 |         #elif mdef['type'] == 'se':
456 |             #se = module[0]
457 |             #fc = se.fc
458 |             #fc1 = fc[0]
459 |             #fc1_num = fc1.weight.numel()
460 |             #print(fc1_num)
461 |             #fc1_w = torch.from_numpy(weights[ptr:ptr + fc1_num]).view_as(fc1.weight)
462 |             #fc1.weight.data.copy_(fc1_w)
463 |             #ptr += fc1_num
464 |             #fc2 = fc[2]
465 |             #fc2_num = fc2.weight.numel()
466 |             #fc2_w = torch.from_numpy(weights[ptr:ptr + fc2_num]).view_as(fc2.weight)
467 |             #fc2.weight.data.copy_(fc2_w)
468 |             #ptr += fc2_num
469 |     
470 |     #assert ptr == len(weights)
471 | 
472 | def save_weights(self, path='model.weights', cutoff=-1):
473 |     # Converts a PyTorch model to Darket format (*.pt to *.weights)
474 |     # Note: Does not work if model.fuse() is applied
475 |     with open(path, 'wb') as f:
476 |         # Write Header https://github.com/AlexeyAB/darknet/issues/2914#issuecomment-496675346
477 |         self.version.tofile(f)  # (int32) version info: major, minor, revision
478 |         self.seen.tofile(f)  # (int64) number of images seen during training
479 | 
480 |         # Iterate through layers
481 |         for i, (mdef, module) in enumerate(zip(self.module_defs[:cutoff], self.module_list[:cutoff])):
482 |             if mdef['type'] == 'convolutional':
483 |                 conv_layer = module[0]
484 |                 # If batch norm, load bn first
485 |                 if mdef['batch_normalize']:
486 |                     bn_layer = module[1]
487 |                     bn_layer.bias.data.cpu().numpy().tofile(f)
488 |                     bn_layer.weight.data.cpu().numpy().tofile(f)
489 |                     bn_layer.running_mean.data.cpu().numpy().tofile(f)
490 |                     bn_layer.running_var.data.cpu().numpy().tofile(f)
491 |                 # Load conv bias
492 |                 else:
493 |                     conv_layer.bias.data.cpu().numpy().tofile(f)
494 |                 # Load conv weights
495 |                 conv_layer.weight.data.cpu().numpy().tofile(f)
496 | 
497 |             elif mdef['type'] == 'se':
498 |                 se = module[0]
499 |                 fc = se.fc
500 |                 fc1 = fc[0]
501 |                 fc2 = fc[2]
502 |                 fc1.weight.data.cpu().numpy().tofile(f)
503 |                 fc2.weight.data.cpu().numpy().tofile(f)
504 | 
505 | 
506 | 
507 | 
508 | def convert(cfg='cfg/yolov3-spp.cfg', weights='weights/yolov3-spp.weights'):
509 |     # Converts between PyTorch and Darknet format per extension (i.e. *.weights convert to *.pt and vice versa)
510 |     # from models import *; convert('cfg/yolov3-spp.cfg', 'weights/yolov3-spp.weights')
511 | 
512 |     # Initialize model
513 |     model = Darknet(cfg)
514 | 
515 |     # Load weights and save
516 |     if weights.endswith('.pt'):  # if PyTorch format
517 |         model.load_state_dict(torch.load(weights, map_location='cpu')['model'])
518 |         target = weights.rsplit('.', 1)[0] + '.weights'
519 |         save_weights(model, path=target, cutoff=-1)
520 |         print("Success: converted '%s' to '%s'" % (weights, target))
521 | 
522 |     elif weights.endswith('.weights'):  # darknet format
523 |         _ = load_darknet_weights(model, weights)
524 | 
525 |         chkpt = {'epoch': -1,
526 |                  'best_fitness': None,
527 |                  'training_results': None,
528 |                  'model': model.state_dict(),
529 |                  'optimizer': None}
530 | 
531 |         target = weights.rsplit('.', 1)[0] + '.pt'
532 |         torch.save(chkpt, target)
533 |         print("Success: converted '%s' to 's%'" % (weights, target))
534 | 
535 |     else:
536 |         print('Error: extension not supported.')
537 | 
538 | 
539 | def attempt_download(weights):
540 |     # Attempt to download pretrained weights if not found locally
541 |     weights = weights.strip()
542 |     msg = weights + ' missing, try downloading from https://drive.google.com/open?id=1LezFG5g3BCW6iYaV89B2i64cqEUZD7e0'
543 | 
544 |     if len(weights) > 0 and not os.path.isfile(weights):
545 |         d = {'yolov3-spp.weights': '16lYS4bcIdM2HdmyJBVDOvt3Trx6N3W2R',
546 |              'yolov3.weights': '1uTlyDWlnaqXcsKOktP5aH_zRDbfcDp-y',
547 |              'yolov3-tiny.weights': '1CCF-iNIIkYesIDzaPvdwlcf7H9zSsKZQ',
548 |              'yolov3-spp.pt': '1f6Ovy3BSq2wYq4UfvFUpxJFNDFfrIDcR',
549 |              'yolov3.pt': '1SHNFyoe5Ni8DajDNEqgB2oVKBb_NoEad',
550 |              'yolov3-tiny.pt': '10m_3MlpQwRtZetQxtksm9jqHrPTHZ6vo',
551 |              'darknet53.conv.74': '1WUVBid-XuoUBmvzBVUCBl_ELrzqwA8dJ',
552 |              'yolov3-tiny.conv.15': '1Bw0kCpplxUqyRYAJr9RY9SGnOJbo9nEj',
553 |              'yolov3-spp-ultralytics.pt': '1UcR-zVoMs7DH5dj3N1bswkiQTA4dmKF4'}
554 | 
555 |         file = Path(weights).name
556 |         if file in d:
557 |             r = gdrive_download(id=d[file], name=weights)
558 |         else:  # download from pjreddie.com
559 |             url = 'https://pjreddie.com/media/files/' + file
560 |             print('Downloading ' + url)
561 |             r = os.system('curl -f ' + url + ' -o ' + weights)
562 | 
563 |         # Error check
564 |         if not (r == 0 and os.path.exists(weights) and os.path.getsize(weights) > 1E6):  # weights exist and > 1MB
565 |             os.system('rm ' + weights)  # remove partial downloads
566 |             raise Exception(msg)
567 | 
568 | class SELayer(nn.Module):
569 |     def __init__(self, channel, reduction=4):
570 |         super(SELayer, self).__init__()
571 |         self.avg_pool = nn.AdaptiveAvgPool2d(1)
572 |         self.fc = nn.Sequential(
573 |                 nn.Linear(channel, channel // reduction),
574 |                 nn.ReLU(inplace=True),
575 |                 nn.Linear(channel // reduction, channel),
576 |                 nn.Sigmoid())
577 | 
578 |     def forward(self, x):
579 |         b, c, _, _ = x.size()
580 |         y = self.avg_pool(x).view(b, c)
581 |         y = self.fc(y).view(b, c, 1, 1)
582 |         y = torch.clamp(y, 0, 1)
583 |         return x * y
584 | 


--------------------------------------------------------------------------------
/normal_prune.py:
--------------------------------------------------------------------------------
  1 | from models import *
  2 | from utils.utils import *
  3 | import torch
  4 | import numpy as np
  5 | from copy import deepcopy
  6 | from test import test
  7 | from terminaltables import AsciiTable
  8 | import time
  9 | from utils.utils import *
 10 | from utils.prune_utils import *
 11 | import os
 12 | 
 13 | os.environ["CUDA_VISIBLE_DEVICES"] = '3'
 14 | 
 15 | class opt():
 16 |     model_def = "cfg/ghost32-yolov3-visdrone.cfg"
 17 |     data_config = "data/visdrone.data"
 18 |     model = 'weights/best/best.pt'
 19 | 
 20 | 
 21 | #指定GPU
 22 | #torch.cuda.set_device(2)
 23 | percent = 0.6
 24 | 
 25 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 26 | model = Darknet(opt.model_def).to(device)
 27 | 
 28 | if opt.model:
 29 |     if opt.model.endswith(".pt"):
 30 |         model.load_state_dict(torch.load(opt.model, map_location=device)['model'])
 31 |     else:
 32 |         _ = load_darknet_weights(model, opt.model)
 33 |         
 34 | 
 35 | data_config = parse_data_cfg(opt.data_config)
 36 | 
 37 | valid_path = data_config["valid"]
 38 | class_names = load_classes(data_config["names"])
 39 | 
 40 | 
 41 | eval_model = lambda model:test(model=model, imgsz=640, cfg='cfg/ghost-yolov3-visdrone.cfg', data='data/visdrone.data')
 42 | 
 43 | 
 44 | 
 45 | obtain_num_parameters = lambda model:sum([param.nelement() for param in model.parameters()])
 46 | 
 47 | #这个不应该注释掉，等会要恢复
 48 | with torch.no_grad():
 49 |     origin_model_metric = eval_model(model)
 50 |     #results, maps = test.test(cfg=opt.model_def,
 51 |                               #data=opt.data_config,
 52 |                               #batch_size=4,
 53 |                               #img_sz=608,
 54 |                               #model=ema.ema,
 55 |                               #conf_thres=0.001
 56 |                               #save_json=False,
 57 |                               #single_cls=False,
 58 | 
 59 |                               
 60 | origin_nparameters = obtain_num_parameters(model)
 61 | 
 62 | CBL_idx, Conv_idx, prune_idx= parse_module_defs(model.module_defs)
 63 | 
 64 | 
 65 | #将所有要剪枝的BN层的α参数，拷贝到bn_weights列表
 66 | bn_weights = gather_bn_weights(model.module_list, prune_idx)
 67 | 
 68 | #torch.sort返回二维列表，第一维是排序后的值列表，第二维是排序后的值列表对应的索引
 69 | sorted_bn = torch.sort(bn_weights)[0]
 70 | 
 71 | 
 72 | #避免剪掉所有channel的最高阈值(每个BN层的gamma的最大值的最小值即为阈值上限)
 73 | highest_thre = []
 74 | for idx in prune_idx:
 75 |     #.item()可以得到张量里的元素值
 76 |     highest_thre.append(model.module_list[idx][1].weight.data.abs().max().item())
 77 | highest_thre = min(highest_thre)
 78 | 
 79 | # 找到highest_thre对应的下标对应的百分比
 80 | percent_limit = (sorted_bn==highest_thre).nonzero().item()/len(bn_weights)
 81 | 
 82 | print(f'Threshold should be less than {highest_thre:.4f}.')
 83 | print(f'The corresponding prune ratio is {percent_limit:.3f}.')
 84 | 
 85 | 
 86 | # 该函数有很重要的意义：
 87 | # ①先用深拷贝将原始模型拷贝下来，得到model_copy
 88 | # ②将model_copy中，BN层中低于阈值的α参数赋值为0
 89 | # ③在BN层中，输出y=α*x+β，由于α参数的值被赋值为0，因此输入仅加了一个偏置β
 90 | # ④很神奇的是，network slimming中是将α参数和β参数都置0，该处只将α参数置0，但效果却很好：其实在另外一篇论文中，已经提到，可以先将β参数的效果移到
 91 | # 下一层卷积层，再去剪掉本层的α参数
 92 | 
 93 | # 该函数用最简单的方法，让我们看到了，如何快速看到剪枝后的效果
 94 | 
 95 | 
 96 | 
 97 | def prune_and_eval(model, sorted_bn, percent=.0):
 98 |     model_copy = deepcopy(model)
 99 |     thre_index = int(len(sorted_bn) * percent)
100 |     #获得α参数的阈值，小于该值的α参数对应的通道，全部裁剪掉
101 |     thre = sorted_bn[thre_index]
102 | 
103 |     print(f'Channels with Gamma value less than {thre:.4f} are pruned!')
104 | 
105 |     remain_num = 0
106 |     for idx in prune_idx:
107 | 
108 |         bn_module = model_copy.module_list[idx][1]
109 | 
110 |         mask = obtain_bn_mask(bn_module, thre)
111 | 
112 |         remain_num += int(mask.sum())
113 |         bn_module.weight.data.mul_(mask)
114 |     with torch.no_grad():
115 |         mAP = eval_model(model_copy)[1].mean()
116 | 
117 |     print(f'Number of channels has been reduced from {len(sorted_bn)} to {remain_num}')
118 |     print(f'Prune ratio: {1-remain_num/len(sorted_bn):.3f}')
119 |     print(f'mAP of the pruned model is {mAP:.4f}')
120 | 
121 |     return thre
122 | 
123 | 
124 | threshold = prune_and_eval(model, sorted_bn, percent)
125 | 
126 | 
127 | 
128 | #****************************************************************
129 | #虽然上面已经能看到剪枝后的效果，但是没有生成剪枝后的模型结构，因此下面的代码是为了生成新的模型结构并拷贝旧模型参数到新模型
130 | 
131 | 
132 | #%%
133 | def obtain_filters_mask(model, thre, CBL_idx, prune_idx):
134 | 
135 |     pruned = 0
136 |     total = 0
137 |     num_filters = []
138 |     filters_mask = []
139 |     #CBL_idx存储的是所有带BN的卷积层（YOLO层的前一层卷积层是不带BN的）
140 |     for idx in CBL_idx:
141 |         bn_module = model.module_list[idx][1]
142 |         if idx in prune_idx:
143 | 
144 |             mask = obtain_bn_mask(bn_module, thre).cpu().numpy()
145 |             remain = int(mask.sum())
146 |             pruned = pruned + mask.shape[0] - remain
147 | 
148 |             if remain == 0:
149 |                 print("Channels would be all pruned!")
150 |                 raise Exception
151 | 
152 |             print(f'layer index: {idx:>3d} \t total channel: {mask.shape[0]:>4d} \t '
153 |                   f'remaining channel: {remain:>4d}')
154 |         else:
155 |             mask = np.ones(bn_module.weight.data.shape)
156 |             remain = mask.shape[0]
157 | 
158 |         total += mask.shape[0]
159 |         num_filters.append(remain)
160 |         filters_mask.append(mask.copy())
161 | 
162 |     #因此，这里求出的prune_ratio,需要裁剪的α参数/cbl_idx中所有的α参数
163 |     prune_ratio = pruned / total
164 |     print(f'Prune channels: {pruned}\tPrune ratio: {prune_ratio:.3f}')
165 | 
166 |     return num_filters, filters_mask
167 | 
168 | num_filters, filters_mask = obtain_filters_mask(model, threshold, CBL_idx, prune_idx)
169 | 
170 | 
171 | #CBLidx2mask存储CBL_idx中，每一层BN层对应的mask
172 | CBLidx2mask = {idx: mask for idx, mask in zip(CBL_idx, filters_mask)}
173 | 
174 | pruned_model = prune_model_keep_size(model, prune_idx, CBL_idx, CBLidx2mask)
175 | 
176 | 
177 | 
178 | 
179 | with torch.no_grad():
180 |     mAP = eval_model(pruned_model)[1].mean()
181 | print('after prune_model_keep_size map is {}'.format(mAP))
182 | 
183 | 
184 | #获得原始模型的module_defs，并修改该defs中的卷积核数量
185 | compact_module_defs = deepcopy(model.module_defs)
186 | for idx, num in zip(CBL_idx, num_filters):
187 |     assert compact_module_defs[idx]['type'] == 'convolutional'
188 |     compact_module_defs[idx]['filters'] = str(num)
189 | 
190 | 
191 | 
192 | #compact_model = Darknet([model.hyp.copy()] + compact_module_defs).to(device)
193 | for i, mdef in enumerate(compact_module_defs):
194 |     if mdef['type'] == 'shortcut':
195 |         mdef['from'] = str(mdef['from'][0])
196 |     if mdef['type'] == 'route':
197 |         if len(mdef['layers']) == 2 :
198 |             mdef['layers'] = str(mdef['layers'][0]) + ',' + str(mdef['layers'][1])
199 |         else:
200 |             mdef['layers'] = str(mdef['layers'][0])
201 |     if mdef['type'] == 'yolo':
202 |         mdef['mask'] = str(mdef['mask'][0]) + ',' + str(mdef['mask'][1]) + ',' + str(mdef['mask'][2])
203 |         mdef['anchors'] = '4,5,  6,10,  14,9,  11,18,  25,15,  21,30,  47,26,  37,53,  87,65'
204 | 
205 | pruned_cfg_file = write_cfg('pruned.cfg',  [model.hyper.copy()] + compact_module_defs)
206 | 
207 | compact_model = Darknet('pruned.cfg').to(device)
208 | print(compact_model)
209 | compact_nparameters = obtain_num_parameters(compact_model)
210 | 
211 | init_weights_from_loose_model(compact_model, pruned_model, CBL_idx, Conv_idx, CBLidx2mask)
212 | 
213 | random_input = torch.rand((16, 3, 416, 416)).to(device)
214 | 
215 | def obtain_avg_forward_time(input, model, repeat=200):
216 | 
217 |     model.eval()
218 |     start = time.time()
219 |     with torch.no_grad():
220 |         for i in range(repeat):
221 |             output = model(input)
222 |     avg_infer_time = (time.time() - start) / repeat
223 | 
224 |     return avg_infer_time, output
225 | 
226 | pruned_forward_time, pruned_output = obtain_avg_forward_time(random_input, pruned_model)
227 | compact_forward_time, compact_output = obtain_avg_forward_time(random_input, compact_model)
228 | 
229 | 
230 | 
231 | # 在测试集上测试剪枝后的模型, 并统计模型的参数数量
232 | with torch.no_grad():
233 |     compact_model_metric = eval_model(compact_model)
234 | 
235 | 
236 | # 比较剪枝前后参数数量的变化、指标性能的变化
237 | metric_table = [
238 |     ["Metric", "Before", "After"],
239 |     ["mAP", f'{origin_model_metric[1].mean():.6f}', f'{compact_model_metric[1].mean():.6f}'],
240 |     ["Parameters", f"{origin_nparameters}", f"{compact_nparameters}"],
241 |     ["Inference", f'{pruned_forward_time:.4f}', f'{compact_forward_time:.4f}']
242 | ]
243 | print(AsciiTable(metric_table).table)
244 | 
245 | 
246 | 
247 | # 生成剪枝后的cfg文件并保存模型
248 | pruned_cfg_name = opt.model_def.replace('/', f'/prune_{percent}_')
249 | 
250 | #由于原始的compact_module_defs将anchor从字符串变为了数组，因此这里将anchors重新变为字符串
251 | 
252 | 
253 | #compact_model_name = opt.model.replace('/', f'/prune_{percent}_')
254 | compact_model_name = 'weights/yolov3_visdrone_normal_pruning_'+str(percent)+'percent.weights'
255 | 
256 | save_weights(compact_model, path=compact_model_name)
257 | print(f'Compact model has been saved: {compact_model_name}')
258 | 
259 | 
260 | 
261 | 


--------------------------------------------------------------------------------
/output/airplane.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HaloTrouvaille/YOLO-Multi-Backbones-Attention/c867b9b67eca97b1b89b2e5c0a1ed7e75f4f8747/output/airplane.png


--------------------------------------------------------------------------------
/output/car.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HaloTrouvaille/YOLO-Multi-Backbones-Attention/c867b9b67eca97b1b89b2e5c0a1ed7e75f4f8747/output/car.png


--------------------------------------------------------------------------------
/output/most.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HaloTrouvaille/YOLO-Multi-Backbones-Attention/c867b9b67eca97b1b89b2e5c0a1ed7e75f4f8747/output/most.png


--------------------------------------------------------------------------------
/output/test.py:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/test.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import json
  3 | 
  4 | from torch.utils.data import DataLoader
  5 | 
  6 | from models import *
  7 | from utils.datasets import *
  8 | from utils.utils import *
  9 | import os  
 10 | os.environ["CUDA_VISIBLE_DEVICES"] = '0'
 11 | 
 12 | hyp = {'degrees' : 0,'translate': 0.05 * 0,'scale': 0.05 * 0,'shear': 0.641 * 0, 'hsv_h': 0.0138,'hsv_s': 0.678,'hsv_v': 0.36}
 13 | 
 14 | def test(cfg,
 15 |          data,
 16 |          weights=None,
 17 |          batch_size=16,
 18 |          imgsz=416,
 19 |          conf_thres=0.001,
 20 |          iou_thres=0.6,  # for nms
 21 |          save_json=False,
 22 |          single_cls=False,
 23 |          augment=False,
 24 |          model=None,
 25 |          dataloader=None,
 26 |          multi_label=True):
 27 |     # Initialize/load model and set device
 28 |     if model is None:
 29 |         device = torch_utils.select_device(opt.device, batch_size=batch_size)
 30 |         verbose = opt.task == 'test'
 31 | 
 32 |         # Remove previous
 33 |         for f in glob.glob('test_batch*.jpg'):
 34 |             os.remove(f)
 35 | 
 36 |         # Initialize model
 37 |         model = Darknet(cfg, imgsz)
 38 | 
 39 |         # Load weights
 40 |         attempt_download(weights)
 41 |         if weights.endswith('.pt'):  # pytorch format
 42 |             model.load_state_dict(torch.load(weights, map_location=device)['model'])
 43 |         else:  # darknet format
 44 |             load_darknet_weights(model, weights)
 45 | 
 46 |         # Fuse
 47 |         #model.fuse()
 48 |         model.to(device)
 49 | 
 50 |         if device.type != 'cpu' and torch.cuda.device_count() > 1:
 51 |             model = nn.DataParallel(model)
 52 |     else:  # called by train.py
 53 |         device = next(model.parameters()).device  # get model device
 54 |         verbose = False
 55 | 
 56 |     # Configure run
 57 |     data = parse_data_cfg(data)
 58 |     nc = 1 if single_cls else int(data['classes'])  # number of classes
 59 |     path = data['valid']  # path to test images
 60 |     names = load_classes(data['names'])  # class names
 61 |     iouv = torch.linspace(0.5, 0.95, 10).to(device)  # iou vector for mAP@0.5:0.95
 62 |     iouv = iouv[0].view(1)  # comment for mAP@0.5:0.95
 63 |     niou = iouv.numel()
 64 | 
 65 |     # Dataloader
 66 |     if dataloader is None:
 67 |         dataset = LoadImagesAndLabels(path, imgsz, batch_size, rect=True, hyp=hyp,single_cls=False)
 68 |         batch_size = min(batch_size, len(dataset))
 69 |         dataloader = DataLoader(dataset,
 70 |                                 batch_size=batch_size,
 71 |                                 num_workers=min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]),
 72 |                                 pin_memory=True,
 73 |                                 collate_fn=dataset.collate_fn)
 74 | 
 75 |     seen = 0
 76 |     model.eval()
 77 |     _ = model(torch.zeros((1, 3, imgsz, imgsz), device=device)) if device.type != 'cpu' else None  # run once
 78 |     coco91class = coco80_to_coco91_class()
 79 |     s = ('%20s' + '%10s' * 6) % ('Class', 'Images', 'Targets', 'P', 'R', 'mAP@0.5', 'F1')
 80 |     p, r, f1, mp, mr, map, mf1, t0, t1 = 0., 0., 0., 0., 0., 0., 0., 0., 0.
 81 |     loss = torch.zeros(3, device=device)
 82 |     jdict, stats, ap, ap_class = [], [], [], []
 83 |     for batch_i, (imgs, targets, paths, shapes) in enumerate(tqdm(dataloader, desc=s)):
 84 |         imgs = imgs.to(device).float() / 255.0  # uint8 to float32, 0 - 255 to 0.0 - 1.0
 85 |         targets = targets.to(device)
 86 |         nb, _, height, width = imgs.shape  # batch size, channels, height, width
 87 |         whwh = torch.Tensor([width, height, width, height]).to(device)
 88 | 
 89 |         # Disable gradients
 90 |         with torch.no_grad():
 91 |             # Run model
 92 |             t = torch_utils.time_synchronized()
 93 |             inf_out, train_out = model(imgs, augment=augment)  # inference and training outputs
 94 |             t0 += torch_utils.time_synchronized() - t
 95 | 
 96 |             # Compute loss
 97 |             if hasattr(model, 'hyp'):  # if model has loss hyperparameters
 98 |                 loss += compute_loss(train_out, targets, model)[1][:3]  # GIoU, obj, cls
 99 | 
100 |             # Run NMS
101 |             t = torch_utils.time_synchronized()
102 |             output = non_max_suppression(inf_out, conf_thres=conf_thres, iou_thres=iou_thres, multi_label=multi_label)
103 |             t1 += torch_utils.time_synchronized() - t
104 | 
105 |         # Statistics per image
106 |         for si, pred in enumerate(output):
107 |             labels = targets[targets[:, 0] == si, 1:]
108 |             nl = len(labels)
109 |             tcls = labels[:, 0].tolist() if nl else []  # target class
110 |             seen += 1
111 | 
112 |             if pred is None:
113 |                 if nl:
114 |                     stats.append((torch.zeros(0, niou, dtype=torch.bool), torch.Tensor(), torch.Tensor(), tcls))
115 |                 continue
116 | 
117 |             # Append to text file
118 |             # with open('test.txt', 'a') as file:
119 |             #    [file.write('%11.5g' * 7 % tuple(x) + '\n') for x in pred]
120 | 
121 |             # Clip boxes to image bounds
122 |             clip_coords(pred, (height, width))
123 | 
124 |             # Append to pycocotools JSON dictionary
125 |             if save_json:
126 |                 # [{"image_id": 42, "category_id": 18, "bbox": [258.15, 41.29, 348.26, 243.78], "score": 0.236}, ...
127 |                 image_id = int(Path(paths[si]).stem.split('_')[-1])
128 |                 box = pred[:, :4].clone()  # xyxy
129 |                 scale_coords(imgs[si].shape[1:], box, shapes[si][0], shapes[si][1])  # to original shape
130 |                 box = xyxy2xywh(box)  # xywh
131 |                 box[:, :2] -= box[:, 2:] / 2  # xy center to top-left corner
132 |                 for p, b in zip(pred.tolist(), box.tolist()):
133 |                     jdict.append({'image_id': image_id,
134 |                                   'category_id': coco91class[int(p[5])],
135 |                                   'bbox': [round(x, 3) for x in b],
136 |                                   'score': round(p[4], 5)})
137 | 
138 |             # Assign all predictions as incorrect
139 |             correct = torch.zeros(pred.shape[0], niou, dtype=torch.bool, device=device)
140 |             if nl:
141 |                 detected = []  # target indices
142 |                 tcls_tensor = labels[:, 0]
143 | 
144 |                 # target boxes
145 |                 tbox = xywh2xyxy(labels[:, 1:5]) * whwh
146 | 
147 |                 # Per target class
148 |                 for cls in torch.unique(tcls_tensor):
149 |                     ti = (cls == tcls_tensor).nonzero().view(-1)  # prediction indices
150 |                     pi = (cls == pred[:, 5]).nonzero().view(-1)  # target indices
151 | 
152 |                     # Search for detections
153 |                     if pi.shape[0]:
154 |                         # Prediction to target ious
155 |                         ious, i = box_iou(pred[pi, :4], tbox[ti]).max(1)  # best ious, indices
156 | 
157 |                         # Append detections
158 |                         for j in (ious > iouv[0]).nonzero():
159 |                             d = ti[i[j]]  # detected target
160 |                             if d not in detected:
161 |                                 detected.append(d)
162 |                                 correct[pi[j]] = ious[j] > iouv  # iou_thres is 1xn
163 |                                 if len(detected) == nl:  # all targets already located in image
164 |                                     break
165 | 
166 |             # Append statistics (correct, conf, pcls, tcls)
167 |             stats.append((correct.cpu(), pred[:, 4].cpu(), pred[:, 5].cpu(), tcls))
168 | 
169 |         # Plot images
170 |         if batch_i < 1:
171 |             f = 'test_batch%g_gt.jpg' % batch_i  # filename
172 |             plot_images(imgs, targets, paths=paths, names=names, fname=f)  # ground truth
173 |             f = 'test_batch%g_pred.jpg' % batch_i
174 |             plot_images(imgs, output_to_target(output, width, height), paths=paths, names=names, fname=f)  # predictions
175 | 
176 |     # Compute statistics
177 |     stats = [np.concatenate(x, 0) for x in zip(*stats)]  # to numpy
178 |     if len(stats):
179 |         p, r, ap, f1, ap_class = ap_per_class(*stats)
180 |         if niou > 1:
181 |             p, r, ap, f1 = p[:, 0], r[:, 0], ap.mean(1), ap[:, 0]  # [P, R, AP@0.5:0.95, AP@0.5]
182 |         mp, mr, map, mf1 = p.mean(), r.mean(), ap.mean(), f1.mean()
183 |         nt = np.bincount(stats[3].astype(np.int64), minlength=nc)  # number of targets per class
184 |     else:
185 |         nt = torch.zeros(1)
186 | 
187 |     # Print results
188 |     pf = '%20s' + '%10.3g' * 6  # print format
189 |     print(pf % ('all', seen, nt.sum(), mp, mr, map, mf1))
190 | 
191 |     # Print results per class
192 |     if verbose and nc > 1 and len(stats):
193 |         for i, c in enumerate(ap_class):
194 |             print(pf % (names[c], seen, nt[c], p[i], r[i], ap[i], f1[i]))
195 | 
196 |     # Print speeds
197 |     if verbose or save_json:
198 |         t = tuple(x / seen * 1E3 for x in (t0, t1, t0 + t1)) + (imgsz, imgsz, batch_size)  # tuple
199 |         print('Speed: %.1f/%.1f/%.1f ms inference/NMS/total per %gx%g image at batch-size %g' % t)
200 | 
201 |     # Save JSON
202 |     if save_json and map and len(jdict):
203 |         print('\nCOCO mAP with pycocotools...')
204 |         imgIds = [int(Path(x).stem.split('_')[-1]) for x in dataloader.dataset.img_files]
205 |         with open('results.json', 'w') as file:
206 |             json.dump(jdict, file)
207 | 
208 |         try:
209 |             from pycocotools.coco import COCO
210 |             from pycocotools.cocoeval import COCOeval
211 | 
212 |             # https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb
213 |             cocoGt = COCO(glob.glob('../coco/annotations/instances_val*.json')[0])  # initialize COCO ground truth api
214 |             cocoDt = cocoGt.loadRes('results.json')  # initialize COCO pred api
215 | 
216 |             cocoEval = COCOeval(cocoGt, cocoDt, 'bbox')
217 |             cocoEval.params.imgIds = imgIds  # [:32]  # only evaluate these images
218 |             cocoEval.evaluate()
219 |             cocoEval.accumulate()
220 |             cocoEval.summarize()
221 |             # mf1, map = cocoEval.stats[:2]  # update to pycocotools results (mAP@0.5:0.95, mAP@0.5)
222 |         except:
223 |             print('WARNING: pycocotools must be installed with numpy==1.17 to run correctly. '
224 |                   'See https://github.com/cocodataset/cocoapi/issues/356')
225 | 
226 |     # Return results
227 |     maps = np.zeros(nc) + map
228 |     for i, c in enumerate(ap_class):
229 |         maps[c] = ap[i]
230 |     return (mp, mr, map, mf1, *(loss.cpu() / len(dataloader)).tolist()), maps
231 | 
232 | 
233 | if __name__ == '__main__':
234 |     parser = argparse.ArgumentParser(prog='test.py')
235 |     parser.add_argument('--cfg', type=str, default='cfg/yolov3-spp.cfg', help='*.cfg path')
236 |     parser.add_argument('--data', type=str, default='data/coco2014.data', help='*.data path')
237 |     parser.add_argument('--weights', type=str, default='weights/yolov3-spp-ultralytics.pt', help='weights path')
238 |     parser.add_argument('--batch-size', type=int, default=8, help='size of each image batch')
239 |     parser.add_argument('--img-size', type=int, default=512, help='inference size (pixels)')
240 |     parser.add_argument('--conf-thres', type=float, default=0.001, help='object confidence threshold')
241 |     parser.add_argument('--iou-thres', type=float, default=0.5, help='IOU threshold for NMS')
242 |     parser.add_argument('--save-json', action='store_true', help='save a cocoapi-compatible JSON results file')
243 |     parser.add_argument('--task', default='test', help="'test', 'study', 'benchmark'")
244 |     parser.add_argument('--device', default='', help='device id (i.e. 0 or 0,1) or cpu')
245 |     parser.add_argument('--single-cls', action='store_true', help='train as single-class dataset')
246 |     parser.add_argument('--augment', action='store_true', help='augmented inference')
247 |     opt = parser.parse_args()
248 |     #opt.save_json = opt.save_json or any([x in opt.data for x in ['coco.data', 'coco2014.data', 'coco2017.data']])
249 |     opt.save_json = False
250 |     opt.cfg = list(glob.iglob('./**/' + opt.cfg, recursive=True))[0]  # find file
251 |     opt.data = list(glob.iglob('./**/' + opt.data, recursive=True))[0]  # find file
252 |     print(opt)
253 | 
254 |     # task = 'test', 'study', 'benchmark'
255 |     if opt.task == 'test':  # (default) test normally
256 |         test(opt.cfg,
257 |              opt.data,
258 |              opt.weights,
259 |              opt.batch_size,
260 |              opt.img_size,
261 |              opt.conf_thres,
262 |              opt.iou_thres,
263 |              opt.save_json,
264 |              opt.single_cls,
265 |              opt.augment)
266 | 
267 |     elif opt.task == 'benchmark':  # mAPs at 256-640 at conf 0.5 and 0.7
268 |         y = []
269 |         for i in list(range(256, 640, 128)):  # img-size
270 |             for j in [0.6, 0.7]:  # iou-thres
271 |                 t = time.time()
272 |                 r = test(opt.cfg, opt.data, opt.weights, opt.batch_size, i, opt.conf_thres, j, opt.save_json)[0]
273 |                 y.append(r + (time.time() - t,))
274 |         np.savetxt('benchmark.txt', y, fmt='%10.4g')  # y = np.loadtxt('study.txt')
275 | 


--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | 
  3 | import torch.distributed as dist
  4 | import torch.optim as optim
  5 | import torch.optim.lr_scheduler as lr_scheduler
  6 | from torch.utils.tensorboard import SummaryWriter
  7 | 
  8 | import test  # import test.py to get mAP after each epoch
  9 | from models import *
 10 | from utils.datasets import *
 11 | from utils.utils import *
 12 | from utils.prune_utils import *
 13 | import os 
 14 | 
 15 | mixed_precision = True
 16 | try:  # Mixed precision training https://github.com/NVIDIA/apex
 17 |     from apex import amp
 18 | except:
 19 |     print('Apex recommended for faster mixed precision training: https://github.com/NVIDIA/apex')
 20 |     mixed_precision = False  # not installed
 21 | 
 22 | wdir = 'weights' + os.sep  # weights dir
 23 | last = wdir + 'last.pt'
 24 | best = wdir + 'best.pt'
 25 | results_file = 'results.txt'
 26 | 
 27 | # Hyperparameters
 28 | hyp = {'giou': 3.54,  # giou loss gain
 29 |        'cls': 37.4,  # cls loss gain
 30 |        'cls_pw': 1.0,  # cls BCELoss positive_weight
 31 |        'obj': 64.3,  # obj loss gain (*=img_size/320 if img_size != 320)
 32 |        'obj_pw': 1.0,  # obj BCELoss positive_weight
 33 |        'iou_t': 0.20,  # iou training threshold
 34 |        'lr0': 0.01,  # initial learning rate (SGD=5E-3, Adam=5E-4)
 35 |        'lrf': 0.0005,  # final learning rate (with cos scheduler)
 36 |        'momentum': 0.937,  # SGD momentum
 37 |        'weight_decay': 0.000484,  # optimizer weight decay
 38 |        'fl_gamma': 0.0,  # focal loss gamma (efficientDet default is gamma=1.5)
 39 |        'hsv_h': 0.0138,  # image HSV-Hue augmentation (fraction)
 40 |        'hsv_s': 0.678,  # image HSV-Saturation augmentation (fraction)
 41 |        'hsv_v': 0.36,  # image HSV-Value augmentation (fraction)
 42 |        'degrees': 1.98 * 0,  # image rotation (+/- deg)
 43 |        'translate': 0.05 * 0,  # image translation (+/- fraction)
 44 |        'scale': 0.05 * 0,  # image scale (+/- gain)
 45 |        'shear': 0.641 * 0}  # image shear (+/- deg)
 46 | 
 47 | # Overwrite hyp with hyp*.txt (optional)
 48 | f = glob.glob('hyp*.txt')
 49 | if f:
 50 |     print('Using %s' % f[0])
 51 |     for k, v in zip(hyp.keys(), np.loadtxt(f[0])):
 52 |         hyp[k] = v
 53 | 
 54 | # Print focal loss if gamma > 0
 55 | if hyp['fl_gamma']:
 56 |     print('Using FocalLoss(gamma=%g)' % hyp['fl_gamma'])
 57 | 
 58 | 
 59 | def train(hyp):
 60 |     cfg = opt.cfg
 61 |     t_cfg = opt.t_cfg
 62 |     data = opt.data
 63 |     epochs = opt.epochs  # 500200 batches at bs 64, 117263 images = 273 epochs
 64 |     batch_size = opt.batch_size
 65 |     accumulate = max(round(64 / batch_size), 1)  # accumulate n times before optimizer update (bs 64)
 66 |     weights = opt.weights  # initial training weights
 67 |     t_weights = opt.t_weights # 老师模型权重
 68 |     imgsz_min, imgsz_max, imgsz_test = opt.img_size  # img sizes (min, max, test)
 69 | 
 70 |     # Image Sizes
 71 |     gs = 64  # (pixels) grid size
 72 |     assert math.fmod(imgsz_min, gs) == 0, '--img-size %g must be a %g-multiple' % (imgsz_min, gs)
 73 |     opt.multi_scale |= imgsz_min != imgsz_max  # multi if different (min, max)
 74 |     if opt.multi_scale:
 75 |         if imgsz_min == imgsz_max:
 76 |             imgsz_min //= 1.5
 77 |             imgsz_max //= 0.667
 78 |         grid_min, grid_max = imgsz_min // gs, imgsz_max // gs
 79 |         imgsz_min, imgsz_max = int(grid_min * gs), int(grid_max * gs)
 80 |     img_size = imgsz_max  # initialize with max size
 81 | 
 82 |     # Configure run
 83 |     init_seeds()
 84 |     data_dict = parse_data_cfg(data)
 85 |     train_path = data_dict['train']
 86 |     test_path = data_dict['valid']
 87 |     nc = 1 if opt.single_cls else int(data_dict['classes'])  # number of classes
 88 |     hyp['cls'] *= nc / 80  # update coco-tuned hyp['cls'] to current dataset
 89 | 
 90 |     # Remove previous results
 91 |     for f in glob.glob('*_batch*.jpg') + glob.glob(results_file):
 92 |         os.remove(f)
 93 | 
 94 |     # Initialize model
 95 |     model = Darknet(cfg).to(device)
 96 |     #print(model)
 97 |     if t_cfg:
 98 |         t_model = Darknet(t_cfg).to(device)
 99 |     # Optimizer
100 |     pg0, pg1, pg2 = [], [], []  # optimizer parameter groups
101 |     for k, v in dict(model.named_parameters()).items():
102 |         if '.bias' in k:
103 |             pg2 += [v]  # biases
104 |         elif 'Conv2d.weight' in k:
105 |             pg1 += [v]  # apply weight_decay
106 |         else:
107 |             pg0 += [v]  # all else
108 | 
109 |     if opt.adam:
110 |         # hyp['lr0'] *= 0.1  # reduce lr (i.e. SGD=5E-3, Adam=5E-4)
111 |         optimizer = optim.Adam(pg0, lr=hyp['lr0'])
112 |         # optimizer = AdaBound(pg0, lr=hyp['lr0'], final_lr=0.1)
113 |     else:
114 |         optimizer = optim.SGD(pg0, lr=hyp['lr0'], momentum=hyp['momentum'], nesterov=True)
115 |     optimizer.add_param_group({'params': pg1, 'weight_decay': hyp['weight_decay']})  # add pg1 with weight_decay
116 |     optimizer.add_param_group({'params': pg2})  # add pg2 (biases)
117 |     print('Optimizer groups: %g .bias, %g Conv2d.weight, %g other' % (len(pg2), len(pg1), len(pg0)))
118 |     del pg0, pg1, pg2
119 | 
120 |     start_epoch = 0
121 |     best_fitness = 0.0
122 | 
123 |     #attempt_download(weights)
124 |     # 待修改
125 |     pretrain = False
126 |     imagenetpre = False
127 |     if imagenetpre:
128 |         model_dict = model.state_dict()
129 |         pretrained_dict = torch.load('weights/checkpoint.t7',map_location='cuda:3')
130 |         pretrained_dict = {k: v for k, v in model_dict.items() if k in pretrained_dict}
131 |         model_dict.update(pretrained_dict)
132 |         model.load_state_dict(model_dict)
133 |         print("Load Imagenet pretrain successfully!")
134 |     if pretrain:
135 |         attempt_download(weights)
136 |         if weights.endswith('.pt'):  # pytorch format
137 |             # possible weights are '*.pt', 'yolov3-spp.pt', 'yolov3-tiny.pt' etc.
138 |             chkpt = torch.load(weights, map_location=device)
139 | 
140 |             # load model
141 |             try:
142 |                 chkpt['model'] = {k: v for k, v in chkpt['model'].items() if model.state_dict()[k].numel() == v.numel()}
143 |                 model.load_state_dict(chkpt['model'], strict=False)
144 |             except KeyError as e:
145 |                 s = "%s is not compatible with %s. Specify --weights '' or specify a --cfg compatible with %s. " \
146 |                     "See https://github.com/ultralytics/yolov3/issues/657" % (opt.weights, opt.cfg, opt.weights)
147 |                 raise KeyError(s) from e
148 | 
149 |             # load optimizer
150 |             if chkpt['optimizer'] is not None:
151 |                 optimizer.load_state_dict(chkpt['optimizer'])
152 |                 best_fitness = chkpt['best_fitness']
153 | 
154 |             # load results
155 |             if chkpt.get('training_results') is not None:
156 |                 with open(results_file, 'w') as file:
157 |                     file.write(chkpt['training_results'])  # write results.txt
158 | 
159 |             start_epoch = chkpt['epoch'] + 1
160 |             del chkpt
161 | 
162 |         elif len(weights) > 0:  # darknet format
163 |             # possible weights are '*.weights', 'yolov3-tiny.conv.15',  'darknet53.conv.74' etc.
164 |             load_darknet_weights(model, weights)
165 |     if t_cfg:
166 |         if t_weights.endswith('.pt'):
167 |             t_model.load_state_dict(torch.load(t_weights, map_location=device)['model'])
168 | 
169 |         elif t_weights.endwith('.weights'):
170 |             load_darknet_weights(t_model, t_weights)
171 | 
172 |         else:
173 |             raise Exception('Unsupported weight format! Please use .pt or .weights')
174 | 
175 |         if not mixed_precision:
176 |             t_model.eval()
177 |         print("<-----------Using Knowledge Distillation---------->")
178 | 
179 |     if hasattr(model, 'module'):
180 |         _,_,prune_idx= parse_module_defs(model.module.module_defs)
181 |     else:
182 |         _,_,prune_idx= parse_module_defs(model.module_defs)
183 |     # Mixed precision training https://github.com/NVIDIA/apex
184 |     if mixed_precision:
185 |         model, optimizer = amp.initialize(model, optimizer, opt_level='O1', verbosity=0)
186 | 
187 |     # Scheduler https://arxiv.org/pdf/1812.01187.pdf
188 |     lf = lambda x: (((1 + math.cos(x * math.pi / epochs)) / 2) ** 1.0) * 0.95 + 0.05  # cosine
189 |     scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf)
190 |     scheduler.last_epoch = start_epoch - 1  # see link below
191 |     # https://discuss.pytorch.org/t/a-problem-occured-when-resuming-an-optimizer/28822
192 | 
193 |     # Plot lr schedule
194 |     # y = []
195 |     # for _ in range(epochs):
196 |     #     scheduler.step()
197 |     #     y.append(optimizer.param_groups[0]['lr'])
198 |     # plt.plot(y, '.-', label='LambdaLR')
199 |     # plt.xlabel('epoch')
200 |     # plt.ylabel('LR')
201 |     # plt.tight_layout()
202 |     # plt.savefig('LR.png', dpi=300)
203 | 
204 |     # Initialize distributed training
205 |     if device.type != 'cpu' and torch.cuda.device_count() > 1 and torch.distributed.is_available():
206 |         dist.init_process_group(backend='nccl',  # 'distributed backend'
207 |                                 init_method='tcp://127.0.0.1:9999',  # distributed training init method
208 |                                 world_size=1,  # number of nodes for distributed training
209 |                                 rank=0)  # distributed training node rank
210 |         model = torch.nn.parallel.DistributedDataParallel(model, find_unused_parameters=True)
211 |         model.yolo_layers = model.module.yolo_layers  # move yolo layer indices to top level
212 | 
213 |     # Dataset
214 |     dataset = LoadImagesAndLabels(train_path, img_size, batch_size,
215 |                                   augment=True,
216 |                                   hyp=hyp,  # augmentation hyperparameters
217 |                                   rect=opt.rect,  # rectangular training
218 |                                   cache_images=opt.cache_images,
219 |                                   single_cls=opt.single_cls)
220 | 
221 |     # Dataloader
222 |     batch_size = min(batch_size, len(dataset))
223 |     nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8])  # number of workers
224 |     dataloader = torch.utils.data.DataLoader(dataset,
225 |                                              batch_size=batch_size,
226 |                                              num_workers=nw,
227 |                                              shuffle=not opt.rect,  # Shuffle=True unless rectangular training is used
228 |                                              pin_memory=True,
229 |                                              collate_fn=dataset.collate_fn)
230 | 
231 |     # Testloader
232 |     testloader = torch.utils.data.DataLoader(LoadImagesAndLabels(test_path, imgsz_test, batch_size,
233 |                                                                  hyp=hyp,
234 |                                                                  rect=True,
235 |                                                                  cache_images=opt.cache_images,
236 |                                                                  single_cls=opt.single_cls),
237 |                                              batch_size=batch_size,
238 |                                              num_workers=nw,
239 |                                              pin_memory=True,
240 |                                              collate_fn=dataset.collate_fn)
241 | 
242 |     # Model parameters
243 |     model.nc = nc  # attach number of classes to model
244 |     model.hyp = hyp  # attach hyperparameters to model
245 |     model.gr = 1.0  # giou loss ratio (obj_loss = 1.0 or giou)
246 |     model.class_weights = labels_to_class_weights(dataset.labels, nc).to(device)  # attach class weights
247 | 
248 |     # Model EMA
249 |     ema = torch_utils.ModelEMA(model)
250 | 
251 |     # Start training
252 |     nb = len(dataloader)  # number of batches
253 |     n_burn = max(3 * nb, 500)  # burn-in iterations, max(3 epochs, 500 iterations)
254 |     maps = np.zeros(nc)  # mAP per class
255 |     # torch.autograd.set_detect_anomaly(True)
256 |     results = (0, 0, 0, 0, 0, 0, 0)  # 'P', 'R', 'mAP', 'F1', 'val GIoU', 'val Objectness', 'val Classification'
257 |     t0 = time.time()
258 |     print('Image sizes %g - %g train, %g test' % (imgsz_min, imgsz_max, imgsz_test))
259 |     print('Using %g dataloader workers' % nw)
260 |     print('Starting training for %g epochs...' % epochs)
261 |     for epoch in range(start_epoch, epochs):  # epoch ------------------------------------------------------------------
262 |         model.train()
263 |         sr_flag = get_sr_flag(epoch, opt.sr)
264 |         # Update image weights (optional)
265 |         if dataset.image_weights:
266 |             w = model.class_weights.cpu().numpy() * (1 - maps) ** 2  # class weights
267 |             image_weights = labels_to_image_weights(dataset.labels, nc=nc, class_weights=w)
268 |             dataset.indices = random.choices(range(dataset.n), weights=image_weights, k=dataset.n)  # rand weighted idx
269 | 
270 |         mloss = torch.zeros(4).to(device)  # mean losses
271 |         msoft_target = torch.zeros(1).to(device)
272 |         
273 |         print(('\n' + '%10s' * 9) % ('Epoch', 'gpu_mem', 'GIoU', 'obj', 'cls', 'total', 'soft', 'targets', 'img_size'))
274 | 
275 |         pbar = tqdm(enumerate(dataloader), total=nb)  # progress bar
276 |         for i, (imgs, targets, paths, _) in pbar:  # batch -------------------------------------------------------------
277 |             ni = i + nb * epoch  # number integrated batches (since train start)
278 |             imgs = imgs.to(device).float() / 255.0  # uint8 to float32, 0 - 255 to 0.0 - 1.0
279 |             targets = targets.to(device)
280 | 
281 |             # Burn-in
282 |             if ni <= n_burn:
283 |                 xi = [0, n_burn]  # x interp
284 |                 model.gr = np.interp(ni, xi, [0.0, 1.0])  # giou loss ratio (obj_loss = 1.0 or giou)
285 |                 accumulate = max(1, np.interp(ni, xi, [1, 64 / batch_size]).round())
286 |                 for j, x in enumerate(optimizer.param_groups):
287 |                     # bias lr falls from 0.1 to lr0, all other lrs rise from 0.0 to lr0
288 |                     x['lr'] = np.interp(ni, xi, [0.1 if j == 2 else 0.0, x['initial_lr'] * lf(epoch)])
289 |                     x['weight_decay'] = np.interp(ni, xi, [0.0, hyp['weight_decay'] if j == 1 else 0.0])
290 |                     if 'momentum' in x:
291 |                         x['momentum'] = np.interp(ni, xi, [0.9, hyp['momentum']])
292 | 
293 |             # Multi-Scale
294 |             if opt.multi_scale:
295 |                 if ni / accumulate % 1 == 0:  #  adjust img_size (67% - 150%) every 1 batch
296 |                     img_size = random.randrange(grid_min, grid_max + 1) * gs
297 |                 sf = img_size / max(imgs.shape[2:])  # scale factor
298 |                 if sf != 1:
299 |                     ns = [math.ceil(x * sf / gs) * gs for x in imgs.shape[2:]]  # new shape (stretched to 32-multiple)
300 |                     imgs = F.interpolate(imgs, size=ns, mode='bilinear', align_corners=False)
301 | 
302 |             # Forward
303 |             pred = model(imgs) # 输出3个YOLO层的输出
304 | 
305 | 
306 | 
307 |             # Loss
308 |             loss, loss_items = compute_loss(pred, targets, model)
309 |             if not torch.isfinite(loss):
310 |                 print('WARNING: non-finite loss, ending training ', loss_items)
311 |                 return results
312 | 
313 |             soft_target = 0
314 |             reg_ratio = 0  #表示对于目标回归效果没有老师模型好，针对这部分此时再与ground truth学习
315 | 
316 |             if t_cfg:
317 |                 if mixed_precision:
318 |                     with torch.no_grad():
319 |                         t_output = t_model(imgs)
320 |                 else:
321 |                     _, t_output = t_model(imgs)
322 |                 soft_target = distillation_loss1(pred, t_output, nc, imgs.size(0))
323 |                 #soft_target, reg_ratio = distillation_loss2(model, targets, pred, t_output)
324 |                 loss += soft_target
325 | 
326 |             # Backward
327 |             loss *= batch_size / 64  # scale loss
328 |             if mixed_precision:
329 |                 with amp.scale_loss(loss, optimizer) as scaled_loss:
330 |                     scaled_loss.backward()
331 |             else:
332 |                 loss.backward()
333 |             
334 |             if hasattr(model, 'module'):
335 |                 BNOptimizer.updateBN(sr_flag, model.module.module_list, opt.s, prune_idx)
336 |             else:
337 |                 BNOptimizer.updateBN(sr_flag, model.module_list, opt.s, prune_idx)
338 | 
339 |             
340 |             # Optimize
341 |             if ni % accumulate == 0:
342 |                 optimizer.step()
343 |                 optimizer.zero_grad()
344 |                 ema.update(model)
345 | 
346 |             # Print
347 |             mloss = (mloss * i + loss_items) / (i + 1)  # update mean losses
348 |             msoft_target = (msoft_target * i + soft_target) / (i + 1)
349 |             
350 |             mem = '%.3gG' % (torch.cuda.memory_cached() / 1E9 if torch.cuda.is_available() else 0)  # (GB)
351 |             s = ('%10s' * 2 + '%10.3g' * 7) % ('%g/%g' % (epoch, epochs - 1), mem, *mloss, msoft_target, len(targets), img_size)
352 |             pbar.set_description(s)
353 | 
354 |             # Plot
355 |             if ni < 1:
356 |                 f = 'train_batch%g.jpg' % i  # filename
357 |                 res = plot_images(images=imgs, targets=targets, paths=paths, fname=f)
358 |                 if tb_writer:
359 |                     tb_writer.add_image(f, res, dataformats='HWC', global_step=epoch)
360 |                     # tb_writer.add_graph(model, imgs)  # add model to tensorboard
361 | 
362 |             # end batch ------------------------------------------------------------------------------------------------
363 | 
364 |         # Update scheduler
365 |         scheduler.step()
366 | 
367 |         # Process epoch results
368 |         ema.update_attr(model)
369 |         final_epoch = epoch + 1 == epochs
370 |         if not opt.notest or final_epoch:  # Calculate mAP
371 |             is_coco = any([x in data for x in ['coco.data', 'coco2014.data', 'coco2017.data']]) and model.nc == 80
372 |             results, maps = test.test(cfg,
373 |                                       data,
374 |                                       batch_size=batch_size,
375 |                                       imgsz=imgsz_test,
376 |                                       model=ema.ema,
377 |                                       conf_thres=0.001,
378 |                                       save_json=final_epoch and is_coco,
379 |                                       single_cls=opt.single_cls,
380 |                                       dataloader=testloader,
381 |                                       multi_label=ni > n_burn)
382 | 
383 |         # Write
384 |         with open(results_file, 'a') as f:
385 |             f.write(s + '%10.3g' * 7 % results + '\n')  # P, R, mAP, F1, test_losses=(GIoU, obj, cls)
386 |         if len(opt.name) and opt.bucket:
387 |             os.system('gsutil cp results.txt gs://%s/results/results%s.txt' % (opt.bucket, opt.name))
388 | 
389 |         # Tensorboard
390 |         if tb_writer:
391 |             tags = ['train/giou_loss', 'train/obj_loss', 'train/cls_loss',
392 |                     'metrics/precision', 'metrics/recall', 'metrics/mAP_0.5', 'metrics/F1',
393 |                     'val/giou_loss', 'val/obj_loss', 'val/cls_loss']
394 |             for x, tag in zip(list(mloss[:-1]) + list(results), tags):
395 |                 tb_writer.add_scalar(tag, x, epoch)
396 | 
397 |         # Update best mAP
398 |         fi = fitness(np.array(results).reshape(1, -1))  # fitness_i = weighted combination of [P, R, mAP, F1]
399 |         if fi > best_fitness:
400 |             best_fitness = fi
401 | 
402 |         # Save model
403 |         save = (not opt.nosave) or (final_epoch and not opt.evolve)
404 |         if save:
405 |             with open(results_file, 'r') as f:  # create checkpoint
406 |                 chkpt = {'epoch': epoch,
407 |                          'best_fitness': best_fitness,
408 |                          'training_results': f.read(),
409 |                          'model': ema.ema.module.state_dict() if hasattr(model, 'module') else ema.ema.state_dict(),
410 |                          'optimizer': None if final_epoch else optimizer.state_dict()}
411 | 
412 |             # Save last, best and delete
413 |             torch.save(chkpt, last)
414 |             if (best_fitness == fi) and not final_epoch:
415 |                 torch.save(chkpt, best)
416 |             del chkpt
417 | 
418 |         # end epoch ----------------------------------------------------------------------------------------------------
419 |     # end training
420 | 
421 |     n = opt.name
422 |     if len(n):
423 |         n = '_' + n if not n.isnumeric() else n
424 |         fresults, flast, fbest = 'results%s.txt' % n, wdir + 'last%s.pt' % n, wdir + 'best%s.pt' % n
425 |         for f1, f2 in zip([wdir + 'last.pt', wdir + 'best.pt', 'results.txt'], [flast, fbest, fresults]):
426 |             if os.path.exists(f1):
427 |                 os.rename(f1, f2)  # rename
428 |                 ispt = f2.endswith('.pt')  # is *.pt
429 |                 strip_optimizer(f2) if ispt else None  # strip optimizer
430 |                 os.system('gsutil cp %s gs://%s/weights' % (f2, opt.bucket)) if opt.bucket and ispt else None  # upload
431 | 
432 |     if not opt.evolve:
433 |         plot_results()  # save as results.png
434 |     print('%g epochs completed in %.3f hours.\n' % (epoch - start_epoch + 1, (time.time() - t0) / 3600))
435 |     dist.destroy_process_group() if torch.cuda.device_count() > 1 else None
436 |     torch.cuda.empty_cache()
437 |     return results
438 | 
439 | 
440 | if __name__ == '__main__':
441 |     parser = argparse.ArgumentParser()
442 |     parser.add_argument('--epochs', type=int, default=300)  # 500200 batches at bs 16, 117263 COCO images = 273 epochs
443 |     parser.add_argument('--batch-size', type=int, default=8)  # effective bs = batch_size * accumulate = 16 * 4 = 64
444 |     parser.add_argument('--cfg', type=str, default='cfg/yolov3-spp.cfg', help='*.cfg path')
445 |     parser.add_argument('--data', type=str, default='data/coco2017.data', help='*.data path')
446 |     parser.add_argument('--multi-scale', action='store_true', help='adjust (67%% - 150%%) img_size every 10 batches')
447 |     parser.add_argument('--img-size', nargs='+', type=int, default=[320, 640], help='[min_train, max-train, test]')
448 |     parser.add_argument('--rect', action='store_true', help='rectangular training')
449 |     parser.add_argument('--resume', action='store_true', help='resume training from last.pt')
450 |     parser.add_argument('--nosave', action='store_true', help='only save final checkpoint')
451 |     parser.add_argument('--notest', action='store_true', help='only test final epoch')
452 |     parser.add_argument('--evolve', action='store_true', help='evolve hyperparameters')
453 |     parser.add_argument('--bucket', type=str, default='', help='gsutil bucket')
454 |     parser.add_argument('--cache-images', action='store_true', help='cache images for faster training')
455 |     parser.add_argument('--weights', type=str, default='weights/last.pt', help='initial weights path')
456 |     parser.add_argument('--name', default='', help='renames results.txt to results_name.txt if supplied')
457 |     parser.add_argument('--device', default='', help='device id (i.e. 0 or 0,1 or cpu)')
458 |     parser.add_argument('--adam', action='store_true', help='use adam optimizer')
459 |     parser.add_argument('--single-cls', action='store_true', help='train as single-class dataset')
460 |     parser.add_argument('--sparsity-regularization', '-sr', dest='sr', action='store_true',help='train with channel sparsity regularization')
461 |     parser.add_argument('--s', type=float, default=0.001, help='scale sparse rate')
462 |     parser.add_argument('--t_cfg', type=str, default='', help='teacher model cfg file path for knowledge distillation')
463 |     parser.add_argument('--t_weights', type=str, default='', help='teacher model weights')
464 |     opt = parser.parse_args()
465 |     opt.weights = last if opt.resume else opt.weights
466 |     check_git_status()
467 |     opt.cfg = list(glob.iglob('./**/' + opt.cfg, recursive=True))[0]  # find file
468 |     # opt.data = list(glob.iglob('./**/' + opt.data, recursive=True))[0]  # find file
469 |     print(opt)
470 |     opt.img_size.extend([opt.img_size[-1]] * (3 - len(opt.img_size)))  # extend to 3 sizes (min, max, test)
471 |     device = torch_utils.select_device(opt.device, apex=mixed_precision, batch_size=opt.batch_size)
472 |     if device.type == 'cpu':
473 |         mixed_precision = False
474 | 
475 |     # scale hyp['obj'] by img_size (evolved at 320)
476 |     # hyp['obj'] *= opt.img_size[0] / 320.
477 | 
478 |     tb_writer = None
479 |     if not opt.evolve:  # Train normally
480 |         print('Start Tensorboard with "tensorboard --logdir=runs", view at http://localhost:6006/')
481 |         tb_writer = SummaryWriter(comment=opt.name)
482 |         train(hyp)  # train normally
483 | 
484 |     else:  # Evolve hyperparameters (optional)
485 |         opt.notest, opt.nosave = True, True  # only test/save final epoch
486 |         if opt.bucket:
487 |             os.system('gsutil cp gs://%s/evolve.txt .' % opt.bucket)  # download evolve.txt if exists
488 | 
489 |         for _ in range(1):  # generations to evolve
490 |             if os.path.exists('evolve.txt'):  # if evolve.txt exists: select best hyps and mutate
491 |                 # Select parent(s)
492 |                 parent = 'single'  # parent selection method: 'single' or 'weighted'
493 |                 x = np.loadtxt('evolve.txt', ndmin=2)
494 |                 n = min(5, len(x))  # number of previous results to consider
495 |                 x = x[np.argsort(-fitness(x))][:n]  # top n mutations
496 |                 w = fitness(x) - fitness(x).min()  # weights
497 |                 if parent == 'single' or len(x) == 1:
498 |                     # x = x[random.randint(0, n - 1)]  # random selection
499 |                     x = x[random.choices(range(n), weights=w)[0]]  # weighted selection
500 |                 elif parent == 'weighted':
501 |                     x = (x * w.reshape(n, 1)).sum(0) / w.sum()  # weighted combination
502 | 
503 |                 # Mutate
504 |                 method, mp, s = 3, 0.9, 0.2  # method, mutation probability, sigma
505 |                 npr = np.random
506 |                 npr.seed(int(time.time()))
507 |                 g = np.array([1, 1, 1, 1, 1, 1, 1, 0, .1, 1, 0, 1, 1, 1, 1, 1, 1, 1])  # gains
508 |                 ng = len(g)
509 |                 if method == 1:
510 |                     v = (npr.randn(ng) * npr.random() * g * s + 1) ** 2.0
511 |                 elif method == 2:
512 |                     v = (npr.randn(ng) * npr.random(ng) * g * s + 1) ** 2.0
513 |                 elif method == 3:
514 |                     v = np.ones(ng)
515 |                     while all(v == 1):  # mutate until a change occurs (prevent duplicates)
516 |                         # v = (g * (npr.random(ng) < mp) * npr.randn(ng) * s + 1) ** 2.0
517 |                         v = (g * (npr.random(ng) < mp) * npr.randn(ng) * npr.random() * s + 1).clip(0.3, 3.0)
518 |                 for i, k in enumerate(hyp.keys()):  # plt.hist(v.ravel(), 300)
519 |                     hyp[k] = x[i + 7] * v[i]  # mutate
520 | 
521 |             # Clip to limits
522 |             keys = ['lr0', 'iou_t', 'momentum', 'weight_decay', 'hsv_s', 'hsv_v', 'translate', 'scale', 'fl_gamma']
523 |             limits = [(1e-5, 1e-2), (0.00, 0.70), (0.60, 0.98), (0, 0.001), (0, .9), (0, .9), (0, .9), (0, .9), (0, 3)]
524 |             for k, v in zip(keys, limits):
525 |                 hyp[k] = np.clip(hyp[k], v[0], v[1])
526 | 
527 |             # Train mutation
528 |             results = train(hyp.copy())
529 | 
530 |             # Write mutation results
531 |             print_mutation(hyp, results, opt.bucket)
532 | 
533 |             # Plot results
534 |             # plot_evolution_results(hyp)
535 | 


--------------------------------------------------------------------------------
/utils/__init__.py:
--------------------------------------------------------------------------------
1 |  
2 | 


--------------------------------------------------------------------------------
/utils/adabound.py:
--------------------------------------------------------------------------------
  1 | import math
  2 | 
  3 | import torch
  4 | from torch.optim.optimizer import Optimizer
  5 | 
  6 | 
  7 | class AdaBound(Optimizer):
  8 |     """Implements AdaBound algorithm.
  9 |     It has been proposed in `Adaptive Gradient Methods with Dynamic Bound of Learning Rate`_.
 10 |     Arguments:
 11 |         params (iterable): iterable of parameters to optimize or dicts defining
 12 |             parameter groups
 13 |         lr (float, optional): Adam learning rate (default: 1e-3)
 14 |         betas (Tuple[float, float], optional): coefficients used for computing
 15 |             running averages of gradient and its square (default: (0.9, 0.999))
 16 |         final_lr (float, optional): final (SGD) learning rate (default: 0.1)
 17 |         gamma (float, optional): convergence speed of the bound functions (default: 1e-3)
 18 |         eps (float, optional): term added to the denominator to improve
 19 |             numerical stability (default: 1e-8)
 20 |         weight_decay (float, optional): weight decay (L2 penalty) (default: 0)
 21 |         amsbound (boolean, optional): whether to use the AMSBound variant of this algorithm
 22 |     .. Adaptive Gradient Methods with Dynamic Bound of Learning Rate:
 23 |         https://openreview.net/forum?id=Bkg3g2R9FX
 24 |     """
 25 | 
 26 |     def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), final_lr=0.1, gamma=1e-3,
 27 |                  eps=1e-8, weight_decay=0, amsbound=False):
 28 |         if not 0.0 <= lr:
 29 |             raise ValueError("Invalid learning rate: {}".format(lr))
 30 |         if not 0.0 <= eps:
 31 |             raise ValueError("Invalid epsilon value: {}".format(eps))
 32 |         if not 0.0 <= betas[0] < 1.0:
 33 |             raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0]))
 34 |         if not 0.0 <= betas[1] < 1.0:
 35 |             raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1]))
 36 |         if not 0.0 <= final_lr:
 37 |             raise ValueError("Invalid final learning rate: {}".format(final_lr))
 38 |         if not 0.0 <= gamma < 1.0:
 39 |             raise ValueError("Invalid gamma parameter: {}".format(gamma))
 40 |         defaults = dict(lr=lr, betas=betas, final_lr=final_lr, gamma=gamma, eps=eps,
 41 |                         weight_decay=weight_decay, amsbound=amsbound)
 42 |         super(AdaBound, self).__init__(params, defaults)
 43 | 
 44 |         self.base_lrs = list(map(lambda group: group['lr'], self.param_groups))
 45 | 
 46 |     def __setstate__(self, state):
 47 |         super(AdaBound, self).__setstate__(state)
 48 |         for group in self.param_groups:
 49 |             group.setdefault('amsbound', False)
 50 | 
 51 |     def step(self, closure=None):
 52 |         """Performs a single optimization step.
 53 |         Arguments:
 54 |             closure (callable, optional): A closure that reevaluates the model
 55 |                 and returns the loss.
 56 |         """
 57 |         loss = None
 58 |         if closure is not None:
 59 |             loss = closure()
 60 | 
 61 |         for group, base_lr in zip(self.param_groups, self.base_lrs):
 62 |             for p in group['params']:
 63 |                 if p.grad is None:
 64 |                     continue
 65 |                 grad = p.grad.data
 66 |                 if grad.is_sparse:
 67 |                     raise RuntimeError(
 68 |                         'Adam does not support sparse gradients, please consider SparseAdam instead')
 69 |                 amsbound = group['amsbound']
 70 | 
 71 |                 state = self.state[p]
 72 | 
 73 |                 # State initialization
 74 |                 if len(state) == 0:
 75 |                     state['step'] = 0
 76 |                     # Exponential moving average of gradient values
 77 |                     state['exp_avg'] = torch.zeros_like(p.data)
 78 |                     # Exponential moving average of squared gradient values
 79 |                     state['exp_avg_sq'] = torch.zeros_like(p.data)
 80 |                     if amsbound:
 81 |                         # Maintains max of all exp. moving avg. of sq. grad. values
 82 |                         state['max_exp_avg_sq'] = torch.zeros_like(p.data)
 83 | 
 84 |                 exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
 85 |                 if amsbound:
 86 |                     max_exp_avg_sq = state['max_exp_avg_sq']
 87 |                 beta1, beta2 = group['betas']
 88 | 
 89 |                 state['step'] += 1
 90 | 
 91 |                 if group['weight_decay'] != 0:
 92 |                     grad = grad.add(group['weight_decay'], p.data)
 93 | 
 94 |                 # Decay the first and second moment running average coefficient
 95 |                 exp_avg.mul_(beta1).add_(1 - beta1, grad)
 96 |                 exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
 97 |                 if amsbound:
 98 |                     # Maintains the maximum of all 2nd moment running avg. till now
 99 |                     torch.max(max_exp_avg_sq, exp_avg_sq, out=max_exp_avg_sq)
100 |                     # Use the max. for normalizing running avg. of gradient
101 |                     denom = max_exp_avg_sq.sqrt().add_(group['eps'])
102 |                 else:
103 |                     denom = exp_avg_sq.sqrt().add_(group['eps'])
104 | 
105 |                 bias_correction1 = 1 - beta1 ** state['step']
106 |                 bias_correction2 = 1 - beta2 ** state['step']
107 |                 step_size = group['lr'] * math.sqrt(bias_correction2) / bias_correction1
108 | 
109 |                 # Applies bounds on actual learning rate
110 |                 # lr_scheduler cannot affect final_lr, this is a workaround to apply lr decay
111 |                 final_lr = group['final_lr'] * group['lr'] / base_lr
112 |                 lower_bound = final_lr * (1 - 1 / (group['gamma'] * state['step'] + 1))
113 |                 upper_bound = final_lr * (1 + 1 / (group['gamma'] * state['step']))
114 |                 step_size = torch.full_like(denom, step_size)
115 |                 step_size.div_(denom).clamp_(lower_bound, upper_bound).mul_(exp_avg)
116 | 
117 |                 p.data.add_(-step_size)
118 | 
119 |         return loss
120 | 
121 | 
122 | class AdaBoundW(Optimizer):
123 |     """Implements AdaBound algorithm with Decoupled Weight Decay (arxiv.org/abs/1711.05101)
124 |     It has been proposed in `Adaptive Gradient Methods with Dynamic Bound of Learning Rate`_.
125 |     Arguments:
126 |         params (iterable): iterable of parameters to optimize or dicts defining
127 |             parameter groups
128 |         lr (float, optional): Adam learning rate (default: 1e-3)
129 |         betas (Tuple[float, float], optional): coefficients used for computing
130 |             running averages of gradient and its square (default: (0.9, 0.999))
131 |         final_lr (float, optional): final (SGD) learning rate (default: 0.1)
132 |         gamma (float, optional): convergence speed of the bound functions (default: 1e-3)
133 |         eps (float, optional): term added to the denominator to improve
134 |             numerical stability (default: 1e-8)
135 |         weight_decay (float, optional): weight decay (L2 penalty) (default: 0)
136 |         amsbound (boolean, optional): whether to use the AMSBound variant of this algorithm
137 |     .. Adaptive Gradient Methods with Dynamic Bound of Learning Rate:
138 |         https://openreview.net/forum?id=Bkg3g2R9FX
139 |     """
140 | 
141 |     def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), final_lr=0.1, gamma=1e-3,
142 |                  eps=1e-8, weight_decay=0, amsbound=False):
143 |         if not 0.0 <= lr:
144 |             raise ValueError("Invalid learning rate: {}".format(lr))
145 |         if not 0.0 <= eps:
146 |             raise ValueError("Invalid epsilon value: {}".format(eps))
147 |         if not 0.0 <= betas[0] < 1.0:
148 |             raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0]))
149 |         if not 0.0 <= betas[1] < 1.0:
150 |             raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1]))
151 |         if not 0.0 <= final_lr:
152 |             raise ValueError("Invalid final learning rate: {}".format(final_lr))
153 |         if not 0.0 <= gamma < 1.0:
154 |             raise ValueError("Invalid gamma parameter: {}".format(gamma))
155 |         defaults = dict(lr=lr, betas=betas, final_lr=final_lr, gamma=gamma, eps=eps,
156 |                         weight_decay=weight_decay, amsbound=amsbound)
157 |         super(AdaBoundW, self).__init__(params, defaults)
158 | 
159 |         self.base_lrs = list(map(lambda group: group['lr'], self.param_groups))
160 | 
161 |     def __setstate__(self, state):
162 |         super(AdaBoundW, self).__setstate__(state)
163 |         for group in self.param_groups:
164 |             group.setdefault('amsbound', False)
165 | 
166 |     def step(self, closure=None):
167 |         """Performs a single optimization step.
168 |         Arguments:
169 |             closure (callable, optional): A closure that reevaluates the model
170 |                 and returns the loss.
171 |         """
172 |         loss = None
173 |         if closure is not None:
174 |             loss = closure()
175 | 
176 |         for group, base_lr in zip(self.param_groups, self.base_lrs):
177 |             for p in group['params']:
178 |                 if p.grad is None:
179 |                     continue
180 |                 grad = p.grad.data
181 |                 if grad.is_sparse:
182 |                     raise RuntimeError(
183 |                         'Adam does not support sparse gradients, please consider SparseAdam instead')
184 |                 amsbound = group['amsbound']
185 | 
186 |                 state = self.state[p]
187 | 
188 |                 # State initialization
189 |                 if len(state) == 0:
190 |                     state['step'] = 0
191 |                     # Exponential moving average of gradient values
192 |                     state['exp_avg'] = torch.zeros_like(p.data)
193 |                     # Exponential moving average of squared gradient values
194 |                     state['exp_avg_sq'] = torch.zeros_like(p.data)
195 |                     if amsbound:
196 |                         # Maintains max of all exp. moving avg. of sq. grad. values
197 |                         state['max_exp_avg_sq'] = torch.zeros_like(p.data)
198 | 
199 |                 exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
200 |                 if amsbound:
201 |                     max_exp_avg_sq = state['max_exp_avg_sq']
202 |                 beta1, beta2 = group['betas']
203 | 
204 |                 state['step'] += 1
205 | 
206 |                 # Decay the first and second moment running average coefficient
207 |                 exp_avg.mul_(beta1).add_(1 - beta1, grad)
208 |                 exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
209 |                 if amsbound:
210 |                     # Maintains the maximum of all 2nd moment running avg. till now
211 |                     torch.max(max_exp_avg_sq, exp_avg_sq, out=max_exp_avg_sq)
212 |                     # Use the max. for normalizing running avg. of gradient
213 |                     denom = max_exp_avg_sq.sqrt().add_(group['eps'])
214 |                 else:
215 |                     denom = exp_avg_sq.sqrt().add_(group['eps'])
216 | 
217 |                 bias_correction1 = 1 - beta1 ** state['step']
218 |                 bias_correction2 = 1 - beta2 ** state['step']
219 |                 step_size = group['lr'] * math.sqrt(bias_correction2) / bias_correction1
220 | 
221 |                 # Applies bounds on actual learning rate
222 |                 # lr_scheduler cannot affect final_lr, this is a workaround to apply lr decay
223 |                 final_lr = group['final_lr'] * group['lr'] / base_lr
224 |                 lower_bound = final_lr * (1 - 1 / (group['gamma'] * state['step'] + 1))
225 |                 upper_bound = final_lr * (1 + 1 / (group['gamma'] * state['step']))
226 |                 step_size = torch.full_like(denom, step_size)
227 |                 step_size.div_(denom).clamp_(lower_bound, upper_bound).mul_(exp_avg)
228 | 
229 |                 if group['weight_decay'] != 0:
230 |                     decayed_weights = torch.mul(p.data, group['weight_decay'])
231 |                     p.data.add_(-step_size)
232 |                     p.data.sub_(decayed_weights)
233 |                 else:
234 |                     p.data.add_(-step_size)
235 | 
236 |         return loss
237 | 


--------------------------------------------------------------------------------
/utils/evolve.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | #for i in 0 1 2 3
 3 | #do
 4 | #  t=ultralytics/yolov3:v139 && sudo docker pull $t && sudo nvidia-docker run -d --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t utils/evolve.sh $i
 5 | #  sleep 30
 6 | #done
 7 | 
 8 | while true; do
 9 |   # python3 train.py --data ../data/sm4/out.data --img-size 320 --epochs 100 --batch 64 --accum 1 --weights yolov3-tiny.conv.15 --multi --bucket ult/wer --evolve --cache --device $1 --cfg yolov3-tiny3-1cls.cfg --single --adam
10 |   # python3 train.py --data ../out/data.data --img-size 608 --epochs 10 --batch 8 --accum 8 --weights ultralytics68.pt --multi --bucket ult/athena --evolve --device $1 --cfg yolov3-spp-1cls.cfg
11 | 
12 |   python3 train.py --data coco2014.data --img-size 512 608 --epochs 27 --batch 8 --accum 8 --evolve --weights '' --bucket ult/coco/sppa_512 --device $1 --cfg yolov3-sppa.cfg --multi
13 | done
14 | 
15 | 
16 | # coco epoch times --img-size 416 608 --epochs 27 --batch 16 --accum 4
17 | # 36:34 2080ti
18 | # 21:58 V100
19 | # 63:00 T4


--------------------------------------------------------------------------------
/utils/gcp.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | # New VM
 4 | rm -rf sample_data yolov3
 5 | git clone https://github.com/ultralytics/yolov3
 6 | # git clone -b test --depth 1 https://github.com/ultralytics/yolov3 test  # branch
 7 | # sudo apt-get install zip
 8 | #git clone https://github.com/NVIDIA/apex && cd apex && pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" . --user && cd .. && rm -rf apex
 9 | sudo conda install -yc conda-forge scikit-image pycocotools
10 | # python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('193Zp_ye-3qXMonR1nZj3YyxMtQkMy50k','coco2014.zip')"
11 | python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('1WQT6SOktSe8Uw6r10-2JhbEhMY5DJaph','coco2017.zip')"
12 | python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('1C3HewOG9akA3y456SZLBJZfNDPkBwAto','knife.zip')"
13 | python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('13g3LqdpkNE8sPosVJT6KFXlfoMypzRP4','sm4.zip')"
14 | sudo shutdown
15 | 
16 | # Mount local SSD
17 | lsblk
18 | sudo mkfs.ext4 -F /dev/nvme0n1
19 | sudo mkdir -p /mnt/disks/nvme0n1
20 | sudo mount /dev/nvme0n1 /mnt/disks/nvme0n1
21 | sudo chmod a+w /mnt/disks/nvme0n1
22 | cp -r coco /mnt/disks/nvme0n1
23 | 
24 | # Kill All
25 | t=ultralytics/yolov3:v1
26 | docker kill $(docker ps -a -q --filter ancestor=$t)
27 | 
28 | # Evolve coco
29 | sudo -s
30 | t=ultralytics/yolov3:evolve
31 | # docker kill $(docker ps -a -q --filter ancestor=$t)
32 | for i in 0 1 6 7
33 | do
34 |   docker pull $t && docker run --gpus all -d --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t bash utils/evolve.sh $i
35 |   sleep 30
36 | done
37 | 
38 | #COCO training
39 | n=131 && t=ultralytics/coco:v131 && sudo docker pull $t && sudo docker run -it --gpus all --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t python3 train.py --data coco2014.data --img-size 320 640 --epochs 300 --batch 16 --weights '' --device 0 --cfg yolov3-spp.cfg --bucket ult/coco --name $n && sudo shutdown
40 | n=132 && t=ultralytics/coco:v131 && sudo docker pull $t && sudo docker run -it --gpus all --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t python3 train.py --data coco2014.data --img-size 320 640 --epochs 300 --batch 64 --weights '' --device 0 --cfg yolov3-tiny.cfg --bucket ult/coco --name $n && sudo shutdown
41 | 


--------------------------------------------------------------------------------
/utils/google_utils.py:
--------------------------------------------------------------------------------
 1 | # This file contains google utils: https://cloud.google.com/storage/docs/reference/libraries
 2 | # pip install --upgrade google-cloud-storage
 3 | 
 4 | import os
 5 | import time
 6 | 
 7 | 
 8 | # from google.cloud import storage
 9 | 
10 | 
11 | def gdrive_download(id='1HaXkef9z6y5l4vUnCYgdmEAj61c6bfWO', name='coco.zip'):
12 |     # https://gist.github.com/tanaikech/f0f2d122e05bf5f971611258c22c110f
13 |     # Downloads a file from Google Drive, accepting presented query
14 |     # from utils.google_utils import *; gdrive_download()
15 |     t = time.time()
16 | 
17 |     print('Downloading https://drive.google.com/uc?export=download&id=%s as %s... ' % (id, name), end='')
18 |     os.remove(name) if os.path.exists(name) else None  # remove existing
19 |     os.remove('cookie') if os.path.exists('cookie') else None
20 | 
21 |     # Attempt file download
22 |     os.system("curl -c ./cookie -s -L \"https://drive.google.com/uc?export=download&id=%s\" > /dev/null" % id)
23 |     if os.path.exists('cookie'):  # large file
24 |         s = "curl -Lb ./cookie \"https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=%s\" -o %s" % (
25 |             id, name)
26 |     else:  # small file
27 |         s = "curl -s -L -o %s 'https://drive.google.com/uc?export=download&id=%s'" % (name, id)
28 |     r = os.system(s)  # execute, capture return values
29 |     os.remove('cookie') if os.path.exists('cookie') else None
30 | 
31 |     # Error check
32 |     if r != 0:
33 |         os.remove(name) if os.path.exists(name) else None  # remove partial
34 |         print('Download error ')  # raise Exception('Download error')
35 |         return r
36 | 
37 |     # Unzip if archive
38 |     if name.endswith('.zip'):
39 |         print('unzipping... ', end='')
40 |         os.system('unzip -q %s' % name)  # unzip
41 |         os.remove(name)  # remove zip to free space
42 | 
43 |     print('Done (%.1fs)' % (time.time() - t))
44 |     return r
45 | 
46 | 
47 | def upload_blob(bucket_name, source_file_name, destination_blob_name):
48 |     # Uploads a file to a bucket
49 |     # https://cloud.google.com/storage/docs/uploading-objects#storage-upload-object-python
50 | 
51 |     storage_client = storage.Client()
52 |     bucket = storage_client.get_bucket(bucket_name)
53 |     blob = bucket.blob(destination_blob_name)
54 | 
55 |     blob.upload_from_filename(source_file_name)
56 | 
57 |     print('File {} uploaded to {}.'.format(
58 |         source_file_name,
59 |         destination_blob_name))
60 | 
61 | 
62 | def download_blob(bucket_name, source_blob_name, destination_file_name):
63 |     # Uploads a blob from a bucket
64 |     storage_client = storage.Client()
65 |     bucket = storage_client.get_bucket(bucket_name)
66 |     blob = bucket.blob(source_blob_name)
67 | 
68 |     blob.download_to_filename(destination_file_name)
69 | 
70 |     print('Blob {} downloaded to {}.'.format(
71 |         source_blob_name,
72 |         destination_file_name))
73 | 


--------------------------------------------------------------------------------
/utils/keepgit:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/utils/layers.py:
--------------------------------------------------------------------------------
  1 | import torch.nn.functional as F
  2 | 
  3 | from utils.utils import *
  4 | 
  5 | 
  6 | def make_divisible(v, divisor):
  7 |     # Function ensures all layers have a channel number that is divisible by 8
  8 |     # https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
  9 |     return math.ceil(v / divisor) * divisor
 10 | 
 11 | 
 12 | class Flatten(nn.Module):
 13 |     # Use after nn.AdaptiveAvgPool2d(1) to remove last 2 dimensions
 14 |     def forward(self, x):
 15 |         return x.view(x.size(0), -1)
 16 | 
 17 | 
 18 | class Concat(nn.Module):
 19 |     # Concatenate a list of tensors along dimension
 20 |     def __init__(self, dimension=1):
 21 |         super(Concat, self).__init__()
 22 |         self.d = dimension
 23 | 
 24 |     def forward(self, x):
 25 |         return torch.cat(x, self.d)
 26 | 
 27 | 
 28 | class FeatureConcat(nn.Module):
 29 |     def __init__(self, layers):
 30 |         super(FeatureConcat, self).__init__()
 31 |         self.layers = layers  # layer indices
 32 |         self.multiple = len(layers) > 1  # multiple layers flag
 33 | 
 34 |     def forward(self, x, outputs):
 35 |         return torch.cat([outputs[i] for i in self.layers], 1) if self.multiple else outputs[self.layers[0]]
 36 | 
 37 | 
 38 | class WeightedFeatureFusion(nn.Module):  # weighted sum of 2 or more layers https://arxiv.org/abs/1911.09070
 39 |     def __init__(self, layers, weight=False):
 40 |         super(WeightedFeatureFusion, self).__init__()
 41 |         self.layers = layers  # layer indices
 42 |         self.weight = weight  # apply weights boolean
 43 |         self.n = len(layers) + 1  # number of layers
 44 |         if weight:
 45 |             self.w = nn.Parameter(torch.zeros(self.n), requires_grad=True)  # layer weights
 46 | 
 47 |     def forward(self, x, outputs):
 48 |         # Weights
 49 |         if self.weight:
 50 |             w = torch.sigmoid(self.w) * (2 / self.n)  # sigmoid weights (0-1)
 51 |             x = x * w[0]
 52 | 
 53 |         # Fusion
 54 |         nx = x.shape[1]  # input channels
 55 |         for i in range(self.n - 1):
 56 |             a = outputs[self.layers[i]] * w[i + 1] if self.weight else outputs[self.layers[i]]  # feature to add
 57 |             na = a.shape[1]  # feature channels
 58 | 
 59 |             # Adjust channels
 60 |             if nx == na:  # same shape
 61 |                 x = x + a
 62 |             elif nx > na:  # slice input
 63 |                 x[:, :na] = x[:, :na] + a  # or a = nn.ZeroPad2d((0, 0, 0, 0, 0, dc))(a); x = x + a
 64 |             else:  # slice feature
 65 |                 x = x + a[:, :nx]
 66 | 
 67 |         return x
 68 | 
 69 | 
 70 | class MixConv2d(nn.Module):  # MixConv: Mixed Depthwise Convolutional Kernels https://arxiv.org/abs/1907.09595
 71 |     def __init__(self, in_ch, out_ch, k=(3, 5, 7), stride=1, dilation=1, bias=True, method='equal_params'):
 72 |         super(MixConv2d, self).__init__()
 73 | 
 74 |         groups = len(k)
 75 |         if method == 'equal_ch':  # equal channels per group
 76 |             i = torch.linspace(0, groups - 1E-6, out_ch).floor()  # out_ch indices
 77 |             ch = [(i == g).sum() for g in range(groups)]
 78 |         else:  # 'equal_params': equal parameter count per group
 79 |             b = [out_ch] + [0] * groups
 80 |             a = np.eye(groups + 1, groups, k=-1)
 81 |             a -= np.roll(a, 1, axis=1)
 82 |             a *= np.array(k) ** 2
 83 |             a[0] = 1
 84 |             ch = np.linalg.lstsq(a, b, rcond=None)[0].round().astype(int)  # solve for equal weight indices, ax = b
 85 | 
 86 |         self.m = nn.ModuleList([nn.Conv2d(in_channels=in_ch,
 87 |                                           out_channels=ch[g],
 88 |                                           kernel_size=k[g],
 89 |                                           stride=stride,
 90 |                                           padding=k[g] // 2,  # 'same' pad
 91 |                                           dilation=dilation,
 92 |                                           bias=bias) for g in range(groups)])
 93 | 
 94 |     def forward(self, x):
 95 |         return torch.cat([m(x) for m in self.m], 1)
 96 | 
 97 | 
 98 | # Activation functions below -------------------------------------------------------------------------------------------
 99 | class SwishImplementation(torch.autograd.Function):
100 |     @staticmethod
101 |     def forward(ctx, x):
102 |         ctx.save_for_backward(x)
103 |         return x * torch.sigmoid(x)
104 | 
105 |     @staticmethod
106 |     def backward(ctx, grad_output):
107 |         x = ctx.saved_tensors[0]
108 |         sx = torch.sigmoid(x)  # sigmoid(ctx)
109 |         return grad_output * (sx * (1 + x * (1 - sx)))
110 | 
111 | 
112 | class MishImplementation(torch.autograd.Function):
113 |     @staticmethod
114 |     def forward(ctx, x):
115 |         ctx.save_for_backward(x)
116 |         return x.mul(torch.tanh(F.softplus(x)))  # x * tanh(ln(1 + exp(x)))
117 | 
118 |     @staticmethod
119 |     def backward(ctx, grad_output):
120 |         x = ctx.saved_tensors[0]
121 |         sx = torch.sigmoid(x)
122 |         fx = F.softplus(x).tanh()
123 |         return grad_output * (fx + x * sx * (1 - fx * fx))
124 | 
125 | 
126 | class MemoryEfficientSwish(nn.Module):
127 |     def forward(self, x):
128 |         return SwishImplementation.apply(x)
129 | 
130 | 
131 | class MemoryEfficientMish(nn.Module):
132 |     def forward(self, x):
133 |         return MishImplementation.apply(x)
134 | 
135 | 
136 | class Swish(nn.Module):
137 |     def forward(self, x):
138 |         return x * torch.sigmoid(x)
139 | 
140 | 
141 | class HardSwish(nn.Module):  # https://arxiv.org/pdf/1905.02244.pdf
142 |     def forward(self, x):
143 |         return x * F.hardtanh(x + 3, 0., 6., True) / 6.
144 | 
145 | 
146 | class Mish(nn.Module):  # https://github.com/digantamisra98/Mish
147 |     def forward(self, x):
148 |         return x * F.softplus(x).tanh()
149 | 


--------------------------------------------------------------------------------
/utils/parse_config.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | 
 3 | import numpy as np
 4 | 
 5 | 
 6 | def parse_model_cfg(path):
 7 |     # Parse the yolo *.cfg file and return module definitions path may be 'cfg/yolov3.cfg', 'yolov3.cfg', or 'yolov3'
 8 |     if not path.endswith('.cfg'):  # add .cfg suffix if omitted
 9 |         path += '.cfg'
10 |     if not os.path.exists(path) and os.path.exists('cfg' + os.sep + path):  # add cfg/ prefix if omitted
11 |         path = 'cfg' + os.sep + path
12 | 
13 |     with open(path, 'r') as f:
14 |         lines = f.read().split('\n')
15 |     lines = [x for x in lines if x and not x.startswith('#')]
16 |     lines = [x.rstrip().lstrip() for x in lines]  # get rid of fringe whitespaces
17 |     mdefs = []  # module definitions
18 |     for line in lines:
19 |         if line.startswith('['):  # This marks the start of a new block
20 |             mdefs.append({})
21 |             mdefs[-1]['type'] = line[1:-1].rstrip()
22 |             if mdefs[-1]['type'] == 'convolutional' or mdefs[-1]['type'] == 'quan_convolutional':
23 |                 mdefs[-1]['batch_normalize'] = 0  # pre-populate with zeros (may be overwritten later)
24 |         else:
25 |             key, val = line.split("=")
26 |             key = key.rstrip()
27 | 
28 |             if key == 'anchors':  # return nparray
29 |                 mdefs[-1][key] = np.array([float(x) for x in val.split(',')]).reshape((-1, 2))  # np anchors
30 |             elif (key in ['from', 'layers', 'mask']) or (key == 'size' and ',' in val):  # return array
31 |                 mdefs[-1][key] = [int(x) for x in val.split(',')]
32 |             else:
33 |                 val = val.strip()
34 |                 if val.isnumeric():  # return int or float
35 |                     mdefs[-1][key] = int(val) if (int(val) - float(val)) == 0 else float(val)
36 |                 else:
37 |                     mdefs[-1][key] = val  # return string
38 | 
39 |     # Check all fields are supported
40 |     supported = ['type', 'batch_normalize', 'filters', 'size', 'stride', 'pad', 'activation', 'layers', 'groups',
41 |                  'from', 'mask', 'anchors', 'classes', 'num', 'jitter', 'ignore_thresh', 'truth_thresh', 'random',
42 |                  'stride_x', 'stride_y', 'weights_type', 'weights_normalization', 'scale_x_y', 'beta_nms', 'nms_kind',
43 |                  'iou_loss', 'iou_normalizer', 'cls_normalizer', 'iou_thresh', 'group', 'reduction','first']
44 | 
45 |     f = []  # fields
46 |     for x in mdefs[1:]:
47 |         [f.append(k) for k in x if k not in f]
48 |     u = [x for x in f if x not in supported]  # unsupported fields
49 |     assert not any(u), "Unsupported fields %s in %s. See https://github.com/ultralytics/yolov3/issues/631" % (u, path)
50 | 
51 |     return mdefs
52 | 
53 | 
54 | def parse_data_cfg(path):
55 |     # Parses the data configuration file
56 |     if not os.path.exists(path) and os.path.exists('data' + os.sep + path):  # add data/ prefix if omitted
57 |         path = 'data' + os.sep + path
58 | 
59 |     with open(path, 'r') as f:
60 |         lines = f.readlines()
61 | 
62 |     options = dict()
63 |     for line in lines:
64 |         line = line.strip()
65 |         if line == '' or line.startswith('#'):
66 |             continue
67 |         key, val = line.split('=')
68 |         options[key.strip()] = val.strip()
69 | 
70 |     return options
71 | 


--------------------------------------------------------------------------------
/utils/prune_utils.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | from terminaltables import AsciiTable
  3 | from copy import deepcopy
  4 | import numpy as np
  5 | import torch.nn.functional as F
  6 | 
  7 | 
  8 | def get_sr_flag(epoch, sr):
  9 |     # return epoch >= 5 and sr
 10 |     return sr
 11 | 
 12 | def parse_module_defs3(module_defs):
 13 | 
 14 |     CBL_idx = []
 15 |     Conv_idx = []
 16 |     for i, module_def in enumerate(module_defs):
 17 |         if module_def['type'] == 'convolutional':
 18 |             if module_def['batch_normalize'] == '1':
 19 |                 CBL_idx.append(i)
 20 |             else:
 21 |                 Conv_idx.append(i)
 22 | 
 23 |     ignore_idx = set()
 24 | 
 25 |     ignore_idx.add(18)
 26 |     
 27 | 
 28 |     prune_idx = [idx for idx in CBL_idx if idx not in ignore_idx]
 29 | 
 30 |     return CBL_idx, Conv_idx, prune_idx
 31 |     
 32 | def parse_module_defs2(module_defs):
 33 | 
 34 |     CBL_idx = []
 35 |     Conv_idx = []
 36 |     shortcut_idx=dict()
 37 |     shortcut_all=set()
 38 |     for i, module_def in enumerate(module_defs):
 39 |         if module_def['type'] == 'convolutional':
 40 |             if module_def['batch_normalize'] == '1':
 41 |                 CBL_idx.append(i)
 42 |             else:
 43 |                 Conv_idx.append(i)
 44 | 
 45 |     ignore_idx = set()
 46 |     for i, module_def in enumerate(module_defs):
 47 |         if module_def['type'] == 'shortcut':
 48 |             identity_idx = (i + int(module_def['from']))
 49 |             if module_defs[identity_idx]['type'] == 'convolutional':
 50 |                 
 51 |                 #ignore_idx.add(identity_idx)
 52 |                 shortcut_idx[i-1]=identity_idx
 53 |                 shortcut_all.add(identity_idx)
 54 |             elif module_defs[identity_idx]['type'] == 'shortcut':
 55 |                 
 56 |                 #ignore_idx.add(identity_idx - 1)
 57 |                 shortcut_idx[i-1]=identity_idx-1
 58 |                 shortcut_all.add(identity_idx-1)
 59 |             shortcut_all.add(i-1)
 60 |     #上采样层前的卷积层不裁剪
 61 |     ignore_idx.add(84)
 62 |     ignore_idx.add(96)
 63 | 
 64 |     prune_idx = [idx for idx in CBL_idx if idx not in ignore_idx]
 65 | 
 66 |     return CBL_idx, Conv_idx, prune_idx,shortcut_idx,shortcut_all
 67 | 
 68 | def parse_module_defs(module_defs):
 69 | 
 70 |     CBL_idx = []
 71 |     Conv_idx = []
 72 |     for i, module_def in enumerate(module_defs):
 73 |         if i > 139:
 74 |             if module_def['type'] == 'convolutional':
 75 |                 if module_def['batch_normalize'] == 1:
 76 |                     CBL_idx.append(i)
 77 |                 else:
 78 |                     Conv_idx.append(i)
 79 |     ignore_idx = set()
 80 |     #for i, module_def in enumerate(module_defs):
 81 |         #if module_def['type'] == 'shortcut':
 82 |             #ignore_idx.add(i-1)
 83 |             #identity_idx = (i + int(module_def['from']))
 84 |             #if module_defs[identity_idx]['type'] == 'convolutional':
 85 |                 #ignore_idx.add(identity_idx)
 86 |             #elif module_defs[identity_idx]['type'] == 'shortcut':
 87 |                 #ignore_idx.add(identity_idx - 1)
 88 |     #上采样层前的卷积层不裁剪
 89 |     ignore_idx.add(149)
 90 |     ignore_idx.add(161)
 91 | 
 92 |     prune_idx = [idx for idx in CBL_idx if idx not in ignore_idx]
 93 | 
 94 |     return CBL_idx, Conv_idx, prune_idx
 95 | 
 96 | 
 97 | def gather_bn_weights(module_list, prune_idx):
 98 | 
 99 |     size_list = [module_list[idx][1].weight.data.shape[0] for idx in prune_idx]
100 | 
101 |     bn_weights = torch.zeros(sum(size_list))
102 |     index = 0
103 |     for idx, size in zip(prune_idx, size_list):
104 |         bn_weights[index:(index + size)] = module_list[idx][1].weight.data.abs().clone()
105 |         index += size
106 | 
107 |     return bn_weights
108 | 
109 | 
110 | def write_cfg(cfg_file, module_defs):
111 | 
112 |     with open(cfg_file, 'w') as f:
113 |         for module_def in module_defs:
114 |             f.write(f"[{module_def['type']}]\n")
115 |             for key, value in module_def.items():
116 |                 if key != 'type':
117 |                     f.write(f"{key}={value}\n")
118 |             f.write("\n")
119 |     return cfg_file
120 | 
121 | 
122 | class BNOptimizer():
123 | 
124 |     @staticmethod
125 |     def updateBN(sr_flag, module_list, s, prune_idx):
126 |         if sr_flag:
127 |             for idx in prune_idx:
128 |                 # Squential(Conv, BN, Lrelu)
129 |                 bn_module = module_list[idx][1]
130 |                 bn_module.weight.grad.data.add_(s * torch.sign(bn_module.weight.data))  # L1
131 | 
132 | 
133 | def obtain_quantiles(bn_weights, num_quantile=5):
134 | 
135 |     sorted_bn_weights, i = torch.sort(bn_weights)
136 |     total = sorted_bn_weights.shape[0]
137 |     quantiles = sorted_bn_weights.tolist()[-1::-total//num_quantile][::-1]
138 |     print("\nBN weights quantile:")
139 |     quantile_table = [
140 |         [f'{i}/{num_quantile}' for i in range(1, num_quantile+1)],
141 |         ["%.3f" % quantile for quantile in quantiles]
142 |     ]
143 |     print(AsciiTable(quantile_table).table)
144 | 
145 |     return quantiles
146 | 
147 | 
148 | def get_input_mask(module_defs, idx, CBLidx2mask):
149 | 
150 |     if idx == 140:
151 |         return np.ones(960)
152 |     
153 |     elif module_defs[idx - 1]['type'] == 'convolutional':
154 |         return CBLidx2mask[idx - 1]
155 |     elif module_defs[idx - 1]['type'] == 'shortcut':
156 |         return CBLidx2mask[idx - 2]
157 |     elif module_defs[idx - 1]['type'] == 'route':
158 |         route_in_idxs = []
159 |         for layer_i in module_defs[idx - 1]['layers']:
160 |             if int(layer_i) < 0:
161 |                 route_in_idxs.append(idx - 1 + int(layer_i))
162 |             else:
163 |                 route_in_idxs.append(int(layer_i))
164 |         print(route_in_idxs)
165 |         if len(route_in_idxs) == 1:
166 |             return CBLidx2mask[route_in_idxs[0]]
167 |         elif len(route_in_idxs) == 2:
168 |             if 96 in route_in_idxs:
169 |                 #return np.concatenate([CBLidx2mask[in_idx - 1] for in_idx in route_in_idxs])
170 |                 return np.concatenate([np.ones(112), CBLidx2mask[149]])
171 |             elif 45 in route_in_idxs:
172 |                 return np.concatenate([np.ones(40), CBLidx2mask[161]])
173 |             
174 |         else:
175 |             print("Something wrong with route module!")
176 |             raise Exception
177 | 
178 | 
179 | def init_weights_from_loose_model(compact_model, loose_model, CBL_idx, Conv_idx, CBLidx2mask):
180 | 
181 |     for idx in CBL_idx:
182 |         compact_CBL = compact_model.module_list[idx]
183 |         loose_CBL = loose_model.module_list[idx]
184 |         out_channel_idx = np.argwhere(CBLidx2mask[idx])[:, 0].tolist()
185 | 
186 |         compact_bn, loose_bn         = compact_CBL[1], loose_CBL[1]
187 |         compact_bn.weight.data       = loose_bn.weight.data[out_channel_idx].clone()
188 |         compact_bn.bias.data         = loose_bn.bias.data[out_channel_idx].clone()
189 |         compact_bn.running_mean.data = loose_bn.running_mean.data[out_channel_idx].clone()
190 |         compact_bn.running_var.data  = loose_bn.running_var.data[out_channel_idx].clone()
191 | 
192 |         input_mask = get_input_mask(loose_model.module_defs, idx, CBLidx2mask)
193 |         in_channel_idx = np.argwhere(input_mask)[:, 0].tolist()
194 |         compact_conv, loose_conv = compact_CBL[0], loose_CBL[0]
195 |         tmp = loose_conv.weight.data[:, in_channel_idx, :, :].clone()
196 |         compact_conv.weight.data = tmp[out_channel_idx, :, :, :].clone()
197 | 
198 |     for idx in Conv_idx:
199 |         compact_conv = compact_model.module_list[idx][0]
200 |         loose_conv = loose_model.module_list[idx][0]
201 | 
202 |         input_mask = get_input_mask(loose_model.module_defs, idx, CBLidx2mask)
203 |         in_channel_idx = np.argwhere(input_mask)[:, 0].tolist()
204 |         compact_conv.weight.data = loose_conv.weight.data[:, in_channel_idx, :, :].clone()
205 |         compact_conv.bias.data   = loose_conv.bias.data.clone()
206 | 
207 | 
208 | def prune_model_keep_size(model, prune_idx, CBL_idx, CBLidx2mask):
209 | 
210 |     pruned_model = deepcopy(model)
211 |     for idx in prune_idx:
212 |         mask = torch.from_numpy(CBLidx2mask[idx]).cuda()
213 |         bn_module = pruned_model.module_list[idx][1]
214 | 
215 |         bn_module.weight.data.mul_(mask)
216 | 
217 |         activation = F.leaky_relu((1 - mask) * bn_module.bias.data, 0.1)
218 | 
219 |         # 两个上采样层前的卷积层
220 |         next_idx_list = [idx + 1]
221 |         if idx == 79:
222 |             next_idx_list.append(84)
223 |         elif idx == 91:
224 |             next_idx_list.append(96)
225 | 
226 |         for next_idx in next_idx_list:
227 |             next_conv = pruned_model.module_list[next_idx][0]
228 |             conv_sum = next_conv.weight.data.sum(dim=(2, 3))
229 |             offset = conv_sum.matmul(activation.reshape(-1, 1)).reshape(-1)
230 |             if next_idx in CBL_idx:
231 |                 next_bn = pruned_model.module_list[next_idx][1]
232 |                 next_bn.running_mean.data.sub_(offset)
233 |             else:
234 |                 #这里需要注意的是，对于convolutionnal，如果有BN，则该层卷积层不使用bias，如果无BN，则使用bias
235 |                 next_conv.bias.data.add_(offset)
236 | 
237 |         bn_module.bias.data.mul_(mask)
238 | 
239 |     return pruned_model
240 | 
241 | 
242 | def obtain_bn_mask(bn_module, thre):
243 | 
244 |     thre = thre.cuda()
245 |     mask = bn_module.weight.data.abs().ge(thre).float()
246 | 
247 |     return mask
248 | 


--------------------------------------------------------------------------------
/utils/quant_dorefa.py:
--------------------------------------------------------------------------------
  1 | import math
  2 | import time
  3 | import torch
  4 | import torch.nn as nn
  5 | import numpy as np
  6 | from torch.autograd import Function
  7 | import torch.nn.functional as F
  8 | 
  9 | 
 10 | 
 11 | 
 12 | class ScaleSigner(Function):
 13 |     """take a real value x, output sign(x)*E(|x|)"""
 14 |     @staticmethod
 15 |     def forward(ctx, input):
 16 |         return torch.sign(input) * torch.mean(torch.abs(input))
 17 | 
 18 |     @staticmethod
 19 |     def backward(ctx, grad_output):
 20 |         return grad_output
 21 | 
 22 | 
 23 | def scale_sign(input):
 24 |     return ScaleSigner.apply(input)
 25 | 
 26 | 
 27 | #真正起作用的量化函数
 28 | class Quantizer(Function):
 29 |     @staticmethod
 30 |     def forward(ctx, input, nbit):
 31 |         scale = 2 ** nbit - 1
 32 |         return torch.round(input * scale) / scale
 33 | 
 34 |     @staticmethod
 35 |     def backward(ctx, grad_output):
 36 |         return grad_output, None
 37 | 
 38 | 
 39 | def quantize(input, nbit):
 40 |     return Quantizer.apply(input, nbit)
 41 | 
 42 |  
 43 | def dorefa_w(w, nbit_w):
 44 |     if nbit_w == 1:
 45 |         w = scale_sign(w)
 46 |     else:
 47 |         w = torch.tanh(w)
 48 |         #将权重限制在[0,1]之间
 49 |         w = w / (2 * torch.max(torch.abs(w))) + 0.5
 50 |         #权重量化
 51 |         w = 2 * quantize(w, nbit_w) - 1
 52 | 
 53 |     return w
 54 | 
 55 | 
 56 | def dorefa_a(input, nbit_a):
 57 |     return quantize(torch.clamp(0.1 * input, 0, 1), nbit_a)
 58 | 
 59 | 
 60 | class QuanConv(nn.Conv2d):
 61 |     """docstring for QuanConv"""
 62 |     def __init__(self, in_channels, out_channels, kernel_size, quan_name_w='dorefa', quan_name_a='dorefa', nbit_w=32,
 63 |                  nbit_a=32, stride=1,
 64 |                  padding=0, dilation=1, groups=1,
 65 |                  bias=True):
 66 |         super(QuanConv, self).__init__(
 67 |             in_channels, out_channels, kernel_size, stride, padding, dilation,
 68 |             groups, bias)
 69 |         self.nbit_w = nbit_w
 70 |         self.nbit_a = nbit_a
 71 |         name_w_dict = {'dorefa': dorefa_w}
 72 |         name_a_dict = {'dorefa': dorefa_a}
 73 |         self.quan_w = name_w_dict[quan_name_w]
 74 |         self.quan_a = name_a_dict[quan_name_a]
 75 | 
 76 |     # @weak_script_method
 77 |     def forward(self, input):
 78 |         if self.nbit_w <=32:
 79 |             #量化卷积
 80 |             w = self.quan_w(self.weight, self.nbit_w)
 81 |         else:
 82 |             #卷积保持不变
 83 |             w = self.weight
 84 | 
 85 |         if self.nbit_a <=32:
 86 |             #量化激活
 87 |             x = self.quan_a(input, self.nbit_a)
 88 |         else:
 89 |             #激活保持不变
 90 |             x = input
 91 |         # print('x unique',np.unique(x.detach().numpy()).shape)
 92 |         # print('w unique',np.unique(w.detach().numpy()).shape)
 93 | 
 94 |         #做真正的卷积运算
 95 |         
 96 |         output = F.conv2d(x, w, self.bias, self.stride, self.padding, self.dilation, self.groups)
 97 | 
 98 |         return output
 99 | 
100 | class Linear_Q(nn.Linear):
101 |     def __init__(self, in_features, out_features, bias=True, quan_name_w='dorefa', quan_name_a='dorefa', nbit_w=32, nbit_a=32):
102 |         super(Linear_Q, self).__init__(in_features, out_features, bias)
103 |         self.nbit_w = nbit_w
104 |         self.nbit_a = nbit_a
105 |         name_w_dict = {'dorefa': dorefa_w}
106 |         name_a_dict = {'dorefa': dorefa_a}
107 |         self.quan_w = name_w_dict[quan_name_w]
108 |         self.quan_a = name_a_dict[quan_name_a]
109 | 
110 |     # @weak_script_method
111 |     def forward(self, input):
112 |         if self.nbit_w < 32:
113 |             w = self.quan_w(self.weight, self.nbit_w)
114 |         else:
115 |             w = self.weight
116 | 
117 |         if self.nbit_a < 32:
118 |             x = self.quan_a(input, self.nbit_a)
119 |         else:
120 |             x = input
121 | 
122 |         # print('x unique',np.unique(x.detach().numpy()))
123 |         # print('w unique',np.unique(w.detach().numpy()))
124 | 
125 |         output = F.linear(x, w, self.bias)
126 | 
127 |         return output
128 | 
129 | 
130 | 


--------------------------------------------------------------------------------
/utils/torch_utils.py:
--------------------------------------------------------------------------------
  1 | import math
  2 | import os
  3 | import time
  4 | from copy import deepcopy
  5 | 
  6 | import torch
  7 | import torch.backends.cudnn as cudnn
  8 | import torch.nn as nn
  9 | import torch.nn.functional as F
 10 | 
 11 | 
 12 | def init_seeds(seed=0):
 13 |     torch.manual_seed(seed)
 14 | 
 15 |     # Reduce randomness (may be slower on Tesla GPUs) # https://pytorch.org/docs/stable/notes/randomness.html
 16 |     if seed == 0:
 17 |         cudnn.deterministic = False
 18 |         cudnn.benchmark = True
 19 | 
 20 | 
 21 | def select_device(device='', apex=False, batch_size=None):
 22 |     # device = 'cpu' or '0' or '0,1,2,3'
 23 |     cpu_request = device.lower() == 'cpu'
 24 |     if device and not cpu_request:  # if device requested other than 'cpu'
 25 |         os.environ['CUDA_VISIBLE_DEVICES'] = device  # set environment variable
 26 |         assert torch.cuda.is_available(), 'CUDA unavailable, invalid device %s requested' % device  # check availablity
 27 | 
 28 |     cuda = False if cpu_request else torch.cuda.is_available()
 29 |     if cuda:
 30 |         c = 1024 ** 2  # bytes to MB
 31 |         ng = torch.cuda.device_count()
 32 |         if ng > 1 and batch_size:  # check that batch_size is compatible with device_count
 33 |             assert batch_size % ng == 0, 'batch-size %g not multiple of GPU count %g' % (batch_size, ng)
 34 |         x = [torch.cuda.get_device_properties(i) for i in range(ng)]
 35 |         s = 'Using CUDA ' + ('Apex ' if apex else '')  # apex for mixed precision https://github.com/NVIDIA/apex
 36 |         for i in range(0, ng):
 37 |             if i == 1:
 38 |                 s = ' ' * len(s)
 39 |             print("%sdevice%g _CudaDeviceProperties(name='%s', total_memory=%dMB)" %
 40 |                   (s, i, x[i].name, x[i].total_memory / c))
 41 |     else:
 42 |         print('Using CPU')
 43 | 
 44 |     print('')  # skip a line
 45 |     return torch.device('cuda:0' if cuda else 'cpu')
 46 | 
 47 | 
 48 | def time_synchronized():
 49 |     torch.cuda.synchronize() if torch.cuda.is_available() else None
 50 |     return time.time()
 51 | 
 52 | 
 53 | def initialize_weights(model):
 54 |     for m in model.modules():
 55 |         t = type(m)
 56 |         if t is nn.Conv2d:
 57 |             pass  # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
 58 |         elif t is nn.BatchNorm2d:
 59 |             m.eps = 1e-4
 60 |             m.momentum = 0.03
 61 |         elif t in [nn.LeakyReLU, nn.ReLU, nn.ReLU6]:
 62 |             m.inplace = True
 63 | 
 64 | 
 65 | def find_modules(model, mclass=nn.Conv2d):
 66 |     # finds layer indices matching module class 'mclass'
 67 |     return [i for i, m in enumerate(model.module_list) if isinstance(m, mclass)]
 68 | 
 69 | 
 70 | def fuse_conv_and_bn(conv, bn):
 71 |     # https://tehnokv.com/posts/fusing-batchnorm-and-conv/
 72 |     with torch.no_grad():
 73 |         # init
 74 |         fusedconv = torch.nn.Conv2d(conv.in_channels,
 75 |                                     conv.out_channels,
 76 |                                     kernel_size=conv.kernel_size,
 77 |                                     stride=conv.stride,
 78 |                                     padding=conv.padding,
 79 |                                     bias=True)
 80 | 
 81 |         # prepare filters
 82 |         w_conv = conv.weight.clone().view(conv.out_channels, -1)
 83 |         w_bn = torch.diag(bn.weight.div(torch.sqrt(bn.eps + bn.running_var)))
 84 |         fusedconv.weight.copy_(torch.mm(w_bn, w_conv).view(fusedconv.weight.size()))
 85 | 
 86 |         # prepare spatial bias
 87 |         if conv.bias is not None:
 88 |             b_conv = conv.bias
 89 |         else:
 90 |             b_conv = torch.zeros(conv.weight.size(0))
 91 |         b_bn = bn.bias - bn.weight.mul(bn.running_mean).div(torch.sqrt(bn.running_var + bn.eps))
 92 |         fusedconv.bias.copy_(torch.mm(w_bn, b_conv.reshape(-1, 1)).reshape(-1) + b_bn)
 93 | 
 94 |         return fusedconv
 95 | 
 96 | 
 97 | def model_info(model, verbose=False):
 98 |     # Plots a line-by-line description of a PyTorch model
 99 |     n_p = sum(x.numel() for x in model.parameters())  # number parameters
100 |     n_g = sum(x.numel() for x in model.parameters() if x.requires_grad)  # number gradients
101 |     if verbose:
102 |         print('%5s %40s %9s %12s %20s %10s %10s' % ('layer', 'name', 'gradient', 'parameters', 'shape', 'mu', 'sigma'))
103 |         for i, (name, p) in enumerate(model.named_parameters()):
104 |             name = name.replace('module_list.', '')
105 |             print('%5g %40s %9s %12g %20s %10.3g %10.3g' %
106 |                   (i, name, p.requires_grad, p.numel(), list(p.shape), p.mean(), p.std()))
107 | 
108 |     try:  # FLOPS
109 |         from thop import profile
110 |         macs, _ = profile(model, inputs=(torch.zeros(1, 3, 480, 640),), verbose=False)
111 |         fs = ', %.1f GFLOPS' % (macs / 1E9 * 2)
112 |     except:
113 |         fs = ''
114 | 
115 |     print('Model Summary: %g layers, %g parameters, %g gradients%s' % (len(list(model.parameters())), n_p, n_g, fs))
116 | 
117 | 
118 | def load_classifier(name='resnet101', n=2):
119 |     # Loads a pretrained model reshaped to n-class output
120 |     import pretrainedmodels  # https://github.com/Cadene/pretrained-models.pytorch#torchvision
121 |     model = pretrainedmodels.__dict__[name](num_classes=1000, pretrained='imagenet')
122 | 
123 |     # Display model properties
124 |     for x in ['model.input_size', 'model.input_space', 'model.input_range', 'model.mean', 'model.std']:
125 |         print(x + ' =', eval(x))
126 | 
127 |     # Reshape output to n classes
128 |     filters = model.last_linear.weight.shape[1]
129 |     model.last_linear.bias = torch.nn.Parameter(torch.zeros(n))
130 |     model.last_linear.weight = torch.nn.Parameter(torch.zeros(n, filters))
131 |     model.last_linear.out_features = n
132 |     return model
133 | 
134 | 
135 | def scale_img(img, ratio=1.0, same_shape=True):  # img(16,3,256,416), r=ratio
136 |     # scales img(bs,3,y,x) by ratio
137 |     h, w = img.shape[2:]
138 |     s = (int(h * ratio), int(w * ratio))  # new size
139 |     img = F.interpolate(img, size=s, mode='bilinear', align_corners=False)  # resize
140 |     if not same_shape:  # pad/crop img
141 |         gs = 64  # (pixels) grid size
142 |         h, w = [math.ceil(x * ratio / gs) * gs for x in (h, w)]
143 |     return F.pad(img, [0, w - s[1], 0, h - s[0]], value=0.447)  # value = imagenet mean
144 | 
145 | 
146 | class ModelEMA:
147 |     """ Model Exponential Moving Average from https://github.com/rwightman/pytorch-image-models
148 |     Keep a moving average of everything in the model state_dict (parameters and buffers).
149 |     This is intended to allow functionality like
150 |     https://www.tensorflow.org/api_docs/python/tf/train/ExponentialMovingAverage
151 |     A smoothed version of the weights is necessary for some training schemes to perform well.
152 |     E.g. Google's hyper-params for training MNASNet, MobileNet-V3, EfficientNet, etc that use
153 |     RMSprop with a short 2.4-3 epoch decay period and slow LR decay rate of .96-.99 requires EMA
154 |     smoothing of weights to match results. Pay attention to the decay constant you are using
155 |     relative to your update count per epoch.
156 |     To keep EMA from using GPU resources, set device='cpu'. This will save a bit of memory but
157 |     disable validation of the EMA weights. Validation will have to be done manually in a separate
158 |     process, or after the training stops converging.
159 |     This class is sensitive where it is initialized in the sequence of model init,
160 |     GPU assignment and distributed training wrappers.
161 |     I've tested with the sequence in my own train.py for torch.DataParallel, apex.DDP, and single-GPU.
162 |     """
163 | 
164 |     def __init__(self, model, decay=0.9999, device=''):
165 |         # make a copy of the model for accumulating moving average of weights
166 |         self.ema = deepcopy(model)
167 |         self.ema.eval()
168 |         self.updates = 0  # number of EMA updates
169 |         self.decay = lambda x: decay * (1 - math.exp(-x / 2000))  # decay exponential ramp (to help early epochs)
170 |         self.device = device  # perform ema on different device from model if set
171 |         if device:
172 |             self.ema.to(device=device)
173 |         for p in self.ema.parameters():
174 |             p.requires_grad_(False)
175 | 
176 |     def update(self, model):
177 |         self.updates += 1
178 |         d = self.decay(self.updates)
179 |         with torch.no_grad():
180 |             if type(model) in (nn.parallel.DataParallel, nn.parallel.DistributedDataParallel):
181 |                 msd, esd = model.module.state_dict(), self.ema.module.state_dict()
182 |             else:
183 |                 msd, esd = model.state_dict(), self.ema.state_dict()
184 | 
185 |             for k, v in esd.items():
186 |                 if v.dtype.is_floating_point:
187 |                     v *= d
188 |                     v += (1. - d) * msd[k].detach()
189 | 
190 |     def update_attr(self, model):
191 |         # Assign attributes (which may change during training)
192 |         for k in model.__dict__.keys():
193 |             if not k.startswith('_'):
194 |                 setattr(self.ema, k, getattr(model, k))
195 | 


--------------------------------------------------------------------------------
/utils/util_wqaq.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import torch.nn.functional as F
  4 | from torch import distributed
  5 | from torch.nn import init
  6 | from torch.nn.parameter import Parameter
  7 | from torch.autograd import Function
  8 | 
  9 | # ********************* range_trackers(范围统计器，统计量化前范围) *********************
 10 | class RangeTracker(nn.Module):
 11 |     def __init__(self, q_level):
 12 |         super().__init__()
 13 |         self.q_level = q_level
 14 | 
 15 |     def update_range(self, min_val, max_val):
 16 |         raise NotImplementedError
 17 | 
 18 |     @torch.no_grad()
 19 |     def forward(self, input):
 20 |         if self.q_level == 'L':    # A,min_max_shape=(1, 1, 1, 1),layer级
 21 |             min_val = torch.min(input)
 22 |             max_val = torch.max(input)
 23 |         elif self.q_level == 'C':  # W,min_max_shape=(N, 1, 1, 1),channel级
 24 |             min_val = torch.min(torch.min(torch.min(input, 3, keepdim=True)[0], 2, keepdim=True)[0], 1, keepdim=True)[0]
 25 |             max_val = torch.max(torch.max(torch.max(input, 3, keepdim=True)[0], 2, keepdim=True)[0], 1, keepdim=True)[0]
 26 |             
 27 |         self.update_range(min_val, max_val)
 28 | class GlobalRangeTracker(RangeTracker):  # W,min_max_shape=(N, 1, 1, 1),channel级,取本次和之前相比的min_max —— (N, C, W, H)
 29 |     def __init__(self, q_level, out_channels):
 30 |         super().__init__(q_level)
 31 |         self.register_buffer('min_val', torch.zeros(out_channels, 1, 1, 1))
 32 |         self.register_buffer('max_val', torch.zeros(out_channels, 1, 1, 1))
 33 |         self.register_buffer('first_w', torch.zeros(1))
 34 | 
 35 |     def update_range(self, min_val, max_val):
 36 |         temp_minval = self.min_val
 37 |         temp_maxval = self.max_val
 38 |         if self.first_w == 0:
 39 |             self.first_w.add_(1)
 40 |             self.min_val.add_(min_val)
 41 |             self.max_val.add_(max_val)
 42 |         else:
 43 |             self.min_val.add_(-temp_minval).add_(torch.min(temp_minval, min_val))
 44 |             self.max_val.add_(-temp_maxval).add_(torch.max(temp_maxval, max_val))
 45 | class AveragedRangeTracker(RangeTracker):  # A,min_max_shape=(1, 1, 1, 1),layer级,取running_min_max —— (N, C, W, H)
 46 |     def __init__(self, q_level, momentum=0.1):
 47 |         super().__init__(q_level)
 48 |         self.momentum = momentum
 49 |         self.register_buffer('min_val', torch.zeros(1))
 50 |         self.register_buffer('max_val', torch.zeros(1))
 51 |         self.register_buffer('first_a', torch.zeros(1))
 52 | 
 53 |     def update_range(self, min_val, max_val):
 54 |         if self.first_a == 0:
 55 |             self.first_a.add_(1)
 56 |             self.min_val.add_(min_val)
 57 |             self.max_val.add_(max_val)
 58 |         else:
 59 |             self.min_val.mul_(1 - self.momentum).add_(min_val * self.momentum)
 60 |             self.max_val.mul_(1 - self.momentum).add_(max_val * self.momentum)
 61 |         
 62 | # ********************* quantizers（量化器，量化） *********************
 63 | class Round(Function):
 64 | 
 65 |     @staticmethod
 66 |     def forward(self, input):
 67 |         output = torch.round(input)
 68 |         return output
 69 | 
 70 |     @staticmethod
 71 |     def backward(self, grad_output):
 72 |         grad_input = grad_output.clone()
 73 |         return grad_input
 74 | class Quantizer(nn.Module):
 75 |     def __init__(self, bits, range_tracker):
 76 |         super().__init__()
 77 |         self.bits = bits
 78 |         self.range_tracker = range_tracker
 79 |         self.register_buffer('scale', None)      # 量化比例因子
 80 |         self.register_buffer('zero_point', None) # 量化零点
 81 | 
 82 |     def update_params(self):
 83 |         raise NotImplementedError
 84 | 
 85 |     # 量化
 86 |     def quantize(self, input):
 87 |         output = input * self.scale - self.zero_point
 88 |         return output
 89 | 
 90 |     def round(self, input):
 91 |         output = Round.apply(input)
 92 |         return output
 93 | 
 94 |     # 截断
 95 |     def clamp(self, input):
 96 |         output = torch.clamp(input, self.min_val, self.max_val)
 97 |         return output
 98 | 
 99 |     # 反量化
100 |     def dequantize(self, input):
101 |         output = (input + self.zero_point) / self.scale
102 |         return output
103 | 
104 |     def forward(self, input):
105 |         if self.bits == 32:
106 |             output = input
107 |         elif self.bits == 1:
108 |             print('！Binary quantization is not supported ！')
109 |             assert self.bits != 1
110 |         else:
111 |             self.range_tracker(input)
112 |             self.update_params()
113 |             output = self.quantize(input)   # 量化
114 |             output = self.round(output)
115 |             output = self.clamp(output)     # 截断
116 |             output = self.dequantize(output)# 反量化
117 |         return output
118 | class SignedQuantizer(Quantizer):
119 |     def __init__(self, *args, **kwargs):
120 |         super().__init__(*args, **kwargs)
121 |         self.register_buffer('min_val', torch.tensor(-(1 << (self.bits - 1))))
122 |         self.register_buffer('max_val', torch.tensor((1 << (self.bits - 1)) - 1))
123 | class UnsignedQuantizer(Quantizer):
124 |     def __init__(self, *args, **kwargs):
125 |         super().__init__(*args, **kwargs)
126 |         self.register_buffer('min_val', torch.tensor(0))
127 |         self.register_buffer('max_val', torch.tensor((1 << self.bits) - 1))
128 | # 对称量化
129 | class SymmetricQuantizer(SignedQuantizer):
130 | 
131 |     def update_params(self):
132 |         quantized_range = torch.min(torch.abs(self.min_val), torch.abs(self.max_val))  # 量化后范围
133 |         float_range = torch.max(torch.abs(self.range_tracker.min_val), torch.abs(self.range_tracker.max_val))  # 量化前范围
134 |         self.scale = quantized_range / float_range      # 量化比例因子
135 |         self.zero_point = torch.zeros_like(self.scale)  # 量化零点
136 | # 非对称量化
137 | class AsymmetricQuantizer(UnsignedQuantizer):
138 | 
139 |     def update_params(self):
140 |         quantized_range = self.max_val - self.min_val  # 量化后范围
141 |         float_range = self.range_tracker.max_val - self.range_tracker.min_val   # 量化前范围
142 |         self.scale = quantized_range / float_range  # 量化比例因子
143 |         self.zero_point = torch.round(self.range_tracker.min_val * self.scale)  # 量化零点
144 | 
145 | # ********************* 量化卷积（同时量化A/W，并做卷积） *********************
146 | class Conv2d_Q(nn.Conv2d):
147 |     def __init__(
148 |         self,
149 |         in_channels,
150 |         out_channels,
151 |         kernel_size,
152 |         stride=1,
153 |         padding=0,
154 |         dilation=1,
155 |         groups=1,
156 |         bias=True,
157 |         a_bits=8,
158 |         w_bits=8,
159 |         q_type=1,
160 |         first_layer=0,
161 |     ):
162 |         super().__init__(
163 |             in_channels=in_channels,
164 |             out_channels=out_channels,
165 |             kernel_size=kernel_size,
166 |             stride=stride,
167 |             padding=padding,
168 |             dilation=dilation,
169 |             groups=groups,
170 |             bias=bias
171 |         )
172 |         # 实例化量化器（A-layer级，W-channel级）
173 |         if q_type == 0:
174 |             self.activation_quantizer = SymmetricQuantizer(bits=a_bits, range_tracker=AveragedRangeTracker(q_level='L'))
175 |             self.weight_quantizer = SymmetricQuantizer(bits=w_bits, range_tracker=GlobalRangeTracker(q_level='C', out_channels=out_channels))
176 |         else:
177 |             self.activation_quantizer = AsymmetricQuantizer(bits=a_bits, range_tracker=AveragedRangeTracker(q_level='L'))
178 |             self.weight_quantizer = AsymmetricQuantizer(bits=w_bits, range_tracker=GlobalRangeTracker(q_level='C', out_channels=out_channels))
179 |         self.first_layer = first_layer
180 | 
181 |     def forward(self, input):
182 |         # 量化A和W
183 |         if not self.first_layer:
184 |             input = self.activation_quantizer(input)
185 |         q_input = input
186 |         q_weight = self.weight_quantizer(self.weight) 
187 |         # 量化卷积
188 |         output = F.conv2d(
189 |             input=q_input,
190 |             weight=q_weight,
191 |             bias=self.bias,
192 |             stride=self.stride,
193 |             padding=self.padding,
194 |             dilation=self.dilation,
195 |             groups=self.groups
196 |         )
197 |         return output
198 | 
199 | def reshape_to_activation(input):
200 |   return input.reshape(1, -1, 1, 1)
201 | def reshape_to_weight(input):
202 |   return input.reshape(-1, 1, 1, 1)
203 | def reshape_to_bias(input):
204 |   return input.reshape(-1)
205 | # ********************* bn融合_量化卷积（bn融合后，同时量化A/W，并做卷积） *********************
206 | class BNFold_Conv2d_Q(Conv2d_Q):
207 |     def __init__(
208 |         self,
209 |         in_channels,
210 |         out_channels,
211 |         kernel_size,
212 |         stride=1,
213 |         padding=0,
214 |         dilation=1,
215 |         groups=1,
216 |         bias=False,
217 |         eps=1e-5,
218 |         momentum=0.01, # 考虑量化带来的抖动影响,对momentum进行调整(0.1 ——> 0.01),削弱batch统计参数占比，一定程度抑制抖动。经实验量化训练效果更好,acc提升1%左右
219 |         a_bits=8,
220 |         w_bits=8,
221 |         q_type=1,
222 |         first_layer=0,
223 |     ):
224 |         super().__init__(
225 |             in_channels=in_channels,
226 |             out_channels=out_channels,
227 |             kernel_size=kernel_size,
228 |             stride=stride,
229 |             padding=padding,
230 |             dilation=dilation,
231 |             groups=groups,
232 |             bias=bias
233 |         )
234 |         self.eps = eps
235 |         self.momentum = momentum
236 |         self.gamma = Parameter(torch.Tensor(out_channels))
237 |         self.beta = Parameter(torch.Tensor(out_channels))
238 |         self.register_buffer('running_mean', torch.zeros(out_channels))
239 |         self.register_buffer('running_var', torch.ones(out_channels))
240 |         self.register_buffer('first_bn', torch.zeros(1))
241 |         init.uniform_(self.gamma)
242 |         init.zeros_(self.beta)
243 |         
244 |         # 实例化量化器（A-layer级，W-channel级）
245 |         if q_type == 0:
246 |             self.activation_quantizer = SymmetricQuantizer(bits=a_bits, range_tracker=AveragedRangeTracker(q_level='L'))
247 |             self.weight_quantizer = SymmetricQuantizer(bits=w_bits, range_tracker=GlobalRangeTracker(q_level='C', out_channels=out_channels))
248 |         else:
249 |             self.activation_quantizer = AsymmetricQuantizer(bits=a_bits, range_tracker=AveragedRangeTracker(q_level='L'))
250 |             self.weight_quantizer = AsymmetricQuantizer(bits=w_bits, range_tracker=GlobalRangeTracker(q_level='C', out_channels=out_channels))
251 |         self.first_layer = first_layer
252 | 
253 |     def forward(self, input):
254 |         # 训练态
255 |         if self.training:
256 |             # 先做普通卷积得到A，以取得BN参数
257 |             output = F.conv2d(
258 |                 input=input,
259 |                 weight=self.weight,
260 |                 bias=self.bias,
261 |                 stride=self.stride,
262 |                 padding=self.padding,
263 |                 dilation=self.dilation,
264 |                 groups=self.groups
265 |             )
266 |             # 更新BN统计参数（batch和running）
267 |             dims = [dim for dim in range(4) if dim != 1]
268 |             batch_mean = torch.mean(output, dim=dims)
269 |             batch_var = torch.var(output, dim=dims)
270 |             with torch.no_grad():
271 |                 if self.first_bn == 0:
272 |                     self.first_bn.add_(1)
273 |                     self.running_mean.add_(batch_mean)
274 |                     self.running_var.add_(batch_var)
275 |                 else:
276 |                     self.running_mean.mul_(1 - self.momentum).add_(batch_mean * self.momentum)
277 |                     self.running_var.mul_(1 - self.momentum).add_(batch_var * self.momentum)
278 |             # BN融合
279 |             if self.bias is not None:  
280 |               bias = reshape_to_bias(self.beta + (self.bias -  batch_mean) * (self.gamma / torch.sqrt(batch_var + self.eps)))
281 |             else:
282 |               bias = reshape_to_bias(self.beta - batch_mean  * (self.gamma / torch.sqrt(batch_var + self.eps)))# b融batch
283 |             weight = self.weight * reshape_to_weight(self.gamma / torch.sqrt(self.running_var + self.eps))     # w融running
284 |         # 测试态
285 |         else:
286 |             #print(self.running_mean, self.running_var)
287 |             # BN融合
288 |             if self.bias is not None:
289 |               bias = reshape_to_bias(self.beta + (self.bias - self.running_mean) * (self.gamma / torch.sqrt(self.running_var + self.eps)))
290 |             else:
291 |               bias = reshape_to_bias(self.beta - self.running_mean * (self.gamma / torch.sqrt(self.running_var + self.eps)))  # b融running
292 |             weight = self.weight * reshape_to_weight(self.gamma / torch.sqrt(self.running_var + self.eps))  # w融running
293 |         
294 |         # 量化A和bn融合后的W
295 |         if not self.first_layer:
296 |             input = self.activation_quantizer(input)
297 |         q_input = input
298 |         q_weight = self.weight_quantizer(weight) 
299 |         # 量化卷积
300 |         if self.training:  # 训练态
301 |           output = F.conv2d(
302 |               input=q_input,
303 |               weight=q_weight,
304 |               bias=self.bias,  # 注意，这里不加bias（self.bias为None）
305 |               stride=self.stride,
306 |               padding=self.padding,
307 |               dilation=self.dilation,
308 |               groups=self.groups
309 |           )
310 |           # （这里将训练态下，卷积中w融合running参数的效果转为融合batch参数的效果）running ——> batch
311 |           output *= reshape_to_activation(torch.sqrt(self.running_var + self.eps) / torch.sqrt(batch_var + self.eps))
312 |           output += reshape_to_activation(bias)
313 |         else:  # 测试态
314 |           output = F.conv2d(
315 |               input=q_input,
316 |               weight=q_weight,
317 |               bias=bias,  # 注意，这里加bias，做完整的conv+bn
318 |               stride=self.stride,
319 |               padding=self.padding,
320 |               dilation=self.dilation,
321 |               groups=self.groups
322 |           )
323 |         return output
324 | 


--------------------------------------------------------------------------------
/weights/download_yolov3_weights.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | # make '/weights' directory if it does not exist and cd into it
 4 | # mkdir -p weights && cd weights
 5 | 
 6 | # copy darknet weight files, continue '-c' if partially downloaded
 7 | # wget -c https://pjreddie.com/media/files/yolov3.weights
 8 | # wget -c https://pjreddie.com/media/files/yolov3-tiny.weights
 9 | # wget -c https://pjreddie.com/media/files/yolov3-spp.weights
10 | 
11 | # yolov3 pytorch weights
12 | # download from Google Drive: https://drive.google.com/drive/folders/1uxgUBemJVw9wZsdpboYbzUN4bcRhsuAI
13 | 
14 | # darknet53 weights (first 75 layers only)
15 | # wget -c https://pjreddie.com/media/files/darknet53.conv.74
16 | 
17 | # yolov3-tiny weights from darknet (first 16 layers only)
18 | # ./darknet partial cfg/yolov3-tiny.cfg yolov3-tiny.weights yolov3-tiny.conv.15 15
19 | # mv yolov3-tiny.conv.15 ../
20 | 
21 | # new method
22 | python3 -c "from models import *;
23 | attempt_download('weights/yolov3.pt');
24 | attempt_download('weights/yolov3-spp.pt')"
25 | 


--------------------------------------------------------------------------------