├── README.md ├── cfg ├── ghost-yolov3-visdrone.cfg ├── yolov3-hand.cfg ├── yolov3-shufflenetv2-hand.cfg ├── yolov3-shufflenetv2-visdrone.cfg ├── yolov3-tiny-hand-cbam.cfg ├── yolov3-tiny-hand-eca.cfg ├── yolov3-tiny-hand-se.cfg └── yolov3-tiny-hand.cfg ├── data ├── oxfordhand.data ├── oxfordhand.names ├── visdrone.data └── visdrone.names ├── detect.py ├── models.py ├── normal_prune.py ├── output ├── airplane.png ├── car.png ├── most.png └── test.py ├── test.py ├── train.py ├── utils ├── __init__.py ├── adabound.py ├── datasets.py ├── evolve.sh ├── gcp.sh ├── google_utils.py ├── keepgit ├── layers.py ├── parse_config.py ├── prune_utils.py ├── quant_dorefa.py ├── torch_utils.py ├── util_wqaq.py └── utils.py └── weights └── download_yolov3_weights.sh /README.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | This Repository includes YOLOv3 with some lightweight backbones (***ShuffleNetV2, GhostNet, VoVNet***), some computer vision attention mechanism (***SE Block, CBAM Block, ECA Block***), pruning,quantization and distillation for GhostNet. 3 | # Important Update 4 | ***2020.6.1*** 5 | (1) The best lightweight model——HuaWei GhostNet has been added as the YOLOv3 backbone! It is better than ShuffleNetV2. The result for visdrone dataset is as following. 6 | (2) Add Dorefa quantization method for arbitrary bit quantization! The result for visdrone dataset is as following. 7 | (3) And I delete the ShuffleNet and the attention mechanism. 8 | ***2020.6.24*** 9 | (1) Add pruning according to NetworkSlimming. 10 | (2) Add distillation for higher mAP after pruning. 11 | (3) Add Imagenet pretraining model for GhostNet. 12 | ***2020.9.26*** 13 | (1) Add VoVNet as the backbone. The result is excellent. 14 | 15 | | Model | Params | FPS | mAP | 16 | | ----- | ----- | ----- |----- | 17 | | GhostNet+YOLOv3 | 23.49M | 62.5 | 35.1 | 18 | | Pruned Model+Distillation | 5.81M | 76.9 | 34.3 | 19 | | Pruned Model+INT8 | 5.81M | 75.1 | 34 | 20 | | YOLOv5s | 7.27M | - | 32.7 | 21 | | YOLOv5x | 88.5M | - | 41.8 | 22 | | VoVNet | 42.8M | 28.9 | 42.7 | 23 | 24 | ***Attention : Single GPU will be better*** 25 | ***If you need previous attention model or have any question, you can add my WeChat: AutrefoisLethe*** 26 | # Environment 27 | * python 3.7 28 | * pytorch >= 1.1.0 29 | * opencv-python 30 | # Datasets 31 | * Oxfordhand datasets (1 class, including human's hand) 32 | https://pan.baidu.com/s/1ZYKXMEvNef41MdG1NgWYiQ (extract code: 00rw) 33 | * Visdrone Remote datasets (10 classes, including pedestrian, car, bus, etc) 34 | https://pan.baidu.com/s/1JzJ6APRym8K64taZgcDZfQ (extract code: xyil) 35 | * Bdd100K datasets (10 classes, including motor, train, traffic light, etc) 36 | https://pan.baidu.com/s/1dBrKEdy92Mxqg-JiyrVjkQ (extract code: lm4p) 37 | * Dior datasets (20 classes, including airplane, airport, bridge, etc) 38 | https://pan.baidu.com/s/1Fc-zJtHy-6iIewvsKWPDnA (extract code: k2js) 39 | 40 | # Usage 41 | 1. Download the datasets, place them in the ***data*** directory 42 | 2. Train the models by using following command (change the model structure by changing the cfg file) 43 | ``` 44 | python3 train.py --data data/visdrone.data --batch-size 16 --cfg cfg/ghost-yolov3-visdrone.cfg --img-size 640 45 | ``` 46 | 3. Detect objects using the trained model (place the pictures or videos in the ***samples*** directory) 47 | ``` 48 | python3 detect.py --cfg cfg/ghostnet-yolov3-visdrone.cfg --weights weights/best.pt --data data/visdrone.data 49 | ``` 50 | 4. Results: 51 | ![most](https://github.com/HaloTrouvaille/YOLO-Multi-Backbones-Attention/blob/master/output/most.png) 52 | ![car](https://github.com/HaloTrouvaille/YOLO-Multi-Backbones-Attention/blob/master/output/car.png) 53 | ![airplane](https://github.com/HaloTrouvaille/YOLO-Multi-Backbones-Attention/blob/master/output/airplane.png) 54 | # Pruning and Quantization 55 | ## Pruning 56 | First of all, execute sparse training. 57 | ``` 58 | python3 train.py --data data/visdrone.data --batch-size 4 --cfg cfg/ghost-yolov3-visdrone.cfg --img-size 640 --epochs 300 --device 3 -sr --s 0.0001 59 | ``` 60 | Then change cfg and weights in normal_prune.py then use following command 61 | ``` 62 | python normal_prune.py 63 | ``` 64 | After obtaining pruned.cfg and corresponding weights file, you can fine-tune the pruned model via following command 65 | ``` 66 | python3 train.py --data data/visdrone.data --batch-size 4 --cfg pruned.cfg --img-size 640 --epochs 300 --device 3 --weights weights/xxx.weighs 67 | ``` 68 | 69 | ## Quantization 70 | If you want to quantize certain convolutional layer, you can just change the [convolutional] to [quan_convolutional] in cfg file. Then use following command 71 | ``` 72 | python3 train.py --data data/visdrone.data --batch-size 16 --cfg cfg/ghostnet-yolov3-visdrone.cfg --img-size 640 73 | ``` 74 | 75 | # Experiment Result for Changing YOLOv3 Backbone 76 | ## ShuffleNetV2 + Two Scales Detection(YOLO Detector) 77 | ### Using Oxfordhand datasets 78 | | Model | Params | Model Size | mAP | 79 | | ----- | ----- | ----- |----- | 80 | | ShuffleNetV2 1x | 3.57M | 13.89MB | 51.2 | 81 | | ShuffleNetV2 1.5x | 5.07M | 19.55MB | 56.4 | 82 | | YOLOv3-tiny | 8.67M | 33.1MB | 60.3 | 83 | ### Using Visdrone datasets(Incomplete training) 84 | | Model | Params | Model Size | mAP | 85 | | ----- | ----- | ----- |----- | 86 | | ShuffleNetV2 1x | 3.59M | 13.99MB | 10.2 | 87 | | ShuffleNetV2 1.5x | 5.09M | 19.63MB | 11 | 88 | | YOLOv3-tiny | 8.69M | 33.9MB | 3.3 | 89 | # Experiment Result for Attention Mechanism 90 | ### Based on YOLOv3-tiny 91 | SE Block paper : https://arxiv.org/abs/1709.01507 92 | CBAM Block paper : https://arxiv.org/abs/1807.06521 93 | ECA Block paper : https://arxiv.org/abs/1910.03151 94 | | Model | Params | mAP | 95 | | ----- | ----- | ----- | 96 | | YOLOv3-tiny | 8.67M | 60.3 | 97 | | YOLOv3-tiny + SE | 8.933M | 62.3 | 98 | | YOLOv3-tiny + CBAM | 8.81M | 62.7 | 99 | | YOLOv3-tiny + ECA | 8.67M | 62.6 | 100 | 101 | 102 | # TODO 103 | - [x] ShuffleNetV2 backbone 104 | - [x] HuaWei GhostNet backbone 105 | - [x] ImageNet pretraining 106 | - [x] COCO datasets training 107 | - [ ] Other detection strategies 108 | - [ ] Other pruning strategies 109 | 110 | 111 | -------------------------------------------------------------------------------- /cfg/ghost-yolov3-visdrone.cfg: -------------------------------------------------------------------------------- 1 | 2 | [net] 3 | # Testing 4 | #batch=1 5 | #subdivisions=1 6 | # Training 7 | batch=16 8 | subdivisions=1 9 | width=416 10 | height=416 11 | channels=3 12 | momentum=0.9 13 | decay=0.0005 14 | angle=0 15 | saturation = 1.5 16 | exposure = 1.5 17 | hue=.1 18 | 19 | learning_rate=0.001 20 | burn_in=1000 21 | max_batches = 500200 22 | policy=steps 23 | steps=400000,450000 24 | scales=.1,.1 25 | # 0 26 | [convolutional] 27 | batch_normalize=1 28 | filters=16 29 | size=3 30 | stride=2 31 | pad=1 32 | group=1 33 | activation=relu 34 | 35 | # ghost bottleneck starts 36 | 37 | #GB1-PConv #1 38 | [convolutional] 39 | batch_normalize=1 40 | filters=8 41 | size=1 42 | stride=1 43 | pad=0 44 | group=1 45 | activation=relu 46 | 47 | #GB1-Cheap #2 48 | [convolutional] 49 | batch_normalize=1 50 | filters=8 51 | size=3 52 | stride=1 53 | pad=1 54 | group=8 55 | activation=relu 56 | 57 | # 3 58 | [route] 59 | layers=-1, 1 60 | 61 | # 4 62 | [convolutional] 63 | batch_normalize=1 64 | filters=8 65 | size=1 66 | stride=1 67 | pad=0 68 | group=1 69 | activation=none 70 | 71 | # 5 72 | [convolutional] 73 | batch_normalize=1 74 | filters=8 75 | size=3 76 | stride=1 77 | pad=1 78 | group=8 79 | activation=none 80 | 81 | # 6 82 | [route] 83 | layers=-1,4 84 | 85 | # 7 86 | [shortcut] 87 | from=-7 88 | activation=none 89 | 90 | # GB2-PConv # 8 91 | [convolutional] 92 | batch_normalize=1 93 | filters=24 94 | size=1 95 | stride=1 96 | pad=0 97 | group=1 98 | activation=relu 99 | 100 | # GB2-Cheap # 9 101 | [convolutional] 102 | batch_normalize=1 103 | filters=24 104 | size=3 105 | stride=1 106 | pad=1 107 | group=24 108 | activation=relu 109 | 110 | #10 111 | [route] 112 | layers=-1,8 113 | 114 | #11 115 | [convolutional] 116 | batch_normalize=1 117 | filters=48 118 | size=3 119 | stride=2 120 | group=48 121 | pad=1 122 | activation=none 123 | 124 | #12 125 | [convolutional] 126 | batch_normalize=1 127 | filters=12 128 | size=1 129 | stride=1 130 | group=1 131 | pad=0 132 | activation=none 133 | 134 | #13 135 | [convolutional] 136 | batch_normalize=1 137 | filters=12 138 | size=3 139 | stride=1 140 | group=12 141 | pad=1 142 | activation=none 143 | 144 | #14 145 | [route] 146 | layers=-1,12 147 | 148 | #15 149 | [route] 150 | layers=7 151 | 152 | #16 153 | [convolutional] 154 | batch_normalize=1 155 | filters=16 156 | size=3 157 | stride=2 158 | group=16 159 | pad=1 160 | activation=none 161 | 162 | #17 163 | [convolutional] 164 | batch_normalize=1 165 | filters=24 166 | size=1 167 | stride=1 168 | group=1 169 | pad=0 170 | activation=none 171 | 172 | #18 173 | [shortcut] 174 | from=-4 175 | activation=none 176 | 177 | # GB3-PConv #19 178 | [convolutional] 179 | batch_normalize=1 180 | filters=36 181 | size=1 182 | stride=1 183 | group=1 184 | pad=0 185 | activation=relu 186 | 187 | # GB3-Cheap #20 188 | [convolutional] 189 | batch_normalize=1 190 | filters=36 191 | size=3 192 | stride=1 193 | group=36 194 | pad=1 195 | activation=relu 196 | 197 | #21 198 | [route] 199 | layers=-1,19 200 | 201 | #22 202 | [convolutional] 203 | batch_normalize=1 204 | filters=12 205 | size=1 206 | stride=1 207 | group=1 208 | pad=0 209 | activation=none 210 | 211 | #23 212 | [convolutional] 213 | batch_normalize=1 214 | filters=12 215 | size=3 216 | stride=1 217 | group=12 218 | pad=1 219 | activation=none 220 | 221 | #24 222 | [route] 223 | layers=-1,22 224 | 225 | #25 226 | [shortcut] 227 | from=-7 228 | activation=none 229 | 230 | #GB4-PConv #26 231 | [convolutional] 232 | batch_normalize=1 233 | filters=36 234 | size=1 235 | stride=1 236 | group=1 237 | pad=0 238 | activation=relu 239 | 240 | #GB4-Cheap #27 241 | [convolutional] 242 | batch_normalize=1 243 | filters=36 244 | size=3 245 | stride=1 246 | group=36 247 | pad=1 248 | activation=relu 249 | 250 | #28 251 | [route] 252 | layers=-1,26 253 | 254 | #29 255 | [convolutional] 256 | batch_normalize=1 257 | filters=72 258 | size=5 259 | stride=2 260 | group=72 261 | pad=2 262 | activation=none 263 | 264 | #30 265 | [se] 266 | reduction=4 267 | 268 | #31 269 | [convolutional] 270 | batch_normalize=1 271 | filters=20 272 | size=1 273 | stride=1 274 | group=1 275 | pad=0 276 | activation=none 277 | 278 | #32 279 | [convolutional] 280 | batch_normalize=1 281 | filters=20 282 | size=3 283 | stride=1 284 | group=20 285 | pad=1 286 | activation=none 287 | 288 | #33 289 | [route] 290 | layers=-1,31 291 | 292 | #34 293 | [route] 294 | layers=25 295 | 296 | #35 297 | [convolutional] 298 | batch_normalize=1 299 | filters=24 300 | size=5 301 | stride=2 302 | group=24 303 | pad=2 304 | activation=none 305 | 306 | #36 307 | [convolutional] 308 | batch_normalize=1 309 | filters=40 310 | size=1 311 | stride=1 312 | group=1 313 | pad=0 314 | activation=none 315 | 316 | #37 317 | [shortcut] 318 | from=-4 319 | activation=none 320 | 321 | #GB5-PConv #38 322 | [convolutional] 323 | batch_normalize=1 324 | filters=60 325 | size=1 326 | stride=1 327 | group=1 328 | pad=0 329 | activation=relu 330 | 331 | #GB5-Cheap #39 332 | [convolutional] 333 | batch_normalize=1 334 | filters=60 335 | size=3 336 | stride=1 337 | group=60 338 | pad=1 339 | activation=relu 340 | 341 | #40 342 | [route] 343 | layers=-1,38 344 | 345 | #41 346 | [se] 347 | reduction=4 348 | 349 | #42 350 | [convolutional] 351 | batch_normalize=1 352 | filters=20 353 | size=1 354 | stride=1 355 | group=1 356 | pad=0 357 | activation=none 358 | 359 | #43 360 | [convolutional] 361 | batch_normalize=1 362 | filters=20 363 | size=3 364 | stride=1 365 | group=20 366 | pad=1 367 | activation=none 368 | 369 | #44 370 | [route] 371 | layers=-1,42 372 | 373 | #45 374 | [shortcut] 375 | from=-8 376 | 377 | #GB6-PConv #46 378 | [convolutional] 379 | batch_normalize=1 380 | filters=120 381 | size=1 382 | stride=1 383 | group=1 384 | pad=0 385 | activation=relu 386 | 387 | #GB6-Cheap #47 388 | [convolutional] 389 | batch_normalize=1 390 | filters=120 391 | size=3 392 | stride=1 393 | group=120 394 | pad=1 395 | activation=relu 396 | 397 | #48 398 | [route] 399 | layers=-1,46 400 | 401 | #49 402 | [convolutional] 403 | batch_normalize=1 404 | filters=240 405 | size=3 406 | stride=2 407 | group=240 408 | pad=1 409 | activation=none 410 | 411 | #50 412 | [convolutional] 413 | batch_normalize=1 414 | filters=40 415 | size=1 416 | stride=1 417 | group=1 418 | pad=0 419 | activation=none 420 | 421 | #51 422 | [convolutional] 423 | batch_normalize=1 424 | filters=40 425 | size=3 426 | stride=1 427 | group=40 428 | pad=1 429 | activation=none 430 | 431 | #52 432 | [route] 433 | layers=-1,50 434 | 435 | #53 436 | [route] 437 | layers=45 438 | 439 | #54 440 | [convolutional] 441 | batch_normalize=1 442 | filters=40 443 | size=3 444 | stride=2 445 | group=40 446 | pad=1 447 | activation=none 448 | 449 | #55 450 | [convolutional] 451 | batch_normalize=1 452 | filters=80 453 | size=1 454 | stride=1 455 | group=1 456 | pad=0 457 | activation=none 458 | 459 | #56 460 | [shortcut] 461 | from=-4 462 | activation=none 463 | 464 | #GB7-PConv #57 465 | [convolutional] 466 | batch_normalize=1 467 | filters=100 468 | size=1 469 | stride=1 470 | group=1 471 | pad=0 472 | activation=relu 473 | 474 | #GB7-Cheap #58 475 | [convolutional] 476 | batch_normalize=1 477 | filters=100 478 | size=3 479 | stride=1 480 | group=100 481 | pad=1 482 | activation=relu 483 | 484 | #59 485 | [route] 486 | layers=-1,57 487 | 488 | #60 489 | [convolutional] 490 | batch_normalize=1 491 | filters=40 492 | size=1 493 | stride=1 494 | group=1 495 | pad=0 496 | activation=none 497 | 498 | #61 499 | [convolutional] 500 | batch_normalize=1 501 | filters=40 502 | size=3 503 | stride=1 504 | group=40 505 | pad=1 506 | activation=none 507 | 508 | #62 509 | [route] 510 | layers=-1,60 511 | 512 | #63 513 | [shortcut] 514 | from=-7 515 | activation=none 516 | 517 | #GB8-PConv #64 518 | [convolutional] 519 | batch_normalize=1 520 | filters=92 521 | size=1 522 | stride=1 523 | group=1 524 | pad=0 525 | activation=relu 526 | 527 | #GB8-Cheap #65 528 | [convolutional] 529 | batch_normalize=1 530 | filters=92 531 | size=3 532 | stride=1 533 | group=92 534 | pad=1 535 | activation=relu 536 | 537 | #66 538 | [route] 539 | layers=-1,64 540 | 541 | #67 542 | [convolutional] 543 | batch_normalize=1 544 | filters=40 545 | size=1 546 | stride=1 547 | group=1 548 | pad=0 549 | activation=none 550 | 551 | #68 552 | [convolutional] 553 | batch_normalize=1 554 | filters=40 555 | size=3 556 | stride=1 557 | group=40 558 | pad=1 559 | activation=none 560 | 561 | #69 562 | [route] 563 | layers=-1,67 564 | 565 | #70 566 | [shortcut] 567 | from=-7 568 | activation=none 569 | 570 | #GB9-PConv #71 571 | [convolutional] 572 | batch_normalize=1 573 | filters=92 574 | size=1 575 | stride=1 576 | group=1 577 | pad=0 578 | activation=relu 579 | 580 | #GB9-Cheap #72 581 | [convolutional] 582 | batch_normalize=1 583 | filters=92 584 | size=3 585 | stride=1 586 | group=92 587 | pad=1 588 | activation=relu 589 | 590 | #73 591 | [route] 592 | layers=-1,71 593 | 594 | #74 595 | [convolutional] 596 | batch_normalize=1 597 | filters=40 598 | size=1 599 | stride=1 600 | group=1 601 | pad=0 602 | activation=none 603 | 604 | #75 605 | [convolutional] 606 | batch_normalize=1 607 | filters=40 608 | size=3 609 | stride=1 610 | group=40 611 | pad=1 612 | activation=none 613 | 614 | #76 615 | [route] 616 | layers=-1,74 617 | 618 | #77 619 | [shortcut] 620 | from=-7 621 | activation=none 622 | 623 | #GB10-PConv #78 624 | [convolutional] 625 | batch_normalize=1 626 | filters=240 627 | size=1 628 | stride=1 629 | group=1 630 | pad=0 631 | activation=relu 632 | 633 | 634 | #GB10-Cheap #79 635 | [convolutional] 636 | batch_normalize=1 637 | filters=240 638 | size=3 639 | stride=1 640 | group=240 641 | pad=1 642 | activation=relu 643 | 644 | #80 645 | [route] 646 | layers=-1,78 647 | 648 | #81 649 | [se] 650 | reduction=4 651 | 652 | #82 653 | [convolutional] 654 | batch_normalize=1 655 | filters=56 656 | size=1 657 | stride=1 658 | group=1 659 | pad=0 660 | activation=none 661 | 662 | #83 663 | [convolutional] 664 | batch_normalize=1 665 | filters=56 666 | size=3 667 | stride=1 668 | group=56 669 | pad=1 670 | activation=none 671 | 672 | #84 673 | [route] 674 | layers=-1,82 675 | 676 | #85 677 | [route] 678 | layers=77 679 | 680 | #86 681 | [convolutional] 682 | batch_normalize=1 683 | filters=80 684 | size=3 685 | stride=1 686 | group=80 687 | pad=1 688 | activation=none 689 | 690 | #87 691 | [convolutional] 692 | batch_normalize=1 693 | filters=112 694 | size=1 695 | stride=1 696 | group=1 697 | pad=0 698 | activation=none 699 | 700 | #88 701 | [shortcut] 702 | from=-4 703 | activation=none 704 | 705 | #GB11-PConv #89 706 | [convolutional] 707 | batch_normalize=1 708 | filters=336 709 | size=1 710 | stride=1 711 | group=1 712 | pad=0 713 | activation=relu 714 | 715 | #GB11-Cheap #90 716 | [convolutional] 717 | batch_normalize=1 718 | filters=336 719 | size=3 720 | stride=1 721 | group=336 722 | pad=1 723 | activation=relu 724 | 725 | #91 726 | [route] 727 | layers=-1,89 728 | 729 | #92 730 | [se] 731 | reduction=4 732 | 733 | #93 734 | [convolutional] 735 | batch_normalize=1 736 | filters=56 737 | size=1 738 | stride=1 739 | group=1 740 | pad=0 741 | activation=none 742 | 743 | #94 744 | [convolutional] 745 | batch_normalize=1 746 | filters=56 747 | size=3 748 | stride=1 749 | group=56 750 | pad=1 751 | activation=none 752 | 753 | #95 754 | [route] 755 | layers=-1,93 756 | 757 | #96 758 | [shortcut] 759 | from=-8 760 | activation=none 761 | 762 | #GB12-PConv #97 763 | [convolutional] 764 | batch_normalize=1 765 | filters=336 766 | size=1 767 | stride=1 768 | group=1 769 | pad=0 770 | activation=relu 771 | 772 | #GB12-Cheap #98 773 | [convolutional] 774 | batch_normalize=1 775 | filters=336 776 | size=3 777 | stride=1 778 | group=336 779 | pad=1 780 | activation=relu 781 | 782 | #99 783 | [route] 784 | layers=-1,97 785 | 786 | #100 787 | [convolutional] 788 | batch_normalize=1 789 | filters=672 790 | size=5 791 | stride=2 792 | group=672 793 | pad=2 794 | activation=none 795 | 796 | #101 797 | [se] 798 | reduction=4 799 | 800 | #102 801 | [convolutional] 802 | batch_normalize=1 803 | filters=80 804 | size=1 805 | stride=1 806 | group=1 807 | pad=0 808 | activation=none 809 | 810 | #103 811 | [convolutional] 812 | batch_normalize=1 813 | filters=80 814 | size=3 815 | stride=1 816 | group=80 817 | pad=1 818 | activation=none 819 | 820 | #104 821 | [route] 822 | layers=-1,102 823 | 824 | #105 825 | [route] 826 | layers=96 827 | 828 | #106 829 | [convolutional] 830 | batch_normalize=1 831 | filters=112 832 | size=5 833 | stride=2 834 | group=112 835 | pad=2 836 | activation=none 837 | 838 | #107 839 | [convolutional] 840 | batch_normalize=1 841 | filters=160 842 | size=1 843 | stride=1 844 | group=1 845 | pad=0 846 | activation=none 847 | 848 | #108 849 | [shortcut] 850 | from=-4 851 | activation=none 852 | 853 | #GB13-PConv #109 854 | [convolutional] 855 | batch_normalize=1 856 | filters=480 857 | size=1 858 | stride=1 859 | group=1 860 | pad=0 861 | activation=relu 862 | 863 | #GB13-Cheap #110 864 | [convolutional] 865 | batch_normalize=1 866 | filters=480 867 | size=3 868 | stride=1 869 | group=480 870 | pad=1 871 | activation=relu 872 | 873 | #111 874 | [route] 875 | layers=-1,109 876 | 877 | #112 878 | [convolutional] 879 | batch_normalize=1 880 | filters=80 881 | size=1 882 | stride=1 883 | group=1 884 | pad=0 885 | activation=none 886 | 887 | #113 888 | [convolutional] 889 | batch_normalize=1 890 | filters=80 891 | size=3 892 | stride=1 893 | group=80 894 | pad=1 895 | activation=none 896 | 897 | #114 898 | [route] 899 | layers=-1,112 900 | 901 | #115 902 | [shortcut] 903 | from=-7 904 | activation=none 905 | 906 | #GB14-PConv #116 907 | [convolutional] 908 | batch_normalize=1 909 | filters=480 910 | size=1 911 | stride=1 912 | group=1 913 | pad=0 914 | activation=relu 915 | 916 | #GB14-Cheap #117 917 | [convolutional] 918 | batch_normalize=1 919 | filters=480 920 | size=3 921 | stride=1 922 | group=480 923 | pad=1 924 | activation=relu 925 | 926 | #118 927 | [route] 928 | layers=-1,116 929 | 930 | #119 931 | [se] 932 | reduction=4 933 | 934 | #120 935 | [convolutional] 936 | batch_normalize=1 937 | filters=80 938 | size=1 939 | stride=1 940 | group=1 941 | pad=0 942 | activation=none 943 | 944 | #121 945 | [convolutional] 946 | batch_normalize=1 947 | filters=80 948 | size=3 949 | stride=1 950 | group=80 951 | pad=1 952 | activation=none 953 | 954 | #122 955 | [route] 956 | layers=-1,120 957 | 958 | #123 959 | [shortcut] 960 | from=-8 961 | activation=none 962 | 963 | #GB15-PConv #124 964 | [convolutional] 965 | batch_normalize=1 966 | filters=480 967 | size=1 968 | stride=1 969 | group=1 970 | pad=0 971 | activation=relu 972 | 973 | #GB15-Cheap #125 974 | [convolutional] 975 | batch_normalize=1 976 | filters=480 977 | size=3 978 | stride=1 979 | group=480 980 | pad=1 981 | activation=relu 982 | 983 | #126 984 | [route] 985 | layers=-1,124 986 | 987 | #127 988 | [convolutional] 989 | batch_normalize=1 990 | filters=80 991 | size=1 992 | stride=1 993 | group=1 994 | pad=0 995 | activation=none 996 | 997 | #128 998 | [convolutional] 999 | batch_normalize=1 1000 | filters=80 1001 | size=3 1002 | stride=1 1003 | group=80 1004 | pad=1 1005 | activation=none 1006 | 1007 | #129 1008 | [route] 1009 | layers=-1,127 1010 | 1011 | #130 1012 | [shortcut] 1013 | from=-7 1014 | activation=none 1015 | 1016 | #GB16-PConv #131 1017 | [convolutional] 1018 | batch_normalize=1 1019 | filters=480 1020 | size=1 1021 | stride=1 1022 | group=1 1023 | pad=0 1024 | activation=relu 1025 | 1026 | #GB16-Cheap #132 1027 | [convolutional] 1028 | batch_normalize=1 1029 | filters=480 1030 | size=3 1031 | stride=1 1032 | group=480 1033 | pad=1 1034 | activation=relu 1035 | 1036 | #133 1037 | [route] 1038 | layers=-1,131 1039 | 1040 | #134 1041 | [se] 1042 | reduction=4 1043 | 1044 | #135 1045 | [convolutional] 1046 | batch_normalize=1 1047 | filters=80 1048 | size=1 1049 | stride=1 1050 | group=1 1051 | pad=0 1052 | activation=none 1053 | 1054 | #136 1055 | [convolutional] 1056 | batch_normalize=1 1057 | filters=80 1058 | size=3 1059 | stride=1 1060 | group=80 1061 | pad=1 1062 | activation=none 1063 | 1064 | #137 1065 | [route] 1066 | layers=-1,135 1067 | 1068 | #138 1069 | [shortcut] 1070 | from=-8 1071 | 1072 | #139 1073 | [convolutional] 1074 | batch_normalize=1 1075 | filters=960 1076 | size=1 1077 | stride=1 1078 | group=1 1079 | pad=0 1080 | activation=relu 1081 | 1082 | 1083 | 1084 | 1085 | #######Backbone结束 1086 | 1087 | #140 1088 | [convolutional] 1089 | batch_normalize=1 1090 | filters=512 1091 | size=1 1092 | stride=1 1093 | pad=0 1094 | group=1 1095 | activation=leaky 1096 | 1097 | #141 1098 | [convolutional] 1099 | batch_normalize=1 1100 | size=3 1101 | stride=1 1102 | pad=1 1103 | group=1 1104 | filters=1024 1105 | activation=leaky 1106 | 1107 | #142 1108 | [convolutional] 1109 | batch_normalize=1 1110 | group=1 1111 | filters=512 1112 | size=1 1113 | stride=1 1114 | pad=0 1115 | activation=leaky 1116 | 1117 | #143 1118 | [convolutional] 1119 | batch_normalize=1 1120 | size=3 1121 | stride=1 1122 | pad=1 1123 | group=1 1124 | filters=1024 1125 | activation=leaky 1126 | 1127 | #144 1128 | [convolutional] 1129 | batch_normalize=1 1130 | filters=512 1131 | size=1 1132 | stride=1 1133 | pad=0 1134 | group=1 1135 | activation=leaky 1136 | 1137 | #145 1138 | [convolutional] 1139 | batch_normalize=1 1140 | size=3 1141 | stride=1 1142 | pad=1 1143 | filters=1024 1144 | group=1 1145 | activation=leaky 1146 | 1147 | #146 1148 | [convolutional] 1149 | size=1 1150 | stride=1 1151 | pad=0 1152 | filters=45 1153 | group=1 1154 | activation=linear 1155 | 1156 | #147 1157 | [yolo] 1158 | mask = 6,7,8 1159 | anchors = 4,5, 6,10, 14,9, 11,18, 25,15, 21,30, 47,26, 37,53, 87,65 1160 | classes=10 1161 | num=9 1162 | jitter=.3 1163 | ignore_thresh = .7 1164 | truth_thresh = 1 1165 | random=1 1166 | 1167 | #148 1168 | [route] 1169 | layers = -4 1170 | 1171 | #149 1172 | [convolutional] 1173 | batch_normalize=1 1174 | filters=256 1175 | size=1 1176 | stride=1 1177 | group=1 1178 | pad=0 1179 | activation=leaky 1180 | 1181 | #150 1182 | [upsample] 1183 | stride=2 1184 | 1185 | #151 1186 | [route] 1187 | layers = -1, 96 1188 | 1189 | #152 1190 | [convolutional] 1191 | batch_normalize=1 1192 | filters=256 1193 | size=1 1194 | group=1 1195 | stride=1 1196 | pad=0 1197 | activation=leaky 1198 | 1199 | #153 1200 | [convolutional] 1201 | batch_normalize=1 1202 | size=3 1203 | stride=1 1204 | pad=1 1205 | group=1 1206 | filters=512 1207 | activation=leaky 1208 | 1209 | #154 1210 | [convolutional] 1211 | batch_normalize=1 1212 | filters=256 1213 | size=1 1214 | group=1 1215 | stride=1 1216 | pad=0 1217 | activation=leaky 1218 | 1219 | #155 1220 | [convolutional] 1221 | batch_normalize=1 1222 | size=3 1223 | group=1 1224 | stride=1 1225 | pad=1 1226 | filters=512 1227 | activation=leaky 1228 | 1229 | #156 1230 | [convolutional] 1231 | batch_normalize=1 1232 | filters=256 1233 | size=1 1234 | stride=1 1235 | pad=0 1236 | group=1 1237 | activation=leaky 1238 | 1239 | #157 1240 | [convolutional] 1241 | batch_normalize=1 1242 | size=3 1243 | stride=1 1244 | pad=1 1245 | filters=512 1246 | group=1 1247 | activation=leaky 1248 | 1249 | #158 1250 | [convolutional] 1251 | size=1 1252 | stride=1 1253 | pad=0 1254 | group=1 1255 | filters=45 1256 | activation=linear 1257 | 1258 | #159 1259 | [yolo] 1260 | mask = 3,4,5 1261 | anchors = 4,5, 6,10, 14,9, 11,18, 25,15, 21,30, 47,26, 37,53, 87,65 1262 | classes=10 1263 | num=9 1264 | jitter=.3 1265 | ignore_thresh = .7 1266 | truth_thresh = 1 1267 | random=1 1268 | 1269 | 1270 | #160 1271 | [route] 1272 | layers = -4 1273 | 1274 | #161 1275 | [convolutional] 1276 | batch_normalize=1 1277 | filters=128 1278 | size=1 1279 | stride=1 1280 | pad=0 1281 | group=1 1282 | activation=leaky 1283 | 1284 | #162 1285 | [upsample] 1286 | stride=2 1287 | 1288 | #163 1289 | [route] 1290 | layers = -1, 45 1291 | 1292 | #164 1293 | [convolutional] 1294 | batch_normalize=1 1295 | filters=128 1296 | size=1 1297 | group=1 1298 | stride=1 1299 | pad=0 1300 | activation=leaky 1301 | 1302 | #165 1303 | [convolutional] 1304 | batch_normalize=1 1305 | size=3 1306 | stride=1 1307 | pad=1 1308 | group=1 1309 | filters=256 1310 | activation=leaky 1311 | 1312 | #166 1313 | [convolutional] 1314 | batch_normalize=1 1315 | filters=128 1316 | size=1 1317 | group=1 1318 | stride=1 1319 | pad=0 1320 | activation=leaky 1321 | 1322 | #167 1323 | [convolutional] 1324 | batch_normalize=1 1325 | size=3 1326 | stride=1 1327 | pad=1 1328 | group=1 1329 | filters=256 1330 | activation=leaky 1331 | 1332 | #168 1333 | [convolutional] 1334 | batch_normalize=1 1335 | filters=128 1336 | size=1 1337 | group=1 1338 | stride=1 1339 | pad=0 1340 | activation=leaky 1341 | 1342 | #169 1343 | [convolutional] 1344 | batch_normalize=1 1345 | size=3 1346 | stride=1 1347 | pad=1 1348 | group=1 1349 | filters=256 1350 | activation=leaky 1351 | 1352 | #170 1353 | [convolutional] 1354 | size=1 1355 | stride=1 1356 | pad=0 1357 | group=1 1358 | filters=45 1359 | activation=linear 1360 | 1361 | #171 1362 | [yolo] 1363 | mask = 0,1,2 1364 | anchors = 4,5, 6,10, 14,9, 11,18, 25,15, 21,30, 47,26, 37,53, 87,65 1365 | classes=10 1366 | num=9 1367 | jitter=.3 1368 | ignore_thresh = .7 1369 | truth_thresh = 1 1370 | random=1 1371 | 1372 | -------------------------------------------------------------------------------- /cfg/yolov3-hand.cfg: -------------------------------------------------------------------------------- 1 | 2 | [net] 3 | # Testing 4 | #batch=1 5 | #subdivisions=1 6 | # Training 7 | batch=16 8 | subdivisions=1 9 | width=416 10 | height=416 11 | channels=3 12 | momentum=0.9 13 | decay=0.0005 14 | angle=0 15 | saturation = 1.5 16 | exposure = 1.5 17 | hue=.1 18 | 19 | learning_rate=0.001 20 | burn_in=1000 21 | max_batches = 500200 22 | policy=steps 23 | steps=400000,450000 24 | scales=.1,.1 25 | 26 | [convolutional] 27 | batch_normalize=1 28 | filters=32 29 | size=3 30 | stride=1 31 | pad=1 32 | activation=leaky 33 | 34 | # Downsample 35 | 36 | [convolutional] 37 | batch_normalize=1 38 | filters=64 39 | size=3 40 | stride=2 41 | pad=1 42 | activation=leaky 43 | 44 | [convolutional] 45 | batch_normalize=1 46 | filters=32 47 | size=1 48 | stride=1 49 | pad=1 50 | activation=leaky 51 | 52 | [convolutional] 53 | batch_normalize=1 54 | filters=64 55 | size=3 56 | stride=1 57 | pad=1 58 | activation=leaky 59 | 60 | [shortcut] 61 | from=-3 62 | activation=linear 63 | 64 | # Downsample 65 | 66 | [convolutional] 67 | batch_normalize=1 68 | filters=128 69 | size=3 70 | stride=2 71 | pad=1 72 | activation=leaky 73 | 74 | [convolutional] 75 | batch_normalize=1 76 | filters=64 77 | size=1 78 | stride=1 79 | pad=1 80 | activation=leaky 81 | 82 | [convolutional] 83 | batch_normalize=1 84 | filters=128 85 | size=3 86 | stride=1 87 | pad=1 88 | activation=leaky 89 | 90 | [shortcut] 91 | from=-3 92 | activation=linear 93 | 94 | [convolutional] 95 | batch_normalize=1 96 | filters=64 97 | size=1 98 | stride=1 99 | pad=1 100 | activation=leaky 101 | 102 | [convolutional] 103 | batch_normalize=1 104 | filters=128 105 | size=3 106 | stride=1 107 | pad=1 108 | activation=leaky 109 | 110 | [shortcut] 111 | from=-3 112 | activation=linear 113 | 114 | # Downsample 115 | 116 | [convolutional] 117 | batch_normalize=1 118 | filters=256 119 | size=3 120 | stride=2 121 | pad=1 122 | activation=leaky 123 | 124 | [convolutional] 125 | batch_normalize=1 126 | filters=128 127 | size=1 128 | stride=1 129 | pad=1 130 | activation=leaky 131 | 132 | [convolutional] 133 | batch_normalize=1 134 | filters=256 135 | size=3 136 | stride=1 137 | pad=1 138 | activation=leaky 139 | 140 | [shortcut] 141 | from=-3 142 | activation=linear 143 | 144 | [convolutional] 145 | batch_normalize=1 146 | filters=128 147 | size=1 148 | stride=1 149 | pad=1 150 | activation=leaky 151 | 152 | [convolutional] 153 | batch_normalize=1 154 | filters=256 155 | size=3 156 | stride=1 157 | pad=1 158 | activation=leaky 159 | 160 | [shortcut] 161 | from=-3 162 | activation=linear 163 | 164 | [convolutional] 165 | batch_normalize=1 166 | filters=128 167 | size=1 168 | stride=1 169 | pad=1 170 | activation=leaky 171 | 172 | [convolutional] 173 | batch_normalize=1 174 | filters=256 175 | size=3 176 | stride=1 177 | pad=1 178 | activation=leaky 179 | 180 | [shortcut] 181 | from=-3 182 | activation=linear 183 | 184 | [convolutional] 185 | batch_normalize=1 186 | filters=128 187 | size=1 188 | stride=1 189 | pad=1 190 | activation=leaky 191 | 192 | [convolutional] 193 | batch_normalize=1 194 | filters=256 195 | size=3 196 | stride=1 197 | pad=1 198 | activation=leaky 199 | 200 | [shortcut] 201 | from=-3 202 | activation=linear 203 | 204 | 205 | [convolutional] 206 | batch_normalize=1 207 | filters=128 208 | size=1 209 | stride=1 210 | pad=1 211 | activation=leaky 212 | 213 | [convolutional] 214 | batch_normalize=1 215 | filters=256 216 | size=3 217 | stride=1 218 | pad=1 219 | activation=leaky 220 | 221 | [shortcut] 222 | from=-3 223 | activation=linear 224 | 225 | [convolutional] 226 | batch_normalize=1 227 | filters=128 228 | size=1 229 | stride=1 230 | pad=1 231 | activation=leaky 232 | 233 | [convolutional] 234 | batch_normalize=1 235 | filters=256 236 | size=3 237 | stride=1 238 | pad=1 239 | activation=leaky 240 | 241 | [shortcut] 242 | from=-3 243 | activation=linear 244 | 245 | [convolutional] 246 | batch_normalize=1 247 | filters=128 248 | size=1 249 | stride=1 250 | pad=1 251 | activation=leaky 252 | 253 | [convolutional] 254 | batch_normalize=1 255 | filters=256 256 | size=3 257 | stride=1 258 | pad=1 259 | activation=leaky 260 | 261 | [shortcut] 262 | from=-3 263 | activation=linear 264 | 265 | [convolutional] 266 | batch_normalize=1 267 | filters=128 268 | size=1 269 | stride=1 270 | pad=1 271 | activation=leaky 272 | 273 | [convolutional] 274 | batch_normalize=1 275 | filters=256 276 | size=3 277 | stride=1 278 | pad=1 279 | activation=leaky 280 | 281 | [shortcut] 282 | from=-3 283 | activation=linear 284 | 285 | # Downsample 286 | 287 | [convolutional] 288 | batch_normalize=1 289 | filters=512 290 | size=3 291 | stride=2 292 | pad=1 293 | activation=leaky 294 | 295 | [convolutional] 296 | batch_normalize=1 297 | filters=256 298 | size=1 299 | stride=1 300 | pad=1 301 | activation=leaky 302 | 303 | [convolutional] 304 | batch_normalize=1 305 | filters=512 306 | size=3 307 | stride=1 308 | pad=1 309 | activation=leaky 310 | 311 | [shortcut] 312 | from=-3 313 | activation=linear 314 | 315 | 316 | [convolutional] 317 | batch_normalize=1 318 | filters=256 319 | size=1 320 | stride=1 321 | pad=1 322 | activation=leaky 323 | 324 | [convolutional] 325 | batch_normalize=1 326 | filters=512 327 | size=3 328 | stride=1 329 | pad=1 330 | activation=leaky 331 | 332 | [shortcut] 333 | from=-3 334 | activation=linear 335 | 336 | 337 | [convolutional] 338 | batch_normalize=1 339 | filters=256 340 | size=1 341 | stride=1 342 | pad=1 343 | activation=leaky 344 | 345 | [convolutional] 346 | batch_normalize=1 347 | filters=512 348 | size=3 349 | stride=1 350 | pad=1 351 | activation=leaky 352 | 353 | [shortcut] 354 | from=-3 355 | activation=linear 356 | 357 | 358 | [convolutional] 359 | batch_normalize=1 360 | filters=256 361 | size=1 362 | stride=1 363 | pad=1 364 | activation=leaky 365 | 366 | [convolutional] 367 | batch_normalize=1 368 | filters=512 369 | size=3 370 | stride=1 371 | pad=1 372 | activation=leaky 373 | 374 | [shortcut] 375 | from=-3 376 | activation=linear 377 | 378 | [convolutional] 379 | batch_normalize=1 380 | filters=256 381 | size=1 382 | stride=1 383 | pad=1 384 | activation=leaky 385 | 386 | [convolutional] 387 | batch_normalize=1 388 | filters=512 389 | size=3 390 | stride=1 391 | pad=1 392 | activation=leaky 393 | 394 | [shortcut] 395 | from=-3 396 | activation=linear 397 | 398 | 399 | [convolutional] 400 | batch_normalize=1 401 | filters=256 402 | size=1 403 | stride=1 404 | pad=1 405 | activation=leaky 406 | 407 | [convolutional] 408 | batch_normalize=1 409 | filters=512 410 | size=3 411 | stride=1 412 | pad=1 413 | activation=leaky 414 | 415 | [shortcut] 416 | from=-3 417 | activation=linear 418 | 419 | 420 | [convolutional] 421 | batch_normalize=1 422 | filters=256 423 | size=1 424 | stride=1 425 | pad=1 426 | activation=leaky 427 | 428 | [convolutional] 429 | batch_normalize=1 430 | filters=512 431 | size=3 432 | stride=1 433 | pad=1 434 | activation=leaky 435 | 436 | [shortcut] 437 | from=-3 438 | activation=linear 439 | 440 | [convolutional] 441 | batch_normalize=1 442 | filters=256 443 | size=1 444 | stride=1 445 | pad=1 446 | activation=leaky 447 | 448 | [convolutional] 449 | batch_normalize=1 450 | filters=512 451 | size=3 452 | stride=1 453 | pad=1 454 | activation=leaky 455 | 456 | [shortcut] 457 | from=-3 458 | activation=linear 459 | 460 | # Downsample 461 | 462 | [convolutional] 463 | batch_normalize=1 464 | filters=1024 465 | size=3 466 | stride=2 467 | pad=1 468 | activation=leaky 469 | 470 | [convolutional] 471 | batch_normalize=1 472 | filters=512 473 | size=1 474 | stride=1 475 | pad=1 476 | activation=leaky 477 | 478 | [convolutional] 479 | batch_normalize=1 480 | filters=1024 481 | size=3 482 | stride=1 483 | pad=1 484 | activation=leaky 485 | 486 | [shortcut] 487 | from=-3 488 | activation=linear 489 | 490 | [convolutional] 491 | batch_normalize=1 492 | filters=512 493 | size=1 494 | stride=1 495 | pad=1 496 | activation=leaky 497 | 498 | [convolutional] 499 | batch_normalize=1 500 | filters=1024 501 | size=3 502 | stride=1 503 | pad=1 504 | activation=leaky 505 | 506 | [shortcut] 507 | from=-3 508 | activation=linear 509 | 510 | [convolutional] 511 | batch_normalize=1 512 | filters=512 513 | size=1 514 | stride=1 515 | pad=1 516 | activation=leaky 517 | 518 | [convolutional] 519 | batch_normalize=1 520 | filters=1024 521 | size=3 522 | stride=1 523 | pad=1 524 | activation=leaky 525 | 526 | [shortcut] 527 | from=-3 528 | activation=linear 529 | 530 | [convolutional] 531 | batch_normalize=1 532 | filters=512 533 | size=1 534 | stride=1 535 | pad=1 536 | activation=leaky 537 | 538 | [convolutional] 539 | batch_normalize=1 540 | filters=1024 541 | size=3 542 | stride=1 543 | pad=1 544 | activation=leaky 545 | 546 | [shortcut] 547 | from=-3 548 | activation=linear 549 | 550 | ###################### 551 | 552 | [convolutional] 553 | batch_normalize=1 554 | filters=512 555 | size=1 556 | stride=1 557 | pad=1 558 | activation=leaky 559 | 560 | [convolutional] 561 | batch_normalize=1 562 | size=3 563 | stride=1 564 | pad=1 565 | filters=1024 566 | activation=leaky 567 | 568 | [convolutional] 569 | batch_normalize=1 570 | filters=512 571 | size=1 572 | stride=1 573 | pad=1 574 | activation=leaky 575 | 576 | [convolutional] 577 | batch_normalize=1 578 | size=3 579 | stride=1 580 | pad=1 581 | filters=1024 582 | activation=leaky 583 | 584 | [convolutional] 585 | batch_normalize=1 586 | filters=512 587 | size=1 588 | stride=1 589 | pad=1 590 | activation=leaky 591 | 592 | [convolutional] 593 | batch_normalize=1 594 | size=3 595 | stride=1 596 | pad=1 597 | filters=1024 598 | activation=leaky 599 | 600 | [convolutional] 601 | size=1 602 | stride=1 603 | pad=1 604 | filters=18 605 | activation=linear 606 | 607 | 608 | [yolo] 609 | mask = 6,7,8 610 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 611 | classes=1 612 | num=9 613 | jitter=.3 614 | ignore_thresh = .7 615 | truth_thresh = 1 616 | random=1 617 | 618 | 619 | [route] 620 | layers = -4 621 | 622 | [convolutional] 623 | batch_normalize=1 624 | filters=256 625 | size=1 626 | stride=1 627 | pad=1 628 | activation=leaky 629 | 630 | [upsample] 631 | stride=2 632 | 633 | [route] 634 | layers = -1, 61 635 | 636 | 637 | 638 | [convolutional] 639 | batch_normalize=1 640 | filters=256 641 | size=1 642 | stride=1 643 | pad=1 644 | activation=leaky 645 | 646 | [convolutional] 647 | batch_normalize=1 648 | size=3 649 | stride=1 650 | pad=1 651 | filters=512 652 | activation=leaky 653 | 654 | [convolutional] 655 | batch_normalize=1 656 | filters=256 657 | size=1 658 | stride=1 659 | pad=1 660 | activation=leaky 661 | 662 | [convolutional] 663 | batch_normalize=1 664 | size=3 665 | stride=1 666 | pad=1 667 | filters=512 668 | activation=leaky 669 | 670 | [convolutional] 671 | batch_normalize=1 672 | filters=256 673 | size=1 674 | stride=1 675 | pad=1 676 | activation=leaky 677 | 678 | [convolutional] 679 | batch_normalize=1 680 | size=3 681 | stride=1 682 | pad=1 683 | filters=512 684 | activation=leaky 685 | 686 | [convolutional] 687 | size=1 688 | stride=1 689 | pad=1 690 | filters=18 691 | activation=linear 692 | 693 | 694 | [yolo] 695 | mask = 3,4,5 696 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 697 | classes=1 698 | num=9 699 | jitter=.3 700 | ignore_thresh = .7 701 | truth_thresh = 1 702 | random=1 703 | 704 | 705 | 706 | [route] 707 | layers = -4 708 | 709 | [convolutional] 710 | batch_normalize=1 711 | filters=128 712 | size=1 713 | stride=1 714 | pad=1 715 | activation=leaky 716 | 717 | [upsample] 718 | stride=2 719 | 720 | [route] 721 | layers = -1, 36 722 | 723 | 724 | 725 | [convolutional] 726 | batch_normalize=1 727 | filters=128 728 | size=1 729 | stride=1 730 | pad=1 731 | activation=leaky 732 | 733 | [convolutional] 734 | batch_normalize=1 735 | size=3 736 | stride=1 737 | pad=1 738 | filters=256 739 | activation=leaky 740 | 741 | [convolutional] 742 | batch_normalize=1 743 | filters=128 744 | size=1 745 | stride=1 746 | pad=1 747 | activation=leaky 748 | 749 | [convolutional] 750 | batch_normalize=1 751 | size=3 752 | stride=1 753 | pad=1 754 | filters=256 755 | activation=leaky 756 | 757 | [convolutional] 758 | batch_normalize=1 759 | filters=128 760 | size=1 761 | stride=1 762 | pad=1 763 | activation=leaky 764 | 765 | [convolutional] 766 | batch_normalize=1 767 | size=3 768 | stride=1 769 | pad=1 770 | filters=256 771 | activation=leaky 772 | 773 | [convolutional] 774 | size=1 775 | stride=1 776 | pad=1 777 | filters=18 778 | activation=linear 779 | 780 | 781 | [yolo] 782 | mask = 0,1,2 783 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 784 | classes=1 785 | num=9 786 | jitter=.3 787 | ignore_thresh = .7 788 | truth_thresh = 1 789 | random=1 790 | 791 | -------------------------------------------------------------------------------- /cfg/yolov3-shufflenetv2-hand.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | batch=1 4 | subdivisions=1 5 | # Training 6 | batch=16 7 | subdivisions=1 8 | width=416 9 | height=416 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=15,25,60,99,150,160,180 23 | scales=0.5,0.5,0.1,0.5,0.5,0.1,0.1 24 | 25 | # 0 26 | [convolutional] 27 | batch_normalize=1 28 | filters=24 29 | size=3 30 | stride=2 31 | pad=1 32 | activation=HardSwish 33 | 34 | #1 35 | [maxpool] 36 | size=3 37 | stride=2 38 | 39 | # 2 40 | [stage2] 41 | out_channels=176 42 | 43 | [stage3] 44 | out_channels=352 45 | 46 | [eca] 47 | kernel_size=16 48 | 49 | [stage4] 50 | out_channels=704 51 | 52 | [eca] 53 | kernel_size=16 54 | 55 | [convolutional] 56 | batch_normalize=1 57 | filters=1024 58 | size=1 59 | stride=1 60 | pad=1 61 | activation=HardSwish 62 | 63 | [eca] 64 | kernel_size=16 65 | 66 | 67 | 68 | ########### 69 | 70 | 71 | [convolutional] 72 | batch_normalize=1 73 | filters=256 74 | size=1 75 | stride=1 76 | pad=1 77 | activation=leaky 78 | 79 | [convolutional] 80 | batch_normalize=1 81 | filters=512 82 | size=3 83 | stride=1 84 | pad=1 85 | activation=leaky 86 | 87 | 88 | [convolutional] 89 | size=1 90 | stride=1 91 | pad=1 92 | filters=18 93 | activation=linear 94 | 95 | 96 | [yolo] 97 | mask = 3,4,5 98 | anchors = 16,19, 28,30, 40,42, 58,57, 85,85, 154,152 99 | classes=1 100 | num=6 101 | jitter=.3 102 | ignore_thresh = .7 103 | truth_thresh = 1 104 | random=1 105 | 106 | [route] 107 | layers = -4 108 | 109 | # 18 110 | [convolutional] 111 | batch_normalize=1 112 | filters=128 113 | size=1 114 | stride=1 115 | pad=1 116 | activation=leaky 117 | 118 | # 19 119 | [upsample] 120 | stride=2 121 | 122 | # 20 123 | [route] 124 | layers = -1, 3 125 | # 21 126 | [convolutional] 127 | batch_normalize=1 128 | filters=256 129 | size=3 130 | stride=1 131 | pad=1 132 | activation=leaky 133 | 134 | # 22 135 | [convolutional] 136 | size=1 137 | stride=1 138 | pad=1 139 | filters=18 140 | activation=linear 141 | 142 | # 23 143 | [yolo] 144 | mask = 0,1,2 145 | anchors = 16,19, 28,30, 40,42, 58,57, 85,85, 154,152 146 | classes=1 147 | num=6 148 | jitter=.3 149 | ignore_thresh = .7 150 | truth_thresh = 1 151 | random=1 152 | 153 | 154 | 155 | 156 | 157 | 158 | -------------------------------------------------------------------------------- /cfg/yolov3-shufflenetv2-visdrone.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | batch=1 4 | subdivisions=1 5 | # Training 6 | batch=16 7 | subdivisions=1 8 | width=416 9 | height=416 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=15,25,60,99,150,160,180 23 | scales=0.5,0.5,0.1,0.5,0.5,0.1,0.1 24 | 25 | # 0 26 | [convolutional] 27 | batch_normalize=1 28 | filters=24 29 | size=3 30 | stride=2 31 | pad=1 32 | activation=HardSwish 33 | 34 | #1 35 | [maxpool] 36 | size=3 37 | stride=2 38 | 39 | # 2 40 | [stage2] 41 | out_channels=176 42 | 43 | [stage3] 44 | out_channels=352 45 | 46 | [eca] 47 | kernel_size=16 48 | 49 | [stage4] 50 | out_channels=704 51 | 52 | [eca] 53 | kernel_size=16 54 | 55 | [convolutional] 56 | batch_normalize=1 57 | filters=1024 58 | size=1 59 | stride=1 60 | pad=1 61 | activation=HardSwish 62 | 63 | [eca] 64 | kernel_size=16 65 | 66 | 67 | 68 | ########### 69 | 70 | 71 | [convolutional] 72 | batch_normalize=1 73 | filters=256 74 | size=1 75 | stride=1 76 | pad=1 77 | activation=leaky 78 | 79 | [convolutional] 80 | batch_normalize=1 81 | filters=512 82 | size=3 83 | stride=1 84 | pad=1 85 | activation=leaky 86 | 87 | 88 | [convolutional] 89 | size=1 90 | stride=1 91 | pad=1 92 | filters=45 93 | activation=linear 94 | 95 | 96 | [yolo] 97 | mask = 3,4,5 98 | anchors = 5,6, 9,15, 18,11, 21,32, 38,23, 67,58 99 | classes=10 100 | num=6 101 | jitter=.3 102 | ignore_thresh = .7 103 | truth_thresh = 1 104 | random=1 105 | 106 | [route] 107 | layers = -4 108 | 109 | # 18 110 | [convolutional] 111 | batch_normalize=1 112 | filters=128 113 | size=1 114 | stride=1 115 | pad=1 116 | activation=leaky 117 | 118 | # 19 119 | [upsample] 120 | stride=2 121 | 122 | # 20 123 | [route] 124 | layers = -1, 3 125 | # 21 126 | [convolutional] 127 | batch_normalize=1 128 | filters=256 129 | size=3 130 | stride=1 131 | pad=1 132 | activation=leaky 133 | 134 | # 22 135 | [convolutional] 136 | size=1 137 | stride=1 138 | pad=1 139 | filters=45 140 | activation=linear 141 | 142 | # 23 143 | [yolo] 144 | mask = 0,1,2 145 | anchors = 5,6, 9,15, 18,11, 21,32, 38,23, 67,58 146 | classes=10 147 | num=6 148 | jitter=.3 149 | ignore_thresh = .7 150 | truth_thresh = 1 151 | random=1 152 | 153 | 154 | 155 | 156 | 157 | 158 | -------------------------------------------------------------------------------- /cfg/yolov3-tiny-hand-cbam.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | batch=1 4 | subdivisions=1 5 | # Training 6 | batch=16 7 | subdivisions=1 8 | width=416 9 | height=416 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=15,25,60,99,150,160,180 23 | scales=0.5,0.5,0.1,0.5,0.5,0.1,0.1 24 | 25 | # 0 26 | [convolutional] 27 | batch_normalize=1 28 | filters=16 29 | size=3 30 | stride=1 31 | pad=1 32 | activation=leaky 33 | 34 | # 1 35 | [maxpool] 36 | size=2 37 | stride=2 38 | 39 | # 2 40 | [convolutional] 41 | batch_normalize=1 42 | filters=32 43 | size=3 44 | stride=1 45 | pad=1 46 | activation=leaky 47 | 48 | # 3 49 | [maxpool] 50 | size=2 51 | stride=2 52 | 53 | # 4 54 | [convolutional] 55 | batch_normalize=1 56 | filters=64 57 | size=3 58 | stride=1 59 | pad=1 60 | activation=leaky 61 | 62 | # 5 63 | [maxpool] 64 | size=2 65 | stride=2 66 | 67 | # 6 68 | [convolutional] 69 | batch_normalize=1 70 | filters=128 71 | size=3 72 | stride=1 73 | pad=1 74 | activation=leaky 75 | 76 | # 7 77 | [maxpool] 78 | size=2 79 | stride=2 80 | 81 | # 8 82 | [convolutional] 83 | batch_normalize=1 84 | filters=256 85 | size=3 86 | stride=1 87 | pad=1 88 | activation=leaky 89 | 90 | # 9 91 | [maxpool] 92 | size=2 93 | stride=2 94 | 95 | # 10 96 | [convolutional] 97 | batch_normalize=1 98 | filters=512 99 | size=3 100 | stride=1 101 | pad=1 102 | activation=leaky 103 | 104 | # 11 105 | [maxpool] 106 | size=2 107 | stride=1 108 | 109 | # 12 110 | [convolutional] 111 | batch_normalize=1 112 | filters=1024 113 | size=3 114 | stride=1 115 | pad=1 116 | activation=leaky 117 | 118 | [ca] 119 | ratio=16 120 | 121 | [sa] 122 | kernelsize=7 123 | 124 | ########### 125 | 126 | # 13 127 | [convolutional] 128 | batch_normalize=1 129 | filters=256 130 | size=1 131 | stride=1 132 | pad=1 133 | activation=leaky 134 | 135 | # 14 136 | [convolutional] 137 | batch_normalize=1 138 | filters=512 139 | size=3 140 | stride=1 141 | pad=1 142 | activation=leaky 143 | 144 | # 15 145 | [convolutional] 146 | size=1 147 | stride=1 148 | pad=1 149 | filters=18 150 | activation=linear 151 | 152 | 153 | 154 | # 16 155 | [yolo] 156 | mask = 3,4,5 157 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 158 | classes=1 159 | num=6 160 | jitter=.3 161 | ignore_thresh = .7 162 | truth_thresh = 1 163 | random=1 164 | 165 | # 17 166 | [route] 167 | layers = -4 168 | 169 | # 18 170 | [convolutional] 171 | batch_normalize=1 172 | filters=128 173 | size=1 174 | stride=1 175 | pad=1 176 | activation=leaky 177 | 178 | # 19 179 | [upsample] 180 | stride=2 181 | 182 | # 20 183 | [route] 184 | layers = -1, 8 185 | 186 | # 21 187 | [convolutional] 188 | batch_normalize=1 189 | filters=256 190 | size=3 191 | stride=1 192 | pad=1 193 | activation=leaky 194 | 195 | # 22 196 | [convolutional] 197 | size=1 198 | stride=1 199 | pad=1 200 | filters=18 201 | activation=linear 202 | 203 | # 23 204 | [yolo] 205 | mask = 1,2,3 206 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 207 | classes=1 208 | num=6 209 | jitter=.3 210 | ignore_thresh = .7 211 | truth_thresh = 1 212 | random=1 213 | -------------------------------------------------------------------------------- /cfg/yolov3-tiny-hand-eca.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | batch=1 4 | subdivisions=1 5 | # Training 6 | batch=16 7 | subdivisions=1 8 | width=416 9 | height=416 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=15,25,60,99,150,160,180 23 | scales=0.5,0.5,0.1,0.5,0.5,0.1,0.1 24 | 25 | # 0 26 | [convolutional] 27 | batch_normalize=1 28 | filters=16 29 | size=3 30 | stride=1 31 | pad=1 32 | activation=leaky 33 | 34 | # 1 35 | [maxpool] 36 | size=2 37 | stride=2 38 | 39 | # 2 40 | [convolutional] 41 | batch_normalize=1 42 | filters=32 43 | size=3 44 | stride=1 45 | pad=1 46 | activation=leaky 47 | 48 | # 3 49 | [maxpool] 50 | size=2 51 | stride=2 52 | 53 | # 4 54 | [convolutional] 55 | batch_normalize=1 56 | filters=64 57 | size=3 58 | stride=1 59 | pad=1 60 | activation=leaky 61 | 62 | # 5 63 | [maxpool] 64 | size=2 65 | stride=2 66 | 67 | # 6 68 | [convolutional] 69 | batch_normalize=1 70 | filters=128 71 | size=3 72 | stride=1 73 | pad=1 74 | activation=leaky 75 | 76 | # 7 77 | [maxpool] 78 | size=2 79 | stride=2 80 | 81 | # 8 82 | [convolutional] 83 | batch_normalize=1 84 | filters=256 85 | size=3 86 | stride=1 87 | pad=1 88 | activation=leaky 89 | 90 | # 9 91 | [maxpool] 92 | size=2 93 | stride=2 94 | 95 | # 10 96 | [convolutional] 97 | batch_normalize=1 98 | filters=512 99 | size=3 100 | stride=1 101 | pad=1 102 | activation=leaky 103 | 104 | # 11 105 | [maxpool] 106 | size=2 107 | stride=1 108 | 109 | # 12 110 | [convolutional] 111 | batch_normalize=1 112 | filters=1024 113 | size=3 114 | stride=1 115 | pad=1 116 | activation=leaky 117 | 118 | [eca] 119 | kernel_size=16 120 | 121 | 122 | ########### 123 | 124 | # 13 125 | [convolutional] 126 | batch_normalize=1 127 | filters=256 128 | size=1 129 | stride=1 130 | pad=1 131 | activation=leaky 132 | 133 | # 14 134 | [convolutional] 135 | batch_normalize=1 136 | filters=512 137 | size=3 138 | stride=1 139 | pad=1 140 | activation=leaky 141 | 142 | # 15 143 | [convolutional] 144 | size=1 145 | stride=1 146 | pad=1 147 | filters=18 148 | activation=linear 149 | 150 | 151 | 152 | # 16 153 | [yolo] 154 | mask = 3,4,5 155 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 156 | classes=1 157 | num=6 158 | jitter=.3 159 | ignore_thresh = .7 160 | truth_thresh = 1 161 | random=1 162 | 163 | # 17 164 | [route] 165 | layers = -4 166 | 167 | # 18 168 | [convolutional] 169 | batch_normalize=1 170 | filters=128 171 | size=1 172 | stride=1 173 | pad=1 174 | activation=leaky 175 | 176 | # 19 177 | [upsample] 178 | stride=2 179 | 180 | # 20 181 | [route] 182 | layers = -1, 8 183 | 184 | # 21 185 | [convolutional] 186 | batch_normalize=1 187 | filters=256 188 | size=3 189 | stride=1 190 | pad=1 191 | activation=leaky 192 | 193 | # 22 194 | [convolutional] 195 | size=1 196 | stride=1 197 | pad=1 198 | filters=18 199 | activation=linear 200 | 201 | # 23 202 | [yolo] 203 | mask = 1,2,3 204 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 205 | classes=1 206 | num=6 207 | jitter=.3 208 | ignore_thresh = .7 209 | truth_thresh = 1 210 | random=1 211 | -------------------------------------------------------------------------------- /cfg/yolov3-tiny-hand-se.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | batch=1 4 | subdivisions=1 5 | # Training 6 | batch=16 7 | subdivisions=1 8 | width=416 9 | height=416 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=15,25,60,99,150,160,180 23 | scales=0.5,0.5,0.1,0.5,0.5,0.1,0.1 24 | 25 | # 0 26 | [convolutional] 27 | batch_normalize=1 28 | filters=16 29 | size=3 30 | stride=1 31 | pad=1 32 | activation=leaky 33 | 34 | # 1 35 | [maxpool] 36 | size=2 37 | stride=2 38 | 39 | # 2 40 | [convolutional] 41 | batch_normalize=1 42 | filters=32 43 | size=3 44 | stride=1 45 | pad=1 46 | activation=leaky 47 | 48 | # 3 49 | [maxpool] 50 | size=2 51 | stride=2 52 | 53 | # 4 54 | [convolutional] 55 | batch_normalize=1 56 | filters=64 57 | size=3 58 | stride=1 59 | pad=1 60 | activation=leaky 61 | 62 | # 5 63 | [maxpool] 64 | size=2 65 | stride=2 66 | 67 | # 6 68 | [convolutional] 69 | batch_normalize=1 70 | filters=128 71 | size=3 72 | stride=1 73 | pad=1 74 | activation=leaky 75 | 76 | # 7 77 | [maxpool] 78 | size=2 79 | stride=2 80 | 81 | # 8 82 | [convolutional] 83 | batch_normalize=1 84 | filters=256 85 | size=3 86 | stride=1 87 | pad=1 88 | activation=leaky 89 | 90 | # 9 91 | [maxpool] 92 | size=2 93 | stride=2 94 | 95 | # 10 96 | [convolutional] 97 | batch_normalize=1 98 | filters=512 99 | size=3 100 | stride=1 101 | pad=1 102 | activation=leaky 103 | 104 | # 11 105 | [maxpool] 106 | size=2 107 | stride=1 108 | 109 | # 12 110 | [convolutional] 111 | batch_normalize=1 112 | filters=1024 113 | size=3 114 | stride=1 115 | pad=1 116 | activation=leaky 117 | 118 | [se] 119 | reduction=16 120 | 121 | ########### 122 | 123 | # 13 124 | [convolutional] 125 | batch_normalize=1 126 | filters=256 127 | size=1 128 | stride=1 129 | pad=1 130 | activation=leaky 131 | 132 | # 14 133 | [convolutional] 134 | batch_normalize=1 135 | filters=512 136 | size=3 137 | stride=1 138 | pad=1 139 | activation=leaky 140 | 141 | # 15 142 | [convolutional] 143 | size=1 144 | stride=1 145 | pad=1 146 | filters=18 147 | activation=linear 148 | 149 | 150 | 151 | # 16 152 | [yolo] 153 | mask = 3,4,5 154 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 155 | classes=1 156 | num=6 157 | jitter=.3 158 | ignore_thresh = .7 159 | truth_thresh = 1 160 | random=1 161 | 162 | # 17 163 | [route] 164 | layers = -4 165 | 166 | # 18 167 | [convolutional] 168 | batch_normalize=1 169 | filters=128 170 | size=1 171 | stride=1 172 | pad=1 173 | activation=leaky 174 | 175 | # 19 176 | [upsample] 177 | stride=2 178 | 179 | # 20 180 | [route] 181 | layers = -1, 8 182 | 183 | # 21 184 | [convolutional] 185 | batch_normalize=1 186 | filters=256 187 | size=3 188 | stride=1 189 | pad=1 190 | activation=leaky 191 | 192 | # 22 193 | [convolutional] 194 | size=1 195 | stride=1 196 | pad=1 197 | filters=18 198 | activation=linear 199 | 200 | # 23 201 | [yolo] 202 | mask = 1,2,3 203 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 204 | classes=1 205 | num=6 206 | jitter=.3 207 | ignore_thresh = .7 208 | truth_thresh = 1 209 | random=1 210 | -------------------------------------------------------------------------------- /cfg/yolov3-tiny-hand.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | batch=1 4 | subdivisions=1 5 | # Training 6 | batch=16 7 | subdivisions=1 8 | width=416 9 | height=416 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=15,25,60,99,150,160,180 23 | scales=0.5,0.5,0.1,0.5,0.5,0.1,0.1 24 | 25 | # 0 26 | [convolutional] 27 | batch_normalize=1 28 | filters=16 29 | size=3 30 | stride=1 31 | pad=1 32 | activation=leaky 33 | 34 | # 1 35 | [maxpool] 36 | size=2 37 | stride=2 38 | 39 | # 2 40 | [convolutional] 41 | batch_normalize=1 42 | filters=32 43 | size=3 44 | stride=1 45 | pad=1 46 | activation=leaky 47 | 48 | # 3 49 | [maxpool] 50 | size=2 51 | stride=2 52 | 53 | # 4 54 | [convolutional] 55 | batch_normalize=1 56 | filters=64 57 | size=3 58 | stride=1 59 | pad=1 60 | activation=leaky 61 | 62 | # 5 63 | [maxpool] 64 | size=2 65 | stride=2 66 | 67 | # 6 68 | [convolutional] 69 | batch_normalize=1 70 | filters=128 71 | size=3 72 | stride=1 73 | pad=1 74 | activation=leaky 75 | 76 | # 7 77 | [maxpool] 78 | size=2 79 | stride=2 80 | 81 | # 8 82 | [convolutional] 83 | batch_normalize=1 84 | filters=256 85 | size=3 86 | stride=1 87 | pad=1 88 | activation=leaky 89 | 90 | # 9 91 | [maxpool] 92 | size=2 93 | stride=2 94 | 95 | # 10 96 | [convolutional] 97 | batch_normalize=1 98 | filters=512 99 | size=3 100 | stride=1 101 | pad=1 102 | activation=leaky 103 | 104 | # 11 105 | [maxpool] 106 | size=2 107 | stride=1 108 | 109 | # 12 110 | [convolutional] 111 | batch_normalize=1 112 | filters=1024 113 | size=3 114 | stride=1 115 | pad=1 116 | activation=leaky 117 | 118 | ########### 119 | 120 | # 13 121 | [convolutional] 122 | batch_normalize=1 123 | filters=256 124 | size=1 125 | stride=1 126 | pad=1 127 | activation=leaky 128 | 129 | # 14 130 | [convolutional] 131 | batch_normalize=1 132 | filters=512 133 | size=3 134 | stride=1 135 | pad=1 136 | activation=leaky 137 | 138 | # 15 139 | [convolutional] 140 | size=1 141 | stride=1 142 | pad=1 143 | filters=18 144 | activation=linear 145 | 146 | 147 | 148 | # 16 149 | [yolo] 150 | mask = 3,4,5 151 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 152 | classes=1 153 | num=6 154 | jitter=.3 155 | ignore_thresh = .7 156 | truth_thresh = 1 157 | random=1 158 | 159 | # 17 160 | [route] 161 | layers = -4 162 | 163 | # 18 164 | [convolutional] 165 | batch_normalize=1 166 | filters=128 167 | size=1 168 | stride=1 169 | pad=1 170 | activation=leaky 171 | 172 | # 19 173 | [upsample] 174 | stride=2 175 | 176 | # 20 177 | [route] 178 | layers = -1, 8 179 | 180 | # 21 181 | [convolutional] 182 | batch_normalize=1 183 | filters=256 184 | size=3 185 | stride=1 186 | pad=1 187 | activation=leaky 188 | 189 | # 22 190 | [convolutional] 191 | size=1 192 | stride=1 193 | pad=1 194 | filters=18 195 | activation=linear 196 | 197 | # 23 198 | [yolo] 199 | mask = 1,2,3 200 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 201 | classes=1 202 | num=6 203 | jitter=.3 204 | ignore_thresh = .7 205 | truth_thresh = 1 206 | random=1 207 | -------------------------------------------------------------------------------- /data/oxfordhand.data: -------------------------------------------------------------------------------- 1 | classes= 1 2 | train=data/train.txt 3 | valid=data/valid.txt 4 | names=data/oxfordhand.names 5 | -------------------------------------------------------------------------------- /data/oxfordhand.names: -------------------------------------------------------------------------------- 1 | hand 2 | 3 | -------------------------------------------------------------------------------- /data/visdrone.data: -------------------------------------------------------------------------------- 1 | classes= 10 2 | train=data/visdrone/train.txt 3 | valid=data/visdrone/test.txt 4 | names=data/visdrone.names 5 | -------------------------------------------------------------------------------- /data/visdrone.names: -------------------------------------------------------------------------------- 1 | pedestrian 2 | people 3 | bicycle 4 | car 5 | van 6 | truck 7 | tricycle 8 | awning-tricycle 9 | bus 10 | motor 11 | -------------------------------------------------------------------------------- /detect.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | 3 | from models import * # set ONNX_EXPORT in models.py 4 | from utils.datasets import * 5 | from utils.utils import * 6 | 7 | import os 8 | os.environ["CUDA_VISIBLE_DEVICES"] = '-1' 9 | def detect(save_img=False): 10 | imgsz = (320, 192) if ONNX_EXPORT else opt.img_size # (320, 192) or (416, 256) or (608, 352) for (height, width) 11 | out, source, weights, half, view_img, save_txt = opt.output, opt.source, opt.weights, opt.half, opt.view_img, opt.save_txt 12 | webcam = source == '0' or source.startswith('rtsp') or source.startswith('http') or source.endswith('.txt') 13 | 14 | # Initialize 15 | device = torch_utils.select_device(device='cpu' if ONNX_EXPORT else opt.device) 16 | if os.path.exists(out): 17 | shutil.rmtree(out) # delete output folder 18 | os.makedirs(out) # make new output folder 19 | 20 | # Initialize model 21 | model = Darknet(opt.cfg, imgsz) 22 | 23 | # Load weights 24 | attempt_download(weights) 25 | if weights.endswith('.pt'): # pytorch format 26 | model.load_state_dict(torch.load(weights, map_location=device)['model']) 27 | else: # darknet format 28 | load_darknet_weights(model, weights) 29 | 30 | # Second-stage classifier 31 | classify = False 32 | if classify: 33 | modelc = torch_utils.load_classifier(name='resnet101', n=2) # initialize 34 | modelc.load_state_dict(torch.load('weights/resnet101.pt', map_location=device)['model']) # load weights 35 | modelc.to(device).eval() 36 | 37 | # Eval mode 38 | model.to(device).eval() 39 | 40 | # Fuse Conv2d + BatchNorm2d layers 41 | # model.fuse() 42 | 43 | # Export mode 44 | if ONNX_EXPORT: 45 | model.fuse() 46 | img = torch.zeros((1, 3) + imgsz) # (1, 3, 320, 192) 47 | f = opt.weights.replace(opt.weights.split('.')[-1], 'onnx') # *.onnx filename 48 | torch.onnx.export(model, img, f, verbose=False, opset_version=11, 49 | input_names=['images'], output_names=['classes', 'boxes']) 50 | 51 | # Validate exported model 52 | import onnx 53 | model = onnx.load(f) # Load the ONNX model 54 | onnx.checker.check_model(model) # Check that the IR is well formed 55 | print(onnx.helper.printable_graph(model.graph)) # Print a human readable representation of the graph 56 | return 57 | 58 | # Half precision 59 | half = half and device.type != 'cpu' # half precision only supported on CUDA 60 | if half: 61 | model.half() 62 | 63 | # Set Dataloader 64 | vid_path, vid_writer = None, None 65 | if webcam: 66 | view_img = True 67 | torch.backends.cudnn.benchmark = True # set True to speed up constant image size inference 68 | dataset = LoadStreams(source, img_size=imgsz) 69 | else: 70 | save_img = True 71 | dataset = LoadImages(source, img_size=imgsz) 72 | 73 | # Get names and colors 74 | names = load_classes(opt.names) 75 | colors = [[random.randint(0, 255) for _ in range(3)] for _ in range(len(names))] 76 | 77 | # Run inference 78 | t0 = time.time() 79 | img = torch.zeros((1, 3, imgsz, imgsz), device=device) # init img 80 | _ = model(img.half() if half else img.float()) if device.type != 'cpu' else None # run once 81 | for path, img, im0s, vid_cap in dataset: 82 | img = torch.from_numpy(img).to(device) 83 | img = img.half() if half else img.float() # uint8 to fp16/32 84 | img /= 255.0 # 0 - 255 to 0.0 - 1.0 85 | if img.ndimension() == 3: 86 | img = img.unsqueeze(0) 87 | 88 | # Inference 89 | t1 = torch_utils.time_synchronized() 90 | pred = model(img, augment=opt.augment)[0] 91 | t2 = torch_utils.time_synchronized() 92 | 93 | # to float 94 | if half: 95 | pred = pred.float() 96 | 97 | # Apply NMS 98 | pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres, 99 | multi_label=False, classes=opt.classes, agnostic=opt.agnostic_nms) 100 | 101 | # Apply Classifier 102 | if classify: 103 | pred = apply_classifier(pred, modelc, img, im0s) 104 | 105 | # Process detections 106 | for i, det in enumerate(pred): # detections for image i 107 | if webcam: # batch_size >= 1 108 | p, s, im0 = path[i], '%g: ' % i, im0s[i] 109 | else: 110 | p, s, im0 = path, '', im0s 111 | 112 | save_path = str(Path(out) / Path(p).name) 113 | s += '%gx%g ' % img.shape[2:] # print string 114 | gn = torch.tensor(im0.shape)[[1, 0, 1, 0]] #  normalization gain whwh 115 | if det is not None and len(det): 116 | # Rescale boxes from imgsz to im0 size 117 | det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round() 118 | 119 | # Print results 120 | for c in det[:, -1].unique(): 121 | n = (det[:, -1] == c).sum() # detections per class 122 | s += '%g %ss, ' % (n, names[int(c)]) # add to string 123 | 124 | # Write results 125 | for *xyxy, conf, cls in det: 126 | if save_txt: # Write to file 127 | xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist() # normalized xywh 128 | with open(save_path[:save_path.rfind('.')] + '.txt', 'a') as file: 129 | file.write(('%g ' * 5 + '\n') % (cls, *xywh)) # label format 130 | 131 | if save_img or view_img: # Add bbox to image 132 | label = '%s %.2f' % (names[int(cls)], conf) 133 | plot_one_box(xyxy, im0, label=label, color=colors[int(cls)]) 134 | 135 | # Print time (inference + NMS) 136 | print('%sDone. (%.3fs)' % (s, t2 - t1)) 137 | 138 | # Stream results 139 | if view_img: 140 | cv2.imshow(p, im0) 141 | if cv2.waitKey(1) == ord('q'): # q to quit 142 | raise StopIteration 143 | 144 | # Save results (image with detections) 145 | if save_img: 146 | if dataset.mode == 'images': 147 | cv2.imwrite(save_path, im0) 148 | else: 149 | if vid_path != save_path: # new video 150 | vid_path = save_path 151 | if isinstance(vid_writer, cv2.VideoWriter): 152 | vid_writer.release() # release previous video writer 153 | 154 | fps = vid_cap.get(cv2.CAP_PROP_FPS) 155 | w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH)) 156 | h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) 157 | vid_writer = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*opt.fourcc), fps, (w, h)) 158 | vid_writer.write(im0) 159 | 160 | if save_txt or save_img: 161 | print('Results saved to %s' % os.getcwd() + os.sep + out) 162 | if platform == 'darwin': # MacOS 163 | os.system('open ' + save_path) 164 | 165 | print('Done. (%.3fs)' % (time.time() - t0)) 166 | 167 | 168 | if __name__ == '__main__': 169 | parser = argparse.ArgumentParser() 170 | parser.add_argument('--cfg', type=str, default='cfg/yolov3-spp.cfg', help='*.cfg path') 171 | parser.add_argument('--names', type=str, default='data/coco.names', help='*.names path') 172 | parser.add_argument('--weights', type=str, default='weights/yolov3-spp-ultralytics.pt', help='weights path') 173 | parser.add_argument('--source', type=str, default='data/samples', help='source') # input file/folder, 0 for webcam 174 | parser.add_argument('--output', type=str, default='output', help='output folder') # output folder 175 | parser.add_argument('--img-size', type=int, default=768, help='inference size (pixels)') 176 | parser.add_argument('--conf-thres', type=float, default=0.2, help='object confidence threshold') 177 | parser.add_argument('--iou-thres', type=float, default=0.6, help='IOU threshold for NMS') 178 | parser.add_argument('--fourcc', type=str, default='mp4v', help='output video codec (verify ffmpeg support)') 179 | parser.add_argument('--half', action='store_true', help='half precision FP16 inference') 180 | parser.add_argument('--device', default='', help='device id (i.e. 0 or 0,1) or cpu') 181 | parser.add_argument('--view-img', action='store_true', help='display results') 182 | parser.add_argument('--save-txt', action='store_true', help='save results to *.txt') 183 | parser.add_argument('--classes', nargs='+', type=int, help='filter by class') 184 | parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS') 185 | parser.add_argument('--augment', action='store_true', help='augmented inference') 186 | opt = parser.parse_args() 187 | opt.cfg = list(glob.iglob('./**/' + opt.cfg, recursive=True))[0] # find file 188 | opt.names = list(glob.iglob('./**/' + opt.names, recursive=True))[0] # find file 189 | print(opt) 190 | 191 | with torch.no_grad(): 192 | detect() 193 | -------------------------------------------------------------------------------- /models.py: -------------------------------------------------------------------------------- 1 | from utils.google_utils import * 2 | from utils.layers import * 3 | from utils.parse_config import * 4 | from utils.quant_dorefa import QuanConv as Conv_q 5 | from utils.util_wqaq import Conv2d_Q 6 | import copy 7 | ONNX_EXPORT = False 8 | 9 | 10 | w_bit = 8 11 | a_bit = 8 12 | 13 | 14 | 15 | def create_modules(module_defs, img_size, cfg): 16 | # Constructs module list of layer blocks from module configuration in module_defs 17 | 18 | img_size = [img_size] * 2 if isinstance(img_size, int) else img_size # expand if necessary 19 | hyper = module_defs.pop(0) # cfg training hyperparams (unused) 20 | output_filters = [3] # input channels 21 | module_list = nn.ModuleList() 22 | routs = [] # list of layers which rout to deeper layers 23 | yolo_index = -1 24 | 25 | for i, mdef in enumerate(module_defs): 26 | modules = nn.Sequential() 27 | if mdef['type'] == 'IAO_convolutional': 28 | bn = int(mdef['batch_normalize']) 29 | filters = int(mdef['filters']) 30 | kernel_size = int(mdef['size']) 31 | pad = int(mdef['pad']) 32 | #first = int(mdef['first']) 33 | modules.add_module('Conv2d', Conv2d_Q(in_channels=output_filters[-1], 34 | out_channels=filters, 35 | kernel_size=kernel_size, 36 | stride=int(mdef['stride']), 37 | padding=pad, 38 | bias=not bn, 39 | groups=int(mdef['group']), 40 | a_bits=16, 41 | w_bits=16, 42 | q_type=1, 43 | first_layer=0)) 44 | 45 | if bn: 46 | modules.add_module('BatchNorm2d',nn.BatchNorm2d(filters, momentum=0.01)) 47 | if mdef['activation'] == 'relu': 48 | modules.add_module('activation',nn.ReLU(inplace=True)) 49 | elif mdef['activation'] == 'leaky': 50 | modules.add_module('activation', nn.LeakyReLU(0.1, inplace=True)) 51 | 52 | elif mdef['type'] == 'quan_convolutional': 53 | bn = int(mdef['batch_normalize']) 54 | filters = int(mdef['filters']) 55 | kernel_size = int(mdef['size']) 56 | pad = int(mdef['pad']) 57 | modules.add_module('Conv2d', Conv_q(in_channels=output_filters[-1], 58 | out_channels=filters, 59 | kernel_size=kernel_size, 60 | stride=int(mdef['stride']), 61 | padding=pad, 62 | bias=not bn, 63 | groups=int(mdef['group']), 64 | nbit_w=w_bit, 65 | nbit_a=a_bit 66 | )) 67 | 68 | if bn: 69 | modules.add_module('BatchNorm2d',nn.BatchNorm2d(filters, momentum=0.1)) 70 | if mdef['activation'] == 'relu': 71 | modules.add_module('activation',nn.ReLU(inplace=True)) 72 | elif mdef['activation'] == 'leaky': 73 | modules.add_module('activation', nn.LeakyReLU(0.1, inplace=True)) 74 | 75 | elif mdef['type'] == 'convolutional': 76 | bn = mdef['batch_normalize'] 77 | filters = mdef['filters'] 78 | k = mdef['size'] # kernel size 79 | stride = mdef['stride'] if 'stride' in mdef else (mdef['stride_y'], mdef['stride_x']) 80 | if isinstance(k, int): # single-size conv 81 | modules.add_module('Conv2d', nn.Conv2d(in_channels=output_filters[-1], 82 | out_channels=filters, 83 | kernel_size=k, 84 | stride=stride, 85 | #padding=k // 2 if mdef['pad'] else 0, 86 | padding=mdef['pad'], 87 | groups=mdef['group'] if 'group' in mdef else 1, 88 | #groups=mdef['group'], 89 | bias=not bn)) 90 | else: # multiple-size conv 91 | modules.add_module('MixConv2d', MixConv2d(in_ch=output_filters[-1], 92 | out_ch=filters, 93 | k=k, 94 | stride=stride, 95 | bias=not bn)) 96 | 97 | if bn: 98 | modules.add_module('BatchNorm2d', nn.BatchNorm2d(filters, momentum=0.1, eps=1E-5)) 99 | else: 100 | routs.append(i) # detection output (goes into yolo layer) 101 | 102 | if mdef['activation'] == 'leaky': # activation study https://github.com/ultralytics/yolov3/issues/441 103 | modules.add_module('activation', nn.LeakyReLU(0.1, inplace=True)) 104 | elif mdef['activation'] == 'swish': 105 | modules.add_module('activation', Swish()) 106 | elif mdef['activation'] == 'mish': 107 | modules.add_module('activation', Mish()) 108 | elif mdef['activation'] == 'relu': 109 | modules.add_module('activation', nn.ReLU(inplace=True)) 110 | 111 | elif mdef['type'] == 'BatchNorm2d': 112 | filters = output_filters[-1] 113 | modules = nn.BatchNorm2d(filters, momentum=0.03, eps=1E-4) 114 | if i == 0 and filters == 3: # normalize RGB image 115 | # imagenet mean and var https://pytorch.org/docs/stable/torchvision/models.html#classification 116 | modules.running_mean = torch.tensor([0.485, 0.456, 0.406]) 117 | modules.running_var = torch.tensor([0.0524, 0.0502, 0.0506]) 118 | 119 | elif mdef['type'] == 'maxpool': 120 | k = mdef['size'] # kernel size 121 | stride = mdef['stride'] 122 | maxpool = nn.MaxPool2d(kernel_size=k, stride=stride, padding=(k - 1) // 2) 123 | if k == 2 and stride == 1: # yolov3-tiny 124 | modules.add_module('ZeroPad2d', nn.ZeroPad2d((0, 1, 0, 1))) 125 | modules.add_module('MaxPool2d', maxpool) 126 | else: 127 | modules = maxpool 128 | 129 | elif mdef['type'] == 'upsample': 130 | if ONNX_EXPORT: # explicitly state size, avoid scale_factor 131 | g = (yolo_index + 1) * 2 / 32 # gain 132 | modules = nn.Upsample(size=tuple(int(x * g) for x in img_size)) # img_size = (320, 192) 133 | else: 134 | modules = nn.Upsample(scale_factor=mdef['stride']) 135 | 136 | elif mdef['type'] == 'route': # nn.Sequential() placeholder for 'route' layer 137 | layers = mdef['layers'] 138 | filters = sum([output_filters[l + 1 if l > 0 else l] for l in layers]) 139 | routs.extend([i + l if l < 0 else l for l in layers]) 140 | modules = FeatureConcat(layers=layers) 141 | 142 | elif mdef['type'] == 'shortcut': # nn.Sequential() placeholder for 'shortcut' layer 143 | layers = mdef['from'] 144 | filters = output_filters[-1] 145 | routs.extend([i + l if l < 0 else l for l in layers]) 146 | modules = WeightedFeatureFusion(layers=layers, weight='weights_type' in mdef) 147 | 148 | elif mdef['type'] == 'reorg3d': # yolov3-spp-pan-scale 149 | pass 150 | 151 | elif mdef['type'] == 'se': 152 | modules.add_module('se',SELayer(output_filters[-1],reduction=int(mdef['reduction']))) 153 | 154 | elif mdef['type'] == 'yolo': 155 | yolo_index += 1 156 | stride = [32, 16, 8] # P5, P4, P3 strides 157 | if any(x in cfg for x in ['panet', 'yolov4', 'cd53']): # stride order reversed 158 | stride = list(reversed(stride)) 159 | layers = mdef['from'] if 'from' in mdef else [] 160 | modules = YOLOLayer(anchors=mdef['anchors'][mdef['mask']], # anchor list 161 | nc=mdef['classes'], # number of classes 162 | img_size=img_size, # (416, 416) 163 | yolo_index=yolo_index, # 0, 1, 2... 164 | layers=layers, # output layers 165 | stride=stride[yolo_index]) 166 | 167 | # Initialize preceding Conv2d() bias (https://arxiv.org/pdf/1708.02002.pdf section 3.3) 168 | try: 169 | j = layers[yolo_index] if 'from' in mdef else -1 170 | bias_ = module_list[j][0].bias # shape(255,) 171 | bias = bias_[:modules.no * modules.na].view(modules.na, -1) # shape(3,85) 172 | bias[:, 4] += -4.5 # obj 173 | bias[:, 5:] += math.log(0.6 / (modules.nc - 0.99)) # cls (sigmoid(p) = 1/nc) 174 | module_list[j][0].bias = torch.nn.Parameter(bias_, requires_grad=bias_.requires_grad) 175 | except: 176 | print('WARNING: smart bias initialization failure.') 177 | 178 | else: 179 | print('Warning: Unrecognized Layer Type: ' + mdef['type']) 180 | 181 | # Register module list and number of output filters 182 | module_list.append(modules) 183 | output_filters.append(filters) 184 | 185 | routs_binary = [False] * (i + 1) 186 | for i in routs: 187 | routs_binary[i] = True 188 | return module_list, routs_binary 189 | 190 | 191 | class YOLOLayer(nn.Module): 192 | def __init__(self, anchors, nc, img_size, yolo_index, layers, stride): 193 | super(YOLOLayer, self).__init__() 194 | self.anchors = torch.Tensor(anchors) 195 | self.index = yolo_index # index of this layer in layers 196 | self.layers = layers # model output layer indices 197 | self.stride = stride # layer stride 198 | self.nl = len(layers) # number of output layers (3) 199 | self.na = len(anchors) # number of anchors (3) 200 | self.nc = nc # number of classes (80) 201 | self.no = nc + 5 # number of outputs (85) 202 | self.nx, self.ny, self.ng = 0, 0, 0 # initialize number of x, y gridpoints 203 | self.anchor_vec = self.anchors / self.stride 204 | self.anchor_wh = self.anchor_vec.view(1, self.na, 1, 1, 2) 205 | 206 | if ONNX_EXPORT: 207 | self.training = False 208 | self.create_grids((img_size[1] // stride, img_size[0] // stride)) # number x, y grid points 209 | 210 | def create_grids(self, ng=(13, 13), device='cpu'): 211 | self.nx, self.ny = ng # x and y grid size 212 | self.ng = torch.tensor(ng, dtype=torch.float) 213 | 214 | # build xy offsets 215 | if not self.training: 216 | yv, xv = torch.meshgrid([torch.arange(self.ny, device=device), torch.arange(self.nx, device=device)]) 217 | self.grid = torch.stack((xv, yv), 2).view((1, 1, self.ny, self.nx, 2)).float() 218 | 219 | if self.anchor_vec.device != device: 220 | self.anchor_vec = self.anchor_vec.to(device) 221 | self.anchor_wh = self.anchor_wh.to(device) 222 | 223 | def forward(self, p, out): 224 | ASFF = False # https://arxiv.org/abs/1911.09516 225 | if ASFF: 226 | i, n = self.index, self.nl # index in layers, number of layers 227 | p = out[self.layers[i]] 228 | bs, _, ny, nx = p.shape # bs, 255, 13, 13 229 | if (self.nx, self.ny) != (nx, ny): 230 | self.create_grids((nx, ny), p.device) 231 | 232 | # outputs and weights 233 | # w = F.softmax(p[:, -n:], 1) # normalized weights 234 | w = torch.sigmoid(p[:, -n:]) * (2 / n) # sigmoid weights (faster) 235 | # w = w / w.sum(1).unsqueeze(1) # normalize across layer dimension 236 | 237 | # weighted ASFF sum 238 | p = out[self.layers[i]][:, :-n] * w[:, i:i + 1] 239 | for j in range(n): 240 | if j != i: 241 | p += w[:, j:j + 1] * \ 242 | F.interpolate(out[self.layers[j]][:, :-n], size=[ny, nx], mode='bilinear', align_corners=False) 243 | 244 | elif ONNX_EXPORT: 245 | bs = 1 # batch size 246 | else: 247 | bs, _, ny, nx = p.shape # bs, 255, 13, 13 248 | if (self.nx, self.ny) != (nx, ny): 249 | self.create_grids((nx, ny), p.device) 250 | 251 | # p.view(bs, 255, 13, 13) -- > (bs, 3, 13, 13, 85) # (bs, anchors, grid, grid, classes + xywh) 252 | p = p.view(bs, self.na, self.no, self.ny, self.nx).permute(0, 1, 3, 4, 2).contiguous() # prediction 253 | 254 | if self.training: 255 | return p 256 | 257 | elif ONNX_EXPORT: 258 | # Avoid broadcasting for ANE operations 259 | m = self.na * self.nx * self.ny 260 | ng = 1. / self.ng.repeat(m, 1) 261 | grid = self.grid.repeat(1, self.na, 1, 1, 1).view(m, 2) 262 | anchor_wh = self.anchor_wh.repeat(1, 1, self.nx, self.ny, 1).view(m, 2) * ng 263 | 264 | p = p.view(m, self.no) 265 | xy = torch.sigmoid(p[:, 0:2]) + grid # x, y 266 | wh = torch.exp(p[:, 2:4]) * anchor_wh # width, height 267 | p_cls = torch.sigmoid(p[:, 4:5]) if self.nc == 1 else \ 268 | torch.sigmoid(p[:, 5:self.no]) * torch.sigmoid(p[:, 4:5]) # conf 269 | return p_cls, xy * ng, wh 270 | 271 | else: # inference 272 | io = p.clone() # inference output 273 | io[..., :2] = torch.sigmoid(io[..., :2]) + self.grid # xy 274 | io[..., 2:4] = torch.exp(io[..., 2:4]) * self.anchor_wh # wh yolo method 275 | io[..., :4] *= self.stride 276 | torch.sigmoid_(io[..., 4:]) 277 | return io.view(bs, -1, self.no), p # view [1, 3, 13, 13, 85] as [1, 507, 85] 278 | 279 | 280 | class Darknet(nn.Module): 281 | # YOLOv3 object detection model 282 | 283 | def __init__(self, cfg, img_size=(416, 416), verbose=False): 284 | super(Darknet, self).__init__() 285 | 286 | self.module_defs = parse_model_cfg(cfg) 287 | self.hyper = copy.deepcopy(self.module_defs[0]) 288 | self.module_list, self.routs = create_modules(self.module_defs, img_size, cfg) 289 | self.yolo_layers = get_yolo_layers(self) 290 | # torch_utils.initialize_weights(self) 291 | 292 | # Darknet Header https://github.com/AlexeyAB/darknet/issues/2914#issuecomment-496675346 293 | self.version = np.array([0, 2, 5], dtype=np.int32) # (int32) version info: major, minor, revision 294 | self.seen = np.array([0], dtype=np.int64) # (int64) number of images seen during training 295 | self.info(verbose) if not ONNX_EXPORT else None # print model description 296 | 297 | def forward(self, x, augment=False, verbose=False): 298 | 299 | if not augment: 300 | return self.forward_once(x) 301 | else: # Augment images (inference and test only) https://github.com/ultralytics/yolov3/issues/931 302 | img_size = x.shape[-2:] # height, width 303 | s = [0.83, 0.67] # scales 304 | y = [] 305 | for i, xi in enumerate((x, 306 | torch_utils.scale_img(x.flip(3), s[0], same_shape=False), # flip-lr and scale 307 | torch_utils.scale_img(x, s[1], same_shape=False), # scale 308 | )): 309 | # cv2.imwrite('img%g.jpg' % i, 255 * xi[0].numpy().transpose((1, 2, 0))[:, :, ::-1]) 310 | y.append(self.forward_once(xi)[0]) 311 | 312 | y[1][..., :4] /= s[0] # scale 313 | y[1][..., 0] = img_size[1] - y[1][..., 0] # flip lr 314 | y[2][..., :4] /= s[1] # scale 315 | 316 | # for i, yi in enumerate(y): # coco small, medium, large = < 32**2 < 96**2 < 317 | # area = yi[..., 2:4].prod(2)[:, :, None] 318 | # if i == 1: 319 | # yi *= (area < 96. ** 2).float() 320 | # elif i == 2: 321 | # yi *= (area > 32. ** 2).float() 322 | # y[i] = yi 323 | 324 | y = torch.cat(y, 1) 325 | return y, None 326 | 327 | def forward_once(self, x, augment=False, verbose=False): 328 | img_size = x.shape[-2:] # height, width 329 | yolo_out, out = [], [] 330 | if verbose: 331 | print('0', x.shape) 332 | str = '' 333 | 334 | # Augment images (inference and test only) 335 | if augment: # https://github.com/ultralytics/yolov3/issues/931 336 | nb = x.shape[0] # batch size 337 | s = [0.83, 0.67] # scales 338 | x = torch.cat((x, 339 | torch_utils.scale_img(x.flip(3), s[0]), # flip-lr and scale 340 | torch_utils.scale_img(x, s[1]), # scale 341 | ), 0) 342 | 343 | for i, module in enumerate(self.module_list): 344 | name = module.__class__.__name__ 345 | if name in ['WeightedFeatureFusion', 'FeatureConcat']: # sum, concat 346 | if verbose: 347 | l = [i - 1] + module.layers # layers 348 | sh = [list(x.shape)] + [list(out[i].shape) for i in module.layers] # shapes 349 | str = ' >> ' + ' + '.join(['layer %g %s' % x for x in zip(l, sh)]) 350 | x = module(x, out) # WeightedFeatureFusion(), FeatureConcat() 351 | elif name == 'YOLOLayer': 352 | yolo_out.append(module(x, out)) 353 | else: # run module directly, i.e. mtype = 'convolutional', 'upsample', 'maxpool', 'batchnorm2d' etc. 354 | x = module(x) 355 | 356 | out.append(x if self.routs[i] else []) 357 | if verbose: 358 | print('%g/%g %s -' % (i, len(self.module_list), name), list(x.shape), str) 359 | str = '' 360 | 361 | if self.training: # train 362 | return yolo_out 363 | elif ONNX_EXPORT: # export 364 | x = [torch.cat(x, 0) for x in zip(*yolo_out)] 365 | return x[0], torch.cat(x[1:3], 1) # scores, boxes: 3780x80, 3780x4 366 | else: # inference or test 367 | x, p = zip(*yolo_out) # inference output, training output 368 | x = torch.cat(x, 1) # cat yolo outputs 369 | if augment: # de-augment results 370 | x = torch.split(x, nb, dim=0) 371 | x[1][..., :4] /= s[0] # scale 372 | x[1][..., 0] = img_size[1] - x[1][..., 0] # flip lr 373 | x[2][..., :4] /= s[1] # scale 374 | x = torch.cat(x, 1) 375 | return x, p 376 | 377 | def fuse(self): 378 | # Fuse Conv2d + BatchNorm2d layers throughout model 379 | print('Fusing layers...') 380 | fused_list = nn.ModuleList() 381 | for a in list(self.children())[0]: 382 | if isinstance(a, nn.Sequential): 383 | for i, b in enumerate(a): 384 | if isinstance(b, nn.modules.batchnorm.BatchNorm2d): 385 | # fuse this bn layer with the previous conv2d layer 386 | conv = a[i - 1] 387 | fused = torch_utils.fuse_conv_and_bn(conv, b) 388 | a = nn.Sequential(fused, *list(a.children())[i + 1:]) 389 | break 390 | fused_list.append(a) 391 | self.module_list = fused_list 392 | self.info() if not ONNX_EXPORT else None # yolov3-spp reduced from 225 to 152 layers 393 | 394 | def info(self, verbose=False): 395 | torch_utils.model_info(self, verbose) 396 | 397 | 398 | def get_yolo_layers(model): 399 | return [i for i, m in enumerate(model.module_list) if m.__class__.__name__ == 'YOLOLayer'] # [89, 101, 113] 400 | 401 | 402 | def load_darknet_weights(self, weights, cutoff=0): 403 | # Parses and loads the weights stored in 'weights' 404 | 405 | # Establish cutoffs (load layers between 0 and cutoff. if cutoff = -1 all are loaded) 406 | file = Path(weights).name 407 | #print(file) 408 | if file == 'darknet53.conv.74': 409 | cutoff = 75 410 | elif file == 'yolov3-tiny.conv.15': 411 | cutoff = 15 412 | #elif file == 'best.weights': 413 | #print('load coco.weights') 414 | #cutoff = -1 415 | # Read weights file 416 | with open(weights, 'rb') as f: 417 | # Read Header https://github.com/AlexeyAB/darknet/issues/2914#issuecomment-496675346 418 | self.version = np.fromfile(f, dtype=np.int32, count=3) # (int32) version info: major, minor, revision 419 | self.seen = np.fromfile(f, dtype=np.int64, count=1) # (int64) number of images seen during training 420 | 421 | weights = np.fromfile(f, dtype=np.float32) # the rest are weights 422 | 423 | ptr = 0 424 | for i, (mdef, module) in enumerate(zip(self.module_defs[:cutoff], self.module_list[:cutoff])): 425 | 426 | if mdef['type'] == 'convolutional': 427 | conv = module[0] 428 | if mdef['batch_normalize']: 429 | # Load BN bias, weights, running mean and running variance 430 | bn = module[1] 431 | nb = bn.bias.numel() # number of biases 432 | # Bias 433 | bn.bias.data.copy_(torch.from_numpy(weights[ptr:ptr + nb]).view_as(bn.bias)) 434 | ptr += nb 435 | # Weight 436 | bn.weight.data.copy_(torch.from_numpy(weights[ptr:ptr + nb]).view_as(bn.weight)) 437 | ptr += nb 438 | # Running Mean 439 | bn.running_mean.data.copy_(torch.from_numpy(weights[ptr:ptr + nb]).view_as(bn.running_mean)) 440 | ptr += nb 441 | # Running Var 442 | bn.running_var.data.copy_(torch.from_numpy(weights[ptr:ptr + nb]).view_as(bn.running_var)) 443 | ptr += nb 444 | else: 445 | # Load conv. bias 446 | nb = conv.bias.numel() 447 | conv_b = torch.from_numpy(weights[ptr:ptr + nb]).view_as(conv.bias) 448 | conv.bias.data.copy_(conv_b) 449 | ptr += nb 450 | # Load conv. weights 451 | nw = conv.weight.numel() # number of weights 452 | conv.weight.data.copy_(torch.from_numpy(weights[ptr:ptr + nw]).view_as(conv.weight)) 453 | ptr += nb 454 | 455 | #elif mdef['type'] == 'se': 456 | #se = module[0] 457 | #fc = se.fc 458 | #fc1 = fc[0] 459 | #fc1_num = fc1.weight.numel() 460 | #print(fc1_num) 461 | #fc1_w = torch.from_numpy(weights[ptr:ptr + fc1_num]).view_as(fc1.weight) 462 | #fc1.weight.data.copy_(fc1_w) 463 | #ptr += fc1_num 464 | #fc2 = fc[2] 465 | #fc2_num = fc2.weight.numel() 466 | #fc2_w = torch.from_numpy(weights[ptr:ptr + fc2_num]).view_as(fc2.weight) 467 | #fc2.weight.data.copy_(fc2_w) 468 | #ptr += fc2_num 469 | 470 | #assert ptr == len(weights) 471 | 472 | def save_weights(self, path='model.weights', cutoff=-1): 473 | # Converts a PyTorch model to Darket format (*.pt to *.weights) 474 | # Note: Does not work if model.fuse() is applied 475 | with open(path, 'wb') as f: 476 | # Write Header https://github.com/AlexeyAB/darknet/issues/2914#issuecomment-496675346 477 | self.version.tofile(f) # (int32) version info: major, minor, revision 478 | self.seen.tofile(f) # (int64) number of images seen during training 479 | 480 | # Iterate through layers 481 | for i, (mdef, module) in enumerate(zip(self.module_defs[:cutoff], self.module_list[:cutoff])): 482 | if mdef['type'] == 'convolutional': 483 | conv_layer = module[0] 484 | # If batch norm, load bn first 485 | if mdef['batch_normalize']: 486 | bn_layer = module[1] 487 | bn_layer.bias.data.cpu().numpy().tofile(f) 488 | bn_layer.weight.data.cpu().numpy().tofile(f) 489 | bn_layer.running_mean.data.cpu().numpy().tofile(f) 490 | bn_layer.running_var.data.cpu().numpy().tofile(f) 491 | # Load conv bias 492 | else: 493 | conv_layer.bias.data.cpu().numpy().tofile(f) 494 | # Load conv weights 495 | conv_layer.weight.data.cpu().numpy().tofile(f) 496 | 497 | elif mdef['type'] == 'se': 498 | se = module[0] 499 | fc = se.fc 500 | fc1 = fc[0] 501 | fc2 = fc[2] 502 | fc1.weight.data.cpu().numpy().tofile(f) 503 | fc2.weight.data.cpu().numpy().tofile(f) 504 | 505 | 506 | 507 | 508 | def convert(cfg='cfg/yolov3-spp.cfg', weights='weights/yolov3-spp.weights'): 509 | # Converts between PyTorch and Darknet format per extension (i.e. *.weights convert to *.pt and vice versa) 510 | # from models import *; convert('cfg/yolov3-spp.cfg', 'weights/yolov3-spp.weights') 511 | 512 | # Initialize model 513 | model = Darknet(cfg) 514 | 515 | # Load weights and save 516 | if weights.endswith('.pt'): # if PyTorch format 517 | model.load_state_dict(torch.load(weights, map_location='cpu')['model']) 518 | target = weights.rsplit('.', 1)[0] + '.weights' 519 | save_weights(model, path=target, cutoff=-1) 520 | print("Success: converted '%s' to '%s'" % (weights, target)) 521 | 522 | elif weights.endswith('.weights'): # darknet format 523 | _ = load_darknet_weights(model, weights) 524 | 525 | chkpt = {'epoch': -1, 526 | 'best_fitness': None, 527 | 'training_results': None, 528 | 'model': model.state_dict(), 529 | 'optimizer': None} 530 | 531 | target = weights.rsplit('.', 1)[0] + '.pt' 532 | torch.save(chkpt, target) 533 | print("Success: converted '%s' to 's%'" % (weights, target)) 534 | 535 | else: 536 | print('Error: extension not supported.') 537 | 538 | 539 | def attempt_download(weights): 540 | # Attempt to download pretrained weights if not found locally 541 | weights = weights.strip() 542 | msg = weights + ' missing, try downloading from https://drive.google.com/open?id=1LezFG5g3BCW6iYaV89B2i64cqEUZD7e0' 543 | 544 | if len(weights) > 0 and not os.path.isfile(weights): 545 | d = {'yolov3-spp.weights': '16lYS4bcIdM2HdmyJBVDOvt3Trx6N3W2R', 546 | 'yolov3.weights': '1uTlyDWlnaqXcsKOktP5aH_zRDbfcDp-y', 547 | 'yolov3-tiny.weights': '1CCF-iNIIkYesIDzaPvdwlcf7H9zSsKZQ', 548 | 'yolov3-spp.pt': '1f6Ovy3BSq2wYq4UfvFUpxJFNDFfrIDcR', 549 | 'yolov3.pt': '1SHNFyoe5Ni8DajDNEqgB2oVKBb_NoEad', 550 | 'yolov3-tiny.pt': '10m_3MlpQwRtZetQxtksm9jqHrPTHZ6vo', 551 | 'darknet53.conv.74': '1WUVBid-XuoUBmvzBVUCBl_ELrzqwA8dJ', 552 | 'yolov3-tiny.conv.15': '1Bw0kCpplxUqyRYAJr9RY9SGnOJbo9nEj', 553 | 'yolov3-spp-ultralytics.pt': '1UcR-zVoMs7DH5dj3N1bswkiQTA4dmKF4'} 554 | 555 | file = Path(weights).name 556 | if file in d: 557 | r = gdrive_download(id=d[file], name=weights) 558 | else: # download from pjreddie.com 559 | url = 'https://pjreddie.com/media/files/' + file 560 | print('Downloading ' + url) 561 | r = os.system('curl -f ' + url + ' -o ' + weights) 562 | 563 | # Error check 564 | if not (r == 0 and os.path.exists(weights) and os.path.getsize(weights) > 1E6): # weights exist and > 1MB 565 | os.system('rm ' + weights) # remove partial downloads 566 | raise Exception(msg) 567 | 568 | class SELayer(nn.Module): 569 | def __init__(self, channel, reduction=4): 570 | super(SELayer, self).__init__() 571 | self.avg_pool = nn.AdaptiveAvgPool2d(1) 572 | self.fc = nn.Sequential( 573 | nn.Linear(channel, channel // reduction), 574 | nn.ReLU(inplace=True), 575 | nn.Linear(channel // reduction, channel), 576 | nn.Sigmoid()) 577 | 578 | def forward(self, x): 579 | b, c, _, _ = x.size() 580 | y = self.avg_pool(x).view(b, c) 581 | y = self.fc(y).view(b, c, 1, 1) 582 | y = torch.clamp(y, 0, 1) 583 | return x * y 584 | -------------------------------------------------------------------------------- /normal_prune.py: -------------------------------------------------------------------------------- 1 | from models import * 2 | from utils.utils import * 3 | import torch 4 | import numpy as np 5 | from copy import deepcopy 6 | from test import test 7 | from terminaltables import AsciiTable 8 | import time 9 | from utils.utils import * 10 | from utils.prune_utils import * 11 | import os 12 | 13 | os.environ["CUDA_VISIBLE_DEVICES"] = '3' 14 | 15 | class opt(): 16 | model_def = "cfg/ghost32-yolov3-visdrone.cfg" 17 | data_config = "data/visdrone.data" 18 | model = 'weights/best/best.pt' 19 | 20 | 21 | #指定GPU 22 | #torch.cuda.set_device(2) 23 | percent = 0.6 24 | 25 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 26 | model = Darknet(opt.model_def).to(device) 27 | 28 | if opt.model: 29 | if opt.model.endswith(".pt"): 30 | model.load_state_dict(torch.load(opt.model, map_location=device)['model']) 31 | else: 32 | _ = load_darknet_weights(model, opt.model) 33 | 34 | 35 | data_config = parse_data_cfg(opt.data_config) 36 | 37 | valid_path = data_config["valid"] 38 | class_names = load_classes(data_config["names"]) 39 | 40 | 41 | eval_model = lambda model:test(model=model, imgsz=640, cfg='cfg/ghost-yolov3-visdrone.cfg', data='data/visdrone.data') 42 | 43 | 44 | 45 | obtain_num_parameters = lambda model:sum([param.nelement() for param in model.parameters()]) 46 | 47 | #这个不应该注释掉,等会要恢复 48 | with torch.no_grad(): 49 | origin_model_metric = eval_model(model) 50 | #results, maps = test.test(cfg=opt.model_def, 51 | #data=opt.data_config, 52 | #batch_size=4, 53 | #img_sz=608, 54 | #model=ema.ema, 55 | #conf_thres=0.001 56 | #save_json=False, 57 | #single_cls=False, 58 | 59 | 60 | origin_nparameters = obtain_num_parameters(model) 61 | 62 | CBL_idx, Conv_idx, prune_idx= parse_module_defs(model.module_defs) 63 | 64 | 65 | #将所有要剪枝的BN层的α参数,拷贝到bn_weights列表 66 | bn_weights = gather_bn_weights(model.module_list, prune_idx) 67 | 68 | #torch.sort返回二维列表,第一维是排序后的值列表,第二维是排序后的值列表对应的索引 69 | sorted_bn = torch.sort(bn_weights)[0] 70 | 71 | 72 | #避免剪掉所有channel的最高阈值(每个BN层的gamma的最大值的最小值即为阈值上限) 73 | highest_thre = [] 74 | for idx in prune_idx: 75 | #.item()可以得到张量里的元素值 76 | highest_thre.append(model.module_list[idx][1].weight.data.abs().max().item()) 77 | highest_thre = min(highest_thre) 78 | 79 | # 找到highest_thre对应的下标对应的百分比 80 | percent_limit = (sorted_bn==highest_thre).nonzero().item()/len(bn_weights) 81 | 82 | print(f'Threshold should be less than {highest_thre:.4f}.') 83 | print(f'The corresponding prune ratio is {percent_limit:.3f}.') 84 | 85 | 86 | # 该函数有很重要的意义: 87 | # ①先用深拷贝将原始模型拷贝下来,得到model_copy 88 | # ②将model_copy中,BN层中低于阈值的α参数赋值为0 89 | # ③在BN层中,输出y=α*x+β,由于α参数的值被赋值为0,因此输入仅加了一个偏置β 90 | # ④很神奇的是,network slimming中是将α参数和β参数都置0,该处只将α参数置0,但效果却很好:其实在另外一篇论文中,已经提到,可以先将β参数的效果移到 91 | # 下一层卷积层,再去剪掉本层的α参数 92 | 93 | # 该函数用最简单的方法,让我们看到了,如何快速看到剪枝后的效果 94 | 95 | 96 | 97 | def prune_and_eval(model, sorted_bn, percent=.0): 98 | model_copy = deepcopy(model) 99 | thre_index = int(len(sorted_bn) * percent) 100 | #获得α参数的阈值,小于该值的α参数对应的通道,全部裁剪掉 101 | thre = sorted_bn[thre_index] 102 | 103 | print(f'Channels with Gamma value less than {thre:.4f} are pruned!') 104 | 105 | remain_num = 0 106 | for idx in prune_idx: 107 | 108 | bn_module = model_copy.module_list[idx][1] 109 | 110 | mask = obtain_bn_mask(bn_module, thre) 111 | 112 | remain_num += int(mask.sum()) 113 | bn_module.weight.data.mul_(mask) 114 | with torch.no_grad(): 115 | mAP = eval_model(model_copy)[1].mean() 116 | 117 | print(f'Number of channels has been reduced from {len(sorted_bn)} to {remain_num}') 118 | print(f'Prune ratio: {1-remain_num/len(sorted_bn):.3f}') 119 | print(f'mAP of the pruned model is {mAP:.4f}') 120 | 121 | return thre 122 | 123 | 124 | threshold = prune_and_eval(model, sorted_bn, percent) 125 | 126 | 127 | 128 | #**************************************************************** 129 | #虽然上面已经能看到剪枝后的效果,但是没有生成剪枝后的模型结构,因此下面的代码是为了生成新的模型结构并拷贝旧模型参数到新模型 130 | 131 | 132 | #%% 133 | def obtain_filters_mask(model, thre, CBL_idx, prune_idx): 134 | 135 | pruned = 0 136 | total = 0 137 | num_filters = [] 138 | filters_mask = [] 139 | #CBL_idx存储的是所有带BN的卷积层(YOLO层的前一层卷积层是不带BN的) 140 | for idx in CBL_idx: 141 | bn_module = model.module_list[idx][1] 142 | if idx in prune_idx: 143 | 144 | mask = obtain_bn_mask(bn_module, thre).cpu().numpy() 145 | remain = int(mask.sum()) 146 | pruned = pruned + mask.shape[0] - remain 147 | 148 | if remain == 0: 149 | print("Channels would be all pruned!") 150 | raise Exception 151 | 152 | print(f'layer index: {idx:>3d} \t total channel: {mask.shape[0]:>4d} \t ' 153 | f'remaining channel: {remain:>4d}') 154 | else: 155 | mask = np.ones(bn_module.weight.data.shape) 156 | remain = mask.shape[0] 157 | 158 | total += mask.shape[0] 159 | num_filters.append(remain) 160 | filters_mask.append(mask.copy()) 161 | 162 | #因此,这里求出的prune_ratio,需要裁剪的α参数/cbl_idx中所有的α参数 163 | prune_ratio = pruned / total 164 | print(f'Prune channels: {pruned}\tPrune ratio: {prune_ratio:.3f}') 165 | 166 | return num_filters, filters_mask 167 | 168 | num_filters, filters_mask = obtain_filters_mask(model, threshold, CBL_idx, prune_idx) 169 | 170 | 171 | #CBLidx2mask存储CBL_idx中,每一层BN层对应的mask 172 | CBLidx2mask = {idx: mask for idx, mask in zip(CBL_idx, filters_mask)} 173 | 174 | pruned_model = prune_model_keep_size(model, prune_idx, CBL_idx, CBLidx2mask) 175 | 176 | 177 | 178 | 179 | with torch.no_grad(): 180 | mAP = eval_model(pruned_model)[1].mean() 181 | print('after prune_model_keep_size map is {}'.format(mAP)) 182 | 183 | 184 | #获得原始模型的module_defs,并修改该defs中的卷积核数量 185 | compact_module_defs = deepcopy(model.module_defs) 186 | for idx, num in zip(CBL_idx, num_filters): 187 | assert compact_module_defs[idx]['type'] == 'convolutional' 188 | compact_module_defs[idx]['filters'] = str(num) 189 | 190 | 191 | 192 | #compact_model = Darknet([model.hyp.copy()] + compact_module_defs).to(device) 193 | for i, mdef in enumerate(compact_module_defs): 194 | if mdef['type'] == 'shortcut': 195 | mdef['from'] = str(mdef['from'][0]) 196 | if mdef['type'] == 'route': 197 | if len(mdef['layers']) == 2 : 198 | mdef['layers'] = str(mdef['layers'][0]) + ',' + str(mdef['layers'][1]) 199 | else: 200 | mdef['layers'] = str(mdef['layers'][0]) 201 | if mdef['type'] == 'yolo': 202 | mdef['mask'] = str(mdef['mask'][0]) + ',' + str(mdef['mask'][1]) + ',' + str(mdef['mask'][2]) 203 | mdef['anchors'] = '4,5, 6,10, 14,9, 11,18, 25,15, 21,30, 47,26, 37,53, 87,65' 204 | 205 | pruned_cfg_file = write_cfg('pruned.cfg', [model.hyper.copy()] + compact_module_defs) 206 | 207 | compact_model = Darknet('pruned.cfg').to(device) 208 | print(compact_model) 209 | compact_nparameters = obtain_num_parameters(compact_model) 210 | 211 | init_weights_from_loose_model(compact_model, pruned_model, CBL_idx, Conv_idx, CBLidx2mask) 212 | 213 | random_input = torch.rand((16, 3, 416, 416)).to(device) 214 | 215 | def obtain_avg_forward_time(input, model, repeat=200): 216 | 217 | model.eval() 218 | start = time.time() 219 | with torch.no_grad(): 220 | for i in range(repeat): 221 | output = model(input) 222 | avg_infer_time = (time.time() - start) / repeat 223 | 224 | return avg_infer_time, output 225 | 226 | pruned_forward_time, pruned_output = obtain_avg_forward_time(random_input, pruned_model) 227 | compact_forward_time, compact_output = obtain_avg_forward_time(random_input, compact_model) 228 | 229 | 230 | 231 | # 在测试集上测试剪枝后的模型, 并统计模型的参数数量 232 | with torch.no_grad(): 233 | compact_model_metric = eval_model(compact_model) 234 | 235 | 236 | # 比较剪枝前后参数数量的变化、指标性能的变化 237 | metric_table = [ 238 | ["Metric", "Before", "After"], 239 | ["mAP", f'{origin_model_metric[1].mean():.6f}', f'{compact_model_metric[1].mean():.6f}'], 240 | ["Parameters", f"{origin_nparameters}", f"{compact_nparameters}"], 241 | ["Inference", f'{pruned_forward_time:.4f}', f'{compact_forward_time:.4f}'] 242 | ] 243 | print(AsciiTable(metric_table).table) 244 | 245 | 246 | 247 | # 生成剪枝后的cfg文件并保存模型 248 | pruned_cfg_name = opt.model_def.replace('/', f'/prune_{percent}_') 249 | 250 | #由于原始的compact_module_defs将anchor从字符串变为了数组,因此这里将anchors重新变为字符串 251 | 252 | 253 | #compact_model_name = opt.model.replace('/', f'/prune_{percent}_') 254 | compact_model_name = 'weights/yolov3_visdrone_normal_pruning_'+str(percent)+'percent.weights' 255 | 256 | save_weights(compact_model, path=compact_model_name) 257 | print(f'Compact model has been saved: {compact_model_name}') 258 | 259 | 260 | 261 | -------------------------------------------------------------------------------- /output/airplane.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HaloTrouvaille/YOLO-Multi-Backbones-Attention/c867b9b67eca97b1b89b2e5c0a1ed7e75f4f8747/output/airplane.png -------------------------------------------------------------------------------- /output/car.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HaloTrouvaille/YOLO-Multi-Backbones-Attention/c867b9b67eca97b1b89b2e5c0a1ed7e75f4f8747/output/car.png -------------------------------------------------------------------------------- /output/most.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HaloTrouvaille/YOLO-Multi-Backbones-Attention/c867b9b67eca97b1b89b2e5c0a1ed7e75f4f8747/output/most.png -------------------------------------------------------------------------------- /output/test.py: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /test.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import json 3 | 4 | from torch.utils.data import DataLoader 5 | 6 | from models import * 7 | from utils.datasets import * 8 | from utils.utils import * 9 | import os 10 | os.environ["CUDA_VISIBLE_DEVICES"] = '0' 11 | 12 | hyp = {'degrees' : 0,'translate': 0.05 * 0,'scale': 0.05 * 0,'shear': 0.641 * 0, 'hsv_h': 0.0138,'hsv_s': 0.678,'hsv_v': 0.36} 13 | 14 | def test(cfg, 15 | data, 16 | weights=None, 17 | batch_size=16, 18 | imgsz=416, 19 | conf_thres=0.001, 20 | iou_thres=0.6, # for nms 21 | save_json=False, 22 | single_cls=False, 23 | augment=False, 24 | model=None, 25 | dataloader=None, 26 | multi_label=True): 27 | # Initialize/load model and set device 28 | if model is None: 29 | device = torch_utils.select_device(opt.device, batch_size=batch_size) 30 | verbose = opt.task == 'test' 31 | 32 | # Remove previous 33 | for f in glob.glob('test_batch*.jpg'): 34 | os.remove(f) 35 | 36 | # Initialize model 37 | model = Darknet(cfg, imgsz) 38 | 39 | # Load weights 40 | attempt_download(weights) 41 | if weights.endswith('.pt'): # pytorch format 42 | model.load_state_dict(torch.load(weights, map_location=device)['model']) 43 | else: # darknet format 44 | load_darknet_weights(model, weights) 45 | 46 | # Fuse 47 | #model.fuse() 48 | model.to(device) 49 | 50 | if device.type != 'cpu' and torch.cuda.device_count() > 1: 51 | model = nn.DataParallel(model) 52 | else: # called by train.py 53 | device = next(model.parameters()).device # get model device 54 | verbose = False 55 | 56 | # Configure run 57 | data = parse_data_cfg(data) 58 | nc = 1 if single_cls else int(data['classes']) # number of classes 59 | path = data['valid'] # path to test images 60 | names = load_classes(data['names']) # class names 61 | iouv = torch.linspace(0.5, 0.95, 10).to(device) # iou vector for mAP@0.5:0.95 62 | iouv = iouv[0].view(1) # comment for mAP@0.5:0.95 63 | niou = iouv.numel() 64 | 65 | # Dataloader 66 | if dataloader is None: 67 | dataset = LoadImagesAndLabels(path, imgsz, batch_size, rect=True, hyp=hyp,single_cls=False) 68 | batch_size = min(batch_size, len(dataset)) 69 | dataloader = DataLoader(dataset, 70 | batch_size=batch_size, 71 | num_workers=min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]), 72 | pin_memory=True, 73 | collate_fn=dataset.collate_fn) 74 | 75 | seen = 0 76 | model.eval() 77 | _ = model(torch.zeros((1, 3, imgsz, imgsz), device=device)) if device.type != 'cpu' else None # run once 78 | coco91class = coco80_to_coco91_class() 79 | s = ('%20s' + '%10s' * 6) % ('Class', 'Images', 'Targets', 'P', 'R', 'mAP@0.5', 'F1') 80 | p, r, f1, mp, mr, map, mf1, t0, t1 = 0., 0., 0., 0., 0., 0., 0., 0., 0. 81 | loss = torch.zeros(3, device=device) 82 | jdict, stats, ap, ap_class = [], [], [], [] 83 | for batch_i, (imgs, targets, paths, shapes) in enumerate(tqdm(dataloader, desc=s)): 84 | imgs = imgs.to(device).float() / 255.0 # uint8 to float32, 0 - 255 to 0.0 - 1.0 85 | targets = targets.to(device) 86 | nb, _, height, width = imgs.shape # batch size, channels, height, width 87 | whwh = torch.Tensor([width, height, width, height]).to(device) 88 | 89 | # Disable gradients 90 | with torch.no_grad(): 91 | # Run model 92 | t = torch_utils.time_synchronized() 93 | inf_out, train_out = model(imgs, augment=augment) # inference and training outputs 94 | t0 += torch_utils.time_synchronized() - t 95 | 96 | # Compute loss 97 | if hasattr(model, 'hyp'): # if model has loss hyperparameters 98 | loss += compute_loss(train_out, targets, model)[1][:3] # GIoU, obj, cls 99 | 100 | # Run NMS 101 | t = torch_utils.time_synchronized() 102 | output = non_max_suppression(inf_out, conf_thres=conf_thres, iou_thres=iou_thres, multi_label=multi_label) 103 | t1 += torch_utils.time_synchronized() - t 104 | 105 | # Statistics per image 106 | for si, pred in enumerate(output): 107 | labels = targets[targets[:, 0] == si, 1:] 108 | nl = len(labels) 109 | tcls = labels[:, 0].tolist() if nl else [] # target class 110 | seen += 1 111 | 112 | if pred is None: 113 | if nl: 114 | stats.append((torch.zeros(0, niou, dtype=torch.bool), torch.Tensor(), torch.Tensor(), tcls)) 115 | continue 116 | 117 | # Append to text file 118 | # with open('test.txt', 'a') as file: 119 | # [file.write('%11.5g' * 7 % tuple(x) + '\n') for x in pred] 120 | 121 | # Clip boxes to image bounds 122 | clip_coords(pred, (height, width)) 123 | 124 | # Append to pycocotools JSON dictionary 125 | if save_json: 126 | # [{"image_id": 42, "category_id": 18, "bbox": [258.15, 41.29, 348.26, 243.78], "score": 0.236}, ... 127 | image_id = int(Path(paths[si]).stem.split('_')[-1]) 128 | box = pred[:, :4].clone() # xyxy 129 | scale_coords(imgs[si].shape[1:], box, shapes[si][0], shapes[si][1]) # to original shape 130 | box = xyxy2xywh(box) # xywh 131 | box[:, :2] -= box[:, 2:] / 2 # xy center to top-left corner 132 | for p, b in zip(pred.tolist(), box.tolist()): 133 | jdict.append({'image_id': image_id, 134 | 'category_id': coco91class[int(p[5])], 135 | 'bbox': [round(x, 3) for x in b], 136 | 'score': round(p[4], 5)}) 137 | 138 | # Assign all predictions as incorrect 139 | correct = torch.zeros(pred.shape[0], niou, dtype=torch.bool, device=device) 140 | if nl: 141 | detected = [] # target indices 142 | tcls_tensor = labels[:, 0] 143 | 144 | # target boxes 145 | tbox = xywh2xyxy(labels[:, 1:5]) * whwh 146 | 147 | # Per target class 148 | for cls in torch.unique(tcls_tensor): 149 | ti = (cls == tcls_tensor).nonzero().view(-1) # prediction indices 150 | pi = (cls == pred[:, 5]).nonzero().view(-1) # target indices 151 | 152 | # Search for detections 153 | if pi.shape[0]: 154 | # Prediction to target ious 155 | ious, i = box_iou(pred[pi, :4], tbox[ti]).max(1) # best ious, indices 156 | 157 | # Append detections 158 | for j in (ious > iouv[0]).nonzero(): 159 | d = ti[i[j]] # detected target 160 | if d not in detected: 161 | detected.append(d) 162 | correct[pi[j]] = ious[j] > iouv # iou_thres is 1xn 163 | if len(detected) == nl: # all targets already located in image 164 | break 165 | 166 | # Append statistics (correct, conf, pcls, tcls) 167 | stats.append((correct.cpu(), pred[:, 4].cpu(), pred[:, 5].cpu(), tcls)) 168 | 169 | # Plot images 170 | if batch_i < 1: 171 | f = 'test_batch%g_gt.jpg' % batch_i # filename 172 | plot_images(imgs, targets, paths=paths, names=names, fname=f) # ground truth 173 | f = 'test_batch%g_pred.jpg' % batch_i 174 | plot_images(imgs, output_to_target(output, width, height), paths=paths, names=names, fname=f) # predictions 175 | 176 | # Compute statistics 177 | stats = [np.concatenate(x, 0) for x in zip(*stats)] # to numpy 178 | if len(stats): 179 | p, r, ap, f1, ap_class = ap_per_class(*stats) 180 | if niou > 1: 181 | p, r, ap, f1 = p[:, 0], r[:, 0], ap.mean(1), ap[:, 0] # [P, R, AP@0.5:0.95, AP@0.5] 182 | mp, mr, map, mf1 = p.mean(), r.mean(), ap.mean(), f1.mean() 183 | nt = np.bincount(stats[3].astype(np.int64), minlength=nc) # number of targets per class 184 | else: 185 | nt = torch.zeros(1) 186 | 187 | # Print results 188 | pf = '%20s' + '%10.3g' * 6 # print format 189 | print(pf % ('all', seen, nt.sum(), mp, mr, map, mf1)) 190 | 191 | # Print results per class 192 | if verbose and nc > 1 and len(stats): 193 | for i, c in enumerate(ap_class): 194 | print(pf % (names[c], seen, nt[c], p[i], r[i], ap[i], f1[i])) 195 | 196 | # Print speeds 197 | if verbose or save_json: 198 | t = tuple(x / seen * 1E3 for x in (t0, t1, t0 + t1)) + (imgsz, imgsz, batch_size) # tuple 199 | print('Speed: %.1f/%.1f/%.1f ms inference/NMS/total per %gx%g image at batch-size %g' % t) 200 | 201 | # Save JSON 202 | if save_json and map and len(jdict): 203 | print('\nCOCO mAP with pycocotools...') 204 | imgIds = [int(Path(x).stem.split('_')[-1]) for x in dataloader.dataset.img_files] 205 | with open('results.json', 'w') as file: 206 | json.dump(jdict, file) 207 | 208 | try: 209 | from pycocotools.coco import COCO 210 | from pycocotools.cocoeval import COCOeval 211 | 212 | # https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb 213 | cocoGt = COCO(glob.glob('../coco/annotations/instances_val*.json')[0]) # initialize COCO ground truth api 214 | cocoDt = cocoGt.loadRes('results.json') # initialize COCO pred api 215 | 216 | cocoEval = COCOeval(cocoGt, cocoDt, 'bbox') 217 | cocoEval.params.imgIds = imgIds # [:32] # only evaluate these images 218 | cocoEval.evaluate() 219 | cocoEval.accumulate() 220 | cocoEval.summarize() 221 | # mf1, map = cocoEval.stats[:2] # update to pycocotools results (mAP@0.5:0.95, mAP@0.5) 222 | except: 223 | print('WARNING: pycocotools must be installed with numpy==1.17 to run correctly. ' 224 | 'See https://github.com/cocodataset/cocoapi/issues/356') 225 | 226 | # Return results 227 | maps = np.zeros(nc) + map 228 | for i, c in enumerate(ap_class): 229 | maps[c] = ap[i] 230 | return (mp, mr, map, mf1, *(loss.cpu() / len(dataloader)).tolist()), maps 231 | 232 | 233 | if __name__ == '__main__': 234 | parser = argparse.ArgumentParser(prog='test.py') 235 | parser.add_argument('--cfg', type=str, default='cfg/yolov3-spp.cfg', help='*.cfg path') 236 | parser.add_argument('--data', type=str, default='data/coco2014.data', help='*.data path') 237 | parser.add_argument('--weights', type=str, default='weights/yolov3-spp-ultralytics.pt', help='weights path') 238 | parser.add_argument('--batch-size', type=int, default=8, help='size of each image batch') 239 | parser.add_argument('--img-size', type=int, default=512, help='inference size (pixels)') 240 | parser.add_argument('--conf-thres', type=float, default=0.001, help='object confidence threshold') 241 | parser.add_argument('--iou-thres', type=float, default=0.5, help='IOU threshold for NMS') 242 | parser.add_argument('--save-json', action='store_true', help='save a cocoapi-compatible JSON results file') 243 | parser.add_argument('--task', default='test', help="'test', 'study', 'benchmark'") 244 | parser.add_argument('--device', default='', help='device id (i.e. 0 or 0,1) or cpu') 245 | parser.add_argument('--single-cls', action='store_true', help='train as single-class dataset') 246 | parser.add_argument('--augment', action='store_true', help='augmented inference') 247 | opt = parser.parse_args() 248 | #opt.save_json = opt.save_json or any([x in opt.data for x in ['coco.data', 'coco2014.data', 'coco2017.data']]) 249 | opt.save_json = False 250 | opt.cfg = list(glob.iglob('./**/' + opt.cfg, recursive=True))[0] # find file 251 | opt.data = list(glob.iglob('./**/' + opt.data, recursive=True))[0] # find file 252 | print(opt) 253 | 254 | # task = 'test', 'study', 'benchmark' 255 | if opt.task == 'test': # (default) test normally 256 | test(opt.cfg, 257 | opt.data, 258 | opt.weights, 259 | opt.batch_size, 260 | opt.img_size, 261 | opt.conf_thres, 262 | opt.iou_thres, 263 | opt.save_json, 264 | opt.single_cls, 265 | opt.augment) 266 | 267 | elif opt.task == 'benchmark': # mAPs at 256-640 at conf 0.5 and 0.7 268 | y = [] 269 | for i in list(range(256, 640, 128)): # img-size 270 | for j in [0.6, 0.7]: # iou-thres 271 | t = time.time() 272 | r = test(opt.cfg, opt.data, opt.weights, opt.batch_size, i, opt.conf_thres, j, opt.save_json)[0] 273 | y.append(r + (time.time() - t,)) 274 | np.savetxt('benchmark.txt', y, fmt='%10.4g') # y = np.loadtxt('study.txt') 275 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | 3 | import torch.distributed as dist 4 | import torch.optim as optim 5 | import torch.optim.lr_scheduler as lr_scheduler 6 | from torch.utils.tensorboard import SummaryWriter 7 | 8 | import test # import test.py to get mAP after each epoch 9 | from models import * 10 | from utils.datasets import * 11 | from utils.utils import * 12 | from utils.prune_utils import * 13 | import os 14 | 15 | mixed_precision = True 16 | try: # Mixed precision training https://github.com/NVIDIA/apex 17 | from apex import amp 18 | except: 19 | print('Apex recommended for faster mixed precision training: https://github.com/NVIDIA/apex') 20 | mixed_precision = False # not installed 21 | 22 | wdir = 'weights' + os.sep # weights dir 23 | last = wdir + 'last.pt' 24 | best = wdir + 'best.pt' 25 | results_file = 'results.txt' 26 | 27 | # Hyperparameters 28 | hyp = {'giou': 3.54, # giou loss gain 29 | 'cls': 37.4, # cls loss gain 30 | 'cls_pw': 1.0, # cls BCELoss positive_weight 31 | 'obj': 64.3, # obj loss gain (*=img_size/320 if img_size != 320) 32 | 'obj_pw': 1.0, # obj BCELoss positive_weight 33 | 'iou_t': 0.20, # iou training threshold 34 | 'lr0': 0.01, # initial learning rate (SGD=5E-3, Adam=5E-4) 35 | 'lrf': 0.0005, # final learning rate (with cos scheduler) 36 | 'momentum': 0.937, # SGD momentum 37 | 'weight_decay': 0.000484, # optimizer weight decay 38 | 'fl_gamma': 0.0, # focal loss gamma (efficientDet default is gamma=1.5) 39 | 'hsv_h': 0.0138, # image HSV-Hue augmentation (fraction) 40 | 'hsv_s': 0.678, # image HSV-Saturation augmentation (fraction) 41 | 'hsv_v': 0.36, # image HSV-Value augmentation (fraction) 42 | 'degrees': 1.98 * 0, # image rotation (+/- deg) 43 | 'translate': 0.05 * 0, # image translation (+/- fraction) 44 | 'scale': 0.05 * 0, # image scale (+/- gain) 45 | 'shear': 0.641 * 0} # image shear (+/- deg) 46 | 47 | # Overwrite hyp with hyp*.txt (optional) 48 | f = glob.glob('hyp*.txt') 49 | if f: 50 | print('Using %s' % f[0]) 51 | for k, v in zip(hyp.keys(), np.loadtxt(f[0])): 52 | hyp[k] = v 53 | 54 | # Print focal loss if gamma > 0 55 | if hyp['fl_gamma']: 56 | print('Using FocalLoss(gamma=%g)' % hyp['fl_gamma']) 57 | 58 | 59 | def train(hyp): 60 | cfg = opt.cfg 61 | t_cfg = opt.t_cfg 62 | data = opt.data 63 | epochs = opt.epochs # 500200 batches at bs 64, 117263 images = 273 epochs 64 | batch_size = opt.batch_size 65 | accumulate = max(round(64 / batch_size), 1) # accumulate n times before optimizer update (bs 64) 66 | weights = opt.weights # initial training weights 67 | t_weights = opt.t_weights # 老师模型权重 68 | imgsz_min, imgsz_max, imgsz_test = opt.img_size # img sizes (min, max, test) 69 | 70 | # Image Sizes 71 | gs = 64 # (pixels) grid size 72 | assert math.fmod(imgsz_min, gs) == 0, '--img-size %g must be a %g-multiple' % (imgsz_min, gs) 73 | opt.multi_scale |= imgsz_min != imgsz_max # multi if different (min, max) 74 | if opt.multi_scale: 75 | if imgsz_min == imgsz_max: 76 | imgsz_min //= 1.5 77 | imgsz_max //= 0.667 78 | grid_min, grid_max = imgsz_min // gs, imgsz_max // gs 79 | imgsz_min, imgsz_max = int(grid_min * gs), int(grid_max * gs) 80 | img_size = imgsz_max # initialize with max size 81 | 82 | # Configure run 83 | init_seeds() 84 | data_dict = parse_data_cfg(data) 85 | train_path = data_dict['train'] 86 | test_path = data_dict['valid'] 87 | nc = 1 if opt.single_cls else int(data_dict['classes']) # number of classes 88 | hyp['cls'] *= nc / 80 # update coco-tuned hyp['cls'] to current dataset 89 | 90 | # Remove previous results 91 | for f in glob.glob('*_batch*.jpg') + glob.glob(results_file): 92 | os.remove(f) 93 | 94 | # Initialize model 95 | model = Darknet(cfg).to(device) 96 | #print(model) 97 | if t_cfg: 98 | t_model = Darknet(t_cfg).to(device) 99 | # Optimizer 100 | pg0, pg1, pg2 = [], [], [] # optimizer parameter groups 101 | for k, v in dict(model.named_parameters()).items(): 102 | if '.bias' in k: 103 | pg2 += [v] # biases 104 | elif 'Conv2d.weight' in k: 105 | pg1 += [v] # apply weight_decay 106 | else: 107 | pg0 += [v] # all else 108 | 109 | if opt.adam: 110 | # hyp['lr0'] *= 0.1 # reduce lr (i.e. SGD=5E-3, Adam=5E-4) 111 | optimizer = optim.Adam(pg0, lr=hyp['lr0']) 112 | # optimizer = AdaBound(pg0, lr=hyp['lr0'], final_lr=0.1) 113 | else: 114 | optimizer = optim.SGD(pg0, lr=hyp['lr0'], momentum=hyp['momentum'], nesterov=True) 115 | optimizer.add_param_group({'params': pg1, 'weight_decay': hyp['weight_decay']}) # add pg1 with weight_decay 116 | optimizer.add_param_group({'params': pg2}) # add pg2 (biases) 117 | print('Optimizer groups: %g .bias, %g Conv2d.weight, %g other' % (len(pg2), len(pg1), len(pg0))) 118 | del pg0, pg1, pg2 119 | 120 | start_epoch = 0 121 | best_fitness = 0.0 122 | 123 | #attempt_download(weights) 124 | # 待修改 125 | pretrain = False 126 | imagenetpre = False 127 | if imagenetpre: 128 | model_dict = model.state_dict() 129 | pretrained_dict = torch.load('weights/checkpoint.t7',map_location='cuda:3') 130 | pretrained_dict = {k: v for k, v in model_dict.items() if k in pretrained_dict} 131 | model_dict.update(pretrained_dict) 132 | model.load_state_dict(model_dict) 133 | print("Load Imagenet pretrain successfully!") 134 | if pretrain: 135 | attempt_download(weights) 136 | if weights.endswith('.pt'): # pytorch format 137 | # possible weights are '*.pt', 'yolov3-spp.pt', 'yolov3-tiny.pt' etc. 138 | chkpt = torch.load(weights, map_location=device) 139 | 140 | # load model 141 | try: 142 | chkpt['model'] = {k: v for k, v in chkpt['model'].items() if model.state_dict()[k].numel() == v.numel()} 143 | model.load_state_dict(chkpt['model'], strict=False) 144 | except KeyError as e: 145 | s = "%s is not compatible with %s. Specify --weights '' or specify a --cfg compatible with %s. " \ 146 | "See https://github.com/ultralytics/yolov3/issues/657" % (opt.weights, opt.cfg, opt.weights) 147 | raise KeyError(s) from e 148 | 149 | # load optimizer 150 | if chkpt['optimizer'] is not None: 151 | optimizer.load_state_dict(chkpt['optimizer']) 152 | best_fitness = chkpt['best_fitness'] 153 | 154 | # load results 155 | if chkpt.get('training_results') is not None: 156 | with open(results_file, 'w') as file: 157 | file.write(chkpt['training_results']) # write results.txt 158 | 159 | start_epoch = chkpt['epoch'] + 1 160 | del chkpt 161 | 162 | elif len(weights) > 0: # darknet format 163 | # possible weights are '*.weights', 'yolov3-tiny.conv.15', 'darknet53.conv.74' etc. 164 | load_darknet_weights(model, weights) 165 | if t_cfg: 166 | if t_weights.endswith('.pt'): 167 | t_model.load_state_dict(torch.load(t_weights, map_location=device)['model']) 168 | 169 | elif t_weights.endwith('.weights'): 170 | load_darknet_weights(t_model, t_weights) 171 | 172 | else: 173 | raise Exception('Unsupported weight format! Please use .pt or .weights') 174 | 175 | if not mixed_precision: 176 | t_model.eval() 177 | print("<-----------Using Knowledge Distillation---------->") 178 | 179 | if hasattr(model, 'module'): 180 | _,_,prune_idx= parse_module_defs(model.module.module_defs) 181 | else: 182 | _,_,prune_idx= parse_module_defs(model.module_defs) 183 | # Mixed precision training https://github.com/NVIDIA/apex 184 | if mixed_precision: 185 | model, optimizer = amp.initialize(model, optimizer, opt_level='O1', verbosity=0) 186 | 187 | # Scheduler https://arxiv.org/pdf/1812.01187.pdf 188 | lf = lambda x: (((1 + math.cos(x * math.pi / epochs)) / 2) ** 1.0) * 0.95 + 0.05 # cosine 189 | scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf) 190 | scheduler.last_epoch = start_epoch - 1 # see link below 191 | # https://discuss.pytorch.org/t/a-problem-occured-when-resuming-an-optimizer/28822 192 | 193 | # Plot lr schedule 194 | # y = [] 195 | # for _ in range(epochs): 196 | # scheduler.step() 197 | # y.append(optimizer.param_groups[0]['lr']) 198 | # plt.plot(y, '.-', label='LambdaLR') 199 | # plt.xlabel('epoch') 200 | # plt.ylabel('LR') 201 | # plt.tight_layout() 202 | # plt.savefig('LR.png', dpi=300) 203 | 204 | # Initialize distributed training 205 | if device.type != 'cpu' and torch.cuda.device_count() > 1 and torch.distributed.is_available(): 206 | dist.init_process_group(backend='nccl', # 'distributed backend' 207 | init_method='tcp://127.0.0.1:9999', # distributed training init method 208 | world_size=1, # number of nodes for distributed training 209 | rank=0) # distributed training node rank 210 | model = torch.nn.parallel.DistributedDataParallel(model, find_unused_parameters=True) 211 | model.yolo_layers = model.module.yolo_layers # move yolo layer indices to top level 212 | 213 | # Dataset 214 | dataset = LoadImagesAndLabels(train_path, img_size, batch_size, 215 | augment=True, 216 | hyp=hyp, # augmentation hyperparameters 217 | rect=opt.rect, # rectangular training 218 | cache_images=opt.cache_images, 219 | single_cls=opt.single_cls) 220 | 221 | # Dataloader 222 | batch_size = min(batch_size, len(dataset)) 223 | nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]) # number of workers 224 | dataloader = torch.utils.data.DataLoader(dataset, 225 | batch_size=batch_size, 226 | num_workers=nw, 227 | shuffle=not opt.rect, # Shuffle=True unless rectangular training is used 228 | pin_memory=True, 229 | collate_fn=dataset.collate_fn) 230 | 231 | # Testloader 232 | testloader = torch.utils.data.DataLoader(LoadImagesAndLabels(test_path, imgsz_test, batch_size, 233 | hyp=hyp, 234 | rect=True, 235 | cache_images=opt.cache_images, 236 | single_cls=opt.single_cls), 237 | batch_size=batch_size, 238 | num_workers=nw, 239 | pin_memory=True, 240 | collate_fn=dataset.collate_fn) 241 | 242 | # Model parameters 243 | model.nc = nc # attach number of classes to model 244 | model.hyp = hyp # attach hyperparameters to model 245 | model.gr = 1.0 # giou loss ratio (obj_loss = 1.0 or giou) 246 | model.class_weights = labels_to_class_weights(dataset.labels, nc).to(device) # attach class weights 247 | 248 | # Model EMA 249 | ema = torch_utils.ModelEMA(model) 250 | 251 | # Start training 252 | nb = len(dataloader) # number of batches 253 | n_burn = max(3 * nb, 500) # burn-in iterations, max(3 epochs, 500 iterations) 254 | maps = np.zeros(nc) # mAP per class 255 | # torch.autograd.set_detect_anomaly(True) 256 | results = (0, 0, 0, 0, 0, 0, 0) # 'P', 'R', 'mAP', 'F1', 'val GIoU', 'val Objectness', 'val Classification' 257 | t0 = time.time() 258 | print('Image sizes %g - %g train, %g test' % (imgsz_min, imgsz_max, imgsz_test)) 259 | print('Using %g dataloader workers' % nw) 260 | print('Starting training for %g epochs...' % epochs) 261 | for epoch in range(start_epoch, epochs): # epoch ------------------------------------------------------------------ 262 | model.train() 263 | sr_flag = get_sr_flag(epoch, opt.sr) 264 | # Update image weights (optional) 265 | if dataset.image_weights: 266 | w = model.class_weights.cpu().numpy() * (1 - maps) ** 2 # class weights 267 | image_weights = labels_to_image_weights(dataset.labels, nc=nc, class_weights=w) 268 | dataset.indices = random.choices(range(dataset.n), weights=image_weights, k=dataset.n) # rand weighted idx 269 | 270 | mloss = torch.zeros(4).to(device) # mean losses 271 | msoft_target = torch.zeros(1).to(device) 272 | 273 | print(('\n' + '%10s' * 9) % ('Epoch', 'gpu_mem', 'GIoU', 'obj', 'cls', 'total', 'soft', 'targets', 'img_size')) 274 | 275 | pbar = tqdm(enumerate(dataloader), total=nb) # progress bar 276 | for i, (imgs, targets, paths, _) in pbar: # batch ------------------------------------------------------------- 277 | ni = i + nb * epoch # number integrated batches (since train start) 278 | imgs = imgs.to(device).float() / 255.0 # uint8 to float32, 0 - 255 to 0.0 - 1.0 279 | targets = targets.to(device) 280 | 281 | # Burn-in 282 | if ni <= n_burn: 283 | xi = [0, n_burn] # x interp 284 | model.gr = np.interp(ni, xi, [0.0, 1.0]) # giou loss ratio (obj_loss = 1.0 or giou) 285 | accumulate = max(1, np.interp(ni, xi, [1, 64 / batch_size]).round()) 286 | for j, x in enumerate(optimizer.param_groups): 287 | # bias lr falls from 0.1 to lr0, all other lrs rise from 0.0 to lr0 288 | x['lr'] = np.interp(ni, xi, [0.1 if j == 2 else 0.0, x['initial_lr'] * lf(epoch)]) 289 | x['weight_decay'] = np.interp(ni, xi, [0.0, hyp['weight_decay'] if j == 1 else 0.0]) 290 | if 'momentum' in x: 291 | x['momentum'] = np.interp(ni, xi, [0.9, hyp['momentum']]) 292 | 293 | # Multi-Scale 294 | if opt.multi_scale: 295 | if ni / accumulate % 1 == 0: #  adjust img_size (67% - 150%) every 1 batch 296 | img_size = random.randrange(grid_min, grid_max + 1) * gs 297 | sf = img_size / max(imgs.shape[2:]) # scale factor 298 | if sf != 1: 299 | ns = [math.ceil(x * sf / gs) * gs for x in imgs.shape[2:]] # new shape (stretched to 32-multiple) 300 | imgs = F.interpolate(imgs, size=ns, mode='bilinear', align_corners=False) 301 | 302 | # Forward 303 | pred = model(imgs) # 输出3个YOLO层的输出 304 | 305 | 306 | 307 | # Loss 308 | loss, loss_items = compute_loss(pred, targets, model) 309 | if not torch.isfinite(loss): 310 | print('WARNING: non-finite loss, ending training ', loss_items) 311 | return results 312 | 313 | soft_target = 0 314 | reg_ratio = 0 #表示对于目标回归效果没有老师模型好,针对这部分此时再与ground truth学习 315 | 316 | if t_cfg: 317 | if mixed_precision: 318 | with torch.no_grad(): 319 | t_output = t_model(imgs) 320 | else: 321 | _, t_output = t_model(imgs) 322 | soft_target = distillation_loss1(pred, t_output, nc, imgs.size(0)) 323 | #soft_target, reg_ratio = distillation_loss2(model, targets, pred, t_output) 324 | loss += soft_target 325 | 326 | # Backward 327 | loss *= batch_size / 64 # scale loss 328 | if mixed_precision: 329 | with amp.scale_loss(loss, optimizer) as scaled_loss: 330 | scaled_loss.backward() 331 | else: 332 | loss.backward() 333 | 334 | if hasattr(model, 'module'): 335 | BNOptimizer.updateBN(sr_flag, model.module.module_list, opt.s, prune_idx) 336 | else: 337 | BNOptimizer.updateBN(sr_flag, model.module_list, opt.s, prune_idx) 338 | 339 | 340 | # Optimize 341 | if ni % accumulate == 0: 342 | optimizer.step() 343 | optimizer.zero_grad() 344 | ema.update(model) 345 | 346 | # Print 347 | mloss = (mloss * i + loss_items) / (i + 1) # update mean losses 348 | msoft_target = (msoft_target * i + soft_target) / (i + 1) 349 | 350 | mem = '%.3gG' % (torch.cuda.memory_cached() / 1E9 if torch.cuda.is_available() else 0) # (GB) 351 | s = ('%10s' * 2 + '%10.3g' * 7) % ('%g/%g' % (epoch, epochs - 1), mem, *mloss, msoft_target, len(targets), img_size) 352 | pbar.set_description(s) 353 | 354 | # Plot 355 | if ni < 1: 356 | f = 'train_batch%g.jpg' % i # filename 357 | res = plot_images(images=imgs, targets=targets, paths=paths, fname=f) 358 | if tb_writer: 359 | tb_writer.add_image(f, res, dataformats='HWC', global_step=epoch) 360 | # tb_writer.add_graph(model, imgs) # add model to tensorboard 361 | 362 | # end batch ------------------------------------------------------------------------------------------------ 363 | 364 | # Update scheduler 365 | scheduler.step() 366 | 367 | # Process epoch results 368 | ema.update_attr(model) 369 | final_epoch = epoch + 1 == epochs 370 | if not opt.notest or final_epoch: # Calculate mAP 371 | is_coco = any([x in data for x in ['coco.data', 'coco2014.data', 'coco2017.data']]) and model.nc == 80 372 | results, maps = test.test(cfg, 373 | data, 374 | batch_size=batch_size, 375 | imgsz=imgsz_test, 376 | model=ema.ema, 377 | conf_thres=0.001, 378 | save_json=final_epoch and is_coco, 379 | single_cls=opt.single_cls, 380 | dataloader=testloader, 381 | multi_label=ni > n_burn) 382 | 383 | # Write 384 | with open(results_file, 'a') as f: 385 | f.write(s + '%10.3g' * 7 % results + '\n') # P, R, mAP, F1, test_losses=(GIoU, obj, cls) 386 | if len(opt.name) and opt.bucket: 387 | os.system('gsutil cp results.txt gs://%s/results/results%s.txt' % (opt.bucket, opt.name)) 388 | 389 | # Tensorboard 390 | if tb_writer: 391 | tags = ['train/giou_loss', 'train/obj_loss', 'train/cls_loss', 392 | 'metrics/precision', 'metrics/recall', 'metrics/mAP_0.5', 'metrics/F1', 393 | 'val/giou_loss', 'val/obj_loss', 'val/cls_loss'] 394 | for x, tag in zip(list(mloss[:-1]) + list(results), tags): 395 | tb_writer.add_scalar(tag, x, epoch) 396 | 397 | # Update best mAP 398 | fi = fitness(np.array(results).reshape(1, -1)) # fitness_i = weighted combination of [P, R, mAP, F1] 399 | if fi > best_fitness: 400 | best_fitness = fi 401 | 402 | # Save model 403 | save = (not opt.nosave) or (final_epoch and not opt.evolve) 404 | if save: 405 | with open(results_file, 'r') as f: # create checkpoint 406 | chkpt = {'epoch': epoch, 407 | 'best_fitness': best_fitness, 408 | 'training_results': f.read(), 409 | 'model': ema.ema.module.state_dict() if hasattr(model, 'module') else ema.ema.state_dict(), 410 | 'optimizer': None if final_epoch else optimizer.state_dict()} 411 | 412 | # Save last, best and delete 413 | torch.save(chkpt, last) 414 | if (best_fitness == fi) and not final_epoch: 415 | torch.save(chkpt, best) 416 | del chkpt 417 | 418 | # end epoch ---------------------------------------------------------------------------------------------------- 419 | # end training 420 | 421 | n = opt.name 422 | if len(n): 423 | n = '_' + n if not n.isnumeric() else n 424 | fresults, flast, fbest = 'results%s.txt' % n, wdir + 'last%s.pt' % n, wdir + 'best%s.pt' % n 425 | for f1, f2 in zip([wdir + 'last.pt', wdir + 'best.pt', 'results.txt'], [flast, fbest, fresults]): 426 | if os.path.exists(f1): 427 | os.rename(f1, f2) # rename 428 | ispt = f2.endswith('.pt') # is *.pt 429 | strip_optimizer(f2) if ispt else None # strip optimizer 430 | os.system('gsutil cp %s gs://%s/weights' % (f2, opt.bucket)) if opt.bucket and ispt else None # upload 431 | 432 | if not opt.evolve: 433 | plot_results() # save as results.png 434 | print('%g epochs completed in %.3f hours.\n' % (epoch - start_epoch + 1, (time.time() - t0) / 3600)) 435 | dist.destroy_process_group() if torch.cuda.device_count() > 1 else None 436 | torch.cuda.empty_cache() 437 | return results 438 | 439 | 440 | if __name__ == '__main__': 441 | parser = argparse.ArgumentParser() 442 | parser.add_argument('--epochs', type=int, default=300) # 500200 batches at bs 16, 117263 COCO images = 273 epochs 443 | parser.add_argument('--batch-size', type=int, default=8) # effective bs = batch_size * accumulate = 16 * 4 = 64 444 | parser.add_argument('--cfg', type=str, default='cfg/yolov3-spp.cfg', help='*.cfg path') 445 | parser.add_argument('--data', type=str, default='data/coco2017.data', help='*.data path') 446 | parser.add_argument('--multi-scale', action='store_true', help='adjust (67%% - 150%%) img_size every 10 batches') 447 | parser.add_argument('--img-size', nargs='+', type=int, default=[320, 640], help='[min_train, max-train, test]') 448 | parser.add_argument('--rect', action='store_true', help='rectangular training') 449 | parser.add_argument('--resume', action='store_true', help='resume training from last.pt') 450 | parser.add_argument('--nosave', action='store_true', help='only save final checkpoint') 451 | parser.add_argument('--notest', action='store_true', help='only test final epoch') 452 | parser.add_argument('--evolve', action='store_true', help='evolve hyperparameters') 453 | parser.add_argument('--bucket', type=str, default='', help='gsutil bucket') 454 | parser.add_argument('--cache-images', action='store_true', help='cache images for faster training') 455 | parser.add_argument('--weights', type=str, default='weights/last.pt', help='initial weights path') 456 | parser.add_argument('--name', default='', help='renames results.txt to results_name.txt if supplied') 457 | parser.add_argument('--device', default='', help='device id (i.e. 0 or 0,1 or cpu)') 458 | parser.add_argument('--adam', action='store_true', help='use adam optimizer') 459 | parser.add_argument('--single-cls', action='store_true', help='train as single-class dataset') 460 | parser.add_argument('--sparsity-regularization', '-sr', dest='sr', action='store_true',help='train with channel sparsity regularization') 461 | parser.add_argument('--s', type=float, default=0.001, help='scale sparse rate') 462 | parser.add_argument('--t_cfg', type=str, default='', help='teacher model cfg file path for knowledge distillation') 463 | parser.add_argument('--t_weights', type=str, default='', help='teacher model weights') 464 | opt = parser.parse_args() 465 | opt.weights = last if opt.resume else opt.weights 466 | check_git_status() 467 | opt.cfg = list(glob.iglob('./**/' + opt.cfg, recursive=True))[0] # find file 468 | # opt.data = list(glob.iglob('./**/' + opt.data, recursive=True))[0] # find file 469 | print(opt) 470 | opt.img_size.extend([opt.img_size[-1]] * (3 - len(opt.img_size))) # extend to 3 sizes (min, max, test) 471 | device = torch_utils.select_device(opt.device, apex=mixed_precision, batch_size=opt.batch_size) 472 | if device.type == 'cpu': 473 | mixed_precision = False 474 | 475 | # scale hyp['obj'] by img_size (evolved at 320) 476 | # hyp['obj'] *= opt.img_size[0] / 320. 477 | 478 | tb_writer = None 479 | if not opt.evolve: # Train normally 480 | print('Start Tensorboard with "tensorboard --logdir=runs", view at http://localhost:6006/') 481 | tb_writer = SummaryWriter(comment=opt.name) 482 | train(hyp) # train normally 483 | 484 | else: # Evolve hyperparameters (optional) 485 | opt.notest, opt.nosave = True, True # only test/save final epoch 486 | if opt.bucket: 487 | os.system('gsutil cp gs://%s/evolve.txt .' % opt.bucket) # download evolve.txt if exists 488 | 489 | for _ in range(1): # generations to evolve 490 | if os.path.exists('evolve.txt'): # if evolve.txt exists: select best hyps and mutate 491 | # Select parent(s) 492 | parent = 'single' # parent selection method: 'single' or 'weighted' 493 | x = np.loadtxt('evolve.txt', ndmin=2) 494 | n = min(5, len(x)) # number of previous results to consider 495 | x = x[np.argsort(-fitness(x))][:n] # top n mutations 496 | w = fitness(x) - fitness(x).min() # weights 497 | if parent == 'single' or len(x) == 1: 498 | # x = x[random.randint(0, n - 1)] # random selection 499 | x = x[random.choices(range(n), weights=w)[0]] # weighted selection 500 | elif parent == 'weighted': 501 | x = (x * w.reshape(n, 1)).sum(0) / w.sum() # weighted combination 502 | 503 | # Mutate 504 | method, mp, s = 3, 0.9, 0.2 # method, mutation probability, sigma 505 | npr = np.random 506 | npr.seed(int(time.time())) 507 | g = np.array([1, 1, 1, 1, 1, 1, 1, 0, .1, 1, 0, 1, 1, 1, 1, 1, 1, 1]) # gains 508 | ng = len(g) 509 | if method == 1: 510 | v = (npr.randn(ng) * npr.random() * g * s + 1) ** 2.0 511 | elif method == 2: 512 | v = (npr.randn(ng) * npr.random(ng) * g * s + 1) ** 2.0 513 | elif method == 3: 514 | v = np.ones(ng) 515 | while all(v == 1): # mutate until a change occurs (prevent duplicates) 516 | # v = (g * (npr.random(ng) < mp) * npr.randn(ng) * s + 1) ** 2.0 517 | v = (g * (npr.random(ng) < mp) * npr.randn(ng) * npr.random() * s + 1).clip(0.3, 3.0) 518 | for i, k in enumerate(hyp.keys()): # plt.hist(v.ravel(), 300) 519 | hyp[k] = x[i + 7] * v[i] # mutate 520 | 521 | # Clip to limits 522 | keys = ['lr0', 'iou_t', 'momentum', 'weight_decay', 'hsv_s', 'hsv_v', 'translate', 'scale', 'fl_gamma'] 523 | limits = [(1e-5, 1e-2), (0.00, 0.70), (0.60, 0.98), (0, 0.001), (0, .9), (0, .9), (0, .9), (0, .9), (0, 3)] 524 | for k, v in zip(keys, limits): 525 | hyp[k] = np.clip(hyp[k], v[0], v[1]) 526 | 527 | # Train mutation 528 | results = train(hyp.copy()) 529 | 530 | # Write mutation results 531 | print_mutation(hyp, results, opt.bucket) 532 | 533 | # Plot results 534 | # plot_evolution_results(hyp) 535 | -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /utils/adabound.py: -------------------------------------------------------------------------------- 1 | import math 2 | 3 | import torch 4 | from torch.optim.optimizer import Optimizer 5 | 6 | 7 | class AdaBound(Optimizer): 8 | """Implements AdaBound algorithm. 9 | It has been proposed in `Adaptive Gradient Methods with Dynamic Bound of Learning Rate`_. 10 | Arguments: 11 | params (iterable): iterable of parameters to optimize or dicts defining 12 | parameter groups 13 | lr (float, optional): Adam learning rate (default: 1e-3) 14 | betas (Tuple[float, float], optional): coefficients used for computing 15 | running averages of gradient and its square (default: (0.9, 0.999)) 16 | final_lr (float, optional): final (SGD) learning rate (default: 0.1) 17 | gamma (float, optional): convergence speed of the bound functions (default: 1e-3) 18 | eps (float, optional): term added to the denominator to improve 19 | numerical stability (default: 1e-8) 20 | weight_decay (float, optional): weight decay (L2 penalty) (default: 0) 21 | amsbound (boolean, optional): whether to use the AMSBound variant of this algorithm 22 | .. Adaptive Gradient Methods with Dynamic Bound of Learning Rate: 23 | https://openreview.net/forum?id=Bkg3g2R9FX 24 | """ 25 | 26 | def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), final_lr=0.1, gamma=1e-3, 27 | eps=1e-8, weight_decay=0, amsbound=False): 28 | if not 0.0 <= lr: 29 | raise ValueError("Invalid learning rate: {}".format(lr)) 30 | if not 0.0 <= eps: 31 | raise ValueError("Invalid epsilon value: {}".format(eps)) 32 | if not 0.0 <= betas[0] < 1.0: 33 | raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0])) 34 | if not 0.0 <= betas[1] < 1.0: 35 | raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1])) 36 | if not 0.0 <= final_lr: 37 | raise ValueError("Invalid final learning rate: {}".format(final_lr)) 38 | if not 0.0 <= gamma < 1.0: 39 | raise ValueError("Invalid gamma parameter: {}".format(gamma)) 40 | defaults = dict(lr=lr, betas=betas, final_lr=final_lr, gamma=gamma, eps=eps, 41 | weight_decay=weight_decay, amsbound=amsbound) 42 | super(AdaBound, self).__init__(params, defaults) 43 | 44 | self.base_lrs = list(map(lambda group: group['lr'], self.param_groups)) 45 | 46 | def __setstate__(self, state): 47 | super(AdaBound, self).__setstate__(state) 48 | for group in self.param_groups: 49 | group.setdefault('amsbound', False) 50 | 51 | def step(self, closure=None): 52 | """Performs a single optimization step. 53 | Arguments: 54 | closure (callable, optional): A closure that reevaluates the model 55 | and returns the loss. 56 | """ 57 | loss = None 58 | if closure is not None: 59 | loss = closure() 60 | 61 | for group, base_lr in zip(self.param_groups, self.base_lrs): 62 | for p in group['params']: 63 | if p.grad is None: 64 | continue 65 | grad = p.grad.data 66 | if grad.is_sparse: 67 | raise RuntimeError( 68 | 'Adam does not support sparse gradients, please consider SparseAdam instead') 69 | amsbound = group['amsbound'] 70 | 71 | state = self.state[p] 72 | 73 | # State initialization 74 | if len(state) == 0: 75 | state['step'] = 0 76 | # Exponential moving average of gradient values 77 | state['exp_avg'] = torch.zeros_like(p.data) 78 | # Exponential moving average of squared gradient values 79 | state['exp_avg_sq'] = torch.zeros_like(p.data) 80 | if amsbound: 81 | # Maintains max of all exp. moving avg. of sq. grad. values 82 | state['max_exp_avg_sq'] = torch.zeros_like(p.data) 83 | 84 | exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq'] 85 | if amsbound: 86 | max_exp_avg_sq = state['max_exp_avg_sq'] 87 | beta1, beta2 = group['betas'] 88 | 89 | state['step'] += 1 90 | 91 | if group['weight_decay'] != 0: 92 | grad = grad.add(group['weight_decay'], p.data) 93 | 94 | # Decay the first and second moment running average coefficient 95 | exp_avg.mul_(beta1).add_(1 - beta1, grad) 96 | exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad) 97 | if amsbound: 98 | # Maintains the maximum of all 2nd moment running avg. till now 99 | torch.max(max_exp_avg_sq, exp_avg_sq, out=max_exp_avg_sq) 100 | # Use the max. for normalizing running avg. of gradient 101 | denom = max_exp_avg_sq.sqrt().add_(group['eps']) 102 | else: 103 | denom = exp_avg_sq.sqrt().add_(group['eps']) 104 | 105 | bias_correction1 = 1 - beta1 ** state['step'] 106 | bias_correction2 = 1 - beta2 ** state['step'] 107 | step_size = group['lr'] * math.sqrt(bias_correction2) / bias_correction1 108 | 109 | # Applies bounds on actual learning rate 110 | # lr_scheduler cannot affect final_lr, this is a workaround to apply lr decay 111 | final_lr = group['final_lr'] * group['lr'] / base_lr 112 | lower_bound = final_lr * (1 - 1 / (group['gamma'] * state['step'] + 1)) 113 | upper_bound = final_lr * (1 + 1 / (group['gamma'] * state['step'])) 114 | step_size = torch.full_like(denom, step_size) 115 | step_size.div_(denom).clamp_(lower_bound, upper_bound).mul_(exp_avg) 116 | 117 | p.data.add_(-step_size) 118 | 119 | return loss 120 | 121 | 122 | class AdaBoundW(Optimizer): 123 | """Implements AdaBound algorithm with Decoupled Weight Decay (arxiv.org/abs/1711.05101) 124 | It has been proposed in `Adaptive Gradient Methods with Dynamic Bound of Learning Rate`_. 125 | Arguments: 126 | params (iterable): iterable of parameters to optimize or dicts defining 127 | parameter groups 128 | lr (float, optional): Adam learning rate (default: 1e-3) 129 | betas (Tuple[float, float], optional): coefficients used for computing 130 | running averages of gradient and its square (default: (0.9, 0.999)) 131 | final_lr (float, optional): final (SGD) learning rate (default: 0.1) 132 | gamma (float, optional): convergence speed of the bound functions (default: 1e-3) 133 | eps (float, optional): term added to the denominator to improve 134 | numerical stability (default: 1e-8) 135 | weight_decay (float, optional): weight decay (L2 penalty) (default: 0) 136 | amsbound (boolean, optional): whether to use the AMSBound variant of this algorithm 137 | .. Adaptive Gradient Methods with Dynamic Bound of Learning Rate: 138 | https://openreview.net/forum?id=Bkg3g2R9FX 139 | """ 140 | 141 | def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), final_lr=0.1, gamma=1e-3, 142 | eps=1e-8, weight_decay=0, amsbound=False): 143 | if not 0.0 <= lr: 144 | raise ValueError("Invalid learning rate: {}".format(lr)) 145 | if not 0.0 <= eps: 146 | raise ValueError("Invalid epsilon value: {}".format(eps)) 147 | if not 0.0 <= betas[0] < 1.0: 148 | raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0])) 149 | if not 0.0 <= betas[1] < 1.0: 150 | raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1])) 151 | if not 0.0 <= final_lr: 152 | raise ValueError("Invalid final learning rate: {}".format(final_lr)) 153 | if not 0.0 <= gamma < 1.0: 154 | raise ValueError("Invalid gamma parameter: {}".format(gamma)) 155 | defaults = dict(lr=lr, betas=betas, final_lr=final_lr, gamma=gamma, eps=eps, 156 | weight_decay=weight_decay, amsbound=amsbound) 157 | super(AdaBoundW, self).__init__(params, defaults) 158 | 159 | self.base_lrs = list(map(lambda group: group['lr'], self.param_groups)) 160 | 161 | def __setstate__(self, state): 162 | super(AdaBoundW, self).__setstate__(state) 163 | for group in self.param_groups: 164 | group.setdefault('amsbound', False) 165 | 166 | def step(self, closure=None): 167 | """Performs a single optimization step. 168 | Arguments: 169 | closure (callable, optional): A closure that reevaluates the model 170 | and returns the loss. 171 | """ 172 | loss = None 173 | if closure is not None: 174 | loss = closure() 175 | 176 | for group, base_lr in zip(self.param_groups, self.base_lrs): 177 | for p in group['params']: 178 | if p.grad is None: 179 | continue 180 | grad = p.grad.data 181 | if grad.is_sparse: 182 | raise RuntimeError( 183 | 'Adam does not support sparse gradients, please consider SparseAdam instead') 184 | amsbound = group['amsbound'] 185 | 186 | state = self.state[p] 187 | 188 | # State initialization 189 | if len(state) == 0: 190 | state['step'] = 0 191 | # Exponential moving average of gradient values 192 | state['exp_avg'] = torch.zeros_like(p.data) 193 | # Exponential moving average of squared gradient values 194 | state['exp_avg_sq'] = torch.zeros_like(p.data) 195 | if amsbound: 196 | # Maintains max of all exp. moving avg. of sq. grad. values 197 | state['max_exp_avg_sq'] = torch.zeros_like(p.data) 198 | 199 | exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq'] 200 | if amsbound: 201 | max_exp_avg_sq = state['max_exp_avg_sq'] 202 | beta1, beta2 = group['betas'] 203 | 204 | state['step'] += 1 205 | 206 | # Decay the first and second moment running average coefficient 207 | exp_avg.mul_(beta1).add_(1 - beta1, grad) 208 | exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad) 209 | if amsbound: 210 | # Maintains the maximum of all 2nd moment running avg. till now 211 | torch.max(max_exp_avg_sq, exp_avg_sq, out=max_exp_avg_sq) 212 | # Use the max. for normalizing running avg. of gradient 213 | denom = max_exp_avg_sq.sqrt().add_(group['eps']) 214 | else: 215 | denom = exp_avg_sq.sqrt().add_(group['eps']) 216 | 217 | bias_correction1 = 1 - beta1 ** state['step'] 218 | bias_correction2 = 1 - beta2 ** state['step'] 219 | step_size = group['lr'] * math.sqrt(bias_correction2) / bias_correction1 220 | 221 | # Applies bounds on actual learning rate 222 | # lr_scheduler cannot affect final_lr, this is a workaround to apply lr decay 223 | final_lr = group['final_lr'] * group['lr'] / base_lr 224 | lower_bound = final_lr * (1 - 1 / (group['gamma'] * state['step'] + 1)) 225 | upper_bound = final_lr * (1 + 1 / (group['gamma'] * state['step'])) 226 | step_size = torch.full_like(denom, step_size) 227 | step_size.div_(denom).clamp_(lower_bound, upper_bound).mul_(exp_avg) 228 | 229 | if group['weight_decay'] != 0: 230 | decayed_weights = torch.mul(p.data, group['weight_decay']) 231 | p.data.add_(-step_size) 232 | p.data.sub_(decayed_weights) 233 | else: 234 | p.data.add_(-step_size) 235 | 236 | return loss 237 | -------------------------------------------------------------------------------- /utils/evolve.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | #for i in 0 1 2 3 3 | #do 4 | # t=ultralytics/yolov3:v139 && sudo docker pull $t && sudo nvidia-docker run -d --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t utils/evolve.sh $i 5 | # sleep 30 6 | #done 7 | 8 | while true; do 9 | # python3 train.py --data ../data/sm4/out.data --img-size 320 --epochs 100 --batch 64 --accum 1 --weights yolov3-tiny.conv.15 --multi --bucket ult/wer --evolve --cache --device $1 --cfg yolov3-tiny3-1cls.cfg --single --adam 10 | # python3 train.py --data ../out/data.data --img-size 608 --epochs 10 --batch 8 --accum 8 --weights ultralytics68.pt --multi --bucket ult/athena --evolve --device $1 --cfg yolov3-spp-1cls.cfg 11 | 12 | python3 train.py --data coco2014.data --img-size 512 608 --epochs 27 --batch 8 --accum 8 --evolve --weights '' --bucket ult/coco/sppa_512 --device $1 --cfg yolov3-sppa.cfg --multi 13 | done 14 | 15 | 16 | # coco epoch times --img-size 416 608 --epochs 27 --batch 16 --accum 4 17 | # 36:34 2080ti 18 | # 21:58 V100 19 | # 63:00 T4 -------------------------------------------------------------------------------- /utils/gcp.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | # New VM 4 | rm -rf sample_data yolov3 5 | git clone https://github.com/ultralytics/yolov3 6 | # git clone -b test --depth 1 https://github.com/ultralytics/yolov3 test # branch 7 | # sudo apt-get install zip 8 | #git clone https://github.com/NVIDIA/apex && cd apex && pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" . --user && cd .. && rm -rf apex 9 | sudo conda install -yc conda-forge scikit-image pycocotools 10 | # python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('193Zp_ye-3qXMonR1nZj3YyxMtQkMy50k','coco2014.zip')" 11 | python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('1WQT6SOktSe8Uw6r10-2JhbEhMY5DJaph','coco2017.zip')" 12 | python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('1C3HewOG9akA3y456SZLBJZfNDPkBwAto','knife.zip')" 13 | python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('13g3LqdpkNE8sPosVJT6KFXlfoMypzRP4','sm4.zip')" 14 | sudo shutdown 15 | 16 | # Mount local SSD 17 | lsblk 18 | sudo mkfs.ext4 -F /dev/nvme0n1 19 | sudo mkdir -p /mnt/disks/nvme0n1 20 | sudo mount /dev/nvme0n1 /mnt/disks/nvme0n1 21 | sudo chmod a+w /mnt/disks/nvme0n1 22 | cp -r coco /mnt/disks/nvme0n1 23 | 24 | # Kill All 25 | t=ultralytics/yolov3:v1 26 | docker kill $(docker ps -a -q --filter ancestor=$t) 27 | 28 | # Evolve coco 29 | sudo -s 30 | t=ultralytics/yolov3:evolve 31 | # docker kill $(docker ps -a -q --filter ancestor=$t) 32 | for i in 0 1 6 7 33 | do 34 | docker pull $t && docker run --gpus all -d --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t bash utils/evolve.sh $i 35 | sleep 30 36 | done 37 | 38 | #COCO training 39 | n=131 && t=ultralytics/coco:v131 && sudo docker pull $t && sudo docker run -it --gpus all --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t python3 train.py --data coco2014.data --img-size 320 640 --epochs 300 --batch 16 --weights '' --device 0 --cfg yolov3-spp.cfg --bucket ult/coco --name $n && sudo shutdown 40 | n=132 && t=ultralytics/coco:v131 && sudo docker pull $t && sudo docker run -it --gpus all --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t python3 train.py --data coco2014.data --img-size 320 640 --epochs 300 --batch 64 --weights '' --device 0 --cfg yolov3-tiny.cfg --bucket ult/coco --name $n && sudo shutdown 41 | -------------------------------------------------------------------------------- /utils/google_utils.py: -------------------------------------------------------------------------------- 1 | # This file contains google utils: https://cloud.google.com/storage/docs/reference/libraries 2 | # pip install --upgrade google-cloud-storage 3 | 4 | import os 5 | import time 6 | 7 | 8 | # from google.cloud import storage 9 | 10 | 11 | def gdrive_download(id='1HaXkef9z6y5l4vUnCYgdmEAj61c6bfWO', name='coco.zip'): 12 | # https://gist.github.com/tanaikech/f0f2d122e05bf5f971611258c22c110f 13 | # Downloads a file from Google Drive, accepting presented query 14 | # from utils.google_utils import *; gdrive_download() 15 | t = time.time() 16 | 17 | print('Downloading https://drive.google.com/uc?export=download&id=%s as %s... ' % (id, name), end='') 18 | os.remove(name) if os.path.exists(name) else None # remove existing 19 | os.remove('cookie') if os.path.exists('cookie') else None 20 | 21 | # Attempt file download 22 | os.system("curl -c ./cookie -s -L \"https://drive.google.com/uc?export=download&id=%s\" > /dev/null" % id) 23 | if os.path.exists('cookie'): # large file 24 | s = "curl -Lb ./cookie \"https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=%s\" -o %s" % ( 25 | id, name) 26 | else: # small file 27 | s = "curl -s -L -o %s 'https://drive.google.com/uc?export=download&id=%s'" % (name, id) 28 | r = os.system(s) # execute, capture return values 29 | os.remove('cookie') if os.path.exists('cookie') else None 30 | 31 | # Error check 32 | if r != 0: 33 | os.remove(name) if os.path.exists(name) else None # remove partial 34 | print('Download error ') # raise Exception('Download error') 35 | return r 36 | 37 | # Unzip if archive 38 | if name.endswith('.zip'): 39 | print('unzipping... ', end='') 40 | os.system('unzip -q %s' % name) # unzip 41 | os.remove(name) # remove zip to free space 42 | 43 | print('Done (%.1fs)' % (time.time() - t)) 44 | return r 45 | 46 | 47 | def upload_blob(bucket_name, source_file_name, destination_blob_name): 48 | # Uploads a file to a bucket 49 | # https://cloud.google.com/storage/docs/uploading-objects#storage-upload-object-python 50 | 51 | storage_client = storage.Client() 52 | bucket = storage_client.get_bucket(bucket_name) 53 | blob = bucket.blob(destination_blob_name) 54 | 55 | blob.upload_from_filename(source_file_name) 56 | 57 | print('File {} uploaded to {}.'.format( 58 | source_file_name, 59 | destination_blob_name)) 60 | 61 | 62 | def download_blob(bucket_name, source_blob_name, destination_file_name): 63 | # Uploads a blob from a bucket 64 | storage_client = storage.Client() 65 | bucket = storage_client.get_bucket(bucket_name) 66 | blob = bucket.blob(source_blob_name) 67 | 68 | blob.download_to_filename(destination_file_name) 69 | 70 | print('Blob {} downloaded to {}.'.format( 71 | source_blob_name, 72 | destination_file_name)) 73 | -------------------------------------------------------------------------------- /utils/keepgit: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /utils/layers.py: -------------------------------------------------------------------------------- 1 | import torch.nn.functional as F 2 | 3 | from utils.utils import * 4 | 5 | 6 | def make_divisible(v, divisor): 7 | # Function ensures all layers have a channel number that is divisible by 8 8 | # https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py 9 | return math.ceil(v / divisor) * divisor 10 | 11 | 12 | class Flatten(nn.Module): 13 | # Use after nn.AdaptiveAvgPool2d(1) to remove last 2 dimensions 14 | def forward(self, x): 15 | return x.view(x.size(0), -1) 16 | 17 | 18 | class Concat(nn.Module): 19 | # Concatenate a list of tensors along dimension 20 | def __init__(self, dimension=1): 21 | super(Concat, self).__init__() 22 | self.d = dimension 23 | 24 | def forward(self, x): 25 | return torch.cat(x, self.d) 26 | 27 | 28 | class FeatureConcat(nn.Module): 29 | def __init__(self, layers): 30 | super(FeatureConcat, self).__init__() 31 | self.layers = layers # layer indices 32 | self.multiple = len(layers) > 1 # multiple layers flag 33 | 34 | def forward(self, x, outputs): 35 | return torch.cat([outputs[i] for i in self.layers], 1) if self.multiple else outputs[self.layers[0]] 36 | 37 | 38 | class WeightedFeatureFusion(nn.Module): # weighted sum of 2 or more layers https://arxiv.org/abs/1911.09070 39 | def __init__(self, layers, weight=False): 40 | super(WeightedFeatureFusion, self).__init__() 41 | self.layers = layers # layer indices 42 | self.weight = weight # apply weights boolean 43 | self.n = len(layers) + 1 # number of layers 44 | if weight: 45 | self.w = nn.Parameter(torch.zeros(self.n), requires_grad=True) # layer weights 46 | 47 | def forward(self, x, outputs): 48 | # Weights 49 | if self.weight: 50 | w = torch.sigmoid(self.w) * (2 / self.n) # sigmoid weights (0-1) 51 | x = x * w[0] 52 | 53 | # Fusion 54 | nx = x.shape[1] # input channels 55 | for i in range(self.n - 1): 56 | a = outputs[self.layers[i]] * w[i + 1] if self.weight else outputs[self.layers[i]] # feature to add 57 | na = a.shape[1] # feature channels 58 | 59 | # Adjust channels 60 | if nx == na: # same shape 61 | x = x + a 62 | elif nx > na: # slice input 63 | x[:, :na] = x[:, :na] + a # or a = nn.ZeroPad2d((0, 0, 0, 0, 0, dc))(a); x = x + a 64 | else: # slice feature 65 | x = x + a[:, :nx] 66 | 67 | return x 68 | 69 | 70 | class MixConv2d(nn.Module): # MixConv: Mixed Depthwise Convolutional Kernels https://arxiv.org/abs/1907.09595 71 | def __init__(self, in_ch, out_ch, k=(3, 5, 7), stride=1, dilation=1, bias=True, method='equal_params'): 72 | super(MixConv2d, self).__init__() 73 | 74 | groups = len(k) 75 | if method == 'equal_ch': # equal channels per group 76 | i = torch.linspace(0, groups - 1E-6, out_ch).floor() # out_ch indices 77 | ch = [(i == g).sum() for g in range(groups)] 78 | else: # 'equal_params': equal parameter count per group 79 | b = [out_ch] + [0] * groups 80 | a = np.eye(groups + 1, groups, k=-1) 81 | a -= np.roll(a, 1, axis=1) 82 | a *= np.array(k) ** 2 83 | a[0] = 1 84 | ch = np.linalg.lstsq(a, b, rcond=None)[0].round().astype(int) # solve for equal weight indices, ax = b 85 | 86 | self.m = nn.ModuleList([nn.Conv2d(in_channels=in_ch, 87 | out_channels=ch[g], 88 | kernel_size=k[g], 89 | stride=stride, 90 | padding=k[g] // 2, # 'same' pad 91 | dilation=dilation, 92 | bias=bias) for g in range(groups)]) 93 | 94 | def forward(self, x): 95 | return torch.cat([m(x) for m in self.m], 1) 96 | 97 | 98 | # Activation functions below ------------------------------------------------------------------------------------------- 99 | class SwishImplementation(torch.autograd.Function): 100 | @staticmethod 101 | def forward(ctx, x): 102 | ctx.save_for_backward(x) 103 | return x * torch.sigmoid(x) 104 | 105 | @staticmethod 106 | def backward(ctx, grad_output): 107 | x = ctx.saved_tensors[0] 108 | sx = torch.sigmoid(x) # sigmoid(ctx) 109 | return grad_output * (sx * (1 + x * (1 - sx))) 110 | 111 | 112 | class MishImplementation(torch.autograd.Function): 113 | @staticmethod 114 | def forward(ctx, x): 115 | ctx.save_for_backward(x) 116 | return x.mul(torch.tanh(F.softplus(x))) # x * tanh(ln(1 + exp(x))) 117 | 118 | @staticmethod 119 | def backward(ctx, grad_output): 120 | x = ctx.saved_tensors[0] 121 | sx = torch.sigmoid(x) 122 | fx = F.softplus(x).tanh() 123 | return grad_output * (fx + x * sx * (1 - fx * fx)) 124 | 125 | 126 | class MemoryEfficientSwish(nn.Module): 127 | def forward(self, x): 128 | return SwishImplementation.apply(x) 129 | 130 | 131 | class MemoryEfficientMish(nn.Module): 132 | def forward(self, x): 133 | return MishImplementation.apply(x) 134 | 135 | 136 | class Swish(nn.Module): 137 | def forward(self, x): 138 | return x * torch.sigmoid(x) 139 | 140 | 141 | class HardSwish(nn.Module): # https://arxiv.org/pdf/1905.02244.pdf 142 | def forward(self, x): 143 | return x * F.hardtanh(x + 3, 0., 6., True) / 6. 144 | 145 | 146 | class Mish(nn.Module): # https://github.com/digantamisra98/Mish 147 | def forward(self, x): 148 | return x * F.softplus(x).tanh() 149 | -------------------------------------------------------------------------------- /utils/parse_config.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | import numpy as np 4 | 5 | 6 | def parse_model_cfg(path): 7 | # Parse the yolo *.cfg file and return module definitions path may be 'cfg/yolov3.cfg', 'yolov3.cfg', or 'yolov3' 8 | if not path.endswith('.cfg'): # add .cfg suffix if omitted 9 | path += '.cfg' 10 | if not os.path.exists(path) and os.path.exists('cfg' + os.sep + path): # add cfg/ prefix if omitted 11 | path = 'cfg' + os.sep + path 12 | 13 | with open(path, 'r') as f: 14 | lines = f.read().split('\n') 15 | lines = [x for x in lines if x and not x.startswith('#')] 16 | lines = [x.rstrip().lstrip() for x in lines] # get rid of fringe whitespaces 17 | mdefs = [] # module definitions 18 | for line in lines: 19 | if line.startswith('['): # This marks the start of a new block 20 | mdefs.append({}) 21 | mdefs[-1]['type'] = line[1:-1].rstrip() 22 | if mdefs[-1]['type'] == 'convolutional' or mdefs[-1]['type'] == 'quan_convolutional': 23 | mdefs[-1]['batch_normalize'] = 0 # pre-populate with zeros (may be overwritten later) 24 | else: 25 | key, val = line.split("=") 26 | key = key.rstrip() 27 | 28 | if key == 'anchors': # return nparray 29 | mdefs[-1][key] = np.array([float(x) for x in val.split(',')]).reshape((-1, 2)) # np anchors 30 | elif (key in ['from', 'layers', 'mask']) or (key == 'size' and ',' in val): # return array 31 | mdefs[-1][key] = [int(x) for x in val.split(',')] 32 | else: 33 | val = val.strip() 34 | if val.isnumeric(): # return int or float 35 | mdefs[-1][key] = int(val) if (int(val) - float(val)) == 0 else float(val) 36 | else: 37 | mdefs[-1][key] = val # return string 38 | 39 | # Check all fields are supported 40 | supported = ['type', 'batch_normalize', 'filters', 'size', 'stride', 'pad', 'activation', 'layers', 'groups', 41 | 'from', 'mask', 'anchors', 'classes', 'num', 'jitter', 'ignore_thresh', 'truth_thresh', 'random', 42 | 'stride_x', 'stride_y', 'weights_type', 'weights_normalization', 'scale_x_y', 'beta_nms', 'nms_kind', 43 | 'iou_loss', 'iou_normalizer', 'cls_normalizer', 'iou_thresh', 'group', 'reduction','first'] 44 | 45 | f = [] # fields 46 | for x in mdefs[1:]: 47 | [f.append(k) for k in x if k not in f] 48 | u = [x for x in f if x not in supported] # unsupported fields 49 | assert not any(u), "Unsupported fields %s in %s. See https://github.com/ultralytics/yolov3/issues/631" % (u, path) 50 | 51 | return mdefs 52 | 53 | 54 | def parse_data_cfg(path): 55 | # Parses the data configuration file 56 | if not os.path.exists(path) and os.path.exists('data' + os.sep + path): # add data/ prefix if omitted 57 | path = 'data' + os.sep + path 58 | 59 | with open(path, 'r') as f: 60 | lines = f.readlines() 61 | 62 | options = dict() 63 | for line in lines: 64 | line = line.strip() 65 | if line == '' or line.startswith('#'): 66 | continue 67 | key, val = line.split('=') 68 | options[key.strip()] = val.strip() 69 | 70 | return options 71 | -------------------------------------------------------------------------------- /utils/prune_utils.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from terminaltables import AsciiTable 3 | from copy import deepcopy 4 | import numpy as np 5 | import torch.nn.functional as F 6 | 7 | 8 | def get_sr_flag(epoch, sr): 9 | # return epoch >= 5 and sr 10 | return sr 11 | 12 | def parse_module_defs3(module_defs): 13 | 14 | CBL_idx = [] 15 | Conv_idx = [] 16 | for i, module_def in enumerate(module_defs): 17 | if module_def['type'] == 'convolutional': 18 | if module_def['batch_normalize'] == '1': 19 | CBL_idx.append(i) 20 | else: 21 | Conv_idx.append(i) 22 | 23 | ignore_idx = set() 24 | 25 | ignore_idx.add(18) 26 | 27 | 28 | prune_idx = [idx for idx in CBL_idx if idx not in ignore_idx] 29 | 30 | return CBL_idx, Conv_idx, prune_idx 31 | 32 | def parse_module_defs2(module_defs): 33 | 34 | CBL_idx = [] 35 | Conv_idx = [] 36 | shortcut_idx=dict() 37 | shortcut_all=set() 38 | for i, module_def in enumerate(module_defs): 39 | if module_def['type'] == 'convolutional': 40 | if module_def['batch_normalize'] == '1': 41 | CBL_idx.append(i) 42 | else: 43 | Conv_idx.append(i) 44 | 45 | ignore_idx = set() 46 | for i, module_def in enumerate(module_defs): 47 | if module_def['type'] == 'shortcut': 48 | identity_idx = (i + int(module_def['from'])) 49 | if module_defs[identity_idx]['type'] == 'convolutional': 50 | 51 | #ignore_idx.add(identity_idx) 52 | shortcut_idx[i-1]=identity_idx 53 | shortcut_all.add(identity_idx) 54 | elif module_defs[identity_idx]['type'] == 'shortcut': 55 | 56 | #ignore_idx.add(identity_idx - 1) 57 | shortcut_idx[i-1]=identity_idx-1 58 | shortcut_all.add(identity_idx-1) 59 | shortcut_all.add(i-1) 60 | #上采样层前的卷积层不裁剪 61 | ignore_idx.add(84) 62 | ignore_idx.add(96) 63 | 64 | prune_idx = [idx for idx in CBL_idx if idx not in ignore_idx] 65 | 66 | return CBL_idx, Conv_idx, prune_idx,shortcut_idx,shortcut_all 67 | 68 | def parse_module_defs(module_defs): 69 | 70 | CBL_idx = [] 71 | Conv_idx = [] 72 | for i, module_def in enumerate(module_defs): 73 | if i > 139: 74 | if module_def['type'] == 'convolutional': 75 | if module_def['batch_normalize'] == 1: 76 | CBL_idx.append(i) 77 | else: 78 | Conv_idx.append(i) 79 | ignore_idx = set() 80 | #for i, module_def in enumerate(module_defs): 81 | #if module_def['type'] == 'shortcut': 82 | #ignore_idx.add(i-1) 83 | #identity_idx = (i + int(module_def['from'])) 84 | #if module_defs[identity_idx]['type'] == 'convolutional': 85 | #ignore_idx.add(identity_idx) 86 | #elif module_defs[identity_idx]['type'] == 'shortcut': 87 | #ignore_idx.add(identity_idx - 1) 88 | #上采样层前的卷积层不裁剪 89 | ignore_idx.add(149) 90 | ignore_idx.add(161) 91 | 92 | prune_idx = [idx for idx in CBL_idx if idx not in ignore_idx] 93 | 94 | return CBL_idx, Conv_idx, prune_idx 95 | 96 | 97 | def gather_bn_weights(module_list, prune_idx): 98 | 99 | size_list = [module_list[idx][1].weight.data.shape[0] for idx in prune_idx] 100 | 101 | bn_weights = torch.zeros(sum(size_list)) 102 | index = 0 103 | for idx, size in zip(prune_idx, size_list): 104 | bn_weights[index:(index + size)] = module_list[idx][1].weight.data.abs().clone() 105 | index += size 106 | 107 | return bn_weights 108 | 109 | 110 | def write_cfg(cfg_file, module_defs): 111 | 112 | with open(cfg_file, 'w') as f: 113 | for module_def in module_defs: 114 | f.write(f"[{module_def['type']}]\n") 115 | for key, value in module_def.items(): 116 | if key != 'type': 117 | f.write(f"{key}={value}\n") 118 | f.write("\n") 119 | return cfg_file 120 | 121 | 122 | class BNOptimizer(): 123 | 124 | @staticmethod 125 | def updateBN(sr_flag, module_list, s, prune_idx): 126 | if sr_flag: 127 | for idx in prune_idx: 128 | # Squential(Conv, BN, Lrelu) 129 | bn_module = module_list[idx][1] 130 | bn_module.weight.grad.data.add_(s * torch.sign(bn_module.weight.data)) # L1 131 | 132 | 133 | def obtain_quantiles(bn_weights, num_quantile=5): 134 | 135 | sorted_bn_weights, i = torch.sort(bn_weights) 136 | total = sorted_bn_weights.shape[0] 137 | quantiles = sorted_bn_weights.tolist()[-1::-total//num_quantile][::-1] 138 | print("\nBN weights quantile:") 139 | quantile_table = [ 140 | [f'{i}/{num_quantile}' for i in range(1, num_quantile+1)], 141 | ["%.3f" % quantile for quantile in quantiles] 142 | ] 143 | print(AsciiTable(quantile_table).table) 144 | 145 | return quantiles 146 | 147 | 148 | def get_input_mask(module_defs, idx, CBLidx2mask): 149 | 150 | if idx == 140: 151 | return np.ones(960) 152 | 153 | elif module_defs[idx - 1]['type'] == 'convolutional': 154 | return CBLidx2mask[idx - 1] 155 | elif module_defs[idx - 1]['type'] == 'shortcut': 156 | return CBLidx2mask[idx - 2] 157 | elif module_defs[idx - 1]['type'] == 'route': 158 | route_in_idxs = [] 159 | for layer_i in module_defs[idx - 1]['layers']: 160 | if int(layer_i) < 0: 161 | route_in_idxs.append(idx - 1 + int(layer_i)) 162 | else: 163 | route_in_idxs.append(int(layer_i)) 164 | print(route_in_idxs) 165 | if len(route_in_idxs) == 1: 166 | return CBLidx2mask[route_in_idxs[0]] 167 | elif len(route_in_idxs) == 2: 168 | if 96 in route_in_idxs: 169 | #return np.concatenate([CBLidx2mask[in_idx - 1] for in_idx in route_in_idxs]) 170 | return np.concatenate([np.ones(112), CBLidx2mask[149]]) 171 | elif 45 in route_in_idxs: 172 | return np.concatenate([np.ones(40), CBLidx2mask[161]]) 173 | 174 | else: 175 | print("Something wrong with route module!") 176 | raise Exception 177 | 178 | 179 | def init_weights_from_loose_model(compact_model, loose_model, CBL_idx, Conv_idx, CBLidx2mask): 180 | 181 | for idx in CBL_idx: 182 | compact_CBL = compact_model.module_list[idx] 183 | loose_CBL = loose_model.module_list[idx] 184 | out_channel_idx = np.argwhere(CBLidx2mask[idx])[:, 0].tolist() 185 | 186 | compact_bn, loose_bn = compact_CBL[1], loose_CBL[1] 187 | compact_bn.weight.data = loose_bn.weight.data[out_channel_idx].clone() 188 | compact_bn.bias.data = loose_bn.bias.data[out_channel_idx].clone() 189 | compact_bn.running_mean.data = loose_bn.running_mean.data[out_channel_idx].clone() 190 | compact_bn.running_var.data = loose_bn.running_var.data[out_channel_idx].clone() 191 | 192 | input_mask = get_input_mask(loose_model.module_defs, idx, CBLidx2mask) 193 | in_channel_idx = np.argwhere(input_mask)[:, 0].tolist() 194 | compact_conv, loose_conv = compact_CBL[0], loose_CBL[0] 195 | tmp = loose_conv.weight.data[:, in_channel_idx, :, :].clone() 196 | compact_conv.weight.data = tmp[out_channel_idx, :, :, :].clone() 197 | 198 | for idx in Conv_idx: 199 | compact_conv = compact_model.module_list[idx][0] 200 | loose_conv = loose_model.module_list[idx][0] 201 | 202 | input_mask = get_input_mask(loose_model.module_defs, idx, CBLidx2mask) 203 | in_channel_idx = np.argwhere(input_mask)[:, 0].tolist() 204 | compact_conv.weight.data = loose_conv.weight.data[:, in_channel_idx, :, :].clone() 205 | compact_conv.bias.data = loose_conv.bias.data.clone() 206 | 207 | 208 | def prune_model_keep_size(model, prune_idx, CBL_idx, CBLidx2mask): 209 | 210 | pruned_model = deepcopy(model) 211 | for idx in prune_idx: 212 | mask = torch.from_numpy(CBLidx2mask[idx]).cuda() 213 | bn_module = pruned_model.module_list[idx][1] 214 | 215 | bn_module.weight.data.mul_(mask) 216 | 217 | activation = F.leaky_relu((1 - mask) * bn_module.bias.data, 0.1) 218 | 219 | # 两个上采样层前的卷积层 220 | next_idx_list = [idx + 1] 221 | if idx == 79: 222 | next_idx_list.append(84) 223 | elif idx == 91: 224 | next_idx_list.append(96) 225 | 226 | for next_idx in next_idx_list: 227 | next_conv = pruned_model.module_list[next_idx][0] 228 | conv_sum = next_conv.weight.data.sum(dim=(2, 3)) 229 | offset = conv_sum.matmul(activation.reshape(-1, 1)).reshape(-1) 230 | if next_idx in CBL_idx: 231 | next_bn = pruned_model.module_list[next_idx][1] 232 | next_bn.running_mean.data.sub_(offset) 233 | else: 234 | #这里需要注意的是,对于convolutionnal,如果有BN,则该层卷积层不使用bias,如果无BN,则使用bias 235 | next_conv.bias.data.add_(offset) 236 | 237 | bn_module.bias.data.mul_(mask) 238 | 239 | return pruned_model 240 | 241 | 242 | def obtain_bn_mask(bn_module, thre): 243 | 244 | thre = thre.cuda() 245 | mask = bn_module.weight.data.abs().ge(thre).float() 246 | 247 | return mask 248 | -------------------------------------------------------------------------------- /utils/quant_dorefa.py: -------------------------------------------------------------------------------- 1 | import math 2 | import time 3 | import torch 4 | import torch.nn as nn 5 | import numpy as np 6 | from torch.autograd import Function 7 | import torch.nn.functional as F 8 | 9 | 10 | 11 | 12 | class ScaleSigner(Function): 13 | """take a real value x, output sign(x)*E(|x|)""" 14 | @staticmethod 15 | def forward(ctx, input): 16 | return torch.sign(input) * torch.mean(torch.abs(input)) 17 | 18 | @staticmethod 19 | def backward(ctx, grad_output): 20 | return grad_output 21 | 22 | 23 | def scale_sign(input): 24 | return ScaleSigner.apply(input) 25 | 26 | 27 | #真正起作用的量化函数 28 | class Quantizer(Function): 29 | @staticmethod 30 | def forward(ctx, input, nbit): 31 | scale = 2 ** nbit - 1 32 | return torch.round(input * scale) / scale 33 | 34 | @staticmethod 35 | def backward(ctx, grad_output): 36 | return grad_output, None 37 | 38 | 39 | def quantize(input, nbit): 40 | return Quantizer.apply(input, nbit) 41 | 42 | 43 | def dorefa_w(w, nbit_w): 44 | if nbit_w == 1: 45 | w = scale_sign(w) 46 | else: 47 | w = torch.tanh(w) 48 | #将权重限制在[0,1]之间 49 | w = w / (2 * torch.max(torch.abs(w))) + 0.5 50 | #权重量化 51 | w = 2 * quantize(w, nbit_w) - 1 52 | 53 | return w 54 | 55 | 56 | def dorefa_a(input, nbit_a): 57 | return quantize(torch.clamp(0.1 * input, 0, 1), nbit_a) 58 | 59 | 60 | class QuanConv(nn.Conv2d): 61 | """docstring for QuanConv""" 62 | def __init__(self, in_channels, out_channels, kernel_size, quan_name_w='dorefa', quan_name_a='dorefa', nbit_w=32, 63 | nbit_a=32, stride=1, 64 | padding=0, dilation=1, groups=1, 65 | bias=True): 66 | super(QuanConv, self).__init__( 67 | in_channels, out_channels, kernel_size, stride, padding, dilation, 68 | groups, bias) 69 | self.nbit_w = nbit_w 70 | self.nbit_a = nbit_a 71 | name_w_dict = {'dorefa': dorefa_w} 72 | name_a_dict = {'dorefa': dorefa_a} 73 | self.quan_w = name_w_dict[quan_name_w] 74 | self.quan_a = name_a_dict[quan_name_a] 75 | 76 | # @weak_script_method 77 | def forward(self, input): 78 | if self.nbit_w <=32: 79 | #量化卷积 80 | w = self.quan_w(self.weight, self.nbit_w) 81 | else: 82 | #卷积保持不变 83 | w = self.weight 84 | 85 | if self.nbit_a <=32: 86 | #量化激活 87 | x = self.quan_a(input, self.nbit_a) 88 | else: 89 | #激活保持不变 90 | x = input 91 | # print('x unique',np.unique(x.detach().numpy()).shape) 92 | # print('w unique',np.unique(w.detach().numpy()).shape) 93 | 94 | #做真正的卷积运算 95 | 96 | output = F.conv2d(x, w, self.bias, self.stride, self.padding, self.dilation, self.groups) 97 | 98 | return output 99 | 100 | class Linear_Q(nn.Linear): 101 | def __init__(self, in_features, out_features, bias=True, quan_name_w='dorefa', quan_name_a='dorefa', nbit_w=32, nbit_a=32): 102 | super(Linear_Q, self).__init__(in_features, out_features, bias) 103 | self.nbit_w = nbit_w 104 | self.nbit_a = nbit_a 105 | name_w_dict = {'dorefa': dorefa_w} 106 | name_a_dict = {'dorefa': dorefa_a} 107 | self.quan_w = name_w_dict[quan_name_w] 108 | self.quan_a = name_a_dict[quan_name_a] 109 | 110 | # @weak_script_method 111 | def forward(self, input): 112 | if self.nbit_w < 32: 113 | w = self.quan_w(self.weight, self.nbit_w) 114 | else: 115 | w = self.weight 116 | 117 | if self.nbit_a < 32: 118 | x = self.quan_a(input, self.nbit_a) 119 | else: 120 | x = input 121 | 122 | # print('x unique',np.unique(x.detach().numpy())) 123 | # print('w unique',np.unique(w.detach().numpy())) 124 | 125 | output = F.linear(x, w, self.bias) 126 | 127 | return output 128 | 129 | 130 | -------------------------------------------------------------------------------- /utils/torch_utils.py: -------------------------------------------------------------------------------- 1 | import math 2 | import os 3 | import time 4 | from copy import deepcopy 5 | 6 | import torch 7 | import torch.backends.cudnn as cudnn 8 | import torch.nn as nn 9 | import torch.nn.functional as F 10 | 11 | 12 | def init_seeds(seed=0): 13 | torch.manual_seed(seed) 14 | 15 | # Reduce randomness (may be slower on Tesla GPUs) # https://pytorch.org/docs/stable/notes/randomness.html 16 | if seed == 0: 17 | cudnn.deterministic = False 18 | cudnn.benchmark = True 19 | 20 | 21 | def select_device(device='', apex=False, batch_size=None): 22 | # device = 'cpu' or '0' or '0,1,2,3' 23 | cpu_request = device.lower() == 'cpu' 24 | if device and not cpu_request: # if device requested other than 'cpu' 25 | os.environ['CUDA_VISIBLE_DEVICES'] = device # set environment variable 26 | assert torch.cuda.is_available(), 'CUDA unavailable, invalid device %s requested' % device # check availablity 27 | 28 | cuda = False if cpu_request else torch.cuda.is_available() 29 | if cuda: 30 | c = 1024 ** 2 # bytes to MB 31 | ng = torch.cuda.device_count() 32 | if ng > 1 and batch_size: # check that batch_size is compatible with device_count 33 | assert batch_size % ng == 0, 'batch-size %g not multiple of GPU count %g' % (batch_size, ng) 34 | x = [torch.cuda.get_device_properties(i) for i in range(ng)] 35 | s = 'Using CUDA ' + ('Apex ' if apex else '') # apex for mixed precision https://github.com/NVIDIA/apex 36 | for i in range(0, ng): 37 | if i == 1: 38 | s = ' ' * len(s) 39 | print("%sdevice%g _CudaDeviceProperties(name='%s', total_memory=%dMB)" % 40 | (s, i, x[i].name, x[i].total_memory / c)) 41 | else: 42 | print('Using CPU') 43 | 44 | print('') # skip a line 45 | return torch.device('cuda:0' if cuda else 'cpu') 46 | 47 | 48 | def time_synchronized(): 49 | torch.cuda.synchronize() if torch.cuda.is_available() else None 50 | return time.time() 51 | 52 | 53 | def initialize_weights(model): 54 | for m in model.modules(): 55 | t = type(m) 56 | if t is nn.Conv2d: 57 | pass # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 58 | elif t is nn.BatchNorm2d: 59 | m.eps = 1e-4 60 | m.momentum = 0.03 61 | elif t in [nn.LeakyReLU, nn.ReLU, nn.ReLU6]: 62 | m.inplace = True 63 | 64 | 65 | def find_modules(model, mclass=nn.Conv2d): 66 | # finds layer indices matching module class 'mclass' 67 | return [i for i, m in enumerate(model.module_list) if isinstance(m, mclass)] 68 | 69 | 70 | def fuse_conv_and_bn(conv, bn): 71 | # https://tehnokv.com/posts/fusing-batchnorm-and-conv/ 72 | with torch.no_grad(): 73 | # init 74 | fusedconv = torch.nn.Conv2d(conv.in_channels, 75 | conv.out_channels, 76 | kernel_size=conv.kernel_size, 77 | stride=conv.stride, 78 | padding=conv.padding, 79 | bias=True) 80 | 81 | # prepare filters 82 | w_conv = conv.weight.clone().view(conv.out_channels, -1) 83 | w_bn = torch.diag(bn.weight.div(torch.sqrt(bn.eps + bn.running_var))) 84 | fusedconv.weight.copy_(torch.mm(w_bn, w_conv).view(fusedconv.weight.size())) 85 | 86 | # prepare spatial bias 87 | if conv.bias is not None: 88 | b_conv = conv.bias 89 | else: 90 | b_conv = torch.zeros(conv.weight.size(0)) 91 | b_bn = bn.bias - bn.weight.mul(bn.running_mean).div(torch.sqrt(bn.running_var + bn.eps)) 92 | fusedconv.bias.copy_(torch.mm(w_bn, b_conv.reshape(-1, 1)).reshape(-1) + b_bn) 93 | 94 | return fusedconv 95 | 96 | 97 | def model_info(model, verbose=False): 98 | # Plots a line-by-line description of a PyTorch model 99 | n_p = sum(x.numel() for x in model.parameters()) # number parameters 100 | n_g = sum(x.numel() for x in model.parameters() if x.requires_grad) # number gradients 101 | if verbose: 102 | print('%5s %40s %9s %12s %20s %10s %10s' % ('layer', 'name', 'gradient', 'parameters', 'shape', 'mu', 'sigma')) 103 | for i, (name, p) in enumerate(model.named_parameters()): 104 | name = name.replace('module_list.', '') 105 | print('%5g %40s %9s %12g %20s %10.3g %10.3g' % 106 | (i, name, p.requires_grad, p.numel(), list(p.shape), p.mean(), p.std())) 107 | 108 | try: # FLOPS 109 | from thop import profile 110 | macs, _ = profile(model, inputs=(torch.zeros(1, 3, 480, 640),), verbose=False) 111 | fs = ', %.1f GFLOPS' % (macs / 1E9 * 2) 112 | except: 113 | fs = '' 114 | 115 | print('Model Summary: %g layers, %g parameters, %g gradients%s' % (len(list(model.parameters())), n_p, n_g, fs)) 116 | 117 | 118 | def load_classifier(name='resnet101', n=2): 119 | # Loads a pretrained model reshaped to n-class output 120 | import pretrainedmodels # https://github.com/Cadene/pretrained-models.pytorch#torchvision 121 | model = pretrainedmodels.__dict__[name](num_classes=1000, pretrained='imagenet') 122 | 123 | # Display model properties 124 | for x in ['model.input_size', 'model.input_space', 'model.input_range', 'model.mean', 'model.std']: 125 | print(x + ' =', eval(x)) 126 | 127 | # Reshape output to n classes 128 | filters = model.last_linear.weight.shape[1] 129 | model.last_linear.bias = torch.nn.Parameter(torch.zeros(n)) 130 | model.last_linear.weight = torch.nn.Parameter(torch.zeros(n, filters)) 131 | model.last_linear.out_features = n 132 | return model 133 | 134 | 135 | def scale_img(img, ratio=1.0, same_shape=True): # img(16,3,256,416), r=ratio 136 | # scales img(bs,3,y,x) by ratio 137 | h, w = img.shape[2:] 138 | s = (int(h * ratio), int(w * ratio)) # new size 139 | img = F.interpolate(img, size=s, mode='bilinear', align_corners=False) # resize 140 | if not same_shape: # pad/crop img 141 | gs = 64 # (pixels) grid size 142 | h, w = [math.ceil(x * ratio / gs) * gs for x in (h, w)] 143 | return F.pad(img, [0, w - s[1], 0, h - s[0]], value=0.447) # value = imagenet mean 144 | 145 | 146 | class ModelEMA: 147 | """ Model Exponential Moving Average from https://github.com/rwightman/pytorch-image-models 148 | Keep a moving average of everything in the model state_dict (parameters and buffers). 149 | This is intended to allow functionality like 150 | https://www.tensorflow.org/api_docs/python/tf/train/ExponentialMovingAverage 151 | A smoothed version of the weights is necessary for some training schemes to perform well. 152 | E.g. Google's hyper-params for training MNASNet, MobileNet-V3, EfficientNet, etc that use 153 | RMSprop with a short 2.4-3 epoch decay period and slow LR decay rate of .96-.99 requires EMA 154 | smoothing of weights to match results. Pay attention to the decay constant you are using 155 | relative to your update count per epoch. 156 | To keep EMA from using GPU resources, set device='cpu'. This will save a bit of memory but 157 | disable validation of the EMA weights. Validation will have to be done manually in a separate 158 | process, or after the training stops converging. 159 | This class is sensitive where it is initialized in the sequence of model init, 160 | GPU assignment and distributed training wrappers. 161 | I've tested with the sequence in my own train.py for torch.DataParallel, apex.DDP, and single-GPU. 162 | """ 163 | 164 | def __init__(self, model, decay=0.9999, device=''): 165 | # make a copy of the model for accumulating moving average of weights 166 | self.ema = deepcopy(model) 167 | self.ema.eval() 168 | self.updates = 0 # number of EMA updates 169 | self.decay = lambda x: decay * (1 - math.exp(-x / 2000)) # decay exponential ramp (to help early epochs) 170 | self.device = device # perform ema on different device from model if set 171 | if device: 172 | self.ema.to(device=device) 173 | for p in self.ema.parameters(): 174 | p.requires_grad_(False) 175 | 176 | def update(self, model): 177 | self.updates += 1 178 | d = self.decay(self.updates) 179 | with torch.no_grad(): 180 | if type(model) in (nn.parallel.DataParallel, nn.parallel.DistributedDataParallel): 181 | msd, esd = model.module.state_dict(), self.ema.module.state_dict() 182 | else: 183 | msd, esd = model.state_dict(), self.ema.state_dict() 184 | 185 | for k, v in esd.items(): 186 | if v.dtype.is_floating_point: 187 | v *= d 188 | v += (1. - d) * msd[k].detach() 189 | 190 | def update_attr(self, model): 191 | # Assign attributes (which may change during training) 192 | for k in model.__dict__.keys(): 193 | if not k.startswith('_'): 194 | setattr(self.ema, k, getattr(model, k)) 195 | -------------------------------------------------------------------------------- /utils/util_wqaq.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | from torch import distributed 5 | from torch.nn import init 6 | from torch.nn.parameter import Parameter 7 | from torch.autograd import Function 8 | 9 | # ********************* range_trackers(范围统计器,统计量化前范围) ********************* 10 | class RangeTracker(nn.Module): 11 | def __init__(self, q_level): 12 | super().__init__() 13 | self.q_level = q_level 14 | 15 | def update_range(self, min_val, max_val): 16 | raise NotImplementedError 17 | 18 | @torch.no_grad() 19 | def forward(self, input): 20 | if self.q_level == 'L': # A,min_max_shape=(1, 1, 1, 1),layer级 21 | min_val = torch.min(input) 22 | max_val = torch.max(input) 23 | elif self.q_level == 'C': # W,min_max_shape=(N, 1, 1, 1),channel级 24 | min_val = torch.min(torch.min(torch.min(input, 3, keepdim=True)[0], 2, keepdim=True)[0], 1, keepdim=True)[0] 25 | max_val = torch.max(torch.max(torch.max(input, 3, keepdim=True)[0], 2, keepdim=True)[0], 1, keepdim=True)[0] 26 | 27 | self.update_range(min_val, max_val) 28 | class GlobalRangeTracker(RangeTracker): # W,min_max_shape=(N, 1, 1, 1),channel级,取本次和之前相比的min_max —— (N, C, W, H) 29 | def __init__(self, q_level, out_channels): 30 | super().__init__(q_level) 31 | self.register_buffer('min_val', torch.zeros(out_channels, 1, 1, 1)) 32 | self.register_buffer('max_val', torch.zeros(out_channels, 1, 1, 1)) 33 | self.register_buffer('first_w', torch.zeros(1)) 34 | 35 | def update_range(self, min_val, max_val): 36 | temp_minval = self.min_val 37 | temp_maxval = self.max_val 38 | if self.first_w == 0: 39 | self.first_w.add_(1) 40 | self.min_val.add_(min_val) 41 | self.max_val.add_(max_val) 42 | else: 43 | self.min_val.add_(-temp_minval).add_(torch.min(temp_minval, min_val)) 44 | self.max_val.add_(-temp_maxval).add_(torch.max(temp_maxval, max_val)) 45 | class AveragedRangeTracker(RangeTracker): # A,min_max_shape=(1, 1, 1, 1),layer级,取running_min_max —— (N, C, W, H) 46 | def __init__(self, q_level, momentum=0.1): 47 | super().__init__(q_level) 48 | self.momentum = momentum 49 | self.register_buffer('min_val', torch.zeros(1)) 50 | self.register_buffer('max_val', torch.zeros(1)) 51 | self.register_buffer('first_a', torch.zeros(1)) 52 | 53 | def update_range(self, min_val, max_val): 54 | if self.first_a == 0: 55 | self.first_a.add_(1) 56 | self.min_val.add_(min_val) 57 | self.max_val.add_(max_val) 58 | else: 59 | self.min_val.mul_(1 - self.momentum).add_(min_val * self.momentum) 60 | self.max_val.mul_(1 - self.momentum).add_(max_val * self.momentum) 61 | 62 | # ********************* quantizers(量化器,量化) ********************* 63 | class Round(Function): 64 | 65 | @staticmethod 66 | def forward(self, input): 67 | output = torch.round(input) 68 | return output 69 | 70 | @staticmethod 71 | def backward(self, grad_output): 72 | grad_input = grad_output.clone() 73 | return grad_input 74 | class Quantizer(nn.Module): 75 | def __init__(self, bits, range_tracker): 76 | super().__init__() 77 | self.bits = bits 78 | self.range_tracker = range_tracker 79 | self.register_buffer('scale', None) # 量化比例因子 80 | self.register_buffer('zero_point', None) # 量化零点 81 | 82 | def update_params(self): 83 | raise NotImplementedError 84 | 85 | # 量化 86 | def quantize(self, input): 87 | output = input * self.scale - self.zero_point 88 | return output 89 | 90 | def round(self, input): 91 | output = Round.apply(input) 92 | return output 93 | 94 | # 截断 95 | def clamp(self, input): 96 | output = torch.clamp(input, self.min_val, self.max_val) 97 | return output 98 | 99 | # 反量化 100 | def dequantize(self, input): 101 | output = (input + self.zero_point) / self.scale 102 | return output 103 | 104 | def forward(self, input): 105 | if self.bits == 32: 106 | output = input 107 | elif self.bits == 1: 108 | print('!Binary quantization is not supported !') 109 | assert self.bits != 1 110 | else: 111 | self.range_tracker(input) 112 | self.update_params() 113 | output = self.quantize(input) # 量化 114 | output = self.round(output) 115 | output = self.clamp(output) # 截断 116 | output = self.dequantize(output)# 反量化 117 | return output 118 | class SignedQuantizer(Quantizer): 119 | def __init__(self, *args, **kwargs): 120 | super().__init__(*args, **kwargs) 121 | self.register_buffer('min_val', torch.tensor(-(1 << (self.bits - 1)))) 122 | self.register_buffer('max_val', torch.tensor((1 << (self.bits - 1)) - 1)) 123 | class UnsignedQuantizer(Quantizer): 124 | def __init__(self, *args, **kwargs): 125 | super().__init__(*args, **kwargs) 126 | self.register_buffer('min_val', torch.tensor(0)) 127 | self.register_buffer('max_val', torch.tensor((1 << self.bits) - 1)) 128 | # 对称量化 129 | class SymmetricQuantizer(SignedQuantizer): 130 | 131 | def update_params(self): 132 | quantized_range = torch.min(torch.abs(self.min_val), torch.abs(self.max_val)) # 量化后范围 133 | float_range = torch.max(torch.abs(self.range_tracker.min_val), torch.abs(self.range_tracker.max_val)) # 量化前范围 134 | self.scale = quantized_range / float_range # 量化比例因子 135 | self.zero_point = torch.zeros_like(self.scale) # 量化零点 136 | # 非对称量化 137 | class AsymmetricQuantizer(UnsignedQuantizer): 138 | 139 | def update_params(self): 140 | quantized_range = self.max_val - self.min_val # 量化后范围 141 | float_range = self.range_tracker.max_val - self.range_tracker.min_val # 量化前范围 142 | self.scale = quantized_range / float_range # 量化比例因子 143 | self.zero_point = torch.round(self.range_tracker.min_val * self.scale) # 量化零点 144 | 145 | # ********************* 量化卷积(同时量化A/W,并做卷积) ********************* 146 | class Conv2d_Q(nn.Conv2d): 147 | def __init__( 148 | self, 149 | in_channels, 150 | out_channels, 151 | kernel_size, 152 | stride=1, 153 | padding=0, 154 | dilation=1, 155 | groups=1, 156 | bias=True, 157 | a_bits=8, 158 | w_bits=8, 159 | q_type=1, 160 | first_layer=0, 161 | ): 162 | super().__init__( 163 | in_channels=in_channels, 164 | out_channels=out_channels, 165 | kernel_size=kernel_size, 166 | stride=stride, 167 | padding=padding, 168 | dilation=dilation, 169 | groups=groups, 170 | bias=bias 171 | ) 172 | # 实例化量化器(A-layer级,W-channel级) 173 | if q_type == 0: 174 | self.activation_quantizer = SymmetricQuantizer(bits=a_bits, range_tracker=AveragedRangeTracker(q_level='L')) 175 | self.weight_quantizer = SymmetricQuantizer(bits=w_bits, range_tracker=GlobalRangeTracker(q_level='C', out_channels=out_channels)) 176 | else: 177 | self.activation_quantizer = AsymmetricQuantizer(bits=a_bits, range_tracker=AveragedRangeTracker(q_level='L')) 178 | self.weight_quantizer = AsymmetricQuantizer(bits=w_bits, range_tracker=GlobalRangeTracker(q_level='C', out_channels=out_channels)) 179 | self.first_layer = first_layer 180 | 181 | def forward(self, input): 182 | # 量化A和W 183 | if not self.first_layer: 184 | input = self.activation_quantizer(input) 185 | q_input = input 186 | q_weight = self.weight_quantizer(self.weight) 187 | # 量化卷积 188 | output = F.conv2d( 189 | input=q_input, 190 | weight=q_weight, 191 | bias=self.bias, 192 | stride=self.stride, 193 | padding=self.padding, 194 | dilation=self.dilation, 195 | groups=self.groups 196 | ) 197 | return output 198 | 199 | def reshape_to_activation(input): 200 | return input.reshape(1, -1, 1, 1) 201 | def reshape_to_weight(input): 202 | return input.reshape(-1, 1, 1, 1) 203 | def reshape_to_bias(input): 204 | return input.reshape(-1) 205 | # ********************* bn融合_量化卷积(bn融合后,同时量化A/W,并做卷积) ********************* 206 | class BNFold_Conv2d_Q(Conv2d_Q): 207 | def __init__( 208 | self, 209 | in_channels, 210 | out_channels, 211 | kernel_size, 212 | stride=1, 213 | padding=0, 214 | dilation=1, 215 | groups=1, 216 | bias=False, 217 | eps=1e-5, 218 | momentum=0.01, # 考虑量化带来的抖动影响,对momentum进行调整(0.1 ——> 0.01),削弱batch统计参数占比,一定程度抑制抖动。经实验量化训练效果更好,acc提升1%左右 219 | a_bits=8, 220 | w_bits=8, 221 | q_type=1, 222 | first_layer=0, 223 | ): 224 | super().__init__( 225 | in_channels=in_channels, 226 | out_channels=out_channels, 227 | kernel_size=kernel_size, 228 | stride=stride, 229 | padding=padding, 230 | dilation=dilation, 231 | groups=groups, 232 | bias=bias 233 | ) 234 | self.eps = eps 235 | self.momentum = momentum 236 | self.gamma = Parameter(torch.Tensor(out_channels)) 237 | self.beta = Parameter(torch.Tensor(out_channels)) 238 | self.register_buffer('running_mean', torch.zeros(out_channels)) 239 | self.register_buffer('running_var', torch.ones(out_channels)) 240 | self.register_buffer('first_bn', torch.zeros(1)) 241 | init.uniform_(self.gamma) 242 | init.zeros_(self.beta) 243 | 244 | # 实例化量化器(A-layer级,W-channel级) 245 | if q_type == 0: 246 | self.activation_quantizer = SymmetricQuantizer(bits=a_bits, range_tracker=AveragedRangeTracker(q_level='L')) 247 | self.weight_quantizer = SymmetricQuantizer(bits=w_bits, range_tracker=GlobalRangeTracker(q_level='C', out_channels=out_channels)) 248 | else: 249 | self.activation_quantizer = AsymmetricQuantizer(bits=a_bits, range_tracker=AveragedRangeTracker(q_level='L')) 250 | self.weight_quantizer = AsymmetricQuantizer(bits=w_bits, range_tracker=GlobalRangeTracker(q_level='C', out_channels=out_channels)) 251 | self.first_layer = first_layer 252 | 253 | def forward(self, input): 254 | # 训练态 255 | if self.training: 256 | # 先做普通卷积得到A,以取得BN参数 257 | output = F.conv2d( 258 | input=input, 259 | weight=self.weight, 260 | bias=self.bias, 261 | stride=self.stride, 262 | padding=self.padding, 263 | dilation=self.dilation, 264 | groups=self.groups 265 | ) 266 | # 更新BN统计参数(batch和running) 267 | dims = [dim for dim in range(4) if dim != 1] 268 | batch_mean = torch.mean(output, dim=dims) 269 | batch_var = torch.var(output, dim=dims) 270 | with torch.no_grad(): 271 | if self.first_bn == 0: 272 | self.first_bn.add_(1) 273 | self.running_mean.add_(batch_mean) 274 | self.running_var.add_(batch_var) 275 | else: 276 | self.running_mean.mul_(1 - self.momentum).add_(batch_mean * self.momentum) 277 | self.running_var.mul_(1 - self.momentum).add_(batch_var * self.momentum) 278 | # BN融合 279 | if self.bias is not None: 280 | bias = reshape_to_bias(self.beta + (self.bias - batch_mean) * (self.gamma / torch.sqrt(batch_var + self.eps))) 281 | else: 282 | bias = reshape_to_bias(self.beta - batch_mean * (self.gamma / torch.sqrt(batch_var + self.eps)))# b融batch 283 | weight = self.weight * reshape_to_weight(self.gamma / torch.sqrt(self.running_var + self.eps)) # w融running 284 | # 测试态 285 | else: 286 | #print(self.running_mean, self.running_var) 287 | # BN融合 288 | if self.bias is not None: 289 | bias = reshape_to_bias(self.beta + (self.bias - self.running_mean) * (self.gamma / torch.sqrt(self.running_var + self.eps))) 290 | else: 291 | bias = reshape_to_bias(self.beta - self.running_mean * (self.gamma / torch.sqrt(self.running_var + self.eps))) # b融running 292 | weight = self.weight * reshape_to_weight(self.gamma / torch.sqrt(self.running_var + self.eps)) # w融running 293 | 294 | # 量化A和bn融合后的W 295 | if not self.first_layer: 296 | input = self.activation_quantizer(input) 297 | q_input = input 298 | q_weight = self.weight_quantizer(weight) 299 | # 量化卷积 300 | if self.training: # 训练态 301 | output = F.conv2d( 302 | input=q_input, 303 | weight=q_weight, 304 | bias=self.bias, # 注意,这里不加bias(self.bias为None) 305 | stride=self.stride, 306 | padding=self.padding, 307 | dilation=self.dilation, 308 | groups=self.groups 309 | ) 310 | # (这里将训练态下,卷积中w融合running参数的效果转为融合batch参数的效果)running ——> batch 311 | output *= reshape_to_activation(torch.sqrt(self.running_var + self.eps) / torch.sqrt(batch_var + self.eps)) 312 | output += reshape_to_activation(bias) 313 | else: # 测试态 314 | output = F.conv2d( 315 | input=q_input, 316 | weight=q_weight, 317 | bias=bias, # 注意,这里加bias,做完整的conv+bn 318 | stride=self.stride, 319 | padding=self.padding, 320 | dilation=self.dilation, 321 | groups=self.groups 322 | ) 323 | return output 324 | -------------------------------------------------------------------------------- /weights/download_yolov3_weights.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # make '/weights' directory if it does not exist and cd into it 4 | # mkdir -p weights && cd weights 5 | 6 | # copy darknet weight files, continue '-c' if partially downloaded 7 | # wget -c https://pjreddie.com/media/files/yolov3.weights 8 | # wget -c https://pjreddie.com/media/files/yolov3-tiny.weights 9 | # wget -c https://pjreddie.com/media/files/yolov3-spp.weights 10 | 11 | # yolov3 pytorch weights 12 | # download from Google Drive: https://drive.google.com/drive/folders/1uxgUBemJVw9wZsdpboYbzUN4bcRhsuAI 13 | 14 | # darknet53 weights (first 75 layers only) 15 | # wget -c https://pjreddie.com/media/files/darknet53.conv.74 16 | 17 | # yolov3-tiny weights from darknet (first 16 layers only) 18 | # ./darknet partial cfg/yolov3-tiny.cfg yolov3-tiny.weights yolov3-tiny.conv.15 15 19 | # mv yolov3-tiny.conv.15 ../ 20 | 21 | # new method 22 | python3 -c "from models import *; 23 | attempt_download('weights/yolov3.pt'); 24 | attempt_download('weights/yolov3-spp.pt')" 25 | --------------------------------------------------------------------------------