├── .idea ├── PyTorch-YOLOv3-ModelArts.iml ├── misc.xml └── modules.xml ├── README.md ├── config ├── classify_rule.json ├── create_custom_model.sh ├── custom.data ├── train.txt ├── train_classes.txt ├── valid.txt ├── yolov3-44.cfg ├── yolov3-tiny.cfg └── yolov3.cfg ├── deploy_scripts ├── config.json └── customize_service.py ├── detect.py ├── models.py ├── my_utils ├── __init__.py ├── augmentations.py ├── datasets.py ├── parse_config.py ├── prepare_datasets.py └── utils.py ├── pip-requirements.txt ├── test.py ├── train.py └── weights └── download_weights.sh /.idea/PyTorch-YOLOv3-ModelArts.iml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 12 | -------------------------------------------------------------------------------- /.idea/misc.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | -------------------------------------------------------------------------------- /.idea/modules.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # PyTorch-YOLOv3-ModelArts 2 | 在华为云ModelArts平台部署PyTorch版本的YOLOv3目标检测网络，实现模型训练、在线预测及参赛发布。 3 | 4 | 5 | - 动机 6 | 7 | 正在参加“华为云杯”2020深圳开放数据应用创新大赛·生活垃圾图片分类比赛，官方只提供了keras版本YOLOv3的baseline。 8 | 但该baseline判分只有0.05分，低的可怕，远远达不到YOLOv3应有的水平。 9 | 10 | - What I do 11 | 12 | 自己keras用的比较少，因此没去深究官方baseline哪里出了问题。 13 | 索性自己写了个PyTorch版本的baseline。经测试，性能大幅大幅大幅提升。。。（看结果请移步最后） 14 | 果真官方baseline有问题，有兴趣的小伙伴可以考究一下。 15 | 16 | 17 | - source code: https://github.com/eriklindernoren/PyTorch-YOLOv3 18 | - 大赛地址: https://competition.huaweicloud.com/information/1000038439/introduction 19 | 20 | ## 使用前准备 21 | ##### 解压官方原始数据集，制作新数据集 22 | $ cd PyTorch-YOLOv3-ModelArts/my_utils 23 | $ python prepare_datasets.py --source_datasets --new_datasets 24 | 25 | ##### 下载预训练模型 26 | $ cd weights/ 27 | $ bash download_weights.sh 28 | 29 | ##### 创建自定义模型的cfg文件 30 | $ cd PyTorch-YOLOv3-ModelArts/config 31 | $ bash create_custom_model.sh #此处已创建，即yolov3-44.cfg 32 | 33 | ## 在ModelArts平台上训练 34 | 1.将新数据集打包成压缩文件，替换原始数据集压缩包； 35 | 36 | 2.训练集和测试集的图片路径默认保存在config/train.txt和valid.txt中，每一行代表一张图片，默认按8：2划分。注意每行图片的路径为虚拟容器中的地址，自己重新划分训练集时只需要修改最后的图片名称，千万不要更改路径！ 37 | 38 | 2.如果使用预训练模型，请提前将其上传到自己的OBS桶中，并添加参数 39 | 40 | `--pretrained_weights = s3://your_bucket/{model}`。 41 | 42 | 此处的model可以是官方预训练模型（yolov3.weights或darknet53.conv.74），也可以是自己训练过的PyTorch模型（.pth）。 43 | 44 | 3.训练过程中，学习率等参数默认不进行调整，请依个人经验调整 45 | 46 | 4.其余流程同大赛指导文档。 47 | 48 | ## 测试 49 | 1. 与官方keras版本的baseline比较，训练速度提升两倍多（官方baseline跑10个epoch需要150分钟，本项目仅需47分钟）；参赛发布大概一小时完成判分，同样快一倍以上。 50 | 51 | 2. 官方baseline跑10个epoch用时两个半小时，判分却仅得0.05；本项目只训练头部跑5个epoch仅仅用时17分钟，判分达到0.17（惊掉下巴） 52 | 53 | 3. 因为比赛刚开始，过多的测试就不做了。个人估计，在此baseline上改进，最终成绩可以达到0.6分左右。 54 | 当然，如果想拿奖金的话还是转投RCNN或者EfficientDet吧。 55 | 56 | 57 | ## Credit 58 | 59 | ### YOLOv3: An Incremental Improvement 60 | _Joseph Redmon, Ali Farhadi_
61 | 62 | **Abstract**
63 | We present some updates to YOLO! We made a bunch 64 | of little design changes to make it better. We also trained 65 | this new network that’s pretty swell. It’s a little bigger than 66 | last time but more accurate. It’s still fast though, don’t 67 | worry. At 320 × 320 YOLOv3 runs in 22 ms at 28.2 mAP, 68 | as accurate as SSD but three times faster. When we look 69 | at the old .5 IOU mAP detection metric YOLOv3 is quite 70 | good. It achieves 57.9 AP50 in 51 ms on a Titan X, compared 71 | to 57.5 AP50 in 198 ms by RetinaNet, similar performance 72 | but 3.8× faster. As always, all the code is online at 73 | https://pjreddie.com/yolo/. 74 | 75 | [[Paper]](https://pjreddie.com/media/files/papers/YOLOv3.pdf) [[Project Webpage]](https://pjreddie.com/darknet/yolo/) [[Authors' Implementation]](https://github.com/pjreddie/darknet) 76 | 77 | ``` 78 | @article{yolov3, 79 | title={YOLOv3: An Incremental Improvement}, 80 | author={Redmon, Joseph and Farhadi, Ali}, 81 | journal = {arXiv}, 82 | year={2018} 83 | } 84 | ``` 85 | -------------------------------------------------------------------------------- /config/classify_rule.json: -------------------------------------------------------------------------------- 1 | { 2 | "可回收物": [ 3 | "充电宝", 4 | "包", 5 | "洗护用品", 6 | "塑料玩具", 7 | "塑料器皿", 8 | "塑料衣架", 9 | "玻璃器皿", 10 | "金属器皿", 11 | "金属衣架", 12 | "快递纸袋", 13 | "插头电线", 14 | "旧衣服", 15 | "易拉罐", 16 | "枕头", 17 | "毛巾", 18 | "毛绒玩具", 19 | "鞋", 20 | "砧板", 21 | "纸盒纸箱", 22 | "纸袋", 23 | "调料瓶", 24 | "酒瓶", 25 | "金属食品罐", 26 | "金属厨具", 27 | "锅", 28 | "食用油桶", 29 | "饮料瓶", 30 | "饮料盒", 31 | "书籍纸张" 32 | ], 33 | "厨余垃圾": [ 34 | "剩饭剩菜", 35 | "大骨头", 36 | "果皮果肉", 37 | "茶叶渣", 38 | "菜帮菜叶", 39 | "蛋壳", 40 | "鱼骨" 41 | ], 42 | "有害垃圾": [ 43 | "干电池", 44 | "锂电池", 45 | "蓄电池", 46 | "纽扣电池", 47 | "灯管" 48 | ], 49 | "其他垃圾": [ 50 | "一次性快餐盒", 51 | "污损塑料", 52 | "烟蒂", 53 | "牙签", 54 | "花盆", 55 | "陶瓷器皿", 56 | "筷子", 57 | "污损用纸" 58 | ] 59 | } -------------------------------------------------------------------------------- /config/create_custom_model.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | NUM_CLASSES=$1 4 | 5 | echo " 6 | [net] 7 | # Testing 8 | #batch=1 9 | #subdivisions=1 10 | # Training 11 | batch=16 12 | subdivisions=1 13 | width=416 14 | height=416 15 | channels=3 16 | momentum=0.9 17 | decay=0.0005 18 | angle=0 19 | saturation = 1.5 20 | exposure = 1.5 21 | hue=.1 22 | 23 | learning_rate=0.001 24 | burn_in=1000 25 | max_batches = 500200 26 | policy=steps 27 | steps=400000,450000 28 | scales=.1,.1 29 | 30 | [convolutional] 31 | batch_normalize=1 32 | filters=32 33 | size=3 34 | stride=1 35 | pad=1 36 | activation=leaky 37 | 38 | # Downsample 39 | 40 | [convolutional] 41 | batch_normalize=1 42 | filters=64 43 | size=3 44 | stride=2 45 | pad=1 46 | activation=leaky 47 | 48 | [convolutional] 49 | batch_normalize=1 50 | filters=32 51 | size=1 52 | stride=1 53 | pad=1 54 | activation=leaky 55 | 56 | [convolutional] 57 | batch_normalize=1 58 | filters=64 59 | size=3 60 | stride=1 61 | pad=1 62 | activation=leaky 63 | 64 | [shortcut] 65 | from=-3 66 | activation=linear 67 | 68 | # Downsample 69 | 70 | [convolutional] 71 | batch_normalize=1 72 | filters=128 73 | size=3 74 | stride=2 75 | pad=1 76 | activation=leaky 77 | 78 | [convolutional] 79 | batch_normalize=1 80 | filters=64 81 | size=1 82 | stride=1 83 | pad=1 84 | activation=leaky 85 | 86 | [convolutional] 87 | batch_normalize=1 88 | filters=128 89 | size=3 90 | stride=1 91 | pad=1 92 | activation=leaky 93 | 94 | [shortcut] 95 | from=-3 96 | activation=linear 97 | 98 | [convolutional] 99 | batch_normalize=1 100 | filters=64 101 | size=1 102 | stride=1 103 | pad=1 104 | activation=leaky 105 | 106 | [convolutional] 107 | batch_normalize=1 108 | filters=128 109 | size=3 110 | stride=1 111 | pad=1 112 | activation=leaky 113 | 114 | [shortcut] 115 | from=-3 116 | activation=linear 117 | 118 | # Downsample 119 | 120 | [convolutional] 121 | batch_normalize=1 122 | filters=256 123 | size=3 124 | stride=2 125 | pad=1 126 | activation=leaky 127 | 128 | [convolutional] 129 | batch_normalize=1 130 | filters=128 131 | size=1 132 | stride=1 133 | pad=1 134 | activation=leaky 135 | 136 | [convolutional] 137 | batch_normalize=1 138 | filters=256 139 | size=3 140 | stride=1 141 | pad=1 142 | activation=leaky 143 | 144 | [shortcut] 145 | from=-3 146 | activation=linear 147 | 148 | [convolutional] 149 | batch_normalize=1 150 | filters=128 151 | size=1 152 | stride=1 153 | pad=1 154 | activation=leaky 155 | 156 | [convolutional] 157 | batch_normalize=1 158 | filters=256 159 | size=3 160 | stride=1 161 | pad=1 162 | activation=leaky 163 | 164 | [shortcut] 165 | from=-3 166 | activation=linear 167 | 168 | [convolutional] 169 | batch_normalize=1 170 | filters=128 171 | size=1 172 | stride=1 173 | pad=1 174 | activation=leaky 175 | 176 | [convolutional] 177 | batch_normalize=1 178 | filters=256 179 | size=3 180 | stride=1 181 | pad=1 182 | activation=leaky 183 | 184 | [shortcut] 185 | from=-3 186 | activation=linear 187 | 188 | [convolutional] 189 | batch_normalize=1 190 | filters=128 191 | size=1 192 | stride=1 193 | pad=1 194 | activation=leaky 195 | 196 | [convolutional] 197 | batch_normalize=1 198 | filters=256 199 | size=3 200 | stride=1 201 | pad=1 202 | activation=leaky 203 | 204 | [shortcut] 205 | from=-3 206 | activation=linear 207 | 208 | 209 | [convolutional] 210 | batch_normalize=1 211 | filters=128 212 | size=1 213 | stride=1 214 | pad=1 215 | activation=leaky 216 | 217 | [convolutional] 218 | batch_normalize=1 219 | filters=256 220 | size=3 221 | stride=1 222 | pad=1 223 | activation=leaky 224 | 225 | [shortcut] 226 | from=-3 227 | activation=linear 228 | 229 | [convolutional] 230 | batch_normalize=1 231 | filters=128 232 | size=1 233 | stride=1 234 | pad=1 235 | activation=leaky 236 | 237 | [convolutional] 238 | batch_normalize=1 239 | filters=256 240 | size=3 241 | stride=1 242 | pad=1 243 | activation=leaky 244 | 245 | [shortcut] 246 | from=-3 247 | activation=linear 248 | 249 | [convolutional] 250 | batch_normalize=1 251 | filters=128 252 | size=1 253 | stride=1 254 | pad=1 255 | activation=leaky 256 | 257 | [convolutional] 258 | batch_normalize=1 259 | filters=256 260 | size=3 261 | stride=1 262 | pad=1 263 | activation=leaky 264 | 265 | [shortcut] 266 | from=-3 267 | activation=linear 268 | 269 | [convolutional] 270 | batch_normalize=1 271 | filters=128 272 | size=1 273 | stride=1 274 | pad=1 275 | activation=leaky 276 | 277 | [convolutional] 278 | batch_normalize=1 279 | filters=256 280 | size=3 281 | stride=1 282 | pad=1 283 | activation=leaky 284 | 285 | [shortcut] 286 | from=-3 287 | activation=linear 288 | 289 | # Downsample 290 | 291 | [convolutional] 292 | batch_normalize=1 293 | filters=512 294 | size=3 295 | stride=2 296 | pad=1 297 | activation=leaky 298 | 299 | [convolutional] 300 | batch_normalize=1 301 | filters=256 302 | size=1 303 | stride=1 304 | pad=1 305 | activation=leaky 306 | 307 | [convolutional] 308 | batch_normalize=1 309 | filters=512 310 | size=3 311 | stride=1 312 | pad=1 313 | activation=leaky 314 | 315 | [shortcut] 316 | from=-3 317 | activation=linear 318 | 319 | 320 | [convolutional] 321 | batch_normalize=1 322 | filters=256 323 | size=1 324 | stride=1 325 | pad=1 326 | activation=leaky 327 | 328 | [convolutional] 329 | batch_normalize=1 330 | filters=512 331 | size=3 332 | stride=1 333 | pad=1 334 | activation=leaky 335 | 336 | [shortcut] 337 | from=-3 338 | activation=linear 339 | 340 | 341 | [convolutional] 342 | batch_normalize=1 343 | filters=256 344 | size=1 345 | stride=1 346 | pad=1 347 | activation=leaky 348 | 349 | [convolutional] 350 | batch_normalize=1 351 | filters=512 352 | size=3 353 | stride=1 354 | pad=1 355 | activation=leaky 356 | 357 | [shortcut] 358 | from=-3 359 | activation=linear 360 | 361 | 362 | [convolutional] 363 | batch_normalize=1 364 | filters=256 365 | size=1 366 | stride=1 367 | pad=1 368 | activation=leaky 369 | 370 | [convolutional] 371 | batch_normalize=1 372 | filters=512 373 | size=3 374 | stride=1 375 | pad=1 376 | activation=leaky 377 | 378 | [shortcut] 379 | from=-3 380 | activation=linear 381 | 382 | [convolutional] 383 | batch_normalize=1 384 | filters=256 385 | size=1 386 | stride=1 387 | pad=1 388 | activation=leaky 389 | 390 | [convolutional] 391 | batch_normalize=1 392 | filters=512 393 | size=3 394 | stride=1 395 | pad=1 396 | activation=leaky 397 | 398 | [shortcut] 399 | from=-3 400 | activation=linear 401 | 402 | 403 | [convolutional] 404 | batch_normalize=1 405 | filters=256 406 | size=1 407 | stride=1 408 | pad=1 409 | activation=leaky 410 | 411 | [convolutional] 412 | batch_normalize=1 413 | filters=512 414 | size=3 415 | stride=1 416 | pad=1 417 | activation=leaky 418 | 419 | [shortcut] 420 | from=-3 421 | activation=linear 422 | 423 | 424 | [convolutional] 425 | batch_normalize=1 426 | filters=256 427 | size=1 428 | stride=1 429 | pad=1 430 | activation=leaky 431 | 432 | [convolutional] 433 | batch_normalize=1 434 | filters=512 435 | size=3 436 | stride=1 437 | pad=1 438 | activation=leaky 439 | 440 | [shortcut] 441 | from=-3 442 | activation=linear 443 | 444 | [convolutional] 445 | batch_normalize=1 446 | filters=256 447 | size=1 448 | stride=1 449 | pad=1 450 | activation=leaky 451 | 452 | [convolutional] 453 | batch_normalize=1 454 | filters=512 455 | size=3 456 | stride=1 457 | pad=1 458 | activation=leaky 459 | 460 | [shortcut] 461 | from=-3 462 | activation=linear 463 | 464 | # Downsample 465 | 466 | [convolutional] 467 | batch_normalize=1 468 | filters=1024 469 | size=3 470 | stride=2 471 | pad=1 472 | activation=leaky 473 | 474 | [convolutional] 475 | batch_normalize=1 476 | filters=512 477 | size=1 478 | stride=1 479 | pad=1 480 | activation=leaky 481 | 482 | [convolutional] 483 | batch_normalize=1 484 | filters=1024 485 | size=3 486 | stride=1 487 | pad=1 488 | activation=leaky 489 | 490 | [shortcut] 491 | from=-3 492 | activation=linear 493 | 494 | [convolutional] 495 | batch_normalize=1 496 | filters=512 497 | size=1 498 | stride=1 499 | pad=1 500 | activation=leaky 501 | 502 | [convolutional] 503 | batch_normalize=1 504 | filters=1024 505 | size=3 506 | stride=1 507 | pad=1 508 | activation=leaky 509 | 510 | [shortcut] 511 | from=-3 512 | activation=linear 513 | 514 | [convolutional] 515 | batch_normalize=1 516 | filters=512 517 | size=1 518 | stride=1 519 | pad=1 520 | activation=leaky 521 | 522 | [convolutional] 523 | batch_normalize=1 524 | filters=1024 525 | size=3 526 | stride=1 527 | pad=1 528 | activation=leaky 529 | 530 | [shortcut] 531 | from=-3 532 | activation=linear 533 | 534 | [convolutional] 535 | batch_normalize=1 536 | filters=512 537 | size=1 538 | stride=1 539 | pad=1 540 | activation=leaky 541 | 542 | [convolutional] 543 | batch_normalize=1 544 | filters=1024 545 | size=3 546 | stride=1 547 | pad=1 548 | activation=leaky 549 | 550 | [shortcut] 551 | from=-3 552 | activation=linear 553 | 554 | ###################### 555 | 556 | [convolutional] 557 | batch_normalize=1 558 | filters=512 559 | size=1 560 | stride=1 561 | pad=1 562 | activation=leaky 563 | 564 | [convolutional] 565 | batch_normalize=1 566 | size=3 567 | stride=1 568 | pad=1 569 | filters=1024 570 | activation=leaky 571 | 572 | [convolutional] 573 | batch_normalize=1 574 | filters=512 575 | size=1 576 | stride=1 577 | pad=1 578 | activation=leaky 579 | 580 | [convolutional] 581 | batch_normalize=1 582 | size=3 583 | stride=1 584 | pad=1 585 | filters=1024 586 | activation=leaky 587 | 588 | [convolutional] 589 | batch_normalize=1 590 | filters=512 591 | size=1 592 | stride=1 593 | pad=1 594 | activation=leaky 595 | 596 | [convolutional] 597 | batch_normalize=1 598 | size=3 599 | stride=1 600 | pad=1 601 | filters=1024 602 | activation=leaky 603 | 604 | [convolutional] 605 | size=1 606 | stride=1 607 | pad=1 608 | filters=$(expr 3 \* $(expr $NUM_CLASSES \+ 5)) 609 | activation=linear 610 | 611 | 612 | [yolo] 613 | mask = 6,7,8 614 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 615 | classes=$NUM_CLASSES 616 | num=9 617 | jitter=.3 618 | ignore_thresh = .7 619 | truth_thresh = 1 620 | random=1 621 | 622 | 623 | [route] 624 | layers = -4 625 | 626 | [convolutional] 627 | batch_normalize=1 628 | filters=256 629 | size=1 630 | stride=1 631 | pad=1 632 | activation=leaky 633 | 634 | [upsample] 635 | stride=2 636 | 637 | [route] 638 | layers = -1, 61 639 | 640 | 641 | 642 | [convolutional] 643 | batch_normalize=1 644 | filters=256 645 | size=1 646 | stride=1 647 | pad=1 648 | activation=leaky 649 | 650 | [convolutional] 651 | batch_normalize=1 652 | size=3 653 | stride=1 654 | pad=1 655 | filters=512 656 | activation=leaky 657 | 658 | [convolutional] 659 | batch_normalize=1 660 | filters=256 661 | size=1 662 | stride=1 663 | pad=1 664 | activation=leaky 665 | 666 | [convolutional] 667 | batch_normalize=1 668 | size=3 669 | stride=1 670 | pad=1 671 | filters=512 672 | activation=leaky 673 | 674 | [convolutional] 675 | batch_normalize=1 676 | filters=256 677 | size=1 678 | stride=1 679 | pad=1 680 | activation=leaky 681 | 682 | [convolutional] 683 | batch_normalize=1 684 | size=3 685 | stride=1 686 | pad=1 687 | filters=512 688 | activation=leaky 689 | 690 | [convolutional] 691 | size=1 692 | stride=1 693 | pad=1 694 | filters=$(expr 3 \* $(expr $NUM_CLASSES \+ 5)) 695 | activation=linear 696 | 697 | 698 | [yolo] 699 | mask = 3,4,5 700 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 701 | classes=$NUM_CLASSES 702 | num=9 703 | jitter=.3 704 | ignore_thresh = .7 705 | truth_thresh = 1 706 | random=1 707 | 708 | 709 | 710 | [route] 711 | layers = -4 712 | 713 | [convolutional] 714 | batch_normalize=1 715 | filters=128 716 | size=1 717 | stride=1 718 | pad=1 719 | activation=leaky 720 | 721 | [upsample] 722 | stride=2 723 | 724 | [route] 725 | layers = -1, 36 726 | 727 | 728 | 729 | [convolutional] 730 | batch_normalize=1 731 | filters=128 732 | size=1 733 | stride=1 734 | pad=1 735 | activation=leaky 736 | 737 | [convolutional] 738 | batch_normalize=1 739 | size=3 740 | stride=1 741 | pad=1 742 | filters=256 743 | activation=leaky 744 | 745 | [convolutional] 746 | batch_normalize=1 747 | filters=128 748 | size=1 749 | stride=1 750 | pad=1 751 | activation=leaky 752 | 753 | [convolutional] 754 | batch_normalize=1 755 | size=3 756 | stride=1 757 | pad=1 758 | filters=256 759 | activation=leaky 760 | 761 | [convolutional] 762 | batch_normalize=1 763 | filters=128 764 | size=1 765 | stride=1 766 | pad=1 767 | activation=leaky 768 | 769 | [convolutional] 770 | batch_normalize=1 771 | size=3 772 | stride=1 773 | pad=1 774 | filters=256 775 | activation=leaky 776 | 777 | [convolutional] 778 | size=1 779 | stride=1 780 | pad=1 781 | filters=$(expr 3 \* $(expr $NUM_CLASSES \+ 5)) 782 | activation=linear 783 | 784 | 785 | [yolo] 786 | mask = 0,1,2 787 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 788 | classes=$NUM_CLASSES 789 | num=9 790 | jitter=.3 791 | ignore_thresh = .7 792 | truth_thresh = 1 793 | random=1 794 | " >> yolov3-custom.cfg 795 | -------------------------------------------------------------------------------- /config/custom.data: -------------------------------------------------------------------------------- 1 | classes= 44 2 | train=PyTorch-YOLOv3-ModelArts/config/train.txt 3 | valid=PyTorch-YOLOv3-ModelArts/config/valid.txt 4 | names=PyTorch-YOLOv3-ModelArts/config/train_classes.txt 5 | -------------------------------------------------------------------------------- /config/train_classes.txt: -------------------------------------------------------------------------------- 1 | 一次性快餐盒 2 | 书籍纸张 3 | 充电宝 4 | 剩饭剩菜 5 | 包 6 | 垃圾桶 7 | 塑料器皿 8 | 塑料玩具 9 | 塑料衣架 10 | 大骨头 11 | 干电池 12 | 快递纸袋 13 | 插头电线 14 | 旧衣服 15 | 易拉罐 16 | 枕头 17 | 果皮果肉 18 | 毛绒玩具 19 | 污损塑料 20 | 污损用纸 21 | 洗护用品 22 | 烟蒂 23 | 牙签 24 | 玻璃器皿 25 | 砧板 26 | 筷子 27 | 纸盒纸箱 28 | 花盆 29 | 茶叶渣 30 | 菜帮菜叶 31 | 蛋壳 32 | 调料瓶 33 | 软膏 34 | 过期药物 35 | 酒瓶 36 | 金属厨具 37 | 金属器皿 38 | 金属食品罐 39 | 锅 40 | 陶瓷器皿 41 | 鞋 42 | 食用油桶 43 | 饮料瓶 44 | 鱼骨 -------------------------------------------------------------------------------- /config/yolov3-44.cfg: -------------------------------------------------------------------------------- 1 | 2 | [net] 3 | # Testing 4 | #batch=1 5 | #subdivisions=1 6 | # Training 7 | batch=16 8 | subdivisions=1 9 | width=416 10 | height=416 11 | channels=3 12 | momentum=0.9 13 | decay=0.0005 14 | angle=0 15 | saturation = 1.5 16 | exposure = 1.5 17 | hue=.1 18 | 19 | learning_rate=0.001 20 | burn_in=1000 21 | max_batches = 500200 22 | policy=steps 23 | steps=400000,450000 24 | scales=.1,.1 25 | 26 | [convolutional] 27 | batch_normalize=1 28 | filters=32 29 | size=3 30 | stride=1 31 | pad=1 32 | activation=leaky 33 | 34 | # Downsample 35 | 36 | [convolutional] 37 | batch_normalize=1 38 | filters=64 39 | size=3 40 | stride=2 41 | pad=1 42 | activation=leaky 43 | 44 | [convolutional] 45 | batch_normalize=1 46 | filters=32 47 | size=1 48 | stride=1 49 | pad=1 50 | activation=leaky 51 | 52 | [convolutional] 53 | batch_normalize=1 54 | filters=64 55 | size=3 56 | stride=1 57 | pad=1 58 | activation=leaky 59 | 60 | [shortcut] 61 | from=-3 62 | activation=linear 63 | 64 | # Downsample 65 | 66 | [convolutional] 67 | batch_normalize=1 68 | filters=128 69 | size=3 70 | stride=2 71 | pad=1 72 | activation=leaky 73 | 74 | [convolutional] 75 | batch_normalize=1 76 | filters=64 77 | size=1 78 | stride=1 79 | pad=1 80 | activation=leaky 81 | 82 | [convolutional] 83 | batch_normalize=1 84 | filters=128 85 | size=3 86 | stride=1 87 | pad=1 88 | activation=leaky 89 | 90 | [shortcut] 91 | from=-3 92 | activation=linear 93 | 94 | [convolutional] 95 | batch_normalize=1 96 | filters=64 97 | size=1 98 | stride=1 99 | pad=1 100 | activation=leaky 101 | 102 | [convolutional] 103 | batch_normalize=1 104 | filters=128 105 | size=3 106 | stride=1 107 | pad=1 108 | activation=leaky 109 | 110 | [shortcut] 111 | from=-3 112 | activation=linear 113 | 114 | # Downsample 115 | 116 | [convolutional] 117 | batch_normalize=1 118 | filters=256 119 | size=3 120 | stride=2 121 | pad=1 122 | activation=leaky 123 | 124 | [convolutional] 125 | batch_normalize=1 126 | filters=128 127 | size=1 128 | stride=1 129 | pad=1 130 | activation=leaky 131 | 132 | [convolutional] 133 | batch_normalize=1 134 | filters=256 135 | size=3 136 | stride=1 137 | pad=1 138 | activation=leaky 139 | 140 | [shortcut] 141 | from=-3 142 | activation=linear 143 | 144 | [convolutional] 145 | batch_normalize=1 146 | filters=128 147 | size=1 148 | stride=1 149 | pad=1 150 | activation=leaky 151 | 152 | [convolutional] 153 | batch_normalize=1 154 | filters=256 155 | size=3 156 | stride=1 157 | pad=1 158 | activation=leaky 159 | 160 | [shortcut] 161 | from=-3 162 | activation=linear 163 | 164 | [convolutional] 165 | batch_normalize=1 166 | filters=128 167 | size=1 168 | stride=1 169 | pad=1 170 | activation=leaky 171 | 172 | [convolutional] 173 | batch_normalize=1 174 | filters=256 175 | size=3 176 | stride=1 177 | pad=1 178 | activation=leaky 179 | 180 | [shortcut] 181 | from=-3 182 | activation=linear 183 | 184 | [convolutional] 185 | batch_normalize=1 186 | filters=128 187 | size=1 188 | stride=1 189 | pad=1 190 | activation=leaky 191 | 192 | [convolutional] 193 | batch_normalize=1 194 | filters=256 195 | size=3 196 | stride=1 197 | pad=1 198 | activation=leaky 199 | 200 | [shortcut] 201 | from=-3 202 | activation=linear 203 | 204 | 205 | [convolutional] 206 | batch_normalize=1 207 | filters=128 208 | size=1 209 | stride=1 210 | pad=1 211 | activation=leaky 212 | 213 | [convolutional] 214 | batch_normalize=1 215 | filters=256 216 | size=3 217 | stride=1 218 | pad=1 219 | activation=leaky 220 | 221 | [shortcut] 222 | from=-3 223 | activation=linear 224 | 225 | [convolutional] 226 | batch_normalize=1 227 | filters=128 228 | size=1 229 | stride=1 230 | pad=1 231 | activation=leaky 232 | 233 | [convolutional] 234 | batch_normalize=1 235 | filters=256 236 | size=3 237 | stride=1 238 | pad=1 239 | activation=leaky 240 | 241 | [shortcut] 242 | from=-3 243 | activation=linear 244 | 245 | [convolutional] 246 | batch_normalize=1 247 | filters=128 248 | size=1 249 | stride=1 250 | pad=1 251 | activation=leaky 252 | 253 | [convolutional] 254 | batch_normalize=1 255 | filters=256 256 | size=3 257 | stride=1 258 | pad=1 259 | activation=leaky 260 | 261 | [shortcut] 262 | from=-3 263 | activation=linear 264 | 265 | [convolutional] 266 | batch_normalize=1 267 | filters=128 268 | size=1 269 | stride=1 270 | pad=1 271 | activation=leaky 272 | 273 | [convolutional] 274 | batch_normalize=1 275 | filters=256 276 | size=3 277 | stride=1 278 | pad=1 279 | activation=leaky 280 | 281 | [shortcut] 282 | from=-3 283 | activation=linear 284 | 285 | # Downsample 286 | 287 | [convolutional] 288 | batch_normalize=1 289 | filters=512 290 | size=3 291 | stride=2 292 | pad=1 293 | activation=leaky 294 | 295 | [convolutional] 296 | batch_normalize=1 297 | filters=256 298 | size=1 299 | stride=1 300 | pad=1 301 | activation=leaky 302 | 303 | [convolutional] 304 | batch_normalize=1 305 | filters=512 306 | size=3 307 | stride=1 308 | pad=1 309 | activation=leaky 310 | 311 | [shortcut] 312 | from=-3 313 | activation=linear 314 | 315 | 316 | [convolutional] 317 | batch_normalize=1 318 | filters=256 319 | size=1 320 | stride=1 321 | pad=1 322 | activation=leaky 323 | 324 | [convolutional] 325 | batch_normalize=1 326 | filters=512 327 | size=3 328 | stride=1 329 | pad=1 330 | activation=leaky 331 | 332 | [shortcut] 333 | from=-3 334 | activation=linear 335 | 336 | 337 | [convolutional] 338 | batch_normalize=1 339 | filters=256 340 | size=1 341 | stride=1 342 | pad=1 343 | activation=leaky 344 | 345 | [convolutional] 346 | batch_normalize=1 347 | filters=512 348 | size=3 349 | stride=1 350 | pad=1 351 | activation=leaky 352 | 353 | [shortcut] 354 | from=-3 355 | activation=linear 356 | 357 | 358 | [convolutional] 359 | batch_normalize=1 360 | filters=256 361 | size=1 362 | stride=1 363 | pad=1 364 | activation=leaky 365 | 366 | [convolutional] 367 | batch_normalize=1 368 | filters=512 369 | size=3 370 | stride=1 371 | pad=1 372 | activation=leaky 373 | 374 | [shortcut] 375 | from=-3 376 | activation=linear 377 | 378 | [convolutional] 379 | batch_normalize=1 380 | filters=256 381 | size=1 382 | stride=1 383 | pad=1 384 | activation=leaky 385 | 386 | [convolutional] 387 | batch_normalize=1 388 | filters=512 389 | size=3 390 | stride=1 391 | pad=1 392 | activation=leaky 393 | 394 | [shortcut] 395 | from=-3 396 | activation=linear 397 | 398 | 399 | [convolutional] 400 | batch_normalize=1 401 | filters=256 402 | size=1 403 | stride=1 404 | pad=1 405 | activation=leaky 406 | 407 | [convolutional] 408 | batch_normalize=1 409 | filters=512 410 | size=3 411 | stride=1 412 | pad=1 413 | activation=leaky 414 | 415 | [shortcut] 416 | from=-3 417 | activation=linear 418 | 419 | 420 | [convolutional] 421 | batch_normalize=1 422 | filters=256 423 | size=1 424 | stride=1 425 | pad=1 426 | activation=leaky 427 | 428 | [convolutional] 429 | batch_normalize=1 430 | filters=512 431 | size=3 432 | stride=1 433 | pad=1 434 | activation=leaky 435 | 436 | [shortcut] 437 | from=-3 438 | activation=linear 439 | 440 | [convolutional] 441 | batch_normalize=1 442 | filters=256 443 | size=1 444 | stride=1 445 | pad=1 446 | activation=leaky 447 | 448 | [convolutional] 449 | batch_normalize=1 450 | filters=512 451 | size=3 452 | stride=1 453 | pad=1 454 | activation=leaky 455 | 456 | [shortcut] 457 | from=-3 458 | activation=linear 459 | 460 | # Downsample 461 | 462 | [convolutional] 463 | batch_normalize=1 464 | filters=1024 465 | size=3 466 | stride=2 467 | pad=1 468 | activation=leaky 469 | 470 | [convolutional] 471 | batch_normalize=1 472 | filters=512 473 | size=1 474 | stride=1 475 | pad=1 476 | activation=leaky 477 | 478 | [convolutional] 479 | batch_normalize=1 480 | filters=1024 481 | size=3 482 | stride=1 483 | pad=1 484 | activation=leaky 485 | 486 | [shortcut] 487 | from=-3 488 | activation=linear 489 | 490 | [convolutional] 491 | batch_normalize=1 492 | filters=512 493 | size=1 494 | stride=1 495 | pad=1 496 | activation=leaky 497 | 498 | [convolutional] 499 | batch_normalize=1 500 | filters=1024 501 | size=3 502 | stride=1 503 | pad=1 504 | activation=leaky 505 | 506 | [shortcut] 507 | from=-3 508 | activation=linear 509 | 510 | [convolutional] 511 | batch_normalize=1 512 | filters=512 513 | size=1 514 | stride=1 515 | pad=1 516 | activation=leaky 517 | 518 | [convolutional] 519 | batch_normalize=1 520 | filters=1024 521 | size=3 522 | stride=1 523 | pad=1 524 | activation=leaky 525 | 526 | [shortcut] 527 | from=-3 528 | activation=linear 529 | 530 | [convolutional] 531 | batch_normalize=1 532 | filters=512 533 | size=1 534 | stride=1 535 | pad=1 536 | activation=leaky 537 | 538 | [convolutional] 539 | batch_normalize=1 540 | filters=1024 541 | size=3 542 | stride=1 543 | pad=1 544 | activation=leaky 545 | 546 | [shortcut] 547 | from=-3 548 | activation=linear 549 | 550 | ###################### 551 | 552 | [convolutional] 553 | batch_normalize=1 554 | filters=512 555 | size=1 556 | stride=1 557 | pad=1 558 | activation=leaky 559 | 560 | [convolutional] 561 | batch_normalize=1 562 | size=3 563 | stride=1 564 | pad=1 565 | filters=1024 566 | activation=leaky 567 | 568 | [convolutional] 569 | batch_normalize=1 570 | filters=512 571 | size=1 572 | stride=1 573 | pad=1 574 | activation=leaky 575 | 576 | [convolutional] 577 | batch_normalize=1 578 | size=3 579 | stride=1 580 | pad=1 581 | filters=1024 582 | activation=leaky 583 | 584 | [convolutional] 585 | batch_normalize=1 586 | filters=512 587 | size=1 588 | stride=1 589 | pad=1 590 | activation=leaky 591 | 592 | [convolutional] 593 | batch_normalize=1 594 | size=3 595 | stride=1 596 | pad=1 597 | filters=1024 598 | activation=leaky 599 | 600 | [convolutional] 601 | size=1 602 | stride=1 603 | pad=1 604 | filters=147 605 | activation=linear 606 | 607 | 608 | [yolo] 609 | mask = 6,7,8 610 | anchors = 25,31, 35,44, 48,56, 59,73, 80,96, 112,132, 144,174, 195,227, 264,337 611 | classes=44 612 | num=9 613 | jitter=.3 614 | ignore_thresh = .7 615 | truth_thresh = 1 616 | random=1 617 | 618 | 619 | [route] 620 | layers = -4 621 | 622 | [convolutional] 623 | batch_normalize=1 624 | filters=256 625 | size=1 626 | stride=1 627 | pad=1 628 | activation=leaky 629 | 630 | [upsample] 631 | stride=2 632 | 633 | [route] 634 | layers = -1, 61 635 | 636 | 637 | 638 | [convolutional] 639 | batch_normalize=1 640 | filters=256 641 | size=1 642 | stride=1 643 | pad=1 644 | activation=leaky 645 | 646 | [convolutional] 647 | batch_normalize=1 648 | size=3 649 | stride=1 650 | pad=1 651 | filters=512 652 | activation=leaky 653 | 654 | [convolutional] 655 | batch_normalize=1 656 | filters=256 657 | size=1 658 | stride=1 659 | pad=1 660 | activation=leaky 661 | 662 | [convolutional] 663 | batch_normalize=1 664 | size=3 665 | stride=1 666 | pad=1 667 | filters=512 668 | activation=leaky 669 | 670 | [convolutional] 671 | batch_normalize=1 672 | filters=256 673 | size=1 674 | stride=1 675 | pad=1 676 | activation=leaky 677 | 678 | [convolutional] 679 | batch_normalize=1 680 | size=3 681 | stride=1 682 | pad=1 683 | filters=512 684 | activation=leaky 685 | 686 | [convolutional] 687 | size=1 688 | stride=1 689 | pad=1 690 | filters=147 691 | activation=linear 692 | 693 | 694 | [yolo] 695 | mask = 3,4,5 696 | anchors = 25,31, 35,44, 48,56, 59,73, 80,96, 112,132, 144,174, 195,227, 264,337 697 | classes=44 698 | num=9 699 | jitter=.3 700 | ignore_thresh = .7 701 | truth_thresh = 1 702 | random=1 703 | 704 | 705 | 706 | [route] 707 | layers = -4 708 | 709 | [convolutional] 710 | batch_normalize=1 711 | filters=128 712 | size=1 713 | stride=1 714 | pad=1 715 | activation=leaky 716 | 717 | [upsample] 718 | stride=2 719 | 720 | [route] 721 | layers = -1, 36 722 | 723 | 724 | 725 | [convolutional] 726 | batch_normalize=1 727 | filters=128 728 | size=1 729 | stride=1 730 | pad=1 731 | activation=leaky 732 | 733 | [convolutional] 734 | batch_normalize=1 735 | size=3 736 | stride=1 737 | pad=1 738 | filters=256 739 | activation=leaky 740 | 741 | [convolutional] 742 | batch_normalize=1 743 | filters=128 744 | size=1 745 | stride=1 746 | pad=1 747 | activation=leaky 748 | 749 | [convolutional] 750 | batch_normalize=1 751 | size=3 752 | stride=1 753 | pad=1 754 | filters=256 755 | activation=leaky 756 | 757 | [convolutional] 758 | batch_normalize=1 759 | filters=128 760 | size=1 761 | stride=1 762 | pad=1 763 | activation=leaky 764 | 765 | [convolutional] 766 | batch_normalize=1 767 | size=3 768 | stride=1 769 | pad=1 770 | filters=256 771 | activation=leaky 772 | 773 | [convolutional] 774 | size=1 775 | stride=1 776 | pad=1 777 | filters=147 778 | activation=linear 779 | 780 | 781 | [yolo] 782 | mask = 0,1,2 783 | anchors = 25,31, 35,44, 48,56, 59,73, 80,96, 112,132, 144,174, 195,227, 264,337 784 | classes=44 785 | num=9 786 | jitter=.3 787 | ignore_thresh = .7 788 | truth_thresh = 1 789 | random=1 790 | 791 | -------------------------------------------------------------------------------- /config/yolov3-tiny.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | batch=1 4 | subdivisions=1 5 | # Training 6 | # batch=64 7 | # subdivisions=2 8 | width=416 9 | height=416 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | # 0 26 | [convolutional] 27 | batch_normalize=1 28 | filters=16 29 | size=3 30 | stride=1 31 | pad=1 32 | activation=leaky 33 | 34 | # 1 35 | [maxpool] 36 | size=2 37 | stride=2 38 | 39 | # 2 40 | [convolutional] 41 | batch_normalize=1 42 | filters=32 43 | size=3 44 | stride=1 45 | pad=1 46 | activation=leaky 47 | 48 | # 3 49 | [maxpool] 50 | size=2 51 | stride=2 52 | 53 | # 4 54 | [convolutional] 55 | batch_normalize=1 56 | filters=64 57 | size=3 58 | stride=1 59 | pad=1 60 | activation=leaky 61 | 62 | # 5 63 | [maxpool] 64 | size=2 65 | stride=2 66 | 67 | # 6 68 | [convolutional] 69 | batch_normalize=1 70 | filters=128 71 | size=3 72 | stride=1 73 | pad=1 74 | activation=leaky 75 | 76 | # 7 77 | [maxpool] 78 | size=2 79 | stride=2 80 | 81 | # 8 82 | [convolutional] 83 | batch_normalize=1 84 | filters=256 85 | size=3 86 | stride=1 87 | pad=1 88 | activation=leaky 89 | 90 | # 9 91 | [maxpool] 92 | size=2 93 | stride=2 94 | 95 | # 10 96 | [convolutional] 97 | batch_normalize=1 98 | filters=512 99 | size=3 100 | stride=1 101 | pad=1 102 | activation=leaky 103 | 104 | # 11 105 | [maxpool] 106 | size=2 107 | stride=1 108 | 109 | # 12 110 | [convolutional] 111 | batch_normalize=1 112 | filters=1024 113 | size=3 114 | stride=1 115 | pad=1 116 | activation=leaky 117 | 118 | ########### 119 | 120 | # 13 121 | [convolutional] 122 | batch_normalize=1 123 | filters=256 124 | size=1 125 | stride=1 126 | pad=1 127 | activation=leaky 128 | 129 | # 14 130 | [convolutional] 131 | batch_normalize=1 132 | filters=512 133 | size=3 134 | stride=1 135 | pad=1 136 | activation=leaky 137 | 138 | # 15 139 | [convolutional] 140 | size=1 141 | stride=1 142 | pad=1 143 | filters=255 144 | activation=linear 145 | 146 | 147 | 148 | # 16 149 | [yolo] 150 | mask = 3,4,5 151 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 152 | classes=80 153 | num=6 154 | jitter=.3 155 | ignore_thresh = .7 156 | truth_thresh = 1 157 | random=1 158 | 159 | # 17 160 | [route] 161 | layers = -4 162 | 163 | # 18 164 | [convolutional] 165 | batch_normalize=1 166 | filters=128 167 | size=1 168 | stride=1 169 | pad=1 170 | activation=leaky 171 | 172 | # 19 173 | [upsample] 174 | stride=2 175 | 176 | # 20 177 | [route] 178 | layers = -1, 8 179 | 180 | # 21 181 | [convolutional] 182 | batch_normalize=1 183 | filters=256 184 | size=3 185 | stride=1 186 | pad=1 187 | activation=leaky 188 | 189 | # 22 190 | [convolutional] 191 | size=1 192 | stride=1 193 | pad=1 194 | filters=255 195 | activation=linear 196 | 197 | # 23 198 | [yolo] 199 | mask = 1,2,3 200 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 201 | classes=80 202 | num=6 203 | jitter=.3 204 | ignore_thresh = .7 205 | truth_thresh = 1 206 | random=1 207 | -------------------------------------------------------------------------------- /config/yolov3.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | #batch=1 4 | #subdivisions=1 5 | # Training 6 | batch=16 7 | subdivisions=1 8 | width=416 9 | height=416 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | [convolutional] 26 | batch_normalize=1 27 | filters=32 28 | size=3 29 | stride=1 30 | pad=1 31 | activation=leaky 32 | 33 | # Downsample 34 | 35 | [convolutional] 36 | batch_normalize=1 37 | filters=64 38 | size=3 39 | stride=2 40 | pad=1 41 | activation=leaky 42 | 43 | [convolutional] 44 | batch_normalize=1 45 | filters=32 46 | size=1 47 | stride=1 48 | pad=1 49 | activation=leaky 50 | 51 | [convolutional] 52 | batch_normalize=1 53 | filters=64 54 | size=3 55 | stride=1 56 | pad=1 57 | activation=leaky 58 | 59 | [shortcut] 60 | from=-3 61 | activation=linear 62 | 63 | # Downsample 64 | 65 | [convolutional] 66 | batch_normalize=1 67 | filters=128 68 | size=3 69 | stride=2 70 | pad=1 71 | activation=leaky 72 | 73 | [convolutional] 74 | batch_normalize=1 75 | filters=64 76 | size=1 77 | stride=1 78 | pad=1 79 | activation=leaky 80 | 81 | [convolutional] 82 | batch_normalize=1 83 | filters=128 84 | size=3 85 | stride=1 86 | pad=1 87 | activation=leaky 88 | 89 | [shortcut] 90 | from=-3 91 | activation=linear 92 | 93 | [convolutional] 94 | batch_normalize=1 95 | filters=64 96 | size=1 97 | stride=1 98 | pad=1 99 | activation=leaky 100 | 101 | [convolutional] 102 | batch_normalize=1 103 | filters=128 104 | size=3 105 | stride=1 106 | pad=1 107 | activation=leaky 108 | 109 | [shortcut] 110 | from=-3 111 | activation=linear 112 | 113 | # Downsample 114 | 115 | [convolutional] 116 | batch_normalize=1 117 | filters=256 118 | size=3 119 | stride=2 120 | pad=1 121 | activation=leaky 122 | 123 | [convolutional] 124 | batch_normalize=1 125 | filters=128 126 | size=1 127 | stride=1 128 | pad=1 129 | activation=leaky 130 | 131 | [convolutional] 132 | batch_normalize=1 133 | filters=256 134 | size=3 135 | stride=1 136 | pad=1 137 | activation=leaky 138 | 139 | [shortcut] 140 | from=-3 141 | activation=linear 142 | 143 | [convolutional] 144 | batch_normalize=1 145 | filters=128 146 | size=1 147 | stride=1 148 | pad=1 149 | activation=leaky 150 | 151 | [convolutional] 152 | batch_normalize=1 153 | filters=256 154 | size=3 155 | stride=1 156 | pad=1 157 | activation=leaky 158 | 159 | [shortcut] 160 | from=-3 161 | activation=linear 162 | 163 | [convolutional] 164 | batch_normalize=1 165 | filters=128 166 | size=1 167 | stride=1 168 | pad=1 169 | activation=leaky 170 | 171 | [convolutional] 172 | batch_normalize=1 173 | filters=256 174 | size=3 175 | stride=1 176 | pad=1 177 | activation=leaky 178 | 179 | [shortcut] 180 | from=-3 181 | activation=linear 182 | 183 | [convolutional] 184 | batch_normalize=1 185 | filters=128 186 | size=1 187 | stride=1 188 | pad=1 189 | activation=leaky 190 | 191 | [convolutional] 192 | batch_normalize=1 193 | filters=256 194 | size=3 195 | stride=1 196 | pad=1 197 | activation=leaky 198 | 199 | [shortcut] 200 | from=-3 201 | activation=linear 202 | 203 | 204 | [convolutional] 205 | batch_normalize=1 206 | filters=128 207 | size=1 208 | stride=1 209 | pad=1 210 | activation=leaky 211 | 212 | [convolutional] 213 | batch_normalize=1 214 | filters=256 215 | size=3 216 | stride=1 217 | pad=1 218 | activation=leaky 219 | 220 | [shortcut] 221 | from=-3 222 | activation=linear 223 | 224 | [convolutional] 225 | batch_normalize=1 226 | filters=128 227 | size=1 228 | stride=1 229 | pad=1 230 | activation=leaky 231 | 232 | [convolutional] 233 | batch_normalize=1 234 | filters=256 235 | size=3 236 | stride=1 237 | pad=1 238 | activation=leaky 239 | 240 | [shortcut] 241 | from=-3 242 | activation=linear 243 | 244 | [convolutional] 245 | batch_normalize=1 246 | filters=128 247 | size=1 248 | stride=1 249 | pad=1 250 | activation=leaky 251 | 252 | [convolutional] 253 | batch_normalize=1 254 | filters=256 255 | size=3 256 | stride=1 257 | pad=1 258 | activation=leaky 259 | 260 | [shortcut] 261 | from=-3 262 | activation=linear 263 | 264 | [convolutional] 265 | batch_normalize=1 266 | filters=128 267 | size=1 268 | stride=1 269 | pad=1 270 | activation=leaky 271 | 272 | [convolutional] 273 | batch_normalize=1 274 | filters=256 275 | size=3 276 | stride=1 277 | pad=1 278 | activation=leaky 279 | 280 | [shortcut] 281 | from=-3 282 | activation=linear 283 | 284 | # Downsample 285 | 286 | [convolutional] 287 | batch_normalize=1 288 | filters=512 289 | size=3 290 | stride=2 291 | pad=1 292 | activation=leaky 293 | 294 | [convolutional] 295 | batch_normalize=1 296 | filters=256 297 | size=1 298 | stride=1 299 | pad=1 300 | activation=leaky 301 | 302 | [convolutional] 303 | batch_normalize=1 304 | filters=512 305 | size=3 306 | stride=1 307 | pad=1 308 | activation=leaky 309 | 310 | [shortcut] 311 | from=-3 312 | activation=linear 313 | 314 | 315 | [convolutional] 316 | batch_normalize=1 317 | filters=256 318 | size=1 319 | stride=1 320 | pad=1 321 | activation=leaky 322 | 323 | [convolutional] 324 | batch_normalize=1 325 | filters=512 326 | size=3 327 | stride=1 328 | pad=1 329 | activation=leaky 330 | 331 | [shortcut] 332 | from=-3 333 | activation=linear 334 | 335 | 336 | [convolutional] 337 | batch_normalize=1 338 | filters=256 339 | size=1 340 | stride=1 341 | pad=1 342 | activation=leaky 343 | 344 | [convolutional] 345 | batch_normalize=1 346 | filters=512 347 | size=3 348 | stride=1 349 | pad=1 350 | activation=leaky 351 | 352 | [shortcut] 353 | from=-3 354 | activation=linear 355 | 356 | 357 | [convolutional] 358 | batch_normalize=1 359 | filters=256 360 | size=1 361 | stride=1 362 | pad=1 363 | activation=leaky 364 | 365 | [convolutional] 366 | batch_normalize=1 367 | filters=512 368 | size=3 369 | stride=1 370 | pad=1 371 | activation=leaky 372 | 373 | [shortcut] 374 | from=-3 375 | activation=linear 376 | 377 | [convolutional] 378 | batch_normalize=1 379 | filters=256 380 | size=1 381 | stride=1 382 | pad=1 383 | activation=leaky 384 | 385 | [convolutional] 386 | batch_normalize=1 387 | filters=512 388 | size=3 389 | stride=1 390 | pad=1 391 | activation=leaky 392 | 393 | [shortcut] 394 | from=-3 395 | activation=linear 396 | 397 | 398 | [convolutional] 399 | batch_normalize=1 400 | filters=256 401 | size=1 402 | stride=1 403 | pad=1 404 | activation=leaky 405 | 406 | [convolutional] 407 | batch_normalize=1 408 | filters=512 409 | size=3 410 | stride=1 411 | pad=1 412 | activation=leaky 413 | 414 | [shortcut] 415 | from=-3 416 | activation=linear 417 | 418 | 419 | [convolutional] 420 | batch_normalize=1 421 | filters=256 422 | size=1 423 | stride=1 424 | pad=1 425 | activation=leaky 426 | 427 | [convolutional] 428 | batch_normalize=1 429 | filters=512 430 | size=3 431 | stride=1 432 | pad=1 433 | activation=leaky 434 | 435 | [shortcut] 436 | from=-3 437 | activation=linear 438 | 439 | [convolutional] 440 | batch_normalize=1 441 | filters=256 442 | size=1 443 | stride=1 444 | pad=1 445 | activation=leaky 446 | 447 | [convolutional] 448 | batch_normalize=1 449 | filters=512 450 | size=3 451 | stride=1 452 | pad=1 453 | activation=leaky 454 | 455 | [shortcut] 456 | from=-3 457 | activation=linear 458 | 459 | # Downsample 460 | 461 | [convolutional] 462 | batch_normalize=1 463 | filters=1024 464 | size=3 465 | stride=2 466 | pad=1 467 | activation=leaky 468 | 469 | [convolutional] 470 | batch_normalize=1 471 | filters=512 472 | size=1 473 | stride=1 474 | pad=1 475 | activation=leaky 476 | 477 | [convolutional] 478 | batch_normalize=1 479 | filters=1024 480 | size=3 481 | stride=1 482 | pad=1 483 | activation=leaky 484 | 485 | [shortcut] 486 | from=-3 487 | activation=linear 488 | 489 | [convolutional] 490 | batch_normalize=1 491 | filters=512 492 | size=1 493 | stride=1 494 | pad=1 495 | activation=leaky 496 | 497 | [convolutional] 498 | batch_normalize=1 499 | filters=1024 500 | size=3 501 | stride=1 502 | pad=1 503 | activation=leaky 504 | 505 | [shortcut] 506 | from=-3 507 | activation=linear 508 | 509 | [convolutional] 510 | batch_normalize=1 511 | filters=512 512 | size=1 513 | stride=1 514 | pad=1 515 | activation=leaky 516 | 517 | [convolutional] 518 | batch_normalize=1 519 | filters=1024 520 | size=3 521 | stride=1 522 | pad=1 523 | activation=leaky 524 | 525 | [shortcut] 526 | from=-3 527 | activation=linear 528 | 529 | [convolutional] 530 | batch_normalize=1 531 | filters=512 532 | size=1 533 | stride=1 534 | pad=1 535 | activation=leaky 536 | 537 | [convolutional] 538 | batch_normalize=1 539 | filters=1024 540 | size=3 541 | stride=1 542 | pad=1 543 | activation=leaky 544 | 545 | [shortcut] 546 | from=-3 547 | activation=linear 548 | 549 | ###################### 550 | 551 | [convolutional] 552 | batch_normalize=1 553 | filters=512 554 | size=1 555 | stride=1 556 | pad=1 557 | activation=leaky 558 | 559 | [convolutional] 560 | batch_normalize=1 561 | size=3 562 | stride=1 563 | pad=1 564 | filters=1024 565 | activation=leaky 566 | 567 | [convolutional] 568 | batch_normalize=1 569 | filters=512 570 | size=1 571 | stride=1 572 | pad=1 573 | activation=leaky 574 | 575 | [convolutional] 576 | batch_normalize=1 577 | size=3 578 | stride=1 579 | pad=1 580 | filters=1024 581 | activation=leaky 582 | 583 | [convolutional] 584 | batch_normalize=1 585 | filters=512 586 | size=1 587 | stride=1 588 | pad=1 589 | activation=leaky 590 | 591 | [convolutional] 592 | batch_normalize=1 593 | size=3 594 | stride=1 595 | pad=1 596 | filters=1024 597 | activation=leaky 598 | 599 | [convolutional] 600 | size=1 601 | stride=1 602 | pad=1 603 | filters=255 604 | activation=linear 605 | 606 | 607 | [yolo] 608 | mask = 6,7,8 609 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 610 | classes=80 611 | num=9 612 | jitter=.3 613 | ignore_thresh = .7 614 | truth_thresh = 1 615 | random=1 616 | 617 | 618 | [route] 619 | layers = -4 620 | 621 | [convolutional] 622 | batch_normalize=1 623 | filters=256 624 | size=1 625 | stride=1 626 | pad=1 627 | activation=leaky 628 | 629 | [upsample] 630 | stride=2 631 | 632 | [route] 633 | layers = -1, 61 634 | 635 | 636 | 637 | [convolutional] 638 | batch_normalize=1 639 | filters=256 640 | size=1 641 | stride=1 642 | pad=1 643 | activation=leaky 644 | 645 | [convolutional] 646 | batch_normalize=1 647 | size=3 648 | stride=1 649 | pad=1 650 | filters=512 651 | activation=leaky 652 | 653 | [convolutional] 654 | batch_normalize=1 655 | filters=256 656 | size=1 657 | stride=1 658 | pad=1 659 | activation=leaky 660 | 661 | [convolutional] 662 | batch_normalize=1 663 | size=3 664 | stride=1 665 | pad=1 666 | filters=512 667 | activation=leaky 668 | 669 | [convolutional] 670 | batch_normalize=1 671 | filters=256 672 | size=1 673 | stride=1 674 | pad=1 675 | activation=leaky 676 | 677 | [convolutional] 678 | batch_normalize=1 679 | size=3 680 | stride=1 681 | pad=1 682 | filters=512 683 | activation=leaky 684 | 685 | [convolutional] 686 | size=1 687 | stride=1 688 | pad=1 689 | filters=255 690 | activation=linear 691 | 692 | 693 | [yolo] 694 | mask = 3,4,5 695 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 696 | classes=80 697 | num=9 698 | jitter=.3 699 | ignore_thresh = .7 700 | truth_thresh = 1 701 | random=1 702 | 703 | 704 | 705 | [route] 706 | layers = -4 707 | 708 | [convolutional] 709 | batch_normalize=1 710 | filters=128 711 | size=1 712 | stride=1 713 | pad=1 714 | activation=leaky 715 | 716 | [upsample] 717 | stride=2 718 | 719 | [route] 720 | layers = -1, 36 721 | 722 | 723 | 724 | [convolutional] 725 | batch_normalize=1 726 | filters=128 727 | size=1 728 | stride=1 729 | pad=1 730 | activation=leaky 731 | 732 | [convolutional] 733 | batch_normalize=1 734 | size=3 735 | stride=1 736 | pad=1 737 | filters=256 738 | activation=leaky 739 | 740 | [convolutional] 741 | batch_normalize=1 742 | filters=128 743 | size=1 744 | stride=1 745 | pad=1 746 | activation=leaky 747 | 748 | [convolutional] 749 | batch_normalize=1 750 | size=3 751 | stride=1 752 | pad=1 753 | filters=256 754 | activation=leaky 755 | 756 | [convolutional] 757 | batch_normalize=1 758 | filters=128 759 | size=1 760 | stride=1 761 | pad=1 762 | activation=leaky 763 | 764 | [convolutional] 765 | batch_normalize=1 766 | size=3 767 | stride=1 768 | pad=1 769 | filters=256 770 | activation=leaky 771 | 772 | [convolutional] 773 | size=1 774 | stride=1 775 | pad=1 776 | filters=255 777 | activation=linear 778 | 779 | 780 | [yolo] 781 | mask = 0,1,2 782 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 783 | classes=80 784 | num=9 785 | jitter=.3 786 | ignore_thresh = .7 787 | truth_thresh = 1 788 | random=1 789 | -------------------------------------------------------------------------------- /deploy_scripts/config.json: -------------------------------------------------------------------------------- 1 | { 2 | "model_type": "PyTorch", 3 | "runtime": "python3.6", 4 | "model_algorithm": "object_detection", 5 | "metrics": { 6 | "f1": 0.0, 7 | "accuracy": 0.0, 8 | "precision": 0.0, 9 | "recall": 0.0 10 | }, 11 | "apis": [{ 12 | "protocol": "https", 13 | "url": "/", 14 | "method": "post", 15 | "request": { 16 | "Content-type": "multipart/form-data", 17 | "data": { 18 | "type": "object", 19 | "properties": { 20 | "images": { 21 | "type": "file" 22 | } 23 | } 24 | } 25 | }, 26 | "response": { 27 | "Content-type": "multipart/form-data", 28 | "data": { 29 | "type": "object", 30 | "properties": { 31 | "detection_classes": { 32 | "type": "list", 33 | "items": [{ 34 | "type": "string" 35 | }] 36 | }, 37 | "detection_scores": { 38 | "type": "list", 39 | "items": [{ 40 | "type": "number" 41 | }] 42 | }, 43 | "detection_boxes": { 44 | "type": "list", 45 | "items": [{ 46 | "type": "list", 47 | "minItems": 4, 48 | "maxItems": 4, 49 | "items": [{ 50 | "type": "number" 51 | }] 52 | }] 53 | } 54 | } 55 | } 56 | } 57 | }], 58 | "dependencies": [{ 59 | "installer": "pip", 60 | "packages": [ 61 | { 62 | "restraint": "EXACT", 63 | "package_version": "5.2.0", 64 | "package_name": "Pillow" 65 | }, 66 | { 67 | "restraint": "EXACT", 68 | "package_version": "1.3.1", 69 | "package_name": "torch" 70 | }, 71 | { 72 | "restraint": "EXACT", 73 | "package_version": "4.32.1", 74 | "package_name": "tqdm" 75 | }, 76 | { 77 | "restraint": "EXACT", 78 | "package_version": "0.4.2", 79 | "package_name": "torchvision" 80 | } 81 | ] 82 | }] 83 | } 84 | -------------------------------------------------------------------------------- /deploy_scripts/customize_service.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import json 3 | import codecs 4 | from collections import OrderedDict 5 | from models import * 6 | from my_utils.utils import * 7 | from my_utils.datasets import * 8 | 9 | 10 | from model_service.pytorch_model_service import PTServingBaseService 11 | 12 | import time 13 | from metric.metrics_manager import MetricsManager 14 | import log 15 | logger = log.getLogger(__name__) 16 | 17 | 18 | class ObjectDetectionService(PTServingBaseService): 19 | def __init__(self, model_name, model_path): 20 | # make sure these files exist 21 | self.model_name = model_name 22 | self.model_path = os.path.join(os.path.dirname(__file__), 'models_best.pth') 23 | self.classes_path = os.path.join(os.path.dirname(__file__), 'train_classes.txt') 24 | self.model_def = os.path.join(os.path.dirname(__file__), 'yolov3-44.cfg') 25 | self.label_map = parse_classify_rule(os.path.join(os.path.dirname(__file__), 'classify_rule.json')) 26 | 27 | self.input_image_key = 'images' 28 | self.score = 0.3 29 | self.iou = 0.45 30 | self.img_size = 416 31 | self.classes = self._get_class() 32 | # define and load YOLOv3 model 33 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 34 | self.model = Darknet(self.model_def, img_size=self.img_size).to(device) 35 | if self.model_path.endswith(".weights"): 36 | # Load darknet weights 37 | self.model.load_darknet_weights(self.model_path) 38 | else: 39 | # Load checkpoint weights 40 | self.model.load_state_dict(torch.load(self.model_path, map_location='cpu')) 41 | print('load weights file success') 42 | self.model.eval() 43 | 44 | def _get_class(self): 45 | classes_path = os.path.expanduser(self.classes_path) 46 | with codecs.open(classes_path, 'r', 'utf-8') as f: 47 | class_names = f.readlines() 48 | class_names = [c.strip() for c in class_names] 49 | return class_names 50 | 51 | def _preprocess(self, data): 52 | preprocessed_data = {} 53 | for k, v in data.items(): 54 | for file_name, file_content in v.items(): 55 | img = Image.open(file_content) 56 | # store image size (height, width) 57 | shape = (img.size[1], img.size[0]) 58 | # convert to tensor 59 | img = transforms.ToTensor()(img) 60 | # Pad to square resolution 61 | img, _ = pad_to_square(img, 0) 62 | # Resize 63 | img = resize(img, 416) 64 | # unsqueeze 65 | img = img.unsqueeze(0) 66 | 67 | preprocessed_data[k] = [img, shape] 68 | return preprocessed_data 69 | 70 | def _inference(self, data): 71 | """ 72 | model inference function 73 | Here are a inference example of resnet, if you use another model, please modify this function 74 | """ 75 | img, shape = data[self.input_image_key] 76 | 77 | Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor 78 | input_imgs = Variable(img.type(Tensor)) 79 | 80 | # Get detections 81 | with torch.no_grad(): 82 | detections = self.model(input_imgs) 83 | detections = non_max_suppression(detections, self.score, self.iou) 84 | 85 | result = OrderedDict() 86 | if detections[0] is not None: 87 | detections = rescale_boxes(detections[0], self.img_size, shape) 88 | detections = detections.numpy().tolist() 89 | out_classes = [x[6] for x in detections] 90 | out_scores = [x[5] for x in detections] 91 | out_boxes = [x[:4] for x in detections] 92 | 93 | detection_class_names = [] 94 | for class_id in out_classes: 95 | class_name = self.classes[int(class_id)] 96 | class_name = self.label_map[class_name] + '/' + class_name 97 | detection_class_names.append(class_name) 98 | out_boxes_list = [] 99 | for box in out_boxes: 100 | out_boxes_list.append([round(float(v), 1) for v in box]) 101 | result['detection_classes'] = detection_class_names 102 | result['detection_scores'] = [round(float(v), 4) for v in out_scores] 103 | result['detection_boxes'] = out_boxes_list 104 | else: 105 | result['detection_classes'] = [] 106 | result['detection_scores'] = [] 107 | result['detection_boxes'] = [] 108 | 109 | return result 110 | 111 | def _postprocess(self, data): 112 | return data 113 | 114 | def inference(self, data): 115 | ''' 116 | Wrapper function to run preprocess, inference and postprocess functions. 117 | 118 | Parameters 119 | ---------- 120 | data : map of object 121 | Raw input from request. 122 | 123 | Returns 124 | ------- 125 | list of outputs to be sent back to client. 126 | data to be sent back 127 | ''' 128 | pre_start_time = time.time() 129 | data = self._preprocess(data) 130 | infer_start_time = time.time() 131 | # Update preprocess latency metric 132 | pre_time_in_ms = (infer_start_time - pre_start_time) * 1000 133 | logger.info('preprocess time: ' + str(pre_time_in_ms) + 'ms') 134 | 135 | if self.model_name + '_LatencyPreprocess' in MetricsManager.metrics: 136 | MetricsManager.metrics[self.model_name + '_LatencyPreprocess'].update(pre_time_in_ms) 137 | 138 | data = self._inference(data) 139 | infer_end_time = time.time() 140 | infer_in_ms = (infer_end_time - infer_start_time) * 1000 141 | 142 | logger.info('infer time: ' + str(infer_in_ms) + 'ms') 143 | data = self._postprocess(data) 144 | 145 | # Update inference latency metric 146 | post_time_in_ms = (time.time() - infer_end_time) * 1000 147 | logger.info('postprocess time: ' + str(post_time_in_ms) + 'ms') 148 | if self.model_name + '_LatencyInference' in MetricsManager.metrics: 149 | MetricsManager.metrics[self.model_name + '_LatencyInference'].update(post_time_in_ms) 150 | 151 | # Update overall latency metric 152 | if self.model_name + '_LatencyOverall' in MetricsManager.metrics: 153 | MetricsManager.metrics[self.model_name + '_LatencyOverall'].update(pre_time_in_ms + post_time_in_ms) 154 | 155 | logger.info('latency: ' + str(pre_time_in_ms + infer_in_ms + post_time_in_ms) + 'ms') 156 | data['latency_time'] = str(round(pre_time_in_ms + infer_in_ms + post_time_in_ms, 1)) + ' ms' 157 | return data 158 | 159 | 160 | def parse_classify_rule(json_path=''): 161 | with codecs.open(json_path, 'r', 'utf-8') as f: 162 | rule = json.load(f) 163 | label_map = {} 164 | for super_label, labels in rule.items(): 165 | for label in labels: 166 | label_map[label] = super_label 167 | return label_map 168 | -------------------------------------------------------------------------------- /detect.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | 3 | from models import * 4 | from my_utils.utils import * 5 | from my_utils.datasets import * 6 | 7 | import os 8 | import sys 9 | import time 10 | import datetime 11 | import argparse 12 | 13 | from PIL import Image 14 | 15 | import torch 16 | from torch.utils.data import DataLoader 17 | from torchvision import datasets 18 | from torch.autograd import Variable 19 | 20 | import matplotlib.pyplot as plt 21 | import matplotlib.patches as patches 22 | from matplotlib.ticker import NullLocator 23 | 24 | if __name__ == "__main__": 25 | parser = argparse.ArgumentParser() 26 | parser.add_argument("--image_folder", type=str, default="data/samples", help="path to dataset") 27 | parser.add_argument("--model_def", type=str, default="config/yolov3.cfg", help="path to model definition file") 28 | parser.add_argument("--weights_path", type=str, default="weights/yolov3.weights", help="path to weights file") 29 | parser.add_argument("--class_path", type=str, default="data/coco.names", help="path to class label file") 30 | parser.add_argument("--conf_thres", type=float, default=0.8, help="object confidence threshold") 31 | parser.add_argument("--nms_thres", type=float, default=0.4, help="iou thresshold for non-maximum suppression") 32 | parser.add_argument("--batch_size", type=int, default=1, help="size of the batches") 33 | parser.add_argument("--n_cpu", type=int, default=0, help="number of cpu threads to use during batch generation") 34 | parser.add_argument("--img_size", type=int, default=416, help="size of each image dimension") 35 | parser.add_argument("--checkpoint_model", type=str, help="path to checkpoint model") 36 | opt = parser.parse_args() 37 | print(opt) 38 | 39 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 40 | 41 | os.makedirs("output", exist_ok=True) 42 | 43 | # Set up model 44 | model = Darknet(opt.model_def, img_size=opt.img_size).to(device) 45 | 46 | if opt.weights_path.endswith(".weights"): 47 | # Load darknet weights 48 | model.load_darknet_weights(opt.weights_path) 49 | else: 50 | # Load checkpoint weights 51 | model.load_state_dict(torch.load(opt.weights_path)) 52 | 53 | model.eval() # Set in evaluation mode 54 | 55 | dataloader = DataLoader( 56 | ImageFolder(opt.image_folder, img_size=opt.img_size), 57 | batch_size=opt.batch_size, 58 | shuffle=False, 59 | num_workers=opt.n_cpu, 60 | ) 61 | 62 | classes = load_classes(opt.class_path) # Extracts class labels from file 63 | 64 | Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor 65 | 66 | imgs = [] # Stores image paths 67 | img_detections = [] # Stores detections for each image index 68 | 69 | print("\nPerforming object detection:") 70 | prev_time = time.time() 71 | for batch_i, (img_paths, input_imgs) in enumerate(dataloader): 72 | # Configure input 73 | input_imgs = Variable(input_imgs.type(Tensor)) 74 | 75 | # Get detections 76 | with torch.no_grad(): 77 | detections = model(input_imgs) 78 | detections = non_max_suppression(detections, opt.conf_thres, opt.nms_thres) 79 | 80 | # Log progress 81 | current_time = time.time() 82 | inference_time = datetime.timedelta(seconds=current_time - prev_time) 83 | prev_time = current_time 84 | print("\t+ Batch %d, Inference Time: %s" % (batch_i, inference_time)) 85 | 86 | # Save image and detections 87 | imgs.extend(img_paths) 88 | img_detections.extend(detections) 89 | 90 | # Bounding-box colors 91 | cmap = plt.get_cmap("tab20b") 92 | colors = [cmap(i) for i in np.linspace(0, 1, 20)] 93 | 94 | print("\nSaving images:") 95 | # Iterate through images and save plot of detections 96 | for img_i, (path, detections) in enumerate(zip(imgs, img_detections)): 97 | 98 | print("(%d) Image: '%s'" % (img_i, path)) 99 | 100 | # Create plot 101 | img = np.array(Image.open(path)) 102 | plt.figure() 103 | fig, ax = plt.subplots(1) 104 | ax.imshow(img) 105 | 106 | # Draw bounding boxes and labels of detections 107 | if detections is not None: 108 | # Rescale boxes to original image 109 | detections = rescale_boxes(detections, opt.img_size, img.shape[:2]) 110 | unique_labels = detections[:, -1].cpu().unique() 111 | n_cls_preds = len(unique_labels) 112 | bbox_colors = random.sample(colors, n_cls_preds) 113 | for x1, y1, x2, y2, conf, cls_conf, cls_pred in detections: 114 | 115 | print("\t+ Label: %s, Conf: %.5f" % (classes[int(cls_pred)], cls_conf.item())) 116 | 117 | box_w = x2 - x1 118 | box_h = y2 - y1 119 | 120 | color = bbox_colors[int(np.where(unique_labels == int(cls_pred))[0])] 121 | # Create a Rectangle patch 122 | bbox = patches.Rectangle((x1, y1), box_w, box_h, linewidth=2, edgecolor=color, facecolor="none") 123 | # Add the bbox to the plot 124 | ax.add_patch(bbox) 125 | # Add label 126 | plt.text( 127 | x1, 128 | y1, 129 | s=classes[int(cls_pred)], 130 | color="white", 131 | verticalalignment="top", 132 | bbox={"color": color, "pad": 0}, 133 | ) 134 | 135 | # Save generated image with detections 136 | plt.axis("off") 137 | plt.gca().xaxis.set_major_locator(NullLocator()) 138 | plt.gca().yaxis.set_major_locator(NullLocator()) 139 | filename = path.split("/")[-1].split(".")[0] 140 | plt.savefig(f"output/{filename}.png", bbox_inches="tight", pad_inches=0.0) 141 | plt.close() 142 | -------------------------------------------------------------------------------- /models.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | 3 | import torch 4 | import torch.nn as nn 5 | import torch.nn.functional as F 6 | from torch.autograd import Variable 7 | import numpy as np 8 | 9 | from my_utils.parse_config import * 10 | from my_utils.utils import build_targets, to_cpu, non_max_suppression 11 | 12 | 13 | def create_modules(module_defs): 14 | """ 15 | Constructs module list of layer blocks from module configuration in module_defs 16 | """ 17 | hyperparams = module_defs.pop(0) 18 | output_filters = [int(hyperparams["channels"])] 19 | module_list = nn.ModuleList() 20 | for module_i, module_def in enumerate(module_defs): 21 | modules = nn.Sequential() 22 | 23 | if module_def["type"] == "convolutional": 24 | bn = int(module_def["batch_normalize"]) 25 | filters = int(module_def["filters"]) 26 | kernel_size = int(module_def["size"]) 27 | pad = (kernel_size - 1) // 2 28 | modules.add_module( 29 | f"conv_{module_i}", 30 | nn.Conv2d( 31 | in_channels=output_filters[-1], 32 | out_channels=filters, 33 | kernel_size=kernel_size, 34 | stride=int(module_def["stride"]), 35 | padding=pad, 36 | bias=not bn, 37 | ), 38 | ) 39 | if bn: 40 | modules.add_module(f"batch_norm_{module_i}", nn.BatchNorm2d(filters, momentum=0.9, eps=1e-5)) 41 | if module_def["activation"] == "leaky": 42 | modules.add_module(f"leaky_{module_i}", nn.LeakyReLU(0.1)) 43 | 44 | elif module_def["type"] == "maxpool": 45 | kernel_size = int(module_def["size"]) 46 | stride = int(module_def["stride"]) 47 | if kernel_size == 2 and stride == 1: 48 | modules.add_module(f"_debug_padding_{module_i}", nn.ZeroPad2d((0, 1, 0, 1))) 49 | maxpool = nn.MaxPool2d(kernel_size=kernel_size, stride=stride, padding=int((kernel_size - 1) // 2)) 50 | modules.add_module(f"maxpool_{module_i}", maxpool) 51 | 52 | elif module_def["type"] == "upsample": 53 | upsample = Upsample(scale_factor=int(module_def["stride"]), mode="nearest") 54 | modules.add_module(f"upsample_{module_i}", upsample) 55 | 56 | elif module_def["type"] == "route": 57 | layers = [int(x) for x in module_def["layers"].split(",")] 58 | filters = sum([output_filters[1:][i] for i in layers]) 59 | modules.add_module(f"route_{module_i}", EmptyLayer()) 60 | 61 | elif module_def["type"] == "shortcut": 62 | filters = output_filters[1:][int(module_def["from"])] 63 | modules.add_module(f"shortcut_{module_i}", EmptyLayer()) 64 | 65 | elif module_def["type"] == "yolo": 66 | anchor_idxs = [int(x) for x in module_def["mask"].split(",")] 67 | # Extract anchors 68 | anchors = [int(x) for x in module_def["anchors"].split(",")] 69 | anchors = [(anchors[i], anchors[i + 1]) for i in range(0, len(anchors), 2)] 70 | anchors = [anchors[i] for i in anchor_idxs] 71 | num_classes = int(module_def["classes"]) 72 | img_size = int(hyperparams["height"]) 73 | # Define detection layer 74 | yolo_layer = YOLOLayer(anchors, num_classes, img_size) 75 | modules.add_module(f"yolo_{module_i}", yolo_layer) 76 | # Register module list and number of output filters 77 | module_list.append(modules) 78 | output_filters.append(filters) 79 | 80 | return hyperparams, module_list 81 | 82 | 83 | class Upsample(nn.Module): 84 | """ nn.Upsample is deprecated """ 85 | 86 | def __init__(self, scale_factor, mode="nearest"): 87 | super(Upsample, self).__init__() 88 | self.scale_factor = scale_factor 89 | self.mode = mode 90 | 91 | def forward(self, x): 92 | x = F.interpolate(x, scale_factor=self.scale_factor, mode=self.mode) 93 | return x 94 | 95 | 96 | class EmptyLayer(nn.Module): 97 | """Placeholder for 'route' and 'shortcut' layers""" 98 | 99 | def __init__(self): 100 | super(EmptyLayer, self).__init__() 101 | 102 | 103 | class YOLOLayer(nn.Module): 104 | """Detection layer""" 105 | 106 | def __init__(self, anchors, num_classes, img_dim=416): 107 | super(YOLOLayer, self).__init__() 108 | self.anchors = anchors 109 | self.num_anchors = len(anchors) 110 | self.num_classes = num_classes 111 | self.ignore_thres = 0.5 112 | self.mse_loss = nn.MSELoss() 113 | self.bce_loss = nn.BCELoss() 114 | self.obj_scale = 1 115 | self.noobj_scale = 100 116 | self.metrics = {} 117 | self.img_dim = img_dim 118 | self.grid_size = 0 # grid size 119 | 120 | def compute_grid_offsets(self, grid_size, cuda=True): 121 | self.grid_size = grid_size 122 | g = self.grid_size 123 | FloatTensor = torch.cuda.FloatTensor if cuda else torch.FloatTensor 124 | self.stride = self.img_dim / self.grid_size 125 | # Calculate offsets for each grid 126 | self.grid_x = torch.arange(g).repeat(g, 1).view([1, 1, g, g]).type(FloatTensor) 127 | self.grid_y = torch.arange(g).repeat(g, 1).t().view([1, 1, g, g]).type(FloatTensor) 128 | self.scaled_anchors = FloatTensor([(a_w / self.stride, a_h / self.stride) for a_w, a_h in self.anchors]) 129 | self.anchor_w = self.scaled_anchors[:, 0:1].view((1, self.num_anchors, 1, 1)) 130 | self.anchor_h = self.scaled_anchors[:, 1:2].view((1, self.num_anchors, 1, 1)) 131 | 132 | def forward(self, x, targets=None, img_dim=None): 133 | 134 | # Tensors for cuda support 135 | FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor 136 | LongTensor = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor 137 | ByteTensor = torch.cuda.ByteTensor if x.is_cuda else torch.ByteTensor 138 | 139 | self.img_dim = img_dim 140 | num_samples = x.size(0) 141 | grid_size = x.size(2) 142 | 143 | prediction = ( 144 | x.view(num_samples, self.num_anchors, self.num_classes + 5, grid_size, grid_size) 145 | .permute(0, 1, 3, 4, 2) 146 | .contiguous() 147 | ) 148 | 149 | # Get outputs 150 | x = torch.sigmoid(prediction[..., 0]) # Center x 151 | y = torch.sigmoid(prediction[..., 1]) # Center y 152 | w = prediction[..., 2] # Width 153 | h = prediction[..., 3] # Height 154 | pred_conf = torch.sigmoid(prediction[..., 4]) # Conf 155 | pred_cls = torch.sigmoid(prediction[..., 5:]) # Cls pred. 156 | 157 | # If grid size does not match current we compute new offsets 158 | if grid_size != self.grid_size: 159 | self.compute_grid_offsets(grid_size, cuda=x.is_cuda) 160 | 161 | # Add offset and scale with anchors 162 | pred_boxes = FloatTensor(prediction[..., :4].shape) 163 | pred_boxes[..., 0] = x.data + self.grid_x 164 | pred_boxes[..., 1] = y.data + self.grid_y 165 | pred_boxes[..., 2] = torch.exp(w.data) * self.anchor_w 166 | pred_boxes[..., 3] = torch.exp(h.data) * self.anchor_h 167 | 168 | output = torch.cat( 169 | ( 170 | pred_boxes.view(num_samples, -1, 4) * self.stride, 171 | pred_conf.view(num_samples, -1, 1), 172 | pred_cls.view(num_samples, -1, self.num_classes), 173 | ), 174 | -1, 175 | ) 176 | 177 | if targets is None: 178 | return output, 0 179 | else: 180 | iou_scores, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tcls, tconf = build_targets( 181 | pred_boxes=pred_boxes, 182 | pred_cls=pred_cls, 183 | target=targets, 184 | anchors=self.scaled_anchors, 185 | ignore_thres=self.ignore_thres, 186 | ) 187 | 188 | obj_mask = obj_mask.bool() # convert int8 to bool 189 | noobj_mask = noobj_mask.bool() # convert int8 to bool 190 | 191 | # Loss : Mask outputs to ignore non-existing objects (except with conf. loss) 192 | loss_x = self.mse_loss(x[obj_mask], tx[obj_mask]) 193 | loss_y = self.mse_loss(y[obj_mask], ty[obj_mask]) 194 | loss_w = self.mse_loss(w[obj_mask], tw[obj_mask]) 195 | loss_h = self.mse_loss(h[obj_mask], th[obj_mask]) 196 | loss_conf_obj = self.bce_loss(pred_conf[obj_mask], tconf[obj_mask]) 197 | loss_conf_noobj = self.bce_loss(pred_conf[noobj_mask], tconf[noobj_mask]) 198 | loss_conf = self.obj_scale * loss_conf_obj + self.noobj_scale * loss_conf_noobj 199 | loss_cls = self.bce_loss(pred_cls[obj_mask], tcls[obj_mask]) 200 | total_loss = loss_x + loss_y + loss_w + loss_h + loss_conf + loss_cls 201 | 202 | # Metrics 203 | cls_acc = 100 * class_mask[obj_mask].mean() 204 | conf_obj = pred_conf[obj_mask].mean() 205 | conf_noobj = pred_conf[noobj_mask].mean() 206 | conf50 = (pred_conf > 0.5).float() 207 | iou50 = (iou_scores > 0.5).float() 208 | iou75 = (iou_scores > 0.75).float() 209 | detected_mask = conf50 * class_mask * tconf 210 | precision = torch.sum(iou50 * detected_mask) / (conf50.sum() + 1e-16) 211 | recall50 = torch.sum(iou50 * detected_mask) / (obj_mask.sum() + 1e-16) 212 | recall75 = torch.sum(iou75 * detected_mask) / (obj_mask.sum() + 1e-16) 213 | 214 | self.metrics = { 215 | "loss": to_cpu(total_loss).item(), 216 | "x": to_cpu(loss_x).item(), 217 | "y": to_cpu(loss_y).item(), 218 | "w": to_cpu(loss_w).item(), 219 | "h": to_cpu(loss_h).item(), 220 | "conf": to_cpu(loss_conf).item(), 221 | "cls": to_cpu(loss_cls).item(), 222 | "cls_acc": to_cpu(cls_acc).item(), 223 | "recall50": to_cpu(recall50).item(), 224 | "recall75": to_cpu(recall75).item(), 225 | "precision": to_cpu(precision).item(), 226 | "conf_obj": to_cpu(conf_obj).item(), 227 | "conf_noobj": to_cpu(conf_noobj).item(), 228 | "grid_size": grid_size, 229 | } 230 | 231 | return output, total_loss 232 | 233 | 234 | class Darknet(nn.Module): 235 | """YOLOv3 object detection model""" 236 | 237 | def __init__(self, config_path, img_size=416): 238 | super(Darknet, self).__init__() 239 | self.module_defs = parse_model_config(config_path) 240 | self.hyperparams, self.module_list = create_modules(self.module_defs) 241 | self.yolo_layers = [layer[0] for layer in self.module_list if hasattr(layer[0], "metrics")] 242 | self.img_size = img_size 243 | self.seen = 0 244 | self.header_info = np.array([0, 0, 0, self.seen, 0], dtype=np.int32) 245 | 246 | def forward(self, x, targets=None): 247 | img_dim = x.shape[2] 248 | loss = 0 249 | layer_outputs, yolo_outputs = [], [] 250 | for i, (module_def, module) in enumerate(zip(self.module_defs, self.module_list)): 251 | if module_def["type"] in ["convolutional", "upsample", "maxpool"]: 252 | x = module(x) 253 | elif module_def["type"] == "route": 254 | x = torch.cat([layer_outputs[int(layer_i)] for layer_i in module_def["layers"].split(",")], 1) 255 | elif module_def["type"] == "shortcut": 256 | layer_i = int(module_def["from"]) 257 | x = layer_outputs[-1] + layer_outputs[layer_i] 258 | elif module_def["type"] == "yolo": 259 | x, layer_loss = module[0](x, targets, img_dim) 260 | loss += layer_loss 261 | yolo_outputs.append(x) 262 | layer_outputs.append(x) 263 | yolo_outputs = to_cpu(torch.cat(yolo_outputs, 1)) 264 | return yolo_outputs if targets is None else (loss, yolo_outputs) 265 | 266 | def load_darknet_weights(self, weights_path): 267 | """Parses and loads the weights stored in 'weights_path'""" 268 | 269 | # Open the weights file 270 | with open(weights_path, "rb") as f: 271 | header = np.fromfile(f, dtype=np.int32, count=5) # First five are header values 272 | self.header_info = header # Needed to write header when saving weights 273 | self.seen = header[3] # number of images seen during training 274 | weights = np.fromfile(f, dtype=np.float32) # The rest are weights 275 | 276 | # Establish cutoff for loading backbone weights 277 | cutoff = None 278 | if "darknet53.conv.74" in weights_path: 279 | cutoff = 75 280 | 281 | ptr = 0 282 | for i, (module_def, module) in enumerate(zip(self.module_defs, self.module_list)): 283 | if i == cutoff: 284 | break 285 | if module_def["type"] == "convolutional": 286 | conv_layer = module[0] 287 | if module_def["batch_normalize"]: 288 | # Load BN bias, weights, running mean and running variance 289 | bn_layer = module[1] 290 | num_b = bn_layer.bias.numel() # Number of biases 291 | # Bias 292 | bn_b = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.bias) 293 | bn_layer.bias.data.copy_(bn_b) 294 | ptr += num_b 295 | # Weight 296 | bn_w = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.weight) 297 | bn_layer.weight.data.copy_(bn_w) 298 | ptr += num_b 299 | # Running Mean 300 | bn_rm = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.running_mean) 301 | bn_layer.running_mean.data.copy_(bn_rm) 302 | ptr += num_b 303 | # Running Var 304 | bn_rv = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.running_var) 305 | bn_layer.running_var.data.copy_(bn_rv) 306 | ptr += num_b 307 | else: 308 | # Load conv. bias 309 | num_b = conv_layer.bias.numel() 310 | conv_b = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(conv_layer.bias) 311 | conv_layer.bias.data.copy_(conv_b) 312 | ptr += num_b 313 | # Load conv. weights 314 | num_w = conv_layer.weight.numel() 315 | conv_w = torch.from_numpy(weights[ptr : ptr + num_w]).view_as(conv_layer.weight) 316 | conv_layer.weight.data.copy_(conv_w) 317 | ptr += num_w 318 | 319 | def save_darknet_weights(self, path, cutoff=-1): 320 | """ 321 | @:param path - path of the new weights file 322 | @:param cutoff - save layers between 0 and cutoff (cutoff = -1 -> all are saved) 323 | """ 324 | fp = open(path, "wb") 325 | self.header_info[3] = self.seen 326 | self.header_info.tofile(fp) 327 | 328 | # Iterate through layers 329 | for i, (module_def, module) in enumerate(zip(self.module_defs[:cutoff], self.module_list[:cutoff])): 330 | if module_def["type"] == "convolutional": 331 | conv_layer = module[0] 332 | # If batch norm, load bn first 333 | if module_def["batch_normalize"]: 334 | bn_layer = module[1] 335 | bn_layer.bias.data.cpu().numpy().tofile(fp) 336 | bn_layer.weight.data.cpu().numpy().tofile(fp) 337 | bn_layer.running_mean.data.cpu().numpy().tofile(fp) 338 | bn_layer.running_var.data.cpu().numpy().tofile(fp) 339 | # Load conv bias 340 | else: 341 | conv_layer.bias.data.cpu().numpy().tofile(fp) 342 | # Load conv weights 343 | conv_layer.weight.data.cpu().numpy().tofile(fp) 344 | 345 | fp.close() 346 | -------------------------------------------------------------------------------- /my_utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/edwardning/PyTorch-YOLOv3-ModelArts/878bdc232da0691939d92806927ea62cc15cb282/my_utils/__init__.py -------------------------------------------------------------------------------- /my_utils/augmentations.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn.functional as F 3 | import numpy as np 4 | 5 | 6 | def horisontal_flip(images, targets): 7 | images = torch.flip(images, [-1]) 8 | targets[:, 2] = 1 - targets[:, 2] 9 | return images, targets 10 | -------------------------------------------------------------------------------- /my_utils/datasets.py: -------------------------------------------------------------------------------- 1 | import glob 2 | import random 3 | import os 4 | import sys 5 | import numpy as np 6 | from PIL import Image 7 | import torch 8 | import torch.nn.functional as F 9 | 10 | from my_utils.augmentations import horisontal_flip 11 | from torch.utils.data import Dataset 12 | import torchvision.transforms as transforms 13 | 14 | 15 | def pad_to_square(img, pad_value): 16 | c, h, w = img.shape 17 | dim_diff = np.abs(h - w) 18 | # (upper / left) padding and (lower / right) padding 19 | pad1, pad2 = dim_diff // 2, dim_diff - dim_diff // 2 20 | # Determine padding 21 | pad = (0, 0, pad1, pad2) if h <= w else (pad1, pad2, 0, 0) 22 | # Add padding 23 | img = F.pad(img, pad, "constant", value=pad_value) 24 | 25 | return img, pad 26 | 27 | 28 | def resize(image, size): 29 | image = F.interpolate(image.unsqueeze(0), size=size, mode="nearest").squeeze(0) 30 | return image 31 | 32 | 33 | def random_resize(images, min_size=288, max_size=448): 34 | new_size = random.sample(list(range(min_size, max_size + 1, 32)), 1)[0] 35 | images = F.interpolate(images, size=new_size, mode="nearest") 36 | return images 37 | 38 | 39 | class ImageFolder(Dataset): 40 | def __init__(self, folder_path, img_size=416): 41 | self.files = sorted(glob.glob("%s/*.*" % folder_path)) 42 | self.img_size = img_size 43 | 44 | def __getitem__(self, index): 45 | img_path = self.files[index % len(self.files)] 46 | # Extract image as PyTorch tensor 47 | img = transforms.ToTensor()(Image.open(img_path)) 48 | # Pad to square resolution 49 | img, _ = pad_to_square(img, 0) 50 | # Resize 51 | img = resize(img, self.img_size) 52 | 53 | return img_path, img 54 | 55 | def __len__(self): 56 | return len(self.files) 57 | 58 | 59 | class ListDataset(Dataset): 60 | def __init__(self, list_path, img_size=416, augment=True, multiscale=True, normalized_labels=True): 61 | with open(list_path, "r") as file: 62 | self.img_files = file.readlines() 63 | 64 | self.label_files = [ 65 | path.replace("images", "labels").replace(".png", ".txt").replace(".jpg", ".txt") 66 | for path in self.img_files 67 | ] 68 | self.img_size = img_size 69 | self.max_objects = 100 70 | self.augment = augment 71 | self.multiscale = multiscale 72 | self.normalized_labels = normalized_labels 73 | self.min_size = self.img_size - 3 * 32 74 | self.max_size = self.img_size + 3 * 32 75 | self.batch_count = 0 76 | 77 | def __getitem__(self, index): 78 | 79 | # --------- 80 | # Image 81 | # --------- 82 | 83 | img_path = self.img_files[index % len(self.img_files)].rstrip() 84 | 85 | # Extract image as PyTorch tensor 86 | img = transforms.ToTensor()(Image.open(img_path).convert('RGB')) 87 | 88 | # Handle images with less than three channels 89 | if len(img.shape) != 3: 90 | img = img.unsqueeze(0) 91 | img = img.expand((3, img.shape[1:])) 92 | 93 | _, h, w = img.shape 94 | h_factor, w_factor = (h, w) if self.normalized_labels else (1, 1) 95 | # Pad to square resolution 96 | img, pad = pad_to_square(img, 0) 97 | _, padded_h, padded_w = img.shape 98 | 99 | # --------- 100 | # Label 101 | # --------- 102 | 103 | label_path = self.label_files[index % len(self.img_files)].rstrip() 104 | 105 | targets = None 106 | if os.path.exists(label_path): 107 | boxes = torch.from_numpy(np.loadtxt(label_path).reshape(-1, 5)) 108 | # Extract coordinates for unpadded + unscaled image 109 | x1 = w_factor * (boxes[:, 1] - boxes[:, 3] / 2) 110 | y1 = h_factor * (boxes[:, 2] - boxes[:, 4] / 2) 111 | x2 = w_factor * (boxes[:, 1] + boxes[:, 3] / 2) 112 | y2 = h_factor * (boxes[:, 2] + boxes[:, 4] / 2) 113 | # Adjust for added padding 114 | x1 += pad[0] 115 | y1 += pad[2] 116 | x2 += pad[1] 117 | y2 += pad[3] 118 | # Returns (x, y, w, h) 119 | boxes[:, 1] = ((x1 + x2) / 2) / padded_w 120 | boxes[:, 2] = ((y1 + y2) / 2) / padded_h 121 | boxes[:, 3] *= w_factor / padded_w 122 | boxes[:, 4] *= h_factor / padded_h 123 | 124 | targets = torch.zeros((len(boxes), 6)) 125 | targets[:, 1:] = boxes 126 | 127 | # Apply augmentations 128 | if self.augment: 129 | if np.random.random() < 0.5: 130 | img, targets = horisontal_flip(img, targets) 131 | 132 | return img_path, img, targets 133 | 134 | def collate_fn(self, batch): 135 | paths, imgs, targets = list(zip(*batch)) 136 | # Remove empty placeholder targets 137 | targets = [boxes for boxes in targets if boxes is not None] 138 | # Add sample index to targets 139 | for i, boxes in enumerate(targets): 140 | boxes[:, 0] = i 141 | targets = torch.cat(targets, 0) 142 | # Selects new image size every tenth batch 143 | if self.multiscale and self.batch_count % 10 == 0: 144 | self.img_size = random.choice(range(self.min_size, self.max_size + 1, 32)) 145 | # Resize images to input shape 146 | imgs = torch.stack([resize(img, self.img_size) for img in imgs]) 147 | self.batch_count += 1 148 | return paths, imgs, targets 149 | 150 | def __len__(self): 151 | return len(self.img_files) 152 | -------------------------------------------------------------------------------- /my_utils/parse_config.py: -------------------------------------------------------------------------------- 1 | 2 | 3 | def parse_model_config(path): 4 | """Parses the yolo-v3 layer configuration file and returns module definitions""" 5 | file = open(path, 'r') 6 | lines = file.read().split('\n') 7 | lines = [x for x in lines if x and not x.startswith('#')] 8 | lines = [x.rstrip().lstrip() for x in lines] # get rid of fringe whitespaces 9 | module_defs = [] 10 | for line in lines: 11 | if line.startswith('['): # This marks the start of a new block 12 | module_defs.append({}) 13 | module_defs[-1]['type'] = line[1:-1].rstrip() 14 | if module_defs[-1]['type'] == 'convolutional': 15 | module_defs[-1]['batch_normalize'] = 0 16 | else: 17 | key, value = line.split("=") 18 | value = value.strip() 19 | module_defs[-1][key.rstrip()] = value.strip() 20 | 21 | return module_defs 22 | 23 | def parse_data_config(path): 24 | """Parses the data configuration file""" 25 | options = dict() 26 | options['gpus'] = '0,1,2,3' 27 | options['num_workers'] = '10' 28 | with open(path, 'r') as fp: 29 | lines = fp.readlines() 30 | for line in lines: 31 | line = line.strip() 32 | if line == '' or line.startswith('#'): 33 | continue 34 | key, value = line.split('=') 35 | options[key.strip()] = value.strip() 36 | return options 37 | -------------------------------------------------------------------------------- /my_utils/prepare_datasets.py: -------------------------------------------------------------------------------- 1 | # 运行成功后会生成如下目录结构的文件夹： 2 | # trainval/ 3 | # -images 4 | # -0001.jpg 5 | # -0002.jpg 6 | # -0003.jpg 7 | # -labels 8 | # -0001.txt 9 | # -0002.txt 10 | # -0003.txt 11 | # 将trainval文件夹打包并命名为trainval.zip, 上传到OBS中以备使用。 12 | import os 13 | import codecs 14 | import xml.etree.ElementTree as ET 15 | from tqdm import tqdm 16 | import shutil 17 | import argparse 18 | 19 | 20 | def get_classes(classes_path): 21 | '''loads the classes''' 22 | with codecs.open(classes_path, 'r', 'utf-8') as f: 23 | class_names = f.readlines() 24 | class_names = [c.strip() for c in class_names] 25 | return class_names 26 | 27 | 28 | def creat_label_txt(soure_datasets, new_datasets): 29 | annotations = os.path.join(soure_datasets, 'VOC2007\Annotations') 30 | txt_path = os.path.join(new_datasets, 'labels') 31 | class_names = get_classes(os.path.join(soure_datasets, 'train_classes.txt')) 32 | 33 | xmls = os.listdir(annotations) 34 | for xml in tqdm(xmls): 35 | txt_anno_path = os.path.join(txt_path, xml.replace('xml', 'txt')) 36 | xml = os.path.join(annotations, xml) 37 | tree = ET.parse(xml) 38 | root = tree.getroot() 39 | 40 | size = root.find('size') 41 | w = int(size.find('width').text) 42 | h = int(size.find('height').text) 43 | line = '' 44 | for obj in root.iter('object'): 45 | cls = obj.find('name').text 46 | if cls not in class_names: 47 | print('name error', xml) 48 | continue 49 | cls_id = class_names.index(cls) 50 | xmlbox = obj.find('bndbox') 51 | box = [int(xmlbox.find('xmin').text), int(xmlbox.find('ymin').text), 52 | int(xmlbox.find('xmax').text), int(xmlbox.find('ymax').text)] 53 | width = round((box[2] - box[0]) / w, 6) 54 | height = round((box[3] - box[1]) / h, 6) 55 | x_center = round(((box[2] + box[0]) / 2) / w, 6) 56 | y_center = round(((box[3] + box[1]) / 2) / h, 6) 57 | line = line + str(cls_id) + ' ' + ' '.join(str(v) for v in [x_center, y_center, width, height])+'\n' 58 | if box[2] > w or box[3] > h: 59 | print('Image with annotation error:', xml) 60 | if box[0] < 0 or box[1] < 0: 61 | print('Image with annotation error:', xml) 62 | with open(txt_anno_path, 'w') as f: 63 | f.writelines(line) 64 | 65 | 66 | def creat_new_datasets(source_datasets, new_datasets): 67 | if not os.path.exists(source_datasets): 68 | print('could find source datasets, please make sure if it is exist') 69 | return 70 | 71 | if new_datasets.endswith('trainval'): 72 | if not os.path.exists(new_datasets): 73 | os.makedirs(new_datasets) 74 | os.makedirs(new_datasets + '\labels') 75 | print('copying images......') 76 | shutil.copytree(source_datasets + '\VOC2007\JPEGImages', new_datasets + '\images') 77 | else: 78 | print('最后一级目录必须为trainval,且为空文件夹') 79 | return 80 | print('creating txt labels:') 81 | creat_label_txt(source_datasets, new_datasets) 82 | return 83 | 84 | 85 | if __name__ == "__main__": 86 | parser = argparse.ArgumentParser() 87 | parser.add_argument("--soure_datasets", "-sd", type=str, help="SODiC官方原始数据集解压后目录") 88 | parser.add_argument("--new_datasets", "-nd", type=str, help="新数据集路径，以trainval结尾且为空文件夹") 89 | opt = parser.parse_args() 90 | # creat_new_datasets(opt.soure_datasets, opt.new_datasets) 91 | 92 | soure_datasets = r'D:\trainval' 93 | new_datasets = r'D:\SODiC\trainval' 94 | creat_new_datasets(soure_datasets, new_datasets) 95 | -------------------------------------------------------------------------------- /my_utils/utils.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | import os 3 | import codecs 4 | import math 5 | import time 6 | import tqdm 7 | import torch 8 | import torch.nn as nn 9 | import torch.nn.functional as F 10 | from torch.autograd import Variable 11 | import numpy as np 12 | 13 | 14 | def to_cpu(tensor): 15 | return tensor.detach().cpu() 16 | 17 | 18 | def load_classes(classes_path): 19 | """ 20 | Loads class labels at 'path' 21 | """ 22 | classes_path = os.path.expanduser(classes_path) 23 | with codecs.open(classes_path, 'r', 'utf-8') as f: 24 | class_names = f.readlines() 25 | class_names = [c.strip() for c in class_names] 26 | return class_names 27 | 28 | 29 | def weights_init_normal(m): 30 | classname = m.__class__.__name__ 31 | if classname.find("Conv") != -1: 32 | torch.nn.init.normal_(m.weight.data, 0.0, 0.02) 33 | elif classname.find("BatchNorm2d") != -1: 34 | torch.nn.init.normal_(m.weight.data, 1.0, 0.02) 35 | torch.nn.init.constant_(m.bias.data, 0.0) 36 | 37 | 38 | def rescale_boxes(boxes, current_dim, original_shape): 39 | """ Rescales bounding boxes to the original shape """ 40 | orig_h, orig_w = original_shape 41 | # The amount of padding that was added 42 | pad_x = max(orig_h - orig_w, 0) * (current_dim / max(original_shape)) 43 | pad_y = max(orig_w - orig_h, 0) * (current_dim / max(original_shape)) 44 | # Image height and width after padding is removed 45 | unpad_h = current_dim - pad_y 46 | unpad_w = current_dim - pad_x 47 | # Rescale bounding boxes to dimension of original image 48 | boxes[:, 0] = ((boxes[:, 0] - pad_x // 2) / unpad_w) * orig_w 49 | boxes[:, 1] = ((boxes[:, 1] - pad_y // 2) / unpad_h) * orig_h 50 | boxes[:, 2] = ((boxes[:, 2] - pad_x // 2) / unpad_w) * orig_w 51 | boxes[:, 3] = ((boxes[:, 3] - pad_y // 2) / unpad_h) * orig_h 52 | return boxes 53 | 54 | 55 | def xywh2xyxy(x): 56 | y = x.new(x.shape) 57 | y[..., 0] = x[..., 0] - x[..., 2] / 2 58 | y[..., 1] = x[..., 1] - x[..., 3] / 2 59 | y[..., 2] = x[..., 0] + x[..., 2] / 2 60 | y[..., 3] = x[..., 1] + x[..., 3] / 2 61 | return y 62 | 63 | 64 | def ap_per_class(tp, conf, pred_cls, target_cls): 65 | """ Compute the average precision, given the recall and precision curves. 66 | Source: https://github.com/rafaelpadilla/Object-Detection-Metrics. 67 | # Arguments 68 | tp: True positives (list). 69 | conf: Objectness value from 0-1 (list). 70 | pred_cls: Predicted object classes (list). 71 | target_cls: True object classes (list). 72 | # Returns 73 | The average precision as computed in py-faster-rcnn. 74 | """ 75 | 76 | # Sort by objectness 77 | i = np.argsort(-conf) 78 | tp, conf, pred_cls = tp[i], conf[i], pred_cls[i] 79 | 80 | # Find unique classes 81 | unique_classes = np.unique(target_cls) 82 | 83 | # Create Precision-Recall curve and compute AP for each class 84 | ap, p, r = [], [], [] 85 | for c in tqdm.tqdm(unique_classes, desc="Computing AP"): 86 | i = pred_cls == c 87 | n_gt = (target_cls == c).sum() # Number of ground truth objects 88 | n_p = i.sum() # Number of predicted objects 89 | 90 | if n_p == 0 and n_gt == 0: 91 | continue 92 | elif n_p == 0 or n_gt == 0: 93 | ap.append(0) 94 | r.append(0) 95 | p.append(0) 96 | else: 97 | # Accumulate FPs and TPs 98 | fpc = (1 - tp[i]).cumsum() 99 | tpc = (tp[i]).cumsum() 100 | 101 | # Recall 102 | recall_curve = tpc / (n_gt + 1e-16) 103 | r.append(recall_curve[-1]) 104 | 105 | # Precision 106 | precision_curve = tpc / (tpc + fpc) 107 | p.append(precision_curve[-1]) 108 | 109 | # AP from recall-precision curve 110 | ap.append(compute_ap(recall_curve, precision_curve)) 111 | 112 | # Compute F1 score (harmonic mean of precision and recall) 113 | p, r, ap = np.array(p), np.array(r), np.array(ap) 114 | f1 = 2 * p * r / (p + r + 1e-16) 115 | 116 | return p, r, ap, f1, unique_classes.astype("int32") 117 | 118 | 119 | def compute_ap(recall, precision): 120 | """ Compute the average precision, given the recall and precision curves. 121 | Code originally from https://github.com/rbgirshick/py-faster-rcnn. 122 | 123 | # Arguments 124 | recall: The recall curve (list). 125 | precision: The precision curve (list). 126 | # Returns 127 | The average precision as computed in py-faster-rcnn. 128 | """ 129 | # correct AP calculation 130 | # first append sentinel values at the end 131 | mrec = np.concatenate(([0.0], recall, [1.0])) 132 | mpre = np.concatenate(([0.0], precision, [0.0])) 133 | 134 | # compute the precision envelope 135 | for i in range(mpre.size - 1, 0, -1): 136 | mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i]) 137 | 138 | # to calculate area under PR curve, look for points 139 | # where X axis (recall) changes value 140 | i = np.where(mrec[1:] != mrec[:-1])[0] 141 | 142 | # and sum (\Delta recall) * prec 143 | ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1]) 144 | return ap 145 | 146 | 147 | def get_batch_statistics(outputs, targets, iou_threshold): 148 | """ Compute true positives, predicted scores and predicted labels per sample """ 149 | batch_metrics = [] 150 | for sample_i in range(len(outputs)): 151 | 152 | if outputs[sample_i] is None: 153 | continue 154 | 155 | output = outputs[sample_i] 156 | pred_boxes = output[:, :4] 157 | pred_scores = output[:, 4] 158 | pred_labels = output[:, -1] 159 | 160 | true_positives = np.zeros(pred_boxes.shape[0]) 161 | 162 | annotations = targets[targets[:, 0] == sample_i][:, 1:] 163 | target_labels = annotations[:, 0] if len(annotations) else [] 164 | if len(annotations): 165 | detected_boxes = [] 166 | target_boxes = annotations[:, 1:] 167 | 168 | for pred_i, (pred_box, pred_label) in enumerate(zip(pred_boxes, pred_labels)): 169 | 170 | # If targets are found break 171 | if len(detected_boxes) == len(annotations): 172 | break 173 | 174 | # Ignore if label is not one of the target labels 175 | if pred_label not in target_labels: 176 | continue 177 | 178 | iou, box_index = bbox_iou(pred_box.unsqueeze(0), target_boxes).max(0) 179 | if iou >= iou_threshold and box_index not in detected_boxes: 180 | true_positives[pred_i] = 1 181 | detected_boxes += [box_index] 182 | batch_metrics.append([true_positives, pred_scores, pred_labels]) 183 | return batch_metrics 184 | 185 | 186 | def bbox_wh_iou(wh1, wh2): 187 | wh2 = wh2.t() 188 | w1, h1 = wh1[0], wh1[1] 189 | w2, h2 = wh2[0], wh2[1] 190 | inter_area = torch.min(w1, w2) * torch.min(h1, h2) 191 | union_area = (w1 * h1 + 1e-16) + w2 * h2 - inter_area 192 | return inter_area / union_area 193 | 194 | 195 | def bbox_iou(box1, box2, x1y1x2y2=True): 196 | """ 197 | Returns the IoU of two bounding boxes 198 | """ 199 | if not x1y1x2y2: 200 | # Transform from center and width to exact coordinates 201 | b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2 202 | b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2 203 | b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2 204 | b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2 205 | else: 206 | # Get the coordinates of bounding boxes 207 | b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3] 208 | b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3] 209 | 210 | # get the corrdinates of the intersection rectangle 211 | inter_rect_x1 = torch.max(b1_x1, b2_x1) 212 | inter_rect_y1 = torch.max(b1_y1, b2_y1) 213 | inter_rect_x2 = torch.min(b1_x2, b2_x2) 214 | inter_rect_y2 = torch.min(b1_y2, b2_y2) 215 | # Intersection area 216 | inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1, min=0) * torch.clamp( 217 | inter_rect_y2 - inter_rect_y1 + 1, min=0 218 | ) 219 | # Union Area 220 | b1_area = (b1_x2 - b1_x1 + 1) * (b1_y2 - b1_y1 + 1) 221 | b2_area = (b2_x2 - b2_x1 + 1) * (b2_y2 - b2_y1 + 1) 222 | 223 | iou = inter_area / (b1_area + b2_area - inter_area + 1e-16) 224 | 225 | return iou 226 | 227 | 228 | def non_max_suppression(prediction, conf_thres=0.5, nms_thres=0.4): 229 | """ 230 | Removes detections with lower object confidence score than 'conf_thres' and performs 231 | Non-Maximum Suppression to further filter detections. 232 | Returns detections with shape: 233 | (x1, y1, x2, y2, object_conf, class_score, class_pred) 234 | """ 235 | 236 | # From (center x, center y, width, height) to (x1, y1, x2, y2) 237 | prediction[..., :4] = xywh2xyxy(prediction[..., :4]) 238 | output = [None for _ in range(len(prediction))] 239 | for image_i, image_pred in enumerate(prediction): 240 | # Filter out confidence scores below threshold 241 | image_pred = image_pred[image_pred[:, 4] >= conf_thres] 242 | # If none are remaining => process next image 243 | if not image_pred.size(0): 244 | continue 245 | # Object confidence times class confidence 246 | score = image_pred[:, 4] * image_pred[:, 5:].max(1)[0] 247 | # Sort by it 248 | image_pred = image_pred[(-score).argsort()] 249 | class_confs, class_preds = image_pred[:, 5:].max(1, keepdim=True) 250 | detections = torch.cat((image_pred[:, :5], class_confs.float(), class_preds.float()), 1) 251 | # Perform non-maximum suppression 252 | keep_boxes = [] 253 | while detections.size(0): 254 | large_overlap = bbox_iou(detections[0, :4].unsqueeze(0), detections[:, :4]) > nms_thres 255 | label_match = detections[0, -1] == detections[:, -1] 256 | # Indices of boxes with lower confidence scores, large IOUs and matching labels 257 | invalid = large_overlap & label_match 258 | weights = detections[invalid, 4:5] 259 | # Merge overlapping bboxes by order of confidence 260 | detections[0, :4] = (weights * detections[invalid, :4]).sum(0) / weights.sum() 261 | keep_boxes += [detections[0]] 262 | detections = detections[~invalid] 263 | if keep_boxes: 264 | output[image_i] = torch.stack(keep_boxes) 265 | 266 | return output 267 | 268 | 269 | def build_targets(pred_boxes, pred_cls, target, anchors, ignore_thres): 270 | 271 | ByteTensor = torch.cuda.ByteTensor if pred_boxes.is_cuda else torch.ByteTensor 272 | FloatTensor = torch.cuda.FloatTensor if pred_boxes.is_cuda else torch.FloatTensor 273 | 274 | nB = pred_boxes.size(0) 275 | nA = pred_boxes.size(1) 276 | nC = pred_cls.size(-1) 277 | nG = pred_boxes.size(2) 278 | 279 | # Output tensors 280 | obj_mask = ByteTensor(nB, nA, nG, nG).fill_(0) 281 | noobj_mask = ByteTensor(nB, nA, nG, nG).fill_(1) 282 | class_mask = FloatTensor(nB, nA, nG, nG).fill_(0) 283 | iou_scores = FloatTensor(nB, nA, nG, nG).fill_(0) 284 | tx = FloatTensor(nB, nA, nG, nG).fill_(0) 285 | ty = FloatTensor(nB, nA, nG, nG).fill_(0) 286 | tw = FloatTensor(nB, nA, nG, nG).fill_(0) 287 | th = FloatTensor(nB, nA, nG, nG).fill_(0) 288 | tcls = FloatTensor(nB, nA, nG, nG, nC).fill_(0) 289 | 290 | # Convert to position relative to box 291 | target_boxes = target[:, 2:6] * nG 292 | gxy = target_boxes[:, :2] 293 | gwh = target_boxes[:, 2:] 294 | # Get anchors with best iou 295 | ious = torch.stack([bbox_wh_iou(anchor, gwh) for anchor in anchors]) 296 | best_ious, best_n = ious.max(0) 297 | # Separate target values 298 | b, target_labels = target[:, :2].long().t() 299 | gx, gy = gxy.t() 300 | gw, gh = gwh.t() 301 | gi, gj = gxy.long().t() 302 | # Set masks 303 | obj_mask[b, best_n, gj, gi] = 1 304 | noobj_mask[b, best_n, gj, gi] = 0 305 | 306 | # Set noobj mask to zero where iou exceeds ignore threshold 307 | for i, anchor_ious in enumerate(ious.t()): 308 | noobj_mask[b[i], anchor_ious > ignore_thres, gj[i], gi[i]] = 0 309 | 310 | # Coordinates 311 | tx[b, best_n, gj, gi] = gx - gx.floor() 312 | ty[b, best_n, gj, gi] = gy - gy.floor() 313 | # Width and height 314 | tw[b, best_n, gj, gi] = torch.log(gw / anchors[best_n][:, 0] + 1e-16) 315 | th[b, best_n, gj, gi] = torch.log(gh / anchors[best_n][:, 1] + 1e-16) 316 | # One-hot encoding of label 317 | tcls[b, best_n, gj, gi, target_labels] = 1 318 | # Compute label correctness and iou at best anchor 319 | class_mask[b, best_n, gj, gi] = (pred_cls[b, best_n, gj, gi].argmax(-1) == target_labels).float() 320 | iou_scores[b, best_n, gj, gi] = bbox_iou(pred_boxes[b, best_n, gj, gi], target_boxes, x1y1x2y2=False) 321 | 322 | tconf = obj_mask.float() 323 | return iou_scores, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tcls, tconf 324 | -------------------------------------------------------------------------------- /pip-requirements.txt: -------------------------------------------------------------------------------- 1 | terminaltables==3.1.0 -------------------------------------------------------------------------------- /test.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | 3 | from models import * 4 | from my_utils.utils import * 5 | from my_utils.datasets import * 6 | from my_utils.parse_config import * 7 | 8 | import os 9 | import sys 10 | import time 11 | import datetime 12 | import argparse 13 | import tqdm 14 | 15 | import torch 16 | from torch.utils.data import DataLoader 17 | from torchvision import datasets 18 | from torchvision import transforms 19 | from torch.autograd import Variable 20 | import torch.optim as optim 21 | 22 | 23 | def evaluate(model, path, iou_thres, conf_thres, nms_thres, img_size, batch_size): 24 | model.eval() 25 | 26 | # Get dataloader 27 | dataset = ListDataset(path, img_size=img_size, augment=False, multiscale=False) 28 | dataloader = torch.utils.data.DataLoader( 29 | dataset, batch_size=batch_size, shuffle=False, num_workers=1, collate_fn=dataset.collate_fn 30 | ) 31 | 32 | Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor 33 | 34 | labels = [] 35 | sample_metrics = [] # List of tuples (TP, confs, pred) 36 | for batch_i, (_, imgs, targets) in enumerate(tqdm.tqdm(dataloader, desc="Detecting objects")): 37 | 38 | # Extract labels 39 | labels += targets[:, 1].tolist() 40 | # Rescale target 41 | targets[:, 2:] = xywh2xyxy(targets[:, 2:]) 42 | targets[:, 2:] *= img_size 43 | 44 | imgs = Variable(imgs.type(Tensor), requires_grad=False) 45 | 46 | with torch.no_grad(): 47 | outputs = model(imgs) 48 | outputs = non_max_suppression(outputs, conf_thres=conf_thres, nms_thres=nms_thres) 49 | 50 | sample_metrics += get_batch_statistics(outputs, targets, iou_threshold=iou_thres) 51 | 52 | # Concatenate sample statistics 53 | if len(sample_metrics) > 0: 54 | true_positives, pred_scores, pred_labels = [np.concatenate(x, 0) for x in list(zip(*sample_metrics))] 55 | precision, recall, AP, f1, ap_class = ap_per_class(true_positives, pred_scores, pred_labels, labels) 56 | assert len(ap_class) == len(AP) 57 | else: 58 | ap_class = np.unique(labels).astype("int32") 59 | precision = np.zeros(len(ap_class), dtype=np.float) 60 | recall = precision 61 | AP = precision 62 | f1 = 2 * precision * recall / (precision + recall + 1e-16) 63 | # true_positives, pred_scores, pred_labels = [np.concatenate(x, 0) for x in list(zip(*sample_metrics))] 64 | # precision, recall, AP, f1, ap_class = ap_per_class(true_positives, pred_scores, pred_labels, labels) 65 | 66 | return precision, recall, AP, f1, ap_class 67 | 68 | 69 | if __name__ == "__main__": 70 | parser = argparse.ArgumentParser() 71 | parser.add_argument("--batch_size", type=int, default=8, help="size of each image batch") 72 | parser.add_argument("--model_def", type=str, default="config/yolov3.cfg", help="path to model definition file") 73 | parser.add_argument("--data_config", type=str, default="config/coco.data", help="path to data config file") 74 | parser.add_argument("--weights_path", type=str, default="weights/yolov3.weights", help="path to weights file") 75 | parser.add_argument("--class_path", type=str, default="data/coco.names", help="path to class label file") 76 | parser.add_argument("--iou_thres", type=float, default=0.5, help="iou threshold required to qualify as detected") 77 | parser.add_argument("--conf_thres", type=float, default=0.001, help="object confidence threshold") 78 | parser.add_argument("--nms_thres", type=float, default=0.5, help="iou thresshold for non-maximum suppression") 79 | parser.add_argument("--n_cpu", type=int, default=8, help="number of cpu threads to use during batch generation") 80 | parser.add_argument("--img_size", type=int, default=416, help="size of each image dimension") 81 | opt = parser.parse_args() 82 | print(opt) 83 | 84 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 85 | 86 | data_config = parse_data_config(opt.data_config) 87 | valid_path = data_config["valid"] 88 | class_names = load_classes(data_config["names"]) 89 | 90 | # Initiate model 91 | model = Darknet(opt.model_def).to(device) 92 | if opt.weights_path.endswith(".weights"): 93 | # Load darknet weights 94 | model.load_darknet_weights(opt.weights_path) 95 | else: 96 | # Load checkpoint weights 97 | model.load_state_dict(torch.load(opt.weights_path)) 98 | 99 | print("Compute mAP...") 100 | 101 | precision, recall, AP, f1, ap_class = evaluate( 102 | model, 103 | path=valid_path, 104 | iou_thres=opt.iou_thres, 105 | conf_thres=opt.conf_thres, 106 | nms_thres=opt.nms_thres, 107 | img_size=opt.img_size, 108 | batch_size=8, 109 | ) 110 | 111 | print("Average Precisions:") 112 | for i, c in enumerate(ap_class): 113 | print(f"+ Class '{c}' ({class_names[c]}) - AP: {AP[i]}") 114 | 115 | print(f"mAP: {AP.mean()}") 116 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | 3 | from models import * 4 | from my_utils.utils import * 5 | from my_utils.datasets import * 6 | from my_utils.parse_config import * 7 | from test import evaluate 8 | 9 | from terminaltables import AsciiTable 10 | 11 | import os 12 | import sys 13 | import time 14 | import datetime 15 | import argparse 16 | 17 | import torch 18 | from torch.utils.data import DataLoader 19 | from torchvision import datasets 20 | from torchvision import transforms 21 | from torch.autograd import Variable 22 | 23 | try: 24 | import moxing as mox 25 | except: 26 | print('not use moxing') 27 | 28 | 29 | def prepare_data_on_modelarts(args): 30 | """ 31 | 将OBS上的数据拷贝到ModelArts中 32 | """ 33 | # 拷贝预训练参数文件 34 | 35 | # 默认使用ModelArts中的如下两个路径用于存储数据： 36 | # 0) /cache/model: 如果适用预训练模型，存储从OBS拷贝过来的预训练模型 37 | # 1）/cache/datasets: 存储从OBS拷贝过来的训练数据 38 | # 2）/cache/log: 存储训练日志和训练模型，并且在训练结束后，该目录下的内容会被全部拷贝到OBS 39 | if args.pretrained_weights: 40 | _, weights_name = os.path.split(args.pretrained_weights) 41 | mox.file.copy(args.pretrained_weights, os.path.join(args.local_data_root, 'model/'+weights_name)) 42 | args.pretrained_weights = os.path.join(args.local_data_root, 'model/'+weights_name) 43 | if not (args.data_url.startswith('s3://') or args.data_url.startswith('obs://')): 44 | args.data_local = args.data_url 45 | else: 46 | args.data_local = os.path.join(args.local_data_root, 'datasets/trainval') 47 | if not os.path.exists(args.data_local): 48 | data_dir = os.path.join(args.local_data_root, 'datasets') 49 | mox.file.copy_parallel(args.data_url, data_dir) 50 | os.system('cd %s;unzip trainval.zip' % data_dir) # 训练集已提前打包为trainval.zip 51 | if os.path.isdir(args.data_local): 52 | os.system('cd %s;rm trainval.zip' % data_dir) 53 | print('unzip trainval.zip success, args.data_local is', args.data_local) 54 | else: 55 | raise Exception('unzip trainval.zip Failed') 56 | else: 57 | print('args.data_local: %s is already exist, skip copy' % args.data_local) 58 | 59 | if not (args.train_url.startswith('s3://') or args.train_url.startswith('obs://')): 60 | args.train_local = args.train_url 61 | else: 62 | args.train_local = os.path.join(args.local_data_root, 'log/') 63 | if not os.path.exists(args.train_local): 64 | os.mkdir(args.train_local) 65 | 66 | return args 67 | 68 | 69 | def gen_model_dir(args, model_best_path): 70 | current_dir = os.path.dirname(__file__) 71 | mox.file.copy_parallel(os.path.join(current_dir, 'deploy_scripts'), 72 | os.path.join(args.train_url, 'model')) 73 | mox.file.copy_parallel(os.path.join(current_dir, 'my_utils'), 74 | os.path.join(args.train_url, 'model/my_utils')) 75 | mox.file.copy(os.path.join(current_dir, 'config/yolov3-44.cfg'), 76 | os.path.join(args.train_url, 'model/yolov3-44.cfg')) 77 | mox.file.copy(os.path.join(current_dir, 'config/train_classes.txt'), 78 | os.path.join(args.train_url, 'model/train_classes.txt')) 79 | mox.file.copy(os.path.join(current_dir, 'config/classify_rule.json'), 80 | os.path.join(args.train_url, 'model/classify_rule.json')) 81 | mox.file.copy(os.path.join(current_dir, 'models.py'), 82 | os.path.join(args.train_url, 'model/models.py')) 83 | mox.file.copy(model_best_path, 84 | os.path.join(args.train_url, 'model/models_best.pth')) 85 | print('gen_model_dir success, model dir is at', os.path.join(args.train_url, 'model')) 86 | 87 | 88 | def freeze_body(model, freeze_body): 89 | # input: freeze_body.type = int, .choose = 0, 1, 2 90 | # return: modified model.parameters() 91 | # notes: 92 | # 0: do not freeze any layers 93 | # 1: freeze Darknet53 only 94 | # 2: freeze all but three detection layers 95 | # three detection layers is [81, 93, 105], refer to https://blog.csdn.net/litt1e/article/details/88907542 96 | 97 | for name, value in model.named_parameters(): 98 | value.requires_grad = True 99 | 100 | if freeze_body == 0: 101 | print('using original model without any freeze body') 102 | elif freeze_body == 1: 103 | print('using fitting model with backbone(Darknet53) frozen') 104 | for name, value in model.named_parameters(): 105 | layers = int(name.split('.')[1]) 106 | if layers < 74: 107 | value.requires_grad = False 108 | elif freeze_body == 2: 109 | print('using fitting model with all but three detection layers frozen') 110 | for name, value in model.named_parameters(): 111 | layers = int(name.split('.')[1]) 112 | if layers not in [81, 93, 105]: 113 | value.requires_grad = False 114 | else: 115 | print('Type error for freeze_body. Thus no layer is frozen') 116 | 117 | new_params = filter(lambda p: p.requires_grad, model.parameters()) 118 | return new_params 119 | 120 | 121 | def train(model, dataloader, optimizer, epoch, opt, device): 122 | model.train() 123 | start_time = time.time() 124 | metrics = [ 125 | "grid_size", 126 | "loss", 127 | "x", 128 | "y", 129 | "w", 130 | "h", 131 | "conf", 132 | "cls", 133 | "cls_acc", 134 | "recall50", 135 | "recall75", 136 | "precision", 137 | "conf_obj", 138 | "conf_noobj", 139 | ] 140 | for batch_i, (_, imgs, targets) in enumerate(dataloader): 141 | batches_done = len(dataloader) * epoch + batch_i 142 | 143 | imgs = Variable(imgs.to(device)) 144 | targets = Variable(targets.to(device), requires_grad=False) 145 | 146 | loss, outputs = model(imgs, targets) 147 | loss.backward() 148 | 149 | if batches_done % opt.gradient_accumulations: 150 | # Accumulates gradient before each step 151 | optimizer.step() 152 | optimizer.zero_grad() 153 | 154 | # Log progress 155 | log_str = "\n---- [Epoch %d/%d, Batch %d/%d] ----\n" % (epoch, opt.max_epochs_2, batch_i, len(dataloader)) 156 | metric_table = [["Metrics", *[f"YOLO Layer {i}" for i in range(len(model.yolo_layers))]]] 157 | 158 | # Log metrics at each YOLO layer 159 | for i, metric in enumerate(metrics): 160 | formats = {m: "%.6f" for m in metrics} 161 | formats["grid_size"] = "%2d" 162 | formats["cls_acc"] = "%.2f%%" 163 | row_metrics = [formats[metric] % yolo.metrics.get(metric, 0) for yolo in model.yolo_layers] 164 | metric_table += [[metric, *row_metrics]] 165 | 166 | ''' 167 | # Tensorboard logging 168 | tensorboard_log = [] 169 | for j, yolo in enumerate(model.yolo_layers): 170 | for name, metric in yolo.metrics.items(): 171 | if name != "grid_size": 172 | tensorboard_log += [(f"{name}_{j+1}", metric)] 173 | tensorboard_log += [("loss", loss.item())] 174 | logger.list_of_scalars_summary(tensorboard_log, batches_done) 175 | ''' 176 | 177 | log_str += AsciiTable(metric_table).table 178 | log_str += f"\nTotal loss {loss.item()}" 179 | 180 | # Determine approximate time left for epoch 181 | epoch_batches_left = len(dataloader) - (batch_i + 1) 182 | time_left = datetime.timedelta(seconds=epoch_batches_left * (time.time() - start_time) / (batch_i + 1)) 183 | log_str += f"\n---- ETA {time_left}" 184 | 185 | print(log_str) 186 | 187 | model.seen += imgs.size(0) 188 | 189 | 190 | def valid(model, path, class_names, opt): 191 | print("\n---- Evaluating Model ----") 192 | # Evaluate the model on the validation set 193 | precision, recall, AP, f1, ap_class = evaluate( 194 | model, 195 | path=path, 196 | iou_thres=0.5, 197 | conf_thres=0.5, 198 | nms_thres=0.5, 199 | img_size=opt.img_size, 200 | batch_size=32, 201 | ) 202 | evaluation_metrics = [ 203 | ("val_precision", precision.mean()), 204 | ("val_recall", recall.mean()), 205 | ("val_mAP", AP.mean()), 206 | ("val_f1", f1.mean()), 207 | ] 208 | # logger.list_of_scalars_summary(evaluation_metrics, epoch) 209 | 210 | # Print class APs and mAP 211 | ap_table = [["Index", "Class name", "AP"]] 212 | for i, c in enumerate(ap_class): 213 | ap_table += [[c, class_names[c], "%.5f" % AP[i]]] 214 | print(AsciiTable(ap_table).table) 215 | print(f"---- mAP {AP.mean()}") 216 | return AP 217 | 218 | 219 | if __name__ == "__main__": 220 | parser = argparse.ArgumentParser() 221 | parser.add_argument('--max_epochs_1', default=5, type=int, help='number of total epochs to run in stage one') 222 | parser.add_argument('--max_epochs_2', default=5, type=int, help='number of total epochs to run in total') 223 | parser.add_argument("--freeze_body_1", type=int, default=2, help="frozen specific layers for stage one") 224 | parser.add_argument("--freeze_body_2", type=int, default=0, help="frozen specific layers for stage two") 225 | parser.add_argument("--lr_1", type=float, default=1e-3, help="initial learning rate for stage one") 226 | parser.add_argument("--lr_2", type=float, default=1e-5, help="initial learning rate for stage two") 227 | parser.add_argument("--batch_size", type=int, default=32, help="size of each image batch") 228 | parser.add_argument("--gradient_accumulations", type=int, default=2, help="number of gradient accums before step") 229 | parser.add_argument("--model_def", type=str, default="PyTorch-YOLOv3-ModelArts/config/yolov3-44.cfg", help="path to model definition file") 230 | parser.add_argument("--data_config", type=str, default="PyTorch-YOLOv3-ModelArts/config/custom.data", help="path to data config file") 231 | parser.add_argument("--pretrained_weights", type=str, help="if specified starts from checkpoint model") 232 | parser.add_argument("--n_cpu", type=int, default=8, help="number of cpu threads to use during batch generation") 233 | parser.add_argument("--img_size", type=int, default=416, help="size of each image dimension") 234 | parser.add_argument("--checkpoint_interval", type=int, default=1, help="interval between saving model weights, " 235 | "here set same to evaluation_interval") 236 | parser.add_argument("--evaluation_interval", type=int, default=1, help="interval evaluations on validation set") 237 | parser.add_argument("--compute_map", default=False, help="if True computes mAP every tenth batch") 238 | parser.add_argument("--multiscale_training", default=True, help="allow for multi-scale training") 239 | parser.add_argument('--local_data_root', default='/cache/', type=str, 240 | help='a directory used for transfer data between local path and OBS path') 241 | parser.add_argument('--data_url', required=True, type=str, help='the training and validation data path') 242 | parser.add_argument('--data_local', default='', type=str, help='the training and validation data path on local') 243 | parser.add_argument('--train_url', required=True, type=str, help='the path to save training outputs') 244 | parser.add_argument('--train_local', default='', type=str, help='the training output results on local') 245 | parser.add_argument('--init_method', default='', type=str, help='the training output results on local') 246 | 247 | opt = parser.parse_args() 248 | print(opt) 249 | opt = prepare_data_on_modelarts(opt) 250 | 251 | # logger = Logger("logs") 252 | 253 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 254 | 255 | # Get data configuration 256 | data_config = parse_data_config(opt.data_config) 257 | train_path = data_config["train"] 258 | valid_path = data_config["valid"] 259 | class_names = load_classes(data_config["names"]) 260 | 261 | # Initiate model 262 | model = Darknet(opt.model_def, opt.img_size).to(device) 263 | model.apply(weights_init_normal) 264 | 265 | # If specified we start from checkpoint 266 | if opt.pretrained_weights: 267 | if opt.pretrained_weights.endswith(".pth"): 268 | model.load_state_dict(torch.load(opt.pretrained_weights)) 269 | else: 270 | model.load_darknet_weights(opt.pretrained_weights) 271 | 272 | # Get dataloader 273 | dataset = ListDataset(train_path, img_size=opt.img_size, augment=True, multiscale=opt.multiscale_training) 274 | dataloader = torch.utils.data.DataLoader( 275 | dataset, 276 | batch_size=opt.batch_size, 277 | shuffle=True, 278 | num_workers=opt.n_cpu, 279 | pin_memory=True, 280 | collate_fn=dataset.collate_fn, 281 | ) 282 | 283 | # store the name of model with best mAP 284 | model_best = {'mAP': 0, 'name': ''} 285 | 286 | # first stage training to get a relatively stable model 287 | optimizer_1 = torch.optim.Adam(freeze_body(model, opt.freeze_body_1), lr=opt.lr_1) 288 | for epoch in range(opt.max_epochs_1): 289 | 290 | train(model, dataloader, optimizer_1, epoch, opt, device) 291 | 292 | if epoch % opt.evaluation_interval == 0: 293 | AP = valid(model, valid_path, class_names, opt) 294 | 295 | temp_model_name = f"ckpt_%d_%.2f.pth" % (epoch, 100 * AP.mean()) 296 | ckpt_name = os.path.join(opt.train_local, temp_model_name) 297 | torch.save(model.state_dict(), ckpt_name) 298 | mox.file.copy_parallel(ckpt_name, os.path.join(opt.train_url, temp_model_name)) 299 | 300 | if AP.mean() > model_best['mAP']: 301 | model_best['mAP'] = AP.mean() 302 | model_best['name'] = ckpt_name 303 | 304 | # second stage training to achieve higher mAP 305 | optimizer_2 = torch.optim.Adam(freeze_body(model, opt.freeze_body_2), lr=opt.lr_2) 306 | scheduler = torch.optim.lr_scheduler.StepLR(optimizer_2, step_size=10) 307 | for epoch in range(opt.max_epochs_1, opt.max_epochs_2): 308 | 309 | train(model, dataloader, optimizer_2, epoch, opt, device) 310 | 311 | if epoch % opt.evaluation_interval == 0: 312 | AP = valid(model, valid_path, class_names, opt) 313 | 314 | temp_model_name = f"ckpt_%d_%.2f.pth" % (epoch, 100 * AP.mean()) 315 | ckpt_name = os.path.join(opt.train_local, temp_model_name) 316 | torch.save(model.state_dict(), ckpt_name) 317 | mox.file.copy_parallel(ckpt_name, os.path.join(opt.train_url, temp_model_name)) 318 | 319 | if AP.mean() > model_best['mAP']: 320 | model_best['mAP'] = AP.mean() 321 | model_best['name'] = ckpt_name 322 | 323 | scheduler.step(epoch) 324 | 325 | print('The current learning rate is: ', scheduler.get_lr()[0]) 326 | 327 | gen_model_dir(opt, model_best['name']) 328 | -------------------------------------------------------------------------------- /weights/download_weights.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Download weights for vanilla YOLOv3 3 | wget -c https://pjreddie.com/media/files/yolov3.weights 4 | # # Download weights for tiny YOLOv3 5 | wget -c https://pjreddie.com/media/files/yolov3-tiny.weights 6 | # Download weights for backbone network 7 | wget -c https://pjreddie.com/media/files/darknet53.conv.74 8 | --------------------------------------------------------------------------------