├── .idea
├── PyTorch-YOLOv3-ModelArts.iml
├── misc.xml
└── modules.xml
├── README.md
├── config
├── classify_rule.json
├── create_custom_model.sh
├── custom.data
├── train.txt
├── train_classes.txt
├── valid.txt
├── yolov3-44.cfg
├── yolov3-tiny.cfg
└── yolov3.cfg
├── deploy_scripts
├── config.json
└── customize_service.py
├── detect.py
├── models.py
├── my_utils
├── __init__.py
├── augmentations.py
├── datasets.py
├── parse_config.py
├── prepare_datasets.py
└── utils.py
├── pip-requirements.txt
├── test.py
├── train.py
└── weights
└── download_weights.sh
/.idea/PyTorch-YOLOv3-ModelArts.iml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
--------------------------------------------------------------------------------
/.idea/misc.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
--------------------------------------------------------------------------------
/.idea/modules.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # PyTorch-YOLOv3-ModelArts
2 | 在华为云ModelArts平台部署PyTorch版本的YOLOv3目标检测网络,实现模型训练、在线预测及参赛发布。
3 |
4 |
5 | - 动机
6 |
7 | 正在参加“华为云杯”2020深圳开放数据应用创新大赛·生活垃圾图片分类比赛,官方只提供了keras版本YOLOv3的baseline。
8 | 但该baseline判分只有0.05分,低的可怕,远远达不到YOLOv3应有的水平。
9 |
10 | - What I do
11 |
12 | 自己keras用的比较少,因此没去深究官方baseline哪里出了问题。
13 | 索性自己写了个PyTorch版本的baseline。经测试,性能大幅大幅大幅提升。。。(看结果请移步最后)
14 | 果真官方baseline有问题,有兴趣的小伙伴可以考究一下。
15 |
16 |
17 | - source code: https://github.com/eriklindernoren/PyTorch-YOLOv3
18 | - 大赛地址: https://competition.huaweicloud.com/information/1000038439/introduction
19 |
20 | ## 使用前准备
21 | ##### 解压官方原始数据集,制作新数据集
22 | $ cd PyTorch-YOLOv3-ModelArts/my_utils
23 | $ python prepare_datasets.py --source_datasets --new_datasets
24 |
25 | ##### 下载预训练模型
26 | $ cd weights/
27 | $ bash download_weights.sh
28 |
29 | ##### 创建自定义模型的cfg文件
30 | $ cd PyTorch-YOLOv3-ModelArts/config
31 | $ bash create_custom_model.sh #此处已创建,即yolov3-44.cfg
32 |
33 | ## 在ModelArts平台上训练
34 | 1.将新数据集打包成压缩文件,替换原始数据集压缩包;
35 |
36 | 2.训练集和测试集的图片路径默认保存在config/train.txt和valid.txt中,每一行代表一张图片,默认按8:2划分。注意每行图片的路径为虚拟容器中的地址,自己重新划分训练集时只需要修改最后的图片名称,千万不要更改路径!
37 |
38 | 2.如果使用预训练模型,请提前将其上传到自己的OBS桶中,并添加参数
39 |
40 | `--pretrained_weights = s3://your_bucket/{model}`。
41 |
42 | 此处的model可以是官方预训练模型(yolov3.weights或darknet53.conv.74),也可以是自己训练过的PyTorch模型(.pth)。
43 |
44 | 3.训练过程中,学习率等参数默认不进行调整,请依个人经验调整
45 |
46 | 4.其余流程同大赛指导文档。
47 |
48 | ## 测试
49 | 1. 与官方keras版本的baseline比较,训练速度提升两倍多(官方baseline跑10个epoch需要150分钟,本项目仅需47分钟);参赛发布大概一小时完成判分,同样快一倍以上。
50 |
51 | 2. 官方baseline跑10个epoch用时两个半小时,判分却仅得0.05;本项目只训练头部跑5个epoch仅仅用时17分钟,判分达到0.17(惊掉下巴)
52 |
53 | 3. 因为比赛刚开始,过多的测试就不做了。个人估计,在此baseline上改进,最终成绩可以达到0.6分左右。
54 | 当然,如果想拿奖金的话还是转投RCNN或者EfficientDet吧。
55 |
56 |
57 | ## Credit
58 |
59 | ### YOLOv3: An Incremental Improvement
60 | _Joseph Redmon, Ali Farhadi_
61 |
62 | **Abstract**
63 | We present some updates to YOLO! We made a bunch
64 | of little design changes to make it better. We also trained
65 | this new network that’s pretty swell. It’s a little bigger than
66 | last time but more accurate. It’s still fast though, don’t
67 | worry. At 320 × 320 YOLOv3 runs in 22 ms at 28.2 mAP,
68 | as accurate as SSD but three times faster. When we look
69 | at the old .5 IOU mAP detection metric YOLOv3 is quite
70 | good. It achieves 57.9 AP50 in 51 ms on a Titan X, compared
71 | to 57.5 AP50 in 198 ms by RetinaNet, similar performance
72 | but 3.8× faster. As always, all the code is online at
73 | https://pjreddie.com/yolo/.
74 |
75 | [[Paper]](https://pjreddie.com/media/files/papers/YOLOv3.pdf) [[Project Webpage]](https://pjreddie.com/darknet/yolo/) [[Authors' Implementation]](https://github.com/pjreddie/darknet)
76 |
77 | ```
78 | @article{yolov3,
79 | title={YOLOv3: An Incremental Improvement},
80 | author={Redmon, Joseph and Farhadi, Ali},
81 | journal = {arXiv},
82 | year={2018}
83 | }
84 | ```
85 |
--------------------------------------------------------------------------------
/config/classify_rule.json:
--------------------------------------------------------------------------------
1 | {
2 | "可回收物": [
3 | "充电宝",
4 | "包",
5 | "洗护用品",
6 | "塑料玩具",
7 | "塑料器皿",
8 | "塑料衣架",
9 | "玻璃器皿",
10 | "金属器皿",
11 | "金属衣架",
12 | "快递纸袋",
13 | "插头电线",
14 | "旧衣服",
15 | "易拉罐",
16 | "枕头",
17 | "毛巾",
18 | "毛绒玩具",
19 | "鞋",
20 | "砧板",
21 | "纸盒纸箱",
22 | "纸袋",
23 | "调料瓶",
24 | "酒瓶",
25 | "金属食品罐",
26 | "金属厨具",
27 | "锅",
28 | "食用油桶",
29 | "饮料瓶",
30 | "饮料盒",
31 | "书籍纸张"
32 | ],
33 | "厨余垃圾": [
34 | "剩饭剩菜",
35 | "大骨头",
36 | "果皮果肉",
37 | "茶叶渣",
38 | "菜帮菜叶",
39 | "蛋壳",
40 | "鱼骨"
41 | ],
42 | "有害垃圾": [
43 | "干电池",
44 | "锂电池",
45 | "蓄电池",
46 | "纽扣电池",
47 | "灯管"
48 | ],
49 | "其他垃圾": [
50 | "一次性快餐盒",
51 | "污损塑料",
52 | "烟蒂",
53 | "牙签",
54 | "花盆",
55 | "陶瓷器皿",
56 | "筷子",
57 | "污损用纸"
58 | ]
59 | }
--------------------------------------------------------------------------------
/config/create_custom_model.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | NUM_CLASSES=$1
4 |
5 | echo "
6 | [net]
7 | # Testing
8 | #batch=1
9 | #subdivisions=1
10 | # Training
11 | batch=16
12 | subdivisions=1
13 | width=416
14 | height=416
15 | channels=3
16 | momentum=0.9
17 | decay=0.0005
18 | angle=0
19 | saturation = 1.5
20 | exposure = 1.5
21 | hue=.1
22 |
23 | learning_rate=0.001
24 | burn_in=1000
25 | max_batches = 500200
26 | policy=steps
27 | steps=400000,450000
28 | scales=.1,.1
29 |
30 | [convolutional]
31 | batch_normalize=1
32 | filters=32
33 | size=3
34 | stride=1
35 | pad=1
36 | activation=leaky
37 |
38 | # Downsample
39 |
40 | [convolutional]
41 | batch_normalize=1
42 | filters=64
43 | size=3
44 | stride=2
45 | pad=1
46 | activation=leaky
47 |
48 | [convolutional]
49 | batch_normalize=1
50 | filters=32
51 | size=1
52 | stride=1
53 | pad=1
54 | activation=leaky
55 |
56 | [convolutional]
57 | batch_normalize=1
58 | filters=64
59 | size=3
60 | stride=1
61 | pad=1
62 | activation=leaky
63 |
64 | [shortcut]
65 | from=-3
66 | activation=linear
67 |
68 | # Downsample
69 |
70 | [convolutional]
71 | batch_normalize=1
72 | filters=128
73 | size=3
74 | stride=2
75 | pad=1
76 | activation=leaky
77 |
78 | [convolutional]
79 | batch_normalize=1
80 | filters=64
81 | size=1
82 | stride=1
83 | pad=1
84 | activation=leaky
85 |
86 | [convolutional]
87 | batch_normalize=1
88 | filters=128
89 | size=3
90 | stride=1
91 | pad=1
92 | activation=leaky
93 |
94 | [shortcut]
95 | from=-3
96 | activation=linear
97 |
98 | [convolutional]
99 | batch_normalize=1
100 | filters=64
101 | size=1
102 | stride=1
103 | pad=1
104 | activation=leaky
105 |
106 | [convolutional]
107 | batch_normalize=1
108 | filters=128
109 | size=3
110 | stride=1
111 | pad=1
112 | activation=leaky
113 |
114 | [shortcut]
115 | from=-3
116 | activation=linear
117 |
118 | # Downsample
119 |
120 | [convolutional]
121 | batch_normalize=1
122 | filters=256
123 | size=3
124 | stride=2
125 | pad=1
126 | activation=leaky
127 |
128 | [convolutional]
129 | batch_normalize=1
130 | filters=128
131 | size=1
132 | stride=1
133 | pad=1
134 | activation=leaky
135 |
136 | [convolutional]
137 | batch_normalize=1
138 | filters=256
139 | size=3
140 | stride=1
141 | pad=1
142 | activation=leaky
143 |
144 | [shortcut]
145 | from=-3
146 | activation=linear
147 |
148 | [convolutional]
149 | batch_normalize=1
150 | filters=128
151 | size=1
152 | stride=1
153 | pad=1
154 | activation=leaky
155 |
156 | [convolutional]
157 | batch_normalize=1
158 | filters=256
159 | size=3
160 | stride=1
161 | pad=1
162 | activation=leaky
163 |
164 | [shortcut]
165 | from=-3
166 | activation=linear
167 |
168 | [convolutional]
169 | batch_normalize=1
170 | filters=128
171 | size=1
172 | stride=1
173 | pad=1
174 | activation=leaky
175 |
176 | [convolutional]
177 | batch_normalize=1
178 | filters=256
179 | size=3
180 | stride=1
181 | pad=1
182 | activation=leaky
183 |
184 | [shortcut]
185 | from=-3
186 | activation=linear
187 |
188 | [convolutional]
189 | batch_normalize=1
190 | filters=128
191 | size=1
192 | stride=1
193 | pad=1
194 | activation=leaky
195 |
196 | [convolutional]
197 | batch_normalize=1
198 | filters=256
199 | size=3
200 | stride=1
201 | pad=1
202 | activation=leaky
203 |
204 | [shortcut]
205 | from=-3
206 | activation=linear
207 |
208 |
209 | [convolutional]
210 | batch_normalize=1
211 | filters=128
212 | size=1
213 | stride=1
214 | pad=1
215 | activation=leaky
216 |
217 | [convolutional]
218 | batch_normalize=1
219 | filters=256
220 | size=3
221 | stride=1
222 | pad=1
223 | activation=leaky
224 |
225 | [shortcut]
226 | from=-3
227 | activation=linear
228 |
229 | [convolutional]
230 | batch_normalize=1
231 | filters=128
232 | size=1
233 | stride=1
234 | pad=1
235 | activation=leaky
236 |
237 | [convolutional]
238 | batch_normalize=1
239 | filters=256
240 | size=3
241 | stride=1
242 | pad=1
243 | activation=leaky
244 |
245 | [shortcut]
246 | from=-3
247 | activation=linear
248 |
249 | [convolutional]
250 | batch_normalize=1
251 | filters=128
252 | size=1
253 | stride=1
254 | pad=1
255 | activation=leaky
256 |
257 | [convolutional]
258 | batch_normalize=1
259 | filters=256
260 | size=3
261 | stride=1
262 | pad=1
263 | activation=leaky
264 |
265 | [shortcut]
266 | from=-3
267 | activation=linear
268 |
269 | [convolutional]
270 | batch_normalize=1
271 | filters=128
272 | size=1
273 | stride=1
274 | pad=1
275 | activation=leaky
276 |
277 | [convolutional]
278 | batch_normalize=1
279 | filters=256
280 | size=3
281 | stride=1
282 | pad=1
283 | activation=leaky
284 |
285 | [shortcut]
286 | from=-3
287 | activation=linear
288 |
289 | # Downsample
290 |
291 | [convolutional]
292 | batch_normalize=1
293 | filters=512
294 | size=3
295 | stride=2
296 | pad=1
297 | activation=leaky
298 |
299 | [convolutional]
300 | batch_normalize=1
301 | filters=256
302 | size=1
303 | stride=1
304 | pad=1
305 | activation=leaky
306 |
307 | [convolutional]
308 | batch_normalize=1
309 | filters=512
310 | size=3
311 | stride=1
312 | pad=1
313 | activation=leaky
314 |
315 | [shortcut]
316 | from=-3
317 | activation=linear
318 |
319 |
320 | [convolutional]
321 | batch_normalize=1
322 | filters=256
323 | size=1
324 | stride=1
325 | pad=1
326 | activation=leaky
327 |
328 | [convolutional]
329 | batch_normalize=1
330 | filters=512
331 | size=3
332 | stride=1
333 | pad=1
334 | activation=leaky
335 |
336 | [shortcut]
337 | from=-3
338 | activation=linear
339 |
340 |
341 | [convolutional]
342 | batch_normalize=1
343 | filters=256
344 | size=1
345 | stride=1
346 | pad=1
347 | activation=leaky
348 |
349 | [convolutional]
350 | batch_normalize=1
351 | filters=512
352 | size=3
353 | stride=1
354 | pad=1
355 | activation=leaky
356 |
357 | [shortcut]
358 | from=-3
359 | activation=linear
360 |
361 |
362 | [convolutional]
363 | batch_normalize=1
364 | filters=256
365 | size=1
366 | stride=1
367 | pad=1
368 | activation=leaky
369 |
370 | [convolutional]
371 | batch_normalize=1
372 | filters=512
373 | size=3
374 | stride=1
375 | pad=1
376 | activation=leaky
377 |
378 | [shortcut]
379 | from=-3
380 | activation=linear
381 |
382 | [convolutional]
383 | batch_normalize=1
384 | filters=256
385 | size=1
386 | stride=1
387 | pad=1
388 | activation=leaky
389 |
390 | [convolutional]
391 | batch_normalize=1
392 | filters=512
393 | size=3
394 | stride=1
395 | pad=1
396 | activation=leaky
397 |
398 | [shortcut]
399 | from=-3
400 | activation=linear
401 |
402 |
403 | [convolutional]
404 | batch_normalize=1
405 | filters=256
406 | size=1
407 | stride=1
408 | pad=1
409 | activation=leaky
410 |
411 | [convolutional]
412 | batch_normalize=1
413 | filters=512
414 | size=3
415 | stride=1
416 | pad=1
417 | activation=leaky
418 |
419 | [shortcut]
420 | from=-3
421 | activation=linear
422 |
423 |
424 | [convolutional]
425 | batch_normalize=1
426 | filters=256
427 | size=1
428 | stride=1
429 | pad=1
430 | activation=leaky
431 |
432 | [convolutional]
433 | batch_normalize=1
434 | filters=512
435 | size=3
436 | stride=1
437 | pad=1
438 | activation=leaky
439 |
440 | [shortcut]
441 | from=-3
442 | activation=linear
443 |
444 | [convolutional]
445 | batch_normalize=1
446 | filters=256
447 | size=1
448 | stride=1
449 | pad=1
450 | activation=leaky
451 |
452 | [convolutional]
453 | batch_normalize=1
454 | filters=512
455 | size=3
456 | stride=1
457 | pad=1
458 | activation=leaky
459 |
460 | [shortcut]
461 | from=-3
462 | activation=linear
463 |
464 | # Downsample
465 |
466 | [convolutional]
467 | batch_normalize=1
468 | filters=1024
469 | size=3
470 | stride=2
471 | pad=1
472 | activation=leaky
473 |
474 | [convolutional]
475 | batch_normalize=1
476 | filters=512
477 | size=1
478 | stride=1
479 | pad=1
480 | activation=leaky
481 |
482 | [convolutional]
483 | batch_normalize=1
484 | filters=1024
485 | size=3
486 | stride=1
487 | pad=1
488 | activation=leaky
489 |
490 | [shortcut]
491 | from=-3
492 | activation=linear
493 |
494 | [convolutional]
495 | batch_normalize=1
496 | filters=512
497 | size=1
498 | stride=1
499 | pad=1
500 | activation=leaky
501 |
502 | [convolutional]
503 | batch_normalize=1
504 | filters=1024
505 | size=3
506 | stride=1
507 | pad=1
508 | activation=leaky
509 |
510 | [shortcut]
511 | from=-3
512 | activation=linear
513 |
514 | [convolutional]
515 | batch_normalize=1
516 | filters=512
517 | size=1
518 | stride=1
519 | pad=1
520 | activation=leaky
521 |
522 | [convolutional]
523 | batch_normalize=1
524 | filters=1024
525 | size=3
526 | stride=1
527 | pad=1
528 | activation=leaky
529 |
530 | [shortcut]
531 | from=-3
532 | activation=linear
533 |
534 | [convolutional]
535 | batch_normalize=1
536 | filters=512
537 | size=1
538 | stride=1
539 | pad=1
540 | activation=leaky
541 |
542 | [convolutional]
543 | batch_normalize=1
544 | filters=1024
545 | size=3
546 | stride=1
547 | pad=1
548 | activation=leaky
549 |
550 | [shortcut]
551 | from=-3
552 | activation=linear
553 |
554 | ######################
555 |
556 | [convolutional]
557 | batch_normalize=1
558 | filters=512
559 | size=1
560 | stride=1
561 | pad=1
562 | activation=leaky
563 |
564 | [convolutional]
565 | batch_normalize=1
566 | size=3
567 | stride=1
568 | pad=1
569 | filters=1024
570 | activation=leaky
571 |
572 | [convolutional]
573 | batch_normalize=1
574 | filters=512
575 | size=1
576 | stride=1
577 | pad=1
578 | activation=leaky
579 |
580 | [convolutional]
581 | batch_normalize=1
582 | size=3
583 | stride=1
584 | pad=1
585 | filters=1024
586 | activation=leaky
587 |
588 | [convolutional]
589 | batch_normalize=1
590 | filters=512
591 | size=1
592 | stride=1
593 | pad=1
594 | activation=leaky
595 |
596 | [convolutional]
597 | batch_normalize=1
598 | size=3
599 | stride=1
600 | pad=1
601 | filters=1024
602 | activation=leaky
603 |
604 | [convolutional]
605 | size=1
606 | stride=1
607 | pad=1
608 | filters=$(expr 3 \* $(expr $NUM_CLASSES \+ 5))
609 | activation=linear
610 |
611 |
612 | [yolo]
613 | mask = 6,7,8
614 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
615 | classes=$NUM_CLASSES
616 | num=9
617 | jitter=.3
618 | ignore_thresh = .7
619 | truth_thresh = 1
620 | random=1
621 |
622 |
623 | [route]
624 | layers = -4
625 |
626 | [convolutional]
627 | batch_normalize=1
628 | filters=256
629 | size=1
630 | stride=1
631 | pad=1
632 | activation=leaky
633 |
634 | [upsample]
635 | stride=2
636 |
637 | [route]
638 | layers = -1, 61
639 |
640 |
641 |
642 | [convolutional]
643 | batch_normalize=1
644 | filters=256
645 | size=1
646 | stride=1
647 | pad=1
648 | activation=leaky
649 |
650 | [convolutional]
651 | batch_normalize=1
652 | size=3
653 | stride=1
654 | pad=1
655 | filters=512
656 | activation=leaky
657 |
658 | [convolutional]
659 | batch_normalize=1
660 | filters=256
661 | size=1
662 | stride=1
663 | pad=1
664 | activation=leaky
665 |
666 | [convolutional]
667 | batch_normalize=1
668 | size=3
669 | stride=1
670 | pad=1
671 | filters=512
672 | activation=leaky
673 |
674 | [convolutional]
675 | batch_normalize=1
676 | filters=256
677 | size=1
678 | stride=1
679 | pad=1
680 | activation=leaky
681 |
682 | [convolutional]
683 | batch_normalize=1
684 | size=3
685 | stride=1
686 | pad=1
687 | filters=512
688 | activation=leaky
689 |
690 | [convolutional]
691 | size=1
692 | stride=1
693 | pad=1
694 | filters=$(expr 3 \* $(expr $NUM_CLASSES \+ 5))
695 | activation=linear
696 |
697 |
698 | [yolo]
699 | mask = 3,4,5
700 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
701 | classes=$NUM_CLASSES
702 | num=9
703 | jitter=.3
704 | ignore_thresh = .7
705 | truth_thresh = 1
706 | random=1
707 |
708 |
709 |
710 | [route]
711 | layers = -4
712 |
713 | [convolutional]
714 | batch_normalize=1
715 | filters=128
716 | size=1
717 | stride=1
718 | pad=1
719 | activation=leaky
720 |
721 | [upsample]
722 | stride=2
723 |
724 | [route]
725 | layers = -1, 36
726 |
727 |
728 |
729 | [convolutional]
730 | batch_normalize=1
731 | filters=128
732 | size=1
733 | stride=1
734 | pad=1
735 | activation=leaky
736 |
737 | [convolutional]
738 | batch_normalize=1
739 | size=3
740 | stride=1
741 | pad=1
742 | filters=256
743 | activation=leaky
744 |
745 | [convolutional]
746 | batch_normalize=1
747 | filters=128
748 | size=1
749 | stride=1
750 | pad=1
751 | activation=leaky
752 |
753 | [convolutional]
754 | batch_normalize=1
755 | size=3
756 | stride=1
757 | pad=1
758 | filters=256
759 | activation=leaky
760 |
761 | [convolutional]
762 | batch_normalize=1
763 | filters=128
764 | size=1
765 | stride=1
766 | pad=1
767 | activation=leaky
768 |
769 | [convolutional]
770 | batch_normalize=1
771 | size=3
772 | stride=1
773 | pad=1
774 | filters=256
775 | activation=leaky
776 |
777 | [convolutional]
778 | size=1
779 | stride=1
780 | pad=1
781 | filters=$(expr 3 \* $(expr $NUM_CLASSES \+ 5))
782 | activation=linear
783 |
784 |
785 | [yolo]
786 | mask = 0,1,2
787 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
788 | classes=$NUM_CLASSES
789 | num=9
790 | jitter=.3
791 | ignore_thresh = .7
792 | truth_thresh = 1
793 | random=1
794 | " >> yolov3-custom.cfg
795 |
--------------------------------------------------------------------------------
/config/custom.data:
--------------------------------------------------------------------------------
1 | classes= 44
2 | train=PyTorch-YOLOv3-ModelArts/config/train.txt
3 | valid=PyTorch-YOLOv3-ModelArts/config/valid.txt
4 | names=PyTorch-YOLOv3-ModelArts/config/train_classes.txt
5 |
--------------------------------------------------------------------------------
/config/train_classes.txt:
--------------------------------------------------------------------------------
1 | 一次性快餐盒
2 | 书籍纸张
3 | 充电宝
4 | 剩饭剩菜
5 | 包
6 | 垃圾桶
7 | 塑料器皿
8 | 塑料玩具
9 | 塑料衣架
10 | 大骨头
11 | 干电池
12 | 快递纸袋
13 | 插头电线
14 | 旧衣服
15 | 易拉罐
16 | 枕头
17 | 果皮果肉
18 | 毛绒玩具
19 | 污损塑料
20 | 污损用纸
21 | 洗护用品
22 | 烟蒂
23 | 牙签
24 | 玻璃器皿
25 | 砧板
26 | 筷子
27 | 纸盒纸箱
28 | 花盆
29 | 茶叶渣
30 | 菜帮菜叶
31 | 蛋壳
32 | 调料瓶
33 | 软膏
34 | 过期药物
35 | 酒瓶
36 | 金属厨具
37 | 金属器皿
38 | 金属食品罐
39 | 锅
40 | 陶瓷器皿
41 | 鞋
42 | 食用油桶
43 | 饮料瓶
44 | 鱼骨
--------------------------------------------------------------------------------
/config/yolov3-44.cfg:
--------------------------------------------------------------------------------
1 |
2 | [net]
3 | # Testing
4 | #batch=1
5 | #subdivisions=1
6 | # Training
7 | batch=16
8 | subdivisions=1
9 | width=416
10 | height=416
11 | channels=3
12 | momentum=0.9
13 | decay=0.0005
14 | angle=0
15 | saturation = 1.5
16 | exposure = 1.5
17 | hue=.1
18 |
19 | learning_rate=0.001
20 | burn_in=1000
21 | max_batches = 500200
22 | policy=steps
23 | steps=400000,450000
24 | scales=.1,.1
25 |
26 | [convolutional]
27 | batch_normalize=1
28 | filters=32
29 | size=3
30 | stride=1
31 | pad=1
32 | activation=leaky
33 |
34 | # Downsample
35 |
36 | [convolutional]
37 | batch_normalize=1
38 | filters=64
39 | size=3
40 | stride=2
41 | pad=1
42 | activation=leaky
43 |
44 | [convolutional]
45 | batch_normalize=1
46 | filters=32
47 | size=1
48 | stride=1
49 | pad=1
50 | activation=leaky
51 |
52 | [convolutional]
53 | batch_normalize=1
54 | filters=64
55 | size=3
56 | stride=1
57 | pad=1
58 | activation=leaky
59 |
60 | [shortcut]
61 | from=-3
62 | activation=linear
63 |
64 | # Downsample
65 |
66 | [convolutional]
67 | batch_normalize=1
68 | filters=128
69 | size=3
70 | stride=2
71 | pad=1
72 | activation=leaky
73 |
74 | [convolutional]
75 | batch_normalize=1
76 | filters=64
77 | size=1
78 | stride=1
79 | pad=1
80 | activation=leaky
81 |
82 | [convolutional]
83 | batch_normalize=1
84 | filters=128
85 | size=3
86 | stride=1
87 | pad=1
88 | activation=leaky
89 |
90 | [shortcut]
91 | from=-3
92 | activation=linear
93 |
94 | [convolutional]
95 | batch_normalize=1
96 | filters=64
97 | size=1
98 | stride=1
99 | pad=1
100 | activation=leaky
101 |
102 | [convolutional]
103 | batch_normalize=1
104 | filters=128
105 | size=3
106 | stride=1
107 | pad=1
108 | activation=leaky
109 |
110 | [shortcut]
111 | from=-3
112 | activation=linear
113 |
114 | # Downsample
115 |
116 | [convolutional]
117 | batch_normalize=1
118 | filters=256
119 | size=3
120 | stride=2
121 | pad=1
122 | activation=leaky
123 |
124 | [convolutional]
125 | batch_normalize=1
126 | filters=128
127 | size=1
128 | stride=1
129 | pad=1
130 | activation=leaky
131 |
132 | [convolutional]
133 | batch_normalize=1
134 | filters=256
135 | size=3
136 | stride=1
137 | pad=1
138 | activation=leaky
139 |
140 | [shortcut]
141 | from=-3
142 | activation=linear
143 |
144 | [convolutional]
145 | batch_normalize=1
146 | filters=128
147 | size=1
148 | stride=1
149 | pad=1
150 | activation=leaky
151 |
152 | [convolutional]
153 | batch_normalize=1
154 | filters=256
155 | size=3
156 | stride=1
157 | pad=1
158 | activation=leaky
159 |
160 | [shortcut]
161 | from=-3
162 | activation=linear
163 |
164 | [convolutional]
165 | batch_normalize=1
166 | filters=128
167 | size=1
168 | stride=1
169 | pad=1
170 | activation=leaky
171 |
172 | [convolutional]
173 | batch_normalize=1
174 | filters=256
175 | size=3
176 | stride=1
177 | pad=1
178 | activation=leaky
179 |
180 | [shortcut]
181 | from=-3
182 | activation=linear
183 |
184 | [convolutional]
185 | batch_normalize=1
186 | filters=128
187 | size=1
188 | stride=1
189 | pad=1
190 | activation=leaky
191 |
192 | [convolutional]
193 | batch_normalize=1
194 | filters=256
195 | size=3
196 | stride=1
197 | pad=1
198 | activation=leaky
199 |
200 | [shortcut]
201 | from=-3
202 | activation=linear
203 |
204 |
205 | [convolutional]
206 | batch_normalize=1
207 | filters=128
208 | size=1
209 | stride=1
210 | pad=1
211 | activation=leaky
212 |
213 | [convolutional]
214 | batch_normalize=1
215 | filters=256
216 | size=3
217 | stride=1
218 | pad=1
219 | activation=leaky
220 |
221 | [shortcut]
222 | from=-3
223 | activation=linear
224 |
225 | [convolutional]
226 | batch_normalize=1
227 | filters=128
228 | size=1
229 | stride=1
230 | pad=1
231 | activation=leaky
232 |
233 | [convolutional]
234 | batch_normalize=1
235 | filters=256
236 | size=3
237 | stride=1
238 | pad=1
239 | activation=leaky
240 |
241 | [shortcut]
242 | from=-3
243 | activation=linear
244 |
245 | [convolutional]
246 | batch_normalize=1
247 | filters=128
248 | size=1
249 | stride=1
250 | pad=1
251 | activation=leaky
252 |
253 | [convolutional]
254 | batch_normalize=1
255 | filters=256
256 | size=3
257 | stride=1
258 | pad=1
259 | activation=leaky
260 |
261 | [shortcut]
262 | from=-3
263 | activation=linear
264 |
265 | [convolutional]
266 | batch_normalize=1
267 | filters=128
268 | size=1
269 | stride=1
270 | pad=1
271 | activation=leaky
272 |
273 | [convolutional]
274 | batch_normalize=1
275 | filters=256
276 | size=3
277 | stride=1
278 | pad=1
279 | activation=leaky
280 |
281 | [shortcut]
282 | from=-3
283 | activation=linear
284 |
285 | # Downsample
286 |
287 | [convolutional]
288 | batch_normalize=1
289 | filters=512
290 | size=3
291 | stride=2
292 | pad=1
293 | activation=leaky
294 |
295 | [convolutional]
296 | batch_normalize=1
297 | filters=256
298 | size=1
299 | stride=1
300 | pad=1
301 | activation=leaky
302 |
303 | [convolutional]
304 | batch_normalize=1
305 | filters=512
306 | size=3
307 | stride=1
308 | pad=1
309 | activation=leaky
310 |
311 | [shortcut]
312 | from=-3
313 | activation=linear
314 |
315 |
316 | [convolutional]
317 | batch_normalize=1
318 | filters=256
319 | size=1
320 | stride=1
321 | pad=1
322 | activation=leaky
323 |
324 | [convolutional]
325 | batch_normalize=1
326 | filters=512
327 | size=3
328 | stride=1
329 | pad=1
330 | activation=leaky
331 |
332 | [shortcut]
333 | from=-3
334 | activation=linear
335 |
336 |
337 | [convolutional]
338 | batch_normalize=1
339 | filters=256
340 | size=1
341 | stride=1
342 | pad=1
343 | activation=leaky
344 |
345 | [convolutional]
346 | batch_normalize=1
347 | filters=512
348 | size=3
349 | stride=1
350 | pad=1
351 | activation=leaky
352 |
353 | [shortcut]
354 | from=-3
355 | activation=linear
356 |
357 |
358 | [convolutional]
359 | batch_normalize=1
360 | filters=256
361 | size=1
362 | stride=1
363 | pad=1
364 | activation=leaky
365 |
366 | [convolutional]
367 | batch_normalize=1
368 | filters=512
369 | size=3
370 | stride=1
371 | pad=1
372 | activation=leaky
373 |
374 | [shortcut]
375 | from=-3
376 | activation=linear
377 |
378 | [convolutional]
379 | batch_normalize=1
380 | filters=256
381 | size=1
382 | stride=1
383 | pad=1
384 | activation=leaky
385 |
386 | [convolutional]
387 | batch_normalize=1
388 | filters=512
389 | size=3
390 | stride=1
391 | pad=1
392 | activation=leaky
393 |
394 | [shortcut]
395 | from=-3
396 | activation=linear
397 |
398 |
399 | [convolutional]
400 | batch_normalize=1
401 | filters=256
402 | size=1
403 | stride=1
404 | pad=1
405 | activation=leaky
406 |
407 | [convolutional]
408 | batch_normalize=1
409 | filters=512
410 | size=3
411 | stride=1
412 | pad=1
413 | activation=leaky
414 |
415 | [shortcut]
416 | from=-3
417 | activation=linear
418 |
419 |
420 | [convolutional]
421 | batch_normalize=1
422 | filters=256
423 | size=1
424 | stride=1
425 | pad=1
426 | activation=leaky
427 |
428 | [convolutional]
429 | batch_normalize=1
430 | filters=512
431 | size=3
432 | stride=1
433 | pad=1
434 | activation=leaky
435 |
436 | [shortcut]
437 | from=-3
438 | activation=linear
439 |
440 | [convolutional]
441 | batch_normalize=1
442 | filters=256
443 | size=1
444 | stride=1
445 | pad=1
446 | activation=leaky
447 |
448 | [convolutional]
449 | batch_normalize=1
450 | filters=512
451 | size=3
452 | stride=1
453 | pad=1
454 | activation=leaky
455 |
456 | [shortcut]
457 | from=-3
458 | activation=linear
459 |
460 | # Downsample
461 |
462 | [convolutional]
463 | batch_normalize=1
464 | filters=1024
465 | size=3
466 | stride=2
467 | pad=1
468 | activation=leaky
469 |
470 | [convolutional]
471 | batch_normalize=1
472 | filters=512
473 | size=1
474 | stride=1
475 | pad=1
476 | activation=leaky
477 |
478 | [convolutional]
479 | batch_normalize=1
480 | filters=1024
481 | size=3
482 | stride=1
483 | pad=1
484 | activation=leaky
485 |
486 | [shortcut]
487 | from=-3
488 | activation=linear
489 |
490 | [convolutional]
491 | batch_normalize=1
492 | filters=512
493 | size=1
494 | stride=1
495 | pad=1
496 | activation=leaky
497 |
498 | [convolutional]
499 | batch_normalize=1
500 | filters=1024
501 | size=3
502 | stride=1
503 | pad=1
504 | activation=leaky
505 |
506 | [shortcut]
507 | from=-3
508 | activation=linear
509 |
510 | [convolutional]
511 | batch_normalize=1
512 | filters=512
513 | size=1
514 | stride=1
515 | pad=1
516 | activation=leaky
517 |
518 | [convolutional]
519 | batch_normalize=1
520 | filters=1024
521 | size=3
522 | stride=1
523 | pad=1
524 | activation=leaky
525 |
526 | [shortcut]
527 | from=-3
528 | activation=linear
529 |
530 | [convolutional]
531 | batch_normalize=1
532 | filters=512
533 | size=1
534 | stride=1
535 | pad=1
536 | activation=leaky
537 |
538 | [convolutional]
539 | batch_normalize=1
540 | filters=1024
541 | size=3
542 | stride=1
543 | pad=1
544 | activation=leaky
545 |
546 | [shortcut]
547 | from=-3
548 | activation=linear
549 |
550 | ######################
551 |
552 | [convolutional]
553 | batch_normalize=1
554 | filters=512
555 | size=1
556 | stride=1
557 | pad=1
558 | activation=leaky
559 |
560 | [convolutional]
561 | batch_normalize=1
562 | size=3
563 | stride=1
564 | pad=1
565 | filters=1024
566 | activation=leaky
567 |
568 | [convolutional]
569 | batch_normalize=1
570 | filters=512
571 | size=1
572 | stride=1
573 | pad=1
574 | activation=leaky
575 |
576 | [convolutional]
577 | batch_normalize=1
578 | size=3
579 | stride=1
580 | pad=1
581 | filters=1024
582 | activation=leaky
583 |
584 | [convolutional]
585 | batch_normalize=1
586 | filters=512
587 | size=1
588 | stride=1
589 | pad=1
590 | activation=leaky
591 |
592 | [convolutional]
593 | batch_normalize=1
594 | size=3
595 | stride=1
596 | pad=1
597 | filters=1024
598 | activation=leaky
599 |
600 | [convolutional]
601 | size=1
602 | stride=1
603 | pad=1
604 | filters=147
605 | activation=linear
606 |
607 |
608 | [yolo]
609 | mask = 6,7,8
610 | anchors = 25,31, 35,44, 48,56, 59,73, 80,96, 112,132, 144,174, 195,227, 264,337
611 | classes=44
612 | num=9
613 | jitter=.3
614 | ignore_thresh = .7
615 | truth_thresh = 1
616 | random=1
617 |
618 |
619 | [route]
620 | layers = -4
621 |
622 | [convolutional]
623 | batch_normalize=1
624 | filters=256
625 | size=1
626 | stride=1
627 | pad=1
628 | activation=leaky
629 |
630 | [upsample]
631 | stride=2
632 |
633 | [route]
634 | layers = -1, 61
635 |
636 |
637 |
638 | [convolutional]
639 | batch_normalize=1
640 | filters=256
641 | size=1
642 | stride=1
643 | pad=1
644 | activation=leaky
645 |
646 | [convolutional]
647 | batch_normalize=1
648 | size=3
649 | stride=1
650 | pad=1
651 | filters=512
652 | activation=leaky
653 |
654 | [convolutional]
655 | batch_normalize=1
656 | filters=256
657 | size=1
658 | stride=1
659 | pad=1
660 | activation=leaky
661 |
662 | [convolutional]
663 | batch_normalize=1
664 | size=3
665 | stride=1
666 | pad=1
667 | filters=512
668 | activation=leaky
669 |
670 | [convolutional]
671 | batch_normalize=1
672 | filters=256
673 | size=1
674 | stride=1
675 | pad=1
676 | activation=leaky
677 |
678 | [convolutional]
679 | batch_normalize=1
680 | size=3
681 | stride=1
682 | pad=1
683 | filters=512
684 | activation=leaky
685 |
686 | [convolutional]
687 | size=1
688 | stride=1
689 | pad=1
690 | filters=147
691 | activation=linear
692 |
693 |
694 | [yolo]
695 | mask = 3,4,5
696 | anchors = 25,31, 35,44, 48,56, 59,73, 80,96, 112,132, 144,174, 195,227, 264,337
697 | classes=44
698 | num=9
699 | jitter=.3
700 | ignore_thresh = .7
701 | truth_thresh = 1
702 | random=1
703 |
704 |
705 |
706 | [route]
707 | layers = -4
708 |
709 | [convolutional]
710 | batch_normalize=1
711 | filters=128
712 | size=1
713 | stride=1
714 | pad=1
715 | activation=leaky
716 |
717 | [upsample]
718 | stride=2
719 |
720 | [route]
721 | layers = -1, 36
722 |
723 |
724 |
725 | [convolutional]
726 | batch_normalize=1
727 | filters=128
728 | size=1
729 | stride=1
730 | pad=1
731 | activation=leaky
732 |
733 | [convolutional]
734 | batch_normalize=1
735 | size=3
736 | stride=1
737 | pad=1
738 | filters=256
739 | activation=leaky
740 |
741 | [convolutional]
742 | batch_normalize=1
743 | filters=128
744 | size=1
745 | stride=1
746 | pad=1
747 | activation=leaky
748 |
749 | [convolutional]
750 | batch_normalize=1
751 | size=3
752 | stride=1
753 | pad=1
754 | filters=256
755 | activation=leaky
756 |
757 | [convolutional]
758 | batch_normalize=1
759 | filters=128
760 | size=1
761 | stride=1
762 | pad=1
763 | activation=leaky
764 |
765 | [convolutional]
766 | batch_normalize=1
767 | size=3
768 | stride=1
769 | pad=1
770 | filters=256
771 | activation=leaky
772 |
773 | [convolutional]
774 | size=1
775 | stride=1
776 | pad=1
777 | filters=147
778 | activation=linear
779 |
780 |
781 | [yolo]
782 | mask = 0,1,2
783 | anchors = 25,31, 35,44, 48,56, 59,73, 80,96, 112,132, 144,174, 195,227, 264,337
784 | classes=44
785 | num=9
786 | jitter=.3
787 | ignore_thresh = .7
788 | truth_thresh = 1
789 | random=1
790 |
791 |
--------------------------------------------------------------------------------
/config/yolov3-tiny.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | batch=1
4 | subdivisions=1
5 | # Training
6 | # batch=64
7 | # subdivisions=2
8 | width=416
9 | height=416
10 | channels=3
11 | momentum=0.9
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.001
19 | burn_in=1000
20 | max_batches = 500200
21 | policy=steps
22 | steps=400000,450000
23 | scales=.1,.1
24 |
25 | # 0
26 | [convolutional]
27 | batch_normalize=1
28 | filters=16
29 | size=3
30 | stride=1
31 | pad=1
32 | activation=leaky
33 |
34 | # 1
35 | [maxpool]
36 | size=2
37 | stride=2
38 |
39 | # 2
40 | [convolutional]
41 | batch_normalize=1
42 | filters=32
43 | size=3
44 | stride=1
45 | pad=1
46 | activation=leaky
47 |
48 | # 3
49 | [maxpool]
50 | size=2
51 | stride=2
52 |
53 | # 4
54 | [convolutional]
55 | batch_normalize=1
56 | filters=64
57 | size=3
58 | stride=1
59 | pad=1
60 | activation=leaky
61 |
62 | # 5
63 | [maxpool]
64 | size=2
65 | stride=2
66 |
67 | # 6
68 | [convolutional]
69 | batch_normalize=1
70 | filters=128
71 | size=3
72 | stride=1
73 | pad=1
74 | activation=leaky
75 |
76 | # 7
77 | [maxpool]
78 | size=2
79 | stride=2
80 |
81 | # 8
82 | [convolutional]
83 | batch_normalize=1
84 | filters=256
85 | size=3
86 | stride=1
87 | pad=1
88 | activation=leaky
89 |
90 | # 9
91 | [maxpool]
92 | size=2
93 | stride=2
94 |
95 | # 10
96 | [convolutional]
97 | batch_normalize=1
98 | filters=512
99 | size=3
100 | stride=1
101 | pad=1
102 | activation=leaky
103 |
104 | # 11
105 | [maxpool]
106 | size=2
107 | stride=1
108 |
109 | # 12
110 | [convolutional]
111 | batch_normalize=1
112 | filters=1024
113 | size=3
114 | stride=1
115 | pad=1
116 | activation=leaky
117 |
118 | ###########
119 |
120 | # 13
121 | [convolutional]
122 | batch_normalize=1
123 | filters=256
124 | size=1
125 | stride=1
126 | pad=1
127 | activation=leaky
128 |
129 | # 14
130 | [convolutional]
131 | batch_normalize=1
132 | filters=512
133 | size=3
134 | stride=1
135 | pad=1
136 | activation=leaky
137 |
138 | # 15
139 | [convolutional]
140 | size=1
141 | stride=1
142 | pad=1
143 | filters=255
144 | activation=linear
145 |
146 |
147 |
148 | # 16
149 | [yolo]
150 | mask = 3,4,5
151 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
152 | classes=80
153 | num=6
154 | jitter=.3
155 | ignore_thresh = .7
156 | truth_thresh = 1
157 | random=1
158 |
159 | # 17
160 | [route]
161 | layers = -4
162 |
163 | # 18
164 | [convolutional]
165 | batch_normalize=1
166 | filters=128
167 | size=1
168 | stride=1
169 | pad=1
170 | activation=leaky
171 |
172 | # 19
173 | [upsample]
174 | stride=2
175 |
176 | # 20
177 | [route]
178 | layers = -1, 8
179 |
180 | # 21
181 | [convolutional]
182 | batch_normalize=1
183 | filters=256
184 | size=3
185 | stride=1
186 | pad=1
187 | activation=leaky
188 |
189 | # 22
190 | [convolutional]
191 | size=1
192 | stride=1
193 | pad=1
194 | filters=255
195 | activation=linear
196 |
197 | # 23
198 | [yolo]
199 | mask = 1,2,3
200 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
201 | classes=80
202 | num=6
203 | jitter=.3
204 | ignore_thresh = .7
205 | truth_thresh = 1
206 | random=1
207 |
--------------------------------------------------------------------------------
/config/yolov3.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | #batch=1
4 | #subdivisions=1
5 | # Training
6 | batch=16
7 | subdivisions=1
8 | width=416
9 | height=416
10 | channels=3
11 | momentum=0.9
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.001
19 | burn_in=1000
20 | max_batches = 500200
21 | policy=steps
22 | steps=400000,450000
23 | scales=.1,.1
24 |
25 | [convolutional]
26 | batch_normalize=1
27 | filters=32
28 | size=3
29 | stride=1
30 | pad=1
31 | activation=leaky
32 |
33 | # Downsample
34 |
35 | [convolutional]
36 | batch_normalize=1
37 | filters=64
38 | size=3
39 | stride=2
40 | pad=1
41 | activation=leaky
42 |
43 | [convolutional]
44 | batch_normalize=1
45 | filters=32
46 | size=1
47 | stride=1
48 | pad=1
49 | activation=leaky
50 |
51 | [convolutional]
52 | batch_normalize=1
53 | filters=64
54 | size=3
55 | stride=1
56 | pad=1
57 | activation=leaky
58 |
59 | [shortcut]
60 | from=-3
61 | activation=linear
62 |
63 | # Downsample
64 |
65 | [convolutional]
66 | batch_normalize=1
67 | filters=128
68 | size=3
69 | stride=2
70 | pad=1
71 | activation=leaky
72 |
73 | [convolutional]
74 | batch_normalize=1
75 | filters=64
76 | size=1
77 | stride=1
78 | pad=1
79 | activation=leaky
80 |
81 | [convolutional]
82 | batch_normalize=1
83 | filters=128
84 | size=3
85 | stride=1
86 | pad=1
87 | activation=leaky
88 |
89 | [shortcut]
90 | from=-3
91 | activation=linear
92 |
93 | [convolutional]
94 | batch_normalize=1
95 | filters=64
96 | size=1
97 | stride=1
98 | pad=1
99 | activation=leaky
100 |
101 | [convolutional]
102 | batch_normalize=1
103 | filters=128
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 |
109 | [shortcut]
110 | from=-3
111 | activation=linear
112 |
113 | # Downsample
114 |
115 | [convolutional]
116 | batch_normalize=1
117 | filters=256
118 | size=3
119 | stride=2
120 | pad=1
121 | activation=leaky
122 |
123 | [convolutional]
124 | batch_normalize=1
125 | filters=128
126 | size=1
127 | stride=1
128 | pad=1
129 | activation=leaky
130 |
131 | [convolutional]
132 | batch_normalize=1
133 | filters=256
134 | size=3
135 | stride=1
136 | pad=1
137 | activation=leaky
138 |
139 | [shortcut]
140 | from=-3
141 | activation=linear
142 |
143 | [convolutional]
144 | batch_normalize=1
145 | filters=128
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=leaky
150 |
151 | [convolutional]
152 | batch_normalize=1
153 | filters=256
154 | size=3
155 | stride=1
156 | pad=1
157 | activation=leaky
158 |
159 | [shortcut]
160 | from=-3
161 | activation=linear
162 |
163 | [convolutional]
164 | batch_normalize=1
165 | filters=128
166 | size=1
167 | stride=1
168 | pad=1
169 | activation=leaky
170 |
171 | [convolutional]
172 | batch_normalize=1
173 | filters=256
174 | size=3
175 | stride=1
176 | pad=1
177 | activation=leaky
178 |
179 | [shortcut]
180 | from=-3
181 | activation=linear
182 |
183 | [convolutional]
184 | batch_normalize=1
185 | filters=128
186 | size=1
187 | stride=1
188 | pad=1
189 | activation=leaky
190 |
191 | [convolutional]
192 | batch_normalize=1
193 | filters=256
194 | size=3
195 | stride=1
196 | pad=1
197 | activation=leaky
198 |
199 | [shortcut]
200 | from=-3
201 | activation=linear
202 |
203 |
204 | [convolutional]
205 | batch_normalize=1
206 | filters=128
207 | size=1
208 | stride=1
209 | pad=1
210 | activation=leaky
211 |
212 | [convolutional]
213 | batch_normalize=1
214 | filters=256
215 | size=3
216 | stride=1
217 | pad=1
218 | activation=leaky
219 |
220 | [shortcut]
221 | from=-3
222 | activation=linear
223 |
224 | [convolutional]
225 | batch_normalize=1
226 | filters=128
227 | size=1
228 | stride=1
229 | pad=1
230 | activation=leaky
231 |
232 | [convolutional]
233 | batch_normalize=1
234 | filters=256
235 | size=3
236 | stride=1
237 | pad=1
238 | activation=leaky
239 |
240 | [shortcut]
241 | from=-3
242 | activation=linear
243 |
244 | [convolutional]
245 | batch_normalize=1
246 | filters=128
247 | size=1
248 | stride=1
249 | pad=1
250 | activation=leaky
251 |
252 | [convolutional]
253 | batch_normalize=1
254 | filters=256
255 | size=3
256 | stride=1
257 | pad=1
258 | activation=leaky
259 |
260 | [shortcut]
261 | from=-3
262 | activation=linear
263 |
264 | [convolutional]
265 | batch_normalize=1
266 | filters=128
267 | size=1
268 | stride=1
269 | pad=1
270 | activation=leaky
271 |
272 | [convolutional]
273 | batch_normalize=1
274 | filters=256
275 | size=3
276 | stride=1
277 | pad=1
278 | activation=leaky
279 |
280 | [shortcut]
281 | from=-3
282 | activation=linear
283 |
284 | # Downsample
285 |
286 | [convolutional]
287 | batch_normalize=1
288 | filters=512
289 | size=3
290 | stride=2
291 | pad=1
292 | activation=leaky
293 |
294 | [convolutional]
295 | batch_normalize=1
296 | filters=256
297 | size=1
298 | stride=1
299 | pad=1
300 | activation=leaky
301 |
302 | [convolutional]
303 | batch_normalize=1
304 | filters=512
305 | size=3
306 | stride=1
307 | pad=1
308 | activation=leaky
309 |
310 | [shortcut]
311 | from=-3
312 | activation=linear
313 |
314 |
315 | [convolutional]
316 | batch_normalize=1
317 | filters=256
318 | size=1
319 | stride=1
320 | pad=1
321 | activation=leaky
322 |
323 | [convolutional]
324 | batch_normalize=1
325 | filters=512
326 | size=3
327 | stride=1
328 | pad=1
329 | activation=leaky
330 |
331 | [shortcut]
332 | from=-3
333 | activation=linear
334 |
335 |
336 | [convolutional]
337 | batch_normalize=1
338 | filters=256
339 | size=1
340 | stride=1
341 | pad=1
342 | activation=leaky
343 |
344 | [convolutional]
345 | batch_normalize=1
346 | filters=512
347 | size=3
348 | stride=1
349 | pad=1
350 | activation=leaky
351 |
352 | [shortcut]
353 | from=-3
354 | activation=linear
355 |
356 |
357 | [convolutional]
358 | batch_normalize=1
359 | filters=256
360 | size=1
361 | stride=1
362 | pad=1
363 | activation=leaky
364 |
365 | [convolutional]
366 | batch_normalize=1
367 | filters=512
368 | size=3
369 | stride=1
370 | pad=1
371 | activation=leaky
372 |
373 | [shortcut]
374 | from=-3
375 | activation=linear
376 |
377 | [convolutional]
378 | batch_normalize=1
379 | filters=256
380 | size=1
381 | stride=1
382 | pad=1
383 | activation=leaky
384 |
385 | [convolutional]
386 | batch_normalize=1
387 | filters=512
388 | size=3
389 | stride=1
390 | pad=1
391 | activation=leaky
392 |
393 | [shortcut]
394 | from=-3
395 | activation=linear
396 |
397 |
398 | [convolutional]
399 | batch_normalize=1
400 | filters=256
401 | size=1
402 | stride=1
403 | pad=1
404 | activation=leaky
405 |
406 | [convolutional]
407 | batch_normalize=1
408 | filters=512
409 | size=3
410 | stride=1
411 | pad=1
412 | activation=leaky
413 |
414 | [shortcut]
415 | from=-3
416 | activation=linear
417 |
418 |
419 | [convolutional]
420 | batch_normalize=1
421 | filters=256
422 | size=1
423 | stride=1
424 | pad=1
425 | activation=leaky
426 |
427 | [convolutional]
428 | batch_normalize=1
429 | filters=512
430 | size=3
431 | stride=1
432 | pad=1
433 | activation=leaky
434 |
435 | [shortcut]
436 | from=-3
437 | activation=linear
438 |
439 | [convolutional]
440 | batch_normalize=1
441 | filters=256
442 | size=1
443 | stride=1
444 | pad=1
445 | activation=leaky
446 |
447 | [convolutional]
448 | batch_normalize=1
449 | filters=512
450 | size=3
451 | stride=1
452 | pad=1
453 | activation=leaky
454 |
455 | [shortcut]
456 | from=-3
457 | activation=linear
458 |
459 | # Downsample
460 |
461 | [convolutional]
462 | batch_normalize=1
463 | filters=1024
464 | size=3
465 | stride=2
466 | pad=1
467 | activation=leaky
468 |
469 | [convolutional]
470 | batch_normalize=1
471 | filters=512
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=leaky
476 |
477 | [convolutional]
478 | batch_normalize=1
479 | filters=1024
480 | size=3
481 | stride=1
482 | pad=1
483 | activation=leaky
484 |
485 | [shortcut]
486 | from=-3
487 | activation=linear
488 |
489 | [convolutional]
490 | batch_normalize=1
491 | filters=512
492 | size=1
493 | stride=1
494 | pad=1
495 | activation=leaky
496 |
497 | [convolutional]
498 | batch_normalize=1
499 | filters=1024
500 | size=3
501 | stride=1
502 | pad=1
503 | activation=leaky
504 |
505 | [shortcut]
506 | from=-3
507 | activation=linear
508 |
509 | [convolutional]
510 | batch_normalize=1
511 | filters=512
512 | size=1
513 | stride=1
514 | pad=1
515 | activation=leaky
516 |
517 | [convolutional]
518 | batch_normalize=1
519 | filters=1024
520 | size=3
521 | stride=1
522 | pad=1
523 | activation=leaky
524 |
525 | [shortcut]
526 | from=-3
527 | activation=linear
528 |
529 | [convolutional]
530 | batch_normalize=1
531 | filters=512
532 | size=1
533 | stride=1
534 | pad=1
535 | activation=leaky
536 |
537 | [convolutional]
538 | batch_normalize=1
539 | filters=1024
540 | size=3
541 | stride=1
542 | pad=1
543 | activation=leaky
544 |
545 | [shortcut]
546 | from=-3
547 | activation=linear
548 |
549 | ######################
550 |
551 | [convolutional]
552 | batch_normalize=1
553 | filters=512
554 | size=1
555 | stride=1
556 | pad=1
557 | activation=leaky
558 |
559 | [convolutional]
560 | batch_normalize=1
561 | size=3
562 | stride=1
563 | pad=1
564 | filters=1024
565 | activation=leaky
566 |
567 | [convolutional]
568 | batch_normalize=1
569 | filters=512
570 | size=1
571 | stride=1
572 | pad=1
573 | activation=leaky
574 |
575 | [convolutional]
576 | batch_normalize=1
577 | size=3
578 | stride=1
579 | pad=1
580 | filters=1024
581 | activation=leaky
582 |
583 | [convolutional]
584 | batch_normalize=1
585 | filters=512
586 | size=1
587 | stride=1
588 | pad=1
589 | activation=leaky
590 |
591 | [convolutional]
592 | batch_normalize=1
593 | size=3
594 | stride=1
595 | pad=1
596 | filters=1024
597 | activation=leaky
598 |
599 | [convolutional]
600 | size=1
601 | stride=1
602 | pad=1
603 | filters=255
604 | activation=linear
605 |
606 |
607 | [yolo]
608 | mask = 6,7,8
609 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
610 | classes=80
611 | num=9
612 | jitter=.3
613 | ignore_thresh = .7
614 | truth_thresh = 1
615 | random=1
616 |
617 |
618 | [route]
619 | layers = -4
620 |
621 | [convolutional]
622 | batch_normalize=1
623 | filters=256
624 | size=1
625 | stride=1
626 | pad=1
627 | activation=leaky
628 |
629 | [upsample]
630 | stride=2
631 |
632 | [route]
633 | layers = -1, 61
634 |
635 |
636 |
637 | [convolutional]
638 | batch_normalize=1
639 | filters=256
640 | size=1
641 | stride=1
642 | pad=1
643 | activation=leaky
644 |
645 | [convolutional]
646 | batch_normalize=1
647 | size=3
648 | stride=1
649 | pad=1
650 | filters=512
651 | activation=leaky
652 |
653 | [convolutional]
654 | batch_normalize=1
655 | filters=256
656 | size=1
657 | stride=1
658 | pad=1
659 | activation=leaky
660 |
661 | [convolutional]
662 | batch_normalize=1
663 | size=3
664 | stride=1
665 | pad=1
666 | filters=512
667 | activation=leaky
668 |
669 | [convolutional]
670 | batch_normalize=1
671 | filters=256
672 | size=1
673 | stride=1
674 | pad=1
675 | activation=leaky
676 |
677 | [convolutional]
678 | batch_normalize=1
679 | size=3
680 | stride=1
681 | pad=1
682 | filters=512
683 | activation=leaky
684 |
685 | [convolutional]
686 | size=1
687 | stride=1
688 | pad=1
689 | filters=255
690 | activation=linear
691 |
692 |
693 | [yolo]
694 | mask = 3,4,5
695 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
696 | classes=80
697 | num=9
698 | jitter=.3
699 | ignore_thresh = .7
700 | truth_thresh = 1
701 | random=1
702 |
703 |
704 |
705 | [route]
706 | layers = -4
707 |
708 | [convolutional]
709 | batch_normalize=1
710 | filters=128
711 | size=1
712 | stride=1
713 | pad=1
714 | activation=leaky
715 |
716 | [upsample]
717 | stride=2
718 |
719 | [route]
720 | layers = -1, 36
721 |
722 |
723 |
724 | [convolutional]
725 | batch_normalize=1
726 | filters=128
727 | size=1
728 | stride=1
729 | pad=1
730 | activation=leaky
731 |
732 | [convolutional]
733 | batch_normalize=1
734 | size=3
735 | stride=1
736 | pad=1
737 | filters=256
738 | activation=leaky
739 |
740 | [convolutional]
741 | batch_normalize=1
742 | filters=128
743 | size=1
744 | stride=1
745 | pad=1
746 | activation=leaky
747 |
748 | [convolutional]
749 | batch_normalize=1
750 | size=3
751 | stride=1
752 | pad=1
753 | filters=256
754 | activation=leaky
755 |
756 | [convolutional]
757 | batch_normalize=1
758 | filters=128
759 | size=1
760 | stride=1
761 | pad=1
762 | activation=leaky
763 |
764 | [convolutional]
765 | batch_normalize=1
766 | size=3
767 | stride=1
768 | pad=1
769 | filters=256
770 | activation=leaky
771 |
772 | [convolutional]
773 | size=1
774 | stride=1
775 | pad=1
776 | filters=255
777 | activation=linear
778 |
779 |
780 | [yolo]
781 | mask = 0,1,2
782 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
783 | classes=80
784 | num=9
785 | jitter=.3
786 | ignore_thresh = .7
787 | truth_thresh = 1
788 | random=1
789 |
--------------------------------------------------------------------------------
/deploy_scripts/config.json:
--------------------------------------------------------------------------------
1 | {
2 | "model_type": "PyTorch",
3 | "runtime": "python3.6",
4 | "model_algorithm": "object_detection",
5 | "metrics": {
6 | "f1": 0.0,
7 | "accuracy": 0.0,
8 | "precision": 0.0,
9 | "recall": 0.0
10 | },
11 | "apis": [{
12 | "protocol": "https",
13 | "url": "/",
14 | "method": "post",
15 | "request": {
16 | "Content-type": "multipart/form-data",
17 | "data": {
18 | "type": "object",
19 | "properties": {
20 | "images": {
21 | "type": "file"
22 | }
23 | }
24 | }
25 | },
26 | "response": {
27 | "Content-type": "multipart/form-data",
28 | "data": {
29 | "type": "object",
30 | "properties": {
31 | "detection_classes": {
32 | "type": "list",
33 | "items": [{
34 | "type": "string"
35 | }]
36 | },
37 | "detection_scores": {
38 | "type": "list",
39 | "items": [{
40 | "type": "number"
41 | }]
42 | },
43 | "detection_boxes": {
44 | "type": "list",
45 | "items": [{
46 | "type": "list",
47 | "minItems": 4,
48 | "maxItems": 4,
49 | "items": [{
50 | "type": "number"
51 | }]
52 | }]
53 | }
54 | }
55 | }
56 | }
57 | }],
58 | "dependencies": [{
59 | "installer": "pip",
60 | "packages": [
61 | {
62 | "restraint": "EXACT",
63 | "package_version": "5.2.0",
64 | "package_name": "Pillow"
65 | },
66 | {
67 | "restraint": "EXACT",
68 | "package_version": "1.3.1",
69 | "package_name": "torch"
70 | },
71 | {
72 | "restraint": "EXACT",
73 | "package_version": "4.32.1",
74 | "package_name": "tqdm"
75 | },
76 | {
77 | "restraint": "EXACT",
78 | "package_version": "0.4.2",
79 | "package_name": "torchvision"
80 | }
81 | ]
82 | }]
83 | }
84 |
--------------------------------------------------------------------------------
/deploy_scripts/customize_service.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | import json
3 | import codecs
4 | from collections import OrderedDict
5 | from models import *
6 | from my_utils.utils import *
7 | from my_utils.datasets import *
8 |
9 |
10 | from model_service.pytorch_model_service import PTServingBaseService
11 |
12 | import time
13 | from metric.metrics_manager import MetricsManager
14 | import log
15 | logger = log.getLogger(__name__)
16 |
17 |
18 | class ObjectDetectionService(PTServingBaseService):
19 | def __init__(self, model_name, model_path):
20 | # make sure these files exist
21 | self.model_name = model_name
22 | self.model_path = os.path.join(os.path.dirname(__file__), 'models_best.pth')
23 | self.classes_path = os.path.join(os.path.dirname(__file__), 'train_classes.txt')
24 | self.model_def = os.path.join(os.path.dirname(__file__), 'yolov3-44.cfg')
25 | self.label_map = parse_classify_rule(os.path.join(os.path.dirname(__file__), 'classify_rule.json'))
26 |
27 | self.input_image_key = 'images'
28 | self.score = 0.3
29 | self.iou = 0.45
30 | self.img_size = 416
31 | self.classes = self._get_class()
32 | # define and load YOLOv3 model
33 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
34 | self.model = Darknet(self.model_def, img_size=self.img_size).to(device)
35 | if self.model_path.endswith(".weights"):
36 | # Load darknet weights
37 | self.model.load_darknet_weights(self.model_path)
38 | else:
39 | # Load checkpoint weights
40 | self.model.load_state_dict(torch.load(self.model_path, map_location='cpu'))
41 | print('load weights file success')
42 | self.model.eval()
43 |
44 | def _get_class(self):
45 | classes_path = os.path.expanduser(self.classes_path)
46 | with codecs.open(classes_path, 'r', 'utf-8') as f:
47 | class_names = f.readlines()
48 | class_names = [c.strip() for c in class_names]
49 | return class_names
50 |
51 | def _preprocess(self, data):
52 | preprocessed_data = {}
53 | for k, v in data.items():
54 | for file_name, file_content in v.items():
55 | img = Image.open(file_content)
56 | # store image size (height, width)
57 | shape = (img.size[1], img.size[0])
58 | # convert to tensor
59 | img = transforms.ToTensor()(img)
60 | # Pad to square resolution
61 | img, _ = pad_to_square(img, 0)
62 | # Resize
63 | img = resize(img, 416)
64 | # unsqueeze
65 | img = img.unsqueeze(0)
66 |
67 | preprocessed_data[k] = [img, shape]
68 | return preprocessed_data
69 |
70 | def _inference(self, data):
71 | """
72 | model inference function
73 | Here are a inference example of resnet, if you use another model, please modify this function
74 | """
75 | img, shape = data[self.input_image_key]
76 |
77 | Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor
78 | input_imgs = Variable(img.type(Tensor))
79 |
80 | # Get detections
81 | with torch.no_grad():
82 | detections = self.model(input_imgs)
83 | detections = non_max_suppression(detections, self.score, self.iou)
84 |
85 | result = OrderedDict()
86 | if detections[0] is not None:
87 | detections = rescale_boxes(detections[0], self.img_size, shape)
88 | detections = detections.numpy().tolist()
89 | out_classes = [x[6] for x in detections]
90 | out_scores = [x[5] for x in detections]
91 | out_boxes = [x[:4] for x in detections]
92 |
93 | detection_class_names = []
94 | for class_id in out_classes:
95 | class_name = self.classes[int(class_id)]
96 | class_name = self.label_map[class_name] + '/' + class_name
97 | detection_class_names.append(class_name)
98 | out_boxes_list = []
99 | for box in out_boxes:
100 | out_boxes_list.append([round(float(v), 1) for v in box])
101 | result['detection_classes'] = detection_class_names
102 | result['detection_scores'] = [round(float(v), 4) for v in out_scores]
103 | result['detection_boxes'] = out_boxes_list
104 | else:
105 | result['detection_classes'] = []
106 | result['detection_scores'] = []
107 | result['detection_boxes'] = []
108 |
109 | return result
110 |
111 | def _postprocess(self, data):
112 | return data
113 |
114 | def inference(self, data):
115 | '''
116 | Wrapper function to run preprocess, inference and postprocess functions.
117 |
118 | Parameters
119 | ----------
120 | data : map of object
121 | Raw input from request.
122 |
123 | Returns
124 | -------
125 | list of outputs to be sent back to client.
126 | data to be sent back
127 | '''
128 | pre_start_time = time.time()
129 | data = self._preprocess(data)
130 | infer_start_time = time.time()
131 | # Update preprocess latency metric
132 | pre_time_in_ms = (infer_start_time - pre_start_time) * 1000
133 | logger.info('preprocess time: ' + str(pre_time_in_ms) + 'ms')
134 |
135 | if self.model_name + '_LatencyPreprocess' in MetricsManager.metrics:
136 | MetricsManager.metrics[self.model_name + '_LatencyPreprocess'].update(pre_time_in_ms)
137 |
138 | data = self._inference(data)
139 | infer_end_time = time.time()
140 | infer_in_ms = (infer_end_time - infer_start_time) * 1000
141 |
142 | logger.info('infer time: ' + str(infer_in_ms) + 'ms')
143 | data = self._postprocess(data)
144 |
145 | # Update inference latency metric
146 | post_time_in_ms = (time.time() - infer_end_time) * 1000
147 | logger.info('postprocess time: ' + str(post_time_in_ms) + 'ms')
148 | if self.model_name + '_LatencyInference' in MetricsManager.metrics:
149 | MetricsManager.metrics[self.model_name + '_LatencyInference'].update(post_time_in_ms)
150 |
151 | # Update overall latency metric
152 | if self.model_name + '_LatencyOverall' in MetricsManager.metrics:
153 | MetricsManager.metrics[self.model_name + '_LatencyOverall'].update(pre_time_in_ms + post_time_in_ms)
154 |
155 | logger.info('latency: ' + str(pre_time_in_ms + infer_in_ms + post_time_in_ms) + 'ms')
156 | data['latency_time'] = str(round(pre_time_in_ms + infer_in_ms + post_time_in_ms, 1)) + ' ms'
157 | return data
158 |
159 |
160 | def parse_classify_rule(json_path=''):
161 | with codecs.open(json_path, 'r', 'utf-8') as f:
162 | rule = json.load(f)
163 | label_map = {}
164 | for super_label, labels in rule.items():
165 | for label in labels:
166 | label_map[label] = super_label
167 | return label_map
168 |
--------------------------------------------------------------------------------
/detect.py:
--------------------------------------------------------------------------------
1 | from __future__ import division
2 |
3 | from models import *
4 | from my_utils.utils import *
5 | from my_utils.datasets import *
6 |
7 | import os
8 | import sys
9 | import time
10 | import datetime
11 | import argparse
12 |
13 | from PIL import Image
14 |
15 | import torch
16 | from torch.utils.data import DataLoader
17 | from torchvision import datasets
18 | from torch.autograd import Variable
19 |
20 | import matplotlib.pyplot as plt
21 | import matplotlib.patches as patches
22 | from matplotlib.ticker import NullLocator
23 |
24 | if __name__ == "__main__":
25 | parser = argparse.ArgumentParser()
26 | parser.add_argument("--image_folder", type=str, default="data/samples", help="path to dataset")
27 | parser.add_argument("--model_def", type=str, default="config/yolov3.cfg", help="path to model definition file")
28 | parser.add_argument("--weights_path", type=str, default="weights/yolov3.weights", help="path to weights file")
29 | parser.add_argument("--class_path", type=str, default="data/coco.names", help="path to class label file")
30 | parser.add_argument("--conf_thres", type=float, default=0.8, help="object confidence threshold")
31 | parser.add_argument("--nms_thres", type=float, default=0.4, help="iou thresshold for non-maximum suppression")
32 | parser.add_argument("--batch_size", type=int, default=1, help="size of the batches")
33 | parser.add_argument("--n_cpu", type=int, default=0, help="number of cpu threads to use during batch generation")
34 | parser.add_argument("--img_size", type=int, default=416, help="size of each image dimension")
35 | parser.add_argument("--checkpoint_model", type=str, help="path to checkpoint model")
36 | opt = parser.parse_args()
37 | print(opt)
38 |
39 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
40 |
41 | os.makedirs("output", exist_ok=True)
42 |
43 | # Set up model
44 | model = Darknet(opt.model_def, img_size=opt.img_size).to(device)
45 |
46 | if opt.weights_path.endswith(".weights"):
47 | # Load darknet weights
48 | model.load_darknet_weights(opt.weights_path)
49 | else:
50 | # Load checkpoint weights
51 | model.load_state_dict(torch.load(opt.weights_path))
52 |
53 | model.eval() # Set in evaluation mode
54 |
55 | dataloader = DataLoader(
56 | ImageFolder(opt.image_folder, img_size=opt.img_size),
57 | batch_size=opt.batch_size,
58 | shuffle=False,
59 | num_workers=opt.n_cpu,
60 | )
61 |
62 | classes = load_classes(opt.class_path) # Extracts class labels from file
63 |
64 | Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor
65 |
66 | imgs = [] # Stores image paths
67 | img_detections = [] # Stores detections for each image index
68 |
69 | print("\nPerforming object detection:")
70 | prev_time = time.time()
71 | for batch_i, (img_paths, input_imgs) in enumerate(dataloader):
72 | # Configure input
73 | input_imgs = Variable(input_imgs.type(Tensor))
74 |
75 | # Get detections
76 | with torch.no_grad():
77 | detections = model(input_imgs)
78 | detections = non_max_suppression(detections, opt.conf_thres, opt.nms_thres)
79 |
80 | # Log progress
81 | current_time = time.time()
82 | inference_time = datetime.timedelta(seconds=current_time - prev_time)
83 | prev_time = current_time
84 | print("\t+ Batch %d, Inference Time: %s" % (batch_i, inference_time))
85 |
86 | # Save image and detections
87 | imgs.extend(img_paths)
88 | img_detections.extend(detections)
89 |
90 | # Bounding-box colors
91 | cmap = plt.get_cmap("tab20b")
92 | colors = [cmap(i) for i in np.linspace(0, 1, 20)]
93 |
94 | print("\nSaving images:")
95 | # Iterate through images and save plot of detections
96 | for img_i, (path, detections) in enumerate(zip(imgs, img_detections)):
97 |
98 | print("(%d) Image: '%s'" % (img_i, path))
99 |
100 | # Create plot
101 | img = np.array(Image.open(path))
102 | plt.figure()
103 | fig, ax = plt.subplots(1)
104 | ax.imshow(img)
105 |
106 | # Draw bounding boxes and labels of detections
107 | if detections is not None:
108 | # Rescale boxes to original image
109 | detections = rescale_boxes(detections, opt.img_size, img.shape[:2])
110 | unique_labels = detections[:, -1].cpu().unique()
111 | n_cls_preds = len(unique_labels)
112 | bbox_colors = random.sample(colors, n_cls_preds)
113 | for x1, y1, x2, y2, conf, cls_conf, cls_pred in detections:
114 |
115 | print("\t+ Label: %s, Conf: %.5f" % (classes[int(cls_pred)], cls_conf.item()))
116 |
117 | box_w = x2 - x1
118 | box_h = y2 - y1
119 |
120 | color = bbox_colors[int(np.where(unique_labels == int(cls_pred))[0])]
121 | # Create a Rectangle patch
122 | bbox = patches.Rectangle((x1, y1), box_w, box_h, linewidth=2, edgecolor=color, facecolor="none")
123 | # Add the bbox to the plot
124 | ax.add_patch(bbox)
125 | # Add label
126 | plt.text(
127 | x1,
128 | y1,
129 | s=classes[int(cls_pred)],
130 | color="white",
131 | verticalalignment="top",
132 | bbox={"color": color, "pad": 0},
133 | )
134 |
135 | # Save generated image with detections
136 | plt.axis("off")
137 | plt.gca().xaxis.set_major_locator(NullLocator())
138 | plt.gca().yaxis.set_major_locator(NullLocator())
139 | filename = path.split("/")[-1].split(".")[0]
140 | plt.savefig(f"output/{filename}.png", bbox_inches="tight", pad_inches=0.0)
141 | plt.close()
142 |
--------------------------------------------------------------------------------
/models.py:
--------------------------------------------------------------------------------
1 | from __future__ import division
2 |
3 | import torch
4 | import torch.nn as nn
5 | import torch.nn.functional as F
6 | from torch.autograd import Variable
7 | import numpy as np
8 |
9 | from my_utils.parse_config import *
10 | from my_utils.utils import build_targets, to_cpu, non_max_suppression
11 |
12 |
13 | def create_modules(module_defs):
14 | """
15 | Constructs module list of layer blocks from module configuration in module_defs
16 | """
17 | hyperparams = module_defs.pop(0)
18 | output_filters = [int(hyperparams["channels"])]
19 | module_list = nn.ModuleList()
20 | for module_i, module_def in enumerate(module_defs):
21 | modules = nn.Sequential()
22 |
23 | if module_def["type"] == "convolutional":
24 | bn = int(module_def["batch_normalize"])
25 | filters = int(module_def["filters"])
26 | kernel_size = int(module_def["size"])
27 | pad = (kernel_size - 1) // 2
28 | modules.add_module(
29 | f"conv_{module_i}",
30 | nn.Conv2d(
31 | in_channels=output_filters[-1],
32 | out_channels=filters,
33 | kernel_size=kernel_size,
34 | stride=int(module_def["stride"]),
35 | padding=pad,
36 | bias=not bn,
37 | ),
38 | )
39 | if bn:
40 | modules.add_module(f"batch_norm_{module_i}", nn.BatchNorm2d(filters, momentum=0.9, eps=1e-5))
41 | if module_def["activation"] == "leaky":
42 | modules.add_module(f"leaky_{module_i}", nn.LeakyReLU(0.1))
43 |
44 | elif module_def["type"] == "maxpool":
45 | kernel_size = int(module_def["size"])
46 | stride = int(module_def["stride"])
47 | if kernel_size == 2 and stride == 1:
48 | modules.add_module(f"_debug_padding_{module_i}", nn.ZeroPad2d((0, 1, 0, 1)))
49 | maxpool = nn.MaxPool2d(kernel_size=kernel_size, stride=stride, padding=int((kernel_size - 1) // 2))
50 | modules.add_module(f"maxpool_{module_i}", maxpool)
51 |
52 | elif module_def["type"] == "upsample":
53 | upsample = Upsample(scale_factor=int(module_def["stride"]), mode="nearest")
54 | modules.add_module(f"upsample_{module_i}", upsample)
55 |
56 | elif module_def["type"] == "route":
57 | layers = [int(x) for x in module_def["layers"].split(",")]
58 | filters = sum([output_filters[1:][i] for i in layers])
59 | modules.add_module(f"route_{module_i}", EmptyLayer())
60 |
61 | elif module_def["type"] == "shortcut":
62 | filters = output_filters[1:][int(module_def["from"])]
63 | modules.add_module(f"shortcut_{module_i}", EmptyLayer())
64 |
65 | elif module_def["type"] == "yolo":
66 | anchor_idxs = [int(x) for x in module_def["mask"].split(",")]
67 | # Extract anchors
68 | anchors = [int(x) for x in module_def["anchors"].split(",")]
69 | anchors = [(anchors[i], anchors[i + 1]) for i in range(0, len(anchors), 2)]
70 | anchors = [anchors[i] for i in anchor_idxs]
71 | num_classes = int(module_def["classes"])
72 | img_size = int(hyperparams["height"])
73 | # Define detection layer
74 | yolo_layer = YOLOLayer(anchors, num_classes, img_size)
75 | modules.add_module(f"yolo_{module_i}", yolo_layer)
76 | # Register module list and number of output filters
77 | module_list.append(modules)
78 | output_filters.append(filters)
79 |
80 | return hyperparams, module_list
81 |
82 |
83 | class Upsample(nn.Module):
84 | """ nn.Upsample is deprecated """
85 |
86 | def __init__(self, scale_factor, mode="nearest"):
87 | super(Upsample, self).__init__()
88 | self.scale_factor = scale_factor
89 | self.mode = mode
90 |
91 | def forward(self, x):
92 | x = F.interpolate(x, scale_factor=self.scale_factor, mode=self.mode)
93 | return x
94 |
95 |
96 | class EmptyLayer(nn.Module):
97 | """Placeholder for 'route' and 'shortcut' layers"""
98 |
99 | def __init__(self):
100 | super(EmptyLayer, self).__init__()
101 |
102 |
103 | class YOLOLayer(nn.Module):
104 | """Detection layer"""
105 |
106 | def __init__(self, anchors, num_classes, img_dim=416):
107 | super(YOLOLayer, self).__init__()
108 | self.anchors = anchors
109 | self.num_anchors = len(anchors)
110 | self.num_classes = num_classes
111 | self.ignore_thres = 0.5
112 | self.mse_loss = nn.MSELoss()
113 | self.bce_loss = nn.BCELoss()
114 | self.obj_scale = 1
115 | self.noobj_scale = 100
116 | self.metrics = {}
117 | self.img_dim = img_dim
118 | self.grid_size = 0 # grid size
119 |
120 | def compute_grid_offsets(self, grid_size, cuda=True):
121 | self.grid_size = grid_size
122 | g = self.grid_size
123 | FloatTensor = torch.cuda.FloatTensor if cuda else torch.FloatTensor
124 | self.stride = self.img_dim / self.grid_size
125 | # Calculate offsets for each grid
126 | self.grid_x = torch.arange(g).repeat(g, 1).view([1, 1, g, g]).type(FloatTensor)
127 | self.grid_y = torch.arange(g).repeat(g, 1).t().view([1, 1, g, g]).type(FloatTensor)
128 | self.scaled_anchors = FloatTensor([(a_w / self.stride, a_h / self.stride) for a_w, a_h in self.anchors])
129 | self.anchor_w = self.scaled_anchors[:, 0:1].view((1, self.num_anchors, 1, 1))
130 | self.anchor_h = self.scaled_anchors[:, 1:2].view((1, self.num_anchors, 1, 1))
131 |
132 | def forward(self, x, targets=None, img_dim=None):
133 |
134 | # Tensors for cuda support
135 | FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
136 | LongTensor = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor
137 | ByteTensor = torch.cuda.ByteTensor if x.is_cuda else torch.ByteTensor
138 |
139 | self.img_dim = img_dim
140 | num_samples = x.size(0)
141 | grid_size = x.size(2)
142 |
143 | prediction = (
144 | x.view(num_samples, self.num_anchors, self.num_classes + 5, grid_size, grid_size)
145 | .permute(0, 1, 3, 4, 2)
146 | .contiguous()
147 | )
148 |
149 | # Get outputs
150 | x = torch.sigmoid(prediction[..., 0]) # Center x
151 | y = torch.sigmoid(prediction[..., 1]) # Center y
152 | w = prediction[..., 2] # Width
153 | h = prediction[..., 3] # Height
154 | pred_conf = torch.sigmoid(prediction[..., 4]) # Conf
155 | pred_cls = torch.sigmoid(prediction[..., 5:]) # Cls pred.
156 |
157 | # If grid size does not match current we compute new offsets
158 | if grid_size != self.grid_size:
159 | self.compute_grid_offsets(grid_size, cuda=x.is_cuda)
160 |
161 | # Add offset and scale with anchors
162 | pred_boxes = FloatTensor(prediction[..., :4].shape)
163 | pred_boxes[..., 0] = x.data + self.grid_x
164 | pred_boxes[..., 1] = y.data + self.grid_y
165 | pred_boxes[..., 2] = torch.exp(w.data) * self.anchor_w
166 | pred_boxes[..., 3] = torch.exp(h.data) * self.anchor_h
167 |
168 | output = torch.cat(
169 | (
170 | pred_boxes.view(num_samples, -1, 4) * self.stride,
171 | pred_conf.view(num_samples, -1, 1),
172 | pred_cls.view(num_samples, -1, self.num_classes),
173 | ),
174 | -1,
175 | )
176 |
177 | if targets is None:
178 | return output, 0
179 | else:
180 | iou_scores, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tcls, tconf = build_targets(
181 | pred_boxes=pred_boxes,
182 | pred_cls=pred_cls,
183 | target=targets,
184 | anchors=self.scaled_anchors,
185 | ignore_thres=self.ignore_thres,
186 | )
187 |
188 | obj_mask = obj_mask.bool() # convert int8 to bool
189 | noobj_mask = noobj_mask.bool() # convert int8 to bool
190 |
191 | # Loss : Mask outputs to ignore non-existing objects (except with conf. loss)
192 | loss_x = self.mse_loss(x[obj_mask], tx[obj_mask])
193 | loss_y = self.mse_loss(y[obj_mask], ty[obj_mask])
194 | loss_w = self.mse_loss(w[obj_mask], tw[obj_mask])
195 | loss_h = self.mse_loss(h[obj_mask], th[obj_mask])
196 | loss_conf_obj = self.bce_loss(pred_conf[obj_mask], tconf[obj_mask])
197 | loss_conf_noobj = self.bce_loss(pred_conf[noobj_mask], tconf[noobj_mask])
198 | loss_conf = self.obj_scale * loss_conf_obj + self.noobj_scale * loss_conf_noobj
199 | loss_cls = self.bce_loss(pred_cls[obj_mask], tcls[obj_mask])
200 | total_loss = loss_x + loss_y + loss_w + loss_h + loss_conf + loss_cls
201 |
202 | # Metrics
203 | cls_acc = 100 * class_mask[obj_mask].mean()
204 | conf_obj = pred_conf[obj_mask].mean()
205 | conf_noobj = pred_conf[noobj_mask].mean()
206 | conf50 = (pred_conf > 0.5).float()
207 | iou50 = (iou_scores > 0.5).float()
208 | iou75 = (iou_scores > 0.75).float()
209 | detected_mask = conf50 * class_mask * tconf
210 | precision = torch.sum(iou50 * detected_mask) / (conf50.sum() + 1e-16)
211 | recall50 = torch.sum(iou50 * detected_mask) / (obj_mask.sum() + 1e-16)
212 | recall75 = torch.sum(iou75 * detected_mask) / (obj_mask.sum() + 1e-16)
213 |
214 | self.metrics = {
215 | "loss": to_cpu(total_loss).item(),
216 | "x": to_cpu(loss_x).item(),
217 | "y": to_cpu(loss_y).item(),
218 | "w": to_cpu(loss_w).item(),
219 | "h": to_cpu(loss_h).item(),
220 | "conf": to_cpu(loss_conf).item(),
221 | "cls": to_cpu(loss_cls).item(),
222 | "cls_acc": to_cpu(cls_acc).item(),
223 | "recall50": to_cpu(recall50).item(),
224 | "recall75": to_cpu(recall75).item(),
225 | "precision": to_cpu(precision).item(),
226 | "conf_obj": to_cpu(conf_obj).item(),
227 | "conf_noobj": to_cpu(conf_noobj).item(),
228 | "grid_size": grid_size,
229 | }
230 |
231 | return output, total_loss
232 |
233 |
234 | class Darknet(nn.Module):
235 | """YOLOv3 object detection model"""
236 |
237 | def __init__(self, config_path, img_size=416):
238 | super(Darknet, self).__init__()
239 | self.module_defs = parse_model_config(config_path)
240 | self.hyperparams, self.module_list = create_modules(self.module_defs)
241 | self.yolo_layers = [layer[0] for layer in self.module_list if hasattr(layer[0], "metrics")]
242 | self.img_size = img_size
243 | self.seen = 0
244 | self.header_info = np.array([0, 0, 0, self.seen, 0], dtype=np.int32)
245 |
246 | def forward(self, x, targets=None):
247 | img_dim = x.shape[2]
248 | loss = 0
249 | layer_outputs, yolo_outputs = [], []
250 | for i, (module_def, module) in enumerate(zip(self.module_defs, self.module_list)):
251 | if module_def["type"] in ["convolutional", "upsample", "maxpool"]:
252 | x = module(x)
253 | elif module_def["type"] == "route":
254 | x = torch.cat([layer_outputs[int(layer_i)] for layer_i in module_def["layers"].split(",")], 1)
255 | elif module_def["type"] == "shortcut":
256 | layer_i = int(module_def["from"])
257 | x = layer_outputs[-1] + layer_outputs[layer_i]
258 | elif module_def["type"] == "yolo":
259 | x, layer_loss = module[0](x, targets, img_dim)
260 | loss += layer_loss
261 | yolo_outputs.append(x)
262 | layer_outputs.append(x)
263 | yolo_outputs = to_cpu(torch.cat(yolo_outputs, 1))
264 | return yolo_outputs if targets is None else (loss, yolo_outputs)
265 |
266 | def load_darknet_weights(self, weights_path):
267 | """Parses and loads the weights stored in 'weights_path'"""
268 |
269 | # Open the weights file
270 | with open(weights_path, "rb") as f:
271 | header = np.fromfile(f, dtype=np.int32, count=5) # First five are header values
272 | self.header_info = header # Needed to write header when saving weights
273 | self.seen = header[3] # number of images seen during training
274 | weights = np.fromfile(f, dtype=np.float32) # The rest are weights
275 |
276 | # Establish cutoff for loading backbone weights
277 | cutoff = None
278 | if "darknet53.conv.74" in weights_path:
279 | cutoff = 75
280 |
281 | ptr = 0
282 | for i, (module_def, module) in enumerate(zip(self.module_defs, self.module_list)):
283 | if i == cutoff:
284 | break
285 | if module_def["type"] == "convolutional":
286 | conv_layer = module[0]
287 | if module_def["batch_normalize"]:
288 | # Load BN bias, weights, running mean and running variance
289 | bn_layer = module[1]
290 | num_b = bn_layer.bias.numel() # Number of biases
291 | # Bias
292 | bn_b = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.bias)
293 | bn_layer.bias.data.copy_(bn_b)
294 | ptr += num_b
295 | # Weight
296 | bn_w = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.weight)
297 | bn_layer.weight.data.copy_(bn_w)
298 | ptr += num_b
299 | # Running Mean
300 | bn_rm = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.running_mean)
301 | bn_layer.running_mean.data.copy_(bn_rm)
302 | ptr += num_b
303 | # Running Var
304 | bn_rv = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.running_var)
305 | bn_layer.running_var.data.copy_(bn_rv)
306 | ptr += num_b
307 | else:
308 | # Load conv. bias
309 | num_b = conv_layer.bias.numel()
310 | conv_b = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(conv_layer.bias)
311 | conv_layer.bias.data.copy_(conv_b)
312 | ptr += num_b
313 | # Load conv. weights
314 | num_w = conv_layer.weight.numel()
315 | conv_w = torch.from_numpy(weights[ptr : ptr + num_w]).view_as(conv_layer.weight)
316 | conv_layer.weight.data.copy_(conv_w)
317 | ptr += num_w
318 |
319 | def save_darknet_weights(self, path, cutoff=-1):
320 | """
321 | @:param path - path of the new weights file
322 | @:param cutoff - save layers between 0 and cutoff (cutoff = -1 -> all are saved)
323 | """
324 | fp = open(path, "wb")
325 | self.header_info[3] = self.seen
326 | self.header_info.tofile(fp)
327 |
328 | # Iterate through layers
329 | for i, (module_def, module) in enumerate(zip(self.module_defs[:cutoff], self.module_list[:cutoff])):
330 | if module_def["type"] == "convolutional":
331 | conv_layer = module[0]
332 | # If batch norm, load bn first
333 | if module_def["batch_normalize"]:
334 | bn_layer = module[1]
335 | bn_layer.bias.data.cpu().numpy().tofile(fp)
336 | bn_layer.weight.data.cpu().numpy().tofile(fp)
337 | bn_layer.running_mean.data.cpu().numpy().tofile(fp)
338 | bn_layer.running_var.data.cpu().numpy().tofile(fp)
339 | # Load conv bias
340 | else:
341 | conv_layer.bias.data.cpu().numpy().tofile(fp)
342 | # Load conv weights
343 | conv_layer.weight.data.cpu().numpy().tofile(fp)
344 |
345 | fp.close()
346 |
--------------------------------------------------------------------------------
/my_utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/edwardning/PyTorch-YOLOv3-ModelArts/878bdc232da0691939d92806927ea62cc15cb282/my_utils/__init__.py
--------------------------------------------------------------------------------
/my_utils/augmentations.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn.functional as F
3 | import numpy as np
4 |
5 |
6 | def horisontal_flip(images, targets):
7 | images = torch.flip(images, [-1])
8 | targets[:, 2] = 1 - targets[:, 2]
9 | return images, targets
10 |
--------------------------------------------------------------------------------
/my_utils/datasets.py:
--------------------------------------------------------------------------------
1 | import glob
2 | import random
3 | import os
4 | import sys
5 | import numpy as np
6 | from PIL import Image
7 | import torch
8 | import torch.nn.functional as F
9 |
10 | from my_utils.augmentations import horisontal_flip
11 | from torch.utils.data import Dataset
12 | import torchvision.transforms as transforms
13 |
14 |
15 | def pad_to_square(img, pad_value):
16 | c, h, w = img.shape
17 | dim_diff = np.abs(h - w)
18 | # (upper / left) padding and (lower / right) padding
19 | pad1, pad2 = dim_diff // 2, dim_diff - dim_diff // 2
20 | # Determine padding
21 | pad = (0, 0, pad1, pad2) if h <= w else (pad1, pad2, 0, 0)
22 | # Add padding
23 | img = F.pad(img, pad, "constant", value=pad_value)
24 |
25 | return img, pad
26 |
27 |
28 | def resize(image, size):
29 | image = F.interpolate(image.unsqueeze(0), size=size, mode="nearest").squeeze(0)
30 | return image
31 |
32 |
33 | def random_resize(images, min_size=288, max_size=448):
34 | new_size = random.sample(list(range(min_size, max_size + 1, 32)), 1)[0]
35 | images = F.interpolate(images, size=new_size, mode="nearest")
36 | return images
37 |
38 |
39 | class ImageFolder(Dataset):
40 | def __init__(self, folder_path, img_size=416):
41 | self.files = sorted(glob.glob("%s/*.*" % folder_path))
42 | self.img_size = img_size
43 |
44 | def __getitem__(self, index):
45 | img_path = self.files[index % len(self.files)]
46 | # Extract image as PyTorch tensor
47 | img = transforms.ToTensor()(Image.open(img_path))
48 | # Pad to square resolution
49 | img, _ = pad_to_square(img, 0)
50 | # Resize
51 | img = resize(img, self.img_size)
52 |
53 | return img_path, img
54 |
55 | def __len__(self):
56 | return len(self.files)
57 |
58 |
59 | class ListDataset(Dataset):
60 | def __init__(self, list_path, img_size=416, augment=True, multiscale=True, normalized_labels=True):
61 | with open(list_path, "r") as file:
62 | self.img_files = file.readlines()
63 |
64 | self.label_files = [
65 | path.replace("images", "labels").replace(".png", ".txt").replace(".jpg", ".txt")
66 | for path in self.img_files
67 | ]
68 | self.img_size = img_size
69 | self.max_objects = 100
70 | self.augment = augment
71 | self.multiscale = multiscale
72 | self.normalized_labels = normalized_labels
73 | self.min_size = self.img_size - 3 * 32
74 | self.max_size = self.img_size + 3 * 32
75 | self.batch_count = 0
76 |
77 | def __getitem__(self, index):
78 |
79 | # ---------
80 | # Image
81 | # ---------
82 |
83 | img_path = self.img_files[index % len(self.img_files)].rstrip()
84 |
85 | # Extract image as PyTorch tensor
86 | img = transforms.ToTensor()(Image.open(img_path).convert('RGB'))
87 |
88 | # Handle images with less than three channels
89 | if len(img.shape) != 3:
90 | img = img.unsqueeze(0)
91 | img = img.expand((3, img.shape[1:]))
92 |
93 | _, h, w = img.shape
94 | h_factor, w_factor = (h, w) if self.normalized_labels else (1, 1)
95 | # Pad to square resolution
96 | img, pad = pad_to_square(img, 0)
97 | _, padded_h, padded_w = img.shape
98 |
99 | # ---------
100 | # Label
101 | # ---------
102 |
103 | label_path = self.label_files[index % len(self.img_files)].rstrip()
104 |
105 | targets = None
106 | if os.path.exists(label_path):
107 | boxes = torch.from_numpy(np.loadtxt(label_path).reshape(-1, 5))
108 | # Extract coordinates for unpadded + unscaled image
109 | x1 = w_factor * (boxes[:, 1] - boxes[:, 3] / 2)
110 | y1 = h_factor * (boxes[:, 2] - boxes[:, 4] / 2)
111 | x2 = w_factor * (boxes[:, 1] + boxes[:, 3] / 2)
112 | y2 = h_factor * (boxes[:, 2] + boxes[:, 4] / 2)
113 | # Adjust for added padding
114 | x1 += pad[0]
115 | y1 += pad[2]
116 | x2 += pad[1]
117 | y2 += pad[3]
118 | # Returns (x, y, w, h)
119 | boxes[:, 1] = ((x1 + x2) / 2) / padded_w
120 | boxes[:, 2] = ((y1 + y2) / 2) / padded_h
121 | boxes[:, 3] *= w_factor / padded_w
122 | boxes[:, 4] *= h_factor / padded_h
123 |
124 | targets = torch.zeros((len(boxes), 6))
125 | targets[:, 1:] = boxes
126 |
127 | # Apply augmentations
128 | if self.augment:
129 | if np.random.random() < 0.5:
130 | img, targets = horisontal_flip(img, targets)
131 |
132 | return img_path, img, targets
133 |
134 | def collate_fn(self, batch):
135 | paths, imgs, targets = list(zip(*batch))
136 | # Remove empty placeholder targets
137 | targets = [boxes for boxes in targets if boxes is not None]
138 | # Add sample index to targets
139 | for i, boxes in enumerate(targets):
140 | boxes[:, 0] = i
141 | targets = torch.cat(targets, 0)
142 | # Selects new image size every tenth batch
143 | if self.multiscale and self.batch_count % 10 == 0:
144 | self.img_size = random.choice(range(self.min_size, self.max_size + 1, 32))
145 | # Resize images to input shape
146 | imgs = torch.stack([resize(img, self.img_size) for img in imgs])
147 | self.batch_count += 1
148 | return paths, imgs, targets
149 |
150 | def __len__(self):
151 | return len(self.img_files)
152 |
--------------------------------------------------------------------------------
/my_utils/parse_config.py:
--------------------------------------------------------------------------------
1 |
2 |
3 | def parse_model_config(path):
4 | """Parses the yolo-v3 layer configuration file and returns module definitions"""
5 | file = open(path, 'r')
6 | lines = file.read().split('\n')
7 | lines = [x for x in lines if x and not x.startswith('#')]
8 | lines = [x.rstrip().lstrip() for x in lines] # get rid of fringe whitespaces
9 | module_defs = []
10 | for line in lines:
11 | if line.startswith('['): # This marks the start of a new block
12 | module_defs.append({})
13 | module_defs[-1]['type'] = line[1:-1].rstrip()
14 | if module_defs[-1]['type'] == 'convolutional':
15 | module_defs[-1]['batch_normalize'] = 0
16 | else:
17 | key, value = line.split("=")
18 | value = value.strip()
19 | module_defs[-1][key.rstrip()] = value.strip()
20 |
21 | return module_defs
22 |
23 | def parse_data_config(path):
24 | """Parses the data configuration file"""
25 | options = dict()
26 | options['gpus'] = '0,1,2,3'
27 | options['num_workers'] = '10'
28 | with open(path, 'r') as fp:
29 | lines = fp.readlines()
30 | for line in lines:
31 | line = line.strip()
32 | if line == '' or line.startswith('#'):
33 | continue
34 | key, value = line.split('=')
35 | options[key.strip()] = value.strip()
36 | return options
37 |
--------------------------------------------------------------------------------
/my_utils/prepare_datasets.py:
--------------------------------------------------------------------------------
1 | # 运行成功后会生成如下目录结构的文件夹:
2 | # trainval/
3 | # -images
4 | # -0001.jpg
5 | # -0002.jpg
6 | # -0003.jpg
7 | # -labels
8 | # -0001.txt
9 | # -0002.txt
10 | # -0003.txt
11 | # 将trainval文件夹打包并命名为trainval.zip, 上传到OBS中以备使用。
12 | import os
13 | import codecs
14 | import xml.etree.ElementTree as ET
15 | from tqdm import tqdm
16 | import shutil
17 | import argparse
18 |
19 |
20 | def get_classes(classes_path):
21 | '''loads the classes'''
22 | with codecs.open(classes_path, 'r', 'utf-8') as f:
23 | class_names = f.readlines()
24 | class_names = [c.strip() for c in class_names]
25 | return class_names
26 |
27 |
28 | def creat_label_txt(soure_datasets, new_datasets):
29 | annotations = os.path.join(soure_datasets, 'VOC2007\Annotations')
30 | txt_path = os.path.join(new_datasets, 'labels')
31 | class_names = get_classes(os.path.join(soure_datasets, 'train_classes.txt'))
32 |
33 | xmls = os.listdir(annotations)
34 | for xml in tqdm(xmls):
35 | txt_anno_path = os.path.join(txt_path, xml.replace('xml', 'txt'))
36 | xml = os.path.join(annotations, xml)
37 | tree = ET.parse(xml)
38 | root = tree.getroot()
39 |
40 | size = root.find('size')
41 | w = int(size.find('width').text)
42 | h = int(size.find('height').text)
43 | line = ''
44 | for obj in root.iter('object'):
45 | cls = obj.find('name').text
46 | if cls not in class_names:
47 | print('name error', xml)
48 | continue
49 | cls_id = class_names.index(cls)
50 | xmlbox = obj.find('bndbox')
51 | box = [int(xmlbox.find('xmin').text), int(xmlbox.find('ymin').text),
52 | int(xmlbox.find('xmax').text), int(xmlbox.find('ymax').text)]
53 | width = round((box[2] - box[0]) / w, 6)
54 | height = round((box[3] - box[1]) / h, 6)
55 | x_center = round(((box[2] + box[0]) / 2) / w, 6)
56 | y_center = round(((box[3] + box[1]) / 2) / h, 6)
57 | line = line + str(cls_id) + ' ' + ' '.join(str(v) for v in [x_center, y_center, width, height])+'\n'
58 | if box[2] > w or box[3] > h:
59 | print('Image with annotation error:', xml)
60 | if box[0] < 0 or box[1] < 0:
61 | print('Image with annotation error:', xml)
62 | with open(txt_anno_path, 'w') as f:
63 | f.writelines(line)
64 |
65 |
66 | def creat_new_datasets(source_datasets, new_datasets):
67 | if not os.path.exists(source_datasets):
68 | print('could find source datasets, please make sure if it is exist')
69 | return
70 |
71 | if new_datasets.endswith('trainval'):
72 | if not os.path.exists(new_datasets):
73 | os.makedirs(new_datasets)
74 | os.makedirs(new_datasets + '\labels')
75 | print('copying images......')
76 | shutil.copytree(source_datasets + '\VOC2007\JPEGImages', new_datasets + '\images')
77 | else:
78 | print('最后一级目录必须为trainval,且为空文件夹')
79 | return
80 | print('creating txt labels:')
81 | creat_label_txt(source_datasets, new_datasets)
82 | return
83 |
84 |
85 | if __name__ == "__main__":
86 | parser = argparse.ArgumentParser()
87 | parser.add_argument("--soure_datasets", "-sd", type=str, help="SODiC官方原始数据集解压后目录")
88 | parser.add_argument("--new_datasets", "-nd", type=str, help="新数据集路径,以trainval结尾且为空文件夹")
89 | opt = parser.parse_args()
90 | # creat_new_datasets(opt.soure_datasets, opt.new_datasets)
91 |
92 | soure_datasets = r'D:\trainval'
93 | new_datasets = r'D:\SODiC\trainval'
94 | creat_new_datasets(soure_datasets, new_datasets)
95 |
--------------------------------------------------------------------------------
/my_utils/utils.py:
--------------------------------------------------------------------------------
1 | from __future__ import division
2 | import os
3 | import codecs
4 | import math
5 | import time
6 | import tqdm
7 | import torch
8 | import torch.nn as nn
9 | import torch.nn.functional as F
10 | from torch.autograd import Variable
11 | import numpy as np
12 |
13 |
14 | def to_cpu(tensor):
15 | return tensor.detach().cpu()
16 |
17 |
18 | def load_classes(classes_path):
19 | """
20 | Loads class labels at 'path'
21 | """
22 | classes_path = os.path.expanduser(classes_path)
23 | with codecs.open(classes_path, 'r', 'utf-8') as f:
24 | class_names = f.readlines()
25 | class_names = [c.strip() for c in class_names]
26 | return class_names
27 |
28 |
29 | def weights_init_normal(m):
30 | classname = m.__class__.__name__
31 | if classname.find("Conv") != -1:
32 | torch.nn.init.normal_(m.weight.data, 0.0, 0.02)
33 | elif classname.find("BatchNorm2d") != -1:
34 | torch.nn.init.normal_(m.weight.data, 1.0, 0.02)
35 | torch.nn.init.constant_(m.bias.data, 0.0)
36 |
37 |
38 | def rescale_boxes(boxes, current_dim, original_shape):
39 | """ Rescales bounding boxes to the original shape """
40 | orig_h, orig_w = original_shape
41 | # The amount of padding that was added
42 | pad_x = max(orig_h - orig_w, 0) * (current_dim / max(original_shape))
43 | pad_y = max(orig_w - orig_h, 0) * (current_dim / max(original_shape))
44 | # Image height and width after padding is removed
45 | unpad_h = current_dim - pad_y
46 | unpad_w = current_dim - pad_x
47 | # Rescale bounding boxes to dimension of original image
48 | boxes[:, 0] = ((boxes[:, 0] - pad_x // 2) / unpad_w) * orig_w
49 | boxes[:, 1] = ((boxes[:, 1] - pad_y // 2) / unpad_h) * orig_h
50 | boxes[:, 2] = ((boxes[:, 2] - pad_x // 2) / unpad_w) * orig_w
51 | boxes[:, 3] = ((boxes[:, 3] - pad_y // 2) / unpad_h) * orig_h
52 | return boxes
53 |
54 |
55 | def xywh2xyxy(x):
56 | y = x.new(x.shape)
57 | y[..., 0] = x[..., 0] - x[..., 2] / 2
58 | y[..., 1] = x[..., 1] - x[..., 3] / 2
59 | y[..., 2] = x[..., 0] + x[..., 2] / 2
60 | y[..., 3] = x[..., 1] + x[..., 3] / 2
61 | return y
62 |
63 |
64 | def ap_per_class(tp, conf, pred_cls, target_cls):
65 | """ Compute the average precision, given the recall and precision curves.
66 | Source: https://github.com/rafaelpadilla/Object-Detection-Metrics.
67 | # Arguments
68 | tp: True positives (list).
69 | conf: Objectness value from 0-1 (list).
70 | pred_cls: Predicted object classes (list).
71 | target_cls: True object classes (list).
72 | # Returns
73 | The average precision as computed in py-faster-rcnn.
74 | """
75 |
76 | # Sort by objectness
77 | i = np.argsort(-conf)
78 | tp, conf, pred_cls = tp[i], conf[i], pred_cls[i]
79 |
80 | # Find unique classes
81 | unique_classes = np.unique(target_cls)
82 |
83 | # Create Precision-Recall curve and compute AP for each class
84 | ap, p, r = [], [], []
85 | for c in tqdm.tqdm(unique_classes, desc="Computing AP"):
86 | i = pred_cls == c
87 | n_gt = (target_cls == c).sum() # Number of ground truth objects
88 | n_p = i.sum() # Number of predicted objects
89 |
90 | if n_p == 0 and n_gt == 0:
91 | continue
92 | elif n_p == 0 or n_gt == 0:
93 | ap.append(0)
94 | r.append(0)
95 | p.append(0)
96 | else:
97 | # Accumulate FPs and TPs
98 | fpc = (1 - tp[i]).cumsum()
99 | tpc = (tp[i]).cumsum()
100 |
101 | # Recall
102 | recall_curve = tpc / (n_gt + 1e-16)
103 | r.append(recall_curve[-1])
104 |
105 | # Precision
106 | precision_curve = tpc / (tpc + fpc)
107 | p.append(precision_curve[-1])
108 |
109 | # AP from recall-precision curve
110 | ap.append(compute_ap(recall_curve, precision_curve))
111 |
112 | # Compute F1 score (harmonic mean of precision and recall)
113 | p, r, ap = np.array(p), np.array(r), np.array(ap)
114 | f1 = 2 * p * r / (p + r + 1e-16)
115 |
116 | return p, r, ap, f1, unique_classes.astype("int32")
117 |
118 |
119 | def compute_ap(recall, precision):
120 | """ Compute the average precision, given the recall and precision curves.
121 | Code originally from https://github.com/rbgirshick/py-faster-rcnn.
122 |
123 | # Arguments
124 | recall: The recall curve (list).
125 | precision: The precision curve (list).
126 | # Returns
127 | The average precision as computed in py-faster-rcnn.
128 | """
129 | # correct AP calculation
130 | # first append sentinel values at the end
131 | mrec = np.concatenate(([0.0], recall, [1.0]))
132 | mpre = np.concatenate(([0.0], precision, [0.0]))
133 |
134 | # compute the precision envelope
135 | for i in range(mpre.size - 1, 0, -1):
136 | mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
137 |
138 | # to calculate area under PR curve, look for points
139 | # where X axis (recall) changes value
140 | i = np.where(mrec[1:] != mrec[:-1])[0]
141 |
142 | # and sum (\Delta recall) * prec
143 | ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
144 | return ap
145 |
146 |
147 | def get_batch_statistics(outputs, targets, iou_threshold):
148 | """ Compute true positives, predicted scores and predicted labels per sample """
149 | batch_metrics = []
150 | for sample_i in range(len(outputs)):
151 |
152 | if outputs[sample_i] is None:
153 | continue
154 |
155 | output = outputs[sample_i]
156 | pred_boxes = output[:, :4]
157 | pred_scores = output[:, 4]
158 | pred_labels = output[:, -1]
159 |
160 | true_positives = np.zeros(pred_boxes.shape[0])
161 |
162 | annotations = targets[targets[:, 0] == sample_i][:, 1:]
163 | target_labels = annotations[:, 0] if len(annotations) else []
164 | if len(annotations):
165 | detected_boxes = []
166 | target_boxes = annotations[:, 1:]
167 |
168 | for pred_i, (pred_box, pred_label) in enumerate(zip(pred_boxes, pred_labels)):
169 |
170 | # If targets are found break
171 | if len(detected_boxes) == len(annotations):
172 | break
173 |
174 | # Ignore if label is not one of the target labels
175 | if pred_label not in target_labels:
176 | continue
177 |
178 | iou, box_index = bbox_iou(pred_box.unsqueeze(0), target_boxes).max(0)
179 | if iou >= iou_threshold and box_index not in detected_boxes:
180 | true_positives[pred_i] = 1
181 | detected_boxes += [box_index]
182 | batch_metrics.append([true_positives, pred_scores, pred_labels])
183 | return batch_metrics
184 |
185 |
186 | def bbox_wh_iou(wh1, wh2):
187 | wh2 = wh2.t()
188 | w1, h1 = wh1[0], wh1[1]
189 | w2, h2 = wh2[0], wh2[1]
190 | inter_area = torch.min(w1, w2) * torch.min(h1, h2)
191 | union_area = (w1 * h1 + 1e-16) + w2 * h2 - inter_area
192 | return inter_area / union_area
193 |
194 |
195 | def bbox_iou(box1, box2, x1y1x2y2=True):
196 | """
197 | Returns the IoU of two bounding boxes
198 | """
199 | if not x1y1x2y2:
200 | # Transform from center and width to exact coordinates
201 | b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2
202 | b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2
203 | b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2
204 | b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2
205 | else:
206 | # Get the coordinates of bounding boxes
207 | b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3]
208 | b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3]
209 |
210 | # get the corrdinates of the intersection rectangle
211 | inter_rect_x1 = torch.max(b1_x1, b2_x1)
212 | inter_rect_y1 = torch.max(b1_y1, b2_y1)
213 | inter_rect_x2 = torch.min(b1_x2, b2_x2)
214 | inter_rect_y2 = torch.min(b1_y2, b2_y2)
215 | # Intersection area
216 | inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1, min=0) * torch.clamp(
217 | inter_rect_y2 - inter_rect_y1 + 1, min=0
218 | )
219 | # Union Area
220 | b1_area = (b1_x2 - b1_x1 + 1) * (b1_y2 - b1_y1 + 1)
221 | b2_area = (b2_x2 - b2_x1 + 1) * (b2_y2 - b2_y1 + 1)
222 |
223 | iou = inter_area / (b1_area + b2_area - inter_area + 1e-16)
224 |
225 | return iou
226 |
227 |
228 | def non_max_suppression(prediction, conf_thres=0.5, nms_thres=0.4):
229 | """
230 | Removes detections with lower object confidence score than 'conf_thres' and performs
231 | Non-Maximum Suppression to further filter detections.
232 | Returns detections with shape:
233 | (x1, y1, x2, y2, object_conf, class_score, class_pred)
234 | """
235 |
236 | # From (center x, center y, width, height) to (x1, y1, x2, y2)
237 | prediction[..., :4] = xywh2xyxy(prediction[..., :4])
238 | output = [None for _ in range(len(prediction))]
239 | for image_i, image_pred in enumerate(prediction):
240 | # Filter out confidence scores below threshold
241 | image_pred = image_pred[image_pred[:, 4] >= conf_thres]
242 | # If none are remaining => process next image
243 | if not image_pred.size(0):
244 | continue
245 | # Object confidence times class confidence
246 | score = image_pred[:, 4] * image_pred[:, 5:].max(1)[0]
247 | # Sort by it
248 | image_pred = image_pred[(-score).argsort()]
249 | class_confs, class_preds = image_pred[:, 5:].max(1, keepdim=True)
250 | detections = torch.cat((image_pred[:, :5], class_confs.float(), class_preds.float()), 1)
251 | # Perform non-maximum suppression
252 | keep_boxes = []
253 | while detections.size(0):
254 | large_overlap = bbox_iou(detections[0, :4].unsqueeze(0), detections[:, :4]) > nms_thres
255 | label_match = detections[0, -1] == detections[:, -1]
256 | # Indices of boxes with lower confidence scores, large IOUs and matching labels
257 | invalid = large_overlap & label_match
258 | weights = detections[invalid, 4:5]
259 | # Merge overlapping bboxes by order of confidence
260 | detections[0, :4] = (weights * detections[invalid, :4]).sum(0) / weights.sum()
261 | keep_boxes += [detections[0]]
262 | detections = detections[~invalid]
263 | if keep_boxes:
264 | output[image_i] = torch.stack(keep_boxes)
265 |
266 | return output
267 |
268 |
269 | def build_targets(pred_boxes, pred_cls, target, anchors, ignore_thres):
270 |
271 | ByteTensor = torch.cuda.ByteTensor if pred_boxes.is_cuda else torch.ByteTensor
272 | FloatTensor = torch.cuda.FloatTensor if pred_boxes.is_cuda else torch.FloatTensor
273 |
274 | nB = pred_boxes.size(0)
275 | nA = pred_boxes.size(1)
276 | nC = pred_cls.size(-1)
277 | nG = pred_boxes.size(2)
278 |
279 | # Output tensors
280 | obj_mask = ByteTensor(nB, nA, nG, nG).fill_(0)
281 | noobj_mask = ByteTensor(nB, nA, nG, nG).fill_(1)
282 | class_mask = FloatTensor(nB, nA, nG, nG).fill_(0)
283 | iou_scores = FloatTensor(nB, nA, nG, nG).fill_(0)
284 | tx = FloatTensor(nB, nA, nG, nG).fill_(0)
285 | ty = FloatTensor(nB, nA, nG, nG).fill_(0)
286 | tw = FloatTensor(nB, nA, nG, nG).fill_(0)
287 | th = FloatTensor(nB, nA, nG, nG).fill_(0)
288 | tcls = FloatTensor(nB, nA, nG, nG, nC).fill_(0)
289 |
290 | # Convert to position relative to box
291 | target_boxes = target[:, 2:6] * nG
292 | gxy = target_boxes[:, :2]
293 | gwh = target_boxes[:, 2:]
294 | # Get anchors with best iou
295 | ious = torch.stack([bbox_wh_iou(anchor, gwh) for anchor in anchors])
296 | best_ious, best_n = ious.max(0)
297 | # Separate target values
298 | b, target_labels = target[:, :2].long().t()
299 | gx, gy = gxy.t()
300 | gw, gh = gwh.t()
301 | gi, gj = gxy.long().t()
302 | # Set masks
303 | obj_mask[b, best_n, gj, gi] = 1
304 | noobj_mask[b, best_n, gj, gi] = 0
305 |
306 | # Set noobj mask to zero where iou exceeds ignore threshold
307 | for i, anchor_ious in enumerate(ious.t()):
308 | noobj_mask[b[i], anchor_ious > ignore_thres, gj[i], gi[i]] = 0
309 |
310 | # Coordinates
311 | tx[b, best_n, gj, gi] = gx - gx.floor()
312 | ty[b, best_n, gj, gi] = gy - gy.floor()
313 | # Width and height
314 | tw[b, best_n, gj, gi] = torch.log(gw / anchors[best_n][:, 0] + 1e-16)
315 | th[b, best_n, gj, gi] = torch.log(gh / anchors[best_n][:, 1] + 1e-16)
316 | # One-hot encoding of label
317 | tcls[b, best_n, gj, gi, target_labels] = 1
318 | # Compute label correctness and iou at best anchor
319 | class_mask[b, best_n, gj, gi] = (pred_cls[b, best_n, gj, gi].argmax(-1) == target_labels).float()
320 | iou_scores[b, best_n, gj, gi] = bbox_iou(pred_boxes[b, best_n, gj, gi], target_boxes, x1y1x2y2=False)
321 |
322 | tconf = obj_mask.float()
323 | return iou_scores, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tcls, tconf
324 |
--------------------------------------------------------------------------------
/pip-requirements.txt:
--------------------------------------------------------------------------------
1 | terminaltables==3.1.0
--------------------------------------------------------------------------------
/test.py:
--------------------------------------------------------------------------------
1 | from __future__ import division
2 |
3 | from models import *
4 | from my_utils.utils import *
5 | from my_utils.datasets import *
6 | from my_utils.parse_config import *
7 |
8 | import os
9 | import sys
10 | import time
11 | import datetime
12 | import argparse
13 | import tqdm
14 |
15 | import torch
16 | from torch.utils.data import DataLoader
17 | from torchvision import datasets
18 | from torchvision import transforms
19 | from torch.autograd import Variable
20 | import torch.optim as optim
21 |
22 |
23 | def evaluate(model, path, iou_thres, conf_thres, nms_thres, img_size, batch_size):
24 | model.eval()
25 |
26 | # Get dataloader
27 | dataset = ListDataset(path, img_size=img_size, augment=False, multiscale=False)
28 | dataloader = torch.utils.data.DataLoader(
29 | dataset, batch_size=batch_size, shuffle=False, num_workers=1, collate_fn=dataset.collate_fn
30 | )
31 |
32 | Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor
33 |
34 | labels = []
35 | sample_metrics = [] # List of tuples (TP, confs, pred)
36 | for batch_i, (_, imgs, targets) in enumerate(tqdm.tqdm(dataloader, desc="Detecting objects")):
37 |
38 | # Extract labels
39 | labels += targets[:, 1].tolist()
40 | # Rescale target
41 | targets[:, 2:] = xywh2xyxy(targets[:, 2:])
42 | targets[:, 2:] *= img_size
43 |
44 | imgs = Variable(imgs.type(Tensor), requires_grad=False)
45 |
46 | with torch.no_grad():
47 | outputs = model(imgs)
48 | outputs = non_max_suppression(outputs, conf_thres=conf_thres, nms_thres=nms_thres)
49 |
50 | sample_metrics += get_batch_statistics(outputs, targets, iou_threshold=iou_thres)
51 |
52 | # Concatenate sample statistics
53 | if len(sample_metrics) > 0:
54 | true_positives, pred_scores, pred_labels = [np.concatenate(x, 0) for x in list(zip(*sample_metrics))]
55 | precision, recall, AP, f1, ap_class = ap_per_class(true_positives, pred_scores, pred_labels, labels)
56 | assert len(ap_class) == len(AP)
57 | else:
58 | ap_class = np.unique(labels).astype("int32")
59 | precision = np.zeros(len(ap_class), dtype=np.float)
60 | recall = precision
61 | AP = precision
62 | f1 = 2 * precision * recall / (precision + recall + 1e-16)
63 | # true_positives, pred_scores, pred_labels = [np.concatenate(x, 0) for x in list(zip(*sample_metrics))]
64 | # precision, recall, AP, f1, ap_class = ap_per_class(true_positives, pred_scores, pred_labels, labels)
65 |
66 | return precision, recall, AP, f1, ap_class
67 |
68 |
69 | if __name__ == "__main__":
70 | parser = argparse.ArgumentParser()
71 | parser.add_argument("--batch_size", type=int, default=8, help="size of each image batch")
72 | parser.add_argument("--model_def", type=str, default="config/yolov3.cfg", help="path to model definition file")
73 | parser.add_argument("--data_config", type=str, default="config/coco.data", help="path to data config file")
74 | parser.add_argument("--weights_path", type=str, default="weights/yolov3.weights", help="path to weights file")
75 | parser.add_argument("--class_path", type=str, default="data/coco.names", help="path to class label file")
76 | parser.add_argument("--iou_thres", type=float, default=0.5, help="iou threshold required to qualify as detected")
77 | parser.add_argument("--conf_thres", type=float, default=0.001, help="object confidence threshold")
78 | parser.add_argument("--nms_thres", type=float, default=0.5, help="iou thresshold for non-maximum suppression")
79 | parser.add_argument("--n_cpu", type=int, default=8, help="number of cpu threads to use during batch generation")
80 | parser.add_argument("--img_size", type=int, default=416, help="size of each image dimension")
81 | opt = parser.parse_args()
82 | print(opt)
83 |
84 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
85 |
86 | data_config = parse_data_config(opt.data_config)
87 | valid_path = data_config["valid"]
88 | class_names = load_classes(data_config["names"])
89 |
90 | # Initiate model
91 | model = Darknet(opt.model_def).to(device)
92 | if opt.weights_path.endswith(".weights"):
93 | # Load darknet weights
94 | model.load_darknet_weights(opt.weights_path)
95 | else:
96 | # Load checkpoint weights
97 | model.load_state_dict(torch.load(opt.weights_path))
98 |
99 | print("Compute mAP...")
100 |
101 | precision, recall, AP, f1, ap_class = evaluate(
102 | model,
103 | path=valid_path,
104 | iou_thres=opt.iou_thres,
105 | conf_thres=opt.conf_thres,
106 | nms_thres=opt.nms_thres,
107 | img_size=opt.img_size,
108 | batch_size=8,
109 | )
110 |
111 | print("Average Precisions:")
112 | for i, c in enumerate(ap_class):
113 | print(f"+ Class '{c}' ({class_names[c]}) - AP: {AP[i]}")
114 |
115 | print(f"mAP: {AP.mean()}")
116 |
--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
1 | from __future__ import division
2 |
3 | from models import *
4 | from my_utils.utils import *
5 | from my_utils.datasets import *
6 | from my_utils.parse_config import *
7 | from test import evaluate
8 |
9 | from terminaltables import AsciiTable
10 |
11 | import os
12 | import sys
13 | import time
14 | import datetime
15 | import argparse
16 |
17 | import torch
18 | from torch.utils.data import DataLoader
19 | from torchvision import datasets
20 | from torchvision import transforms
21 | from torch.autograd import Variable
22 |
23 | try:
24 | import moxing as mox
25 | except:
26 | print('not use moxing')
27 |
28 |
29 | def prepare_data_on_modelarts(args):
30 | """
31 | 将OBS上的数据拷贝到ModelArts中
32 | """
33 | # 拷贝预训练参数文件
34 |
35 | # 默认使用ModelArts中的如下两个路径用于存储数据:
36 | # 0) /cache/model: 如果适用预训练模型,存储从OBS拷贝过来的预训练模型
37 | # 1)/cache/datasets: 存储从OBS拷贝过来的训练数据
38 | # 2)/cache/log: 存储训练日志和训练模型,并且在训练结束后,该目录下的内容会被全部拷贝到OBS
39 | if args.pretrained_weights:
40 | _, weights_name = os.path.split(args.pretrained_weights)
41 | mox.file.copy(args.pretrained_weights, os.path.join(args.local_data_root, 'model/'+weights_name))
42 | args.pretrained_weights = os.path.join(args.local_data_root, 'model/'+weights_name)
43 | if not (args.data_url.startswith('s3://') or args.data_url.startswith('obs://')):
44 | args.data_local = args.data_url
45 | else:
46 | args.data_local = os.path.join(args.local_data_root, 'datasets/trainval')
47 | if not os.path.exists(args.data_local):
48 | data_dir = os.path.join(args.local_data_root, 'datasets')
49 | mox.file.copy_parallel(args.data_url, data_dir)
50 | os.system('cd %s;unzip trainval.zip' % data_dir) # 训练集已提前打包为trainval.zip
51 | if os.path.isdir(args.data_local):
52 | os.system('cd %s;rm trainval.zip' % data_dir)
53 | print('unzip trainval.zip success, args.data_local is', args.data_local)
54 | else:
55 | raise Exception('unzip trainval.zip Failed')
56 | else:
57 | print('args.data_local: %s is already exist, skip copy' % args.data_local)
58 |
59 | if not (args.train_url.startswith('s3://') or args.train_url.startswith('obs://')):
60 | args.train_local = args.train_url
61 | else:
62 | args.train_local = os.path.join(args.local_data_root, 'log/')
63 | if not os.path.exists(args.train_local):
64 | os.mkdir(args.train_local)
65 |
66 | return args
67 |
68 |
69 | def gen_model_dir(args, model_best_path):
70 | current_dir = os.path.dirname(__file__)
71 | mox.file.copy_parallel(os.path.join(current_dir, 'deploy_scripts'),
72 | os.path.join(args.train_url, 'model'))
73 | mox.file.copy_parallel(os.path.join(current_dir, 'my_utils'),
74 | os.path.join(args.train_url, 'model/my_utils'))
75 | mox.file.copy(os.path.join(current_dir, 'config/yolov3-44.cfg'),
76 | os.path.join(args.train_url, 'model/yolov3-44.cfg'))
77 | mox.file.copy(os.path.join(current_dir, 'config/train_classes.txt'),
78 | os.path.join(args.train_url, 'model/train_classes.txt'))
79 | mox.file.copy(os.path.join(current_dir, 'config/classify_rule.json'),
80 | os.path.join(args.train_url, 'model/classify_rule.json'))
81 | mox.file.copy(os.path.join(current_dir, 'models.py'),
82 | os.path.join(args.train_url, 'model/models.py'))
83 | mox.file.copy(model_best_path,
84 | os.path.join(args.train_url, 'model/models_best.pth'))
85 | print('gen_model_dir success, model dir is at', os.path.join(args.train_url, 'model'))
86 |
87 |
88 | def freeze_body(model, freeze_body):
89 | # input: freeze_body.type = int, .choose = 0, 1, 2
90 | # return: modified model.parameters()
91 | # notes:
92 | # 0: do not freeze any layers
93 | # 1: freeze Darknet53 only
94 | # 2: freeze all but three detection layers
95 | # three detection layers is [81, 93, 105], refer to https://blog.csdn.net/litt1e/article/details/88907542
96 |
97 | for name, value in model.named_parameters():
98 | value.requires_grad = True
99 |
100 | if freeze_body == 0:
101 | print('using original model without any freeze body')
102 | elif freeze_body == 1:
103 | print('using fitting model with backbone(Darknet53) frozen')
104 | for name, value in model.named_parameters():
105 | layers = int(name.split('.')[1])
106 | if layers < 74:
107 | value.requires_grad = False
108 | elif freeze_body == 2:
109 | print('using fitting model with all but three detection layers frozen')
110 | for name, value in model.named_parameters():
111 | layers = int(name.split('.')[1])
112 | if layers not in [81, 93, 105]:
113 | value.requires_grad = False
114 | else:
115 | print('Type error for freeze_body. Thus no layer is frozen')
116 |
117 | new_params = filter(lambda p: p.requires_grad, model.parameters())
118 | return new_params
119 |
120 |
121 | def train(model, dataloader, optimizer, epoch, opt, device):
122 | model.train()
123 | start_time = time.time()
124 | metrics = [
125 | "grid_size",
126 | "loss",
127 | "x",
128 | "y",
129 | "w",
130 | "h",
131 | "conf",
132 | "cls",
133 | "cls_acc",
134 | "recall50",
135 | "recall75",
136 | "precision",
137 | "conf_obj",
138 | "conf_noobj",
139 | ]
140 | for batch_i, (_, imgs, targets) in enumerate(dataloader):
141 | batches_done = len(dataloader) * epoch + batch_i
142 |
143 | imgs = Variable(imgs.to(device))
144 | targets = Variable(targets.to(device), requires_grad=False)
145 |
146 | loss, outputs = model(imgs, targets)
147 | loss.backward()
148 |
149 | if batches_done % opt.gradient_accumulations:
150 | # Accumulates gradient before each step
151 | optimizer.step()
152 | optimizer.zero_grad()
153 |
154 | # Log progress
155 | log_str = "\n---- [Epoch %d/%d, Batch %d/%d] ----\n" % (epoch, opt.max_epochs_2, batch_i, len(dataloader))
156 | metric_table = [["Metrics", *[f"YOLO Layer {i}" for i in range(len(model.yolo_layers))]]]
157 |
158 | # Log metrics at each YOLO layer
159 | for i, metric in enumerate(metrics):
160 | formats = {m: "%.6f" for m in metrics}
161 | formats["grid_size"] = "%2d"
162 | formats["cls_acc"] = "%.2f%%"
163 | row_metrics = [formats[metric] % yolo.metrics.get(metric, 0) for yolo in model.yolo_layers]
164 | metric_table += [[metric, *row_metrics]]
165 |
166 | '''
167 | # Tensorboard logging
168 | tensorboard_log = []
169 | for j, yolo in enumerate(model.yolo_layers):
170 | for name, metric in yolo.metrics.items():
171 | if name != "grid_size":
172 | tensorboard_log += [(f"{name}_{j+1}", metric)]
173 | tensorboard_log += [("loss", loss.item())]
174 | logger.list_of_scalars_summary(tensorboard_log, batches_done)
175 | '''
176 |
177 | log_str += AsciiTable(metric_table).table
178 | log_str += f"\nTotal loss {loss.item()}"
179 |
180 | # Determine approximate time left for epoch
181 | epoch_batches_left = len(dataloader) - (batch_i + 1)
182 | time_left = datetime.timedelta(seconds=epoch_batches_left * (time.time() - start_time) / (batch_i + 1))
183 | log_str += f"\n---- ETA {time_left}"
184 |
185 | print(log_str)
186 |
187 | model.seen += imgs.size(0)
188 |
189 |
190 | def valid(model, path, class_names, opt):
191 | print("\n---- Evaluating Model ----")
192 | # Evaluate the model on the validation set
193 | precision, recall, AP, f1, ap_class = evaluate(
194 | model,
195 | path=path,
196 | iou_thres=0.5,
197 | conf_thres=0.5,
198 | nms_thres=0.5,
199 | img_size=opt.img_size,
200 | batch_size=32,
201 | )
202 | evaluation_metrics = [
203 | ("val_precision", precision.mean()),
204 | ("val_recall", recall.mean()),
205 | ("val_mAP", AP.mean()),
206 | ("val_f1", f1.mean()),
207 | ]
208 | # logger.list_of_scalars_summary(evaluation_metrics, epoch)
209 |
210 | # Print class APs and mAP
211 | ap_table = [["Index", "Class name", "AP"]]
212 | for i, c in enumerate(ap_class):
213 | ap_table += [[c, class_names[c], "%.5f" % AP[i]]]
214 | print(AsciiTable(ap_table).table)
215 | print(f"---- mAP {AP.mean()}")
216 | return AP
217 |
218 |
219 | if __name__ == "__main__":
220 | parser = argparse.ArgumentParser()
221 | parser.add_argument('--max_epochs_1', default=5, type=int, help='number of total epochs to run in stage one')
222 | parser.add_argument('--max_epochs_2', default=5, type=int, help='number of total epochs to run in total')
223 | parser.add_argument("--freeze_body_1", type=int, default=2, help="frozen specific layers for stage one")
224 | parser.add_argument("--freeze_body_2", type=int, default=0, help="frozen specific layers for stage two")
225 | parser.add_argument("--lr_1", type=float, default=1e-3, help="initial learning rate for stage one")
226 | parser.add_argument("--lr_2", type=float, default=1e-5, help="initial learning rate for stage two")
227 | parser.add_argument("--batch_size", type=int, default=32, help="size of each image batch")
228 | parser.add_argument("--gradient_accumulations", type=int, default=2, help="number of gradient accums before step")
229 | parser.add_argument("--model_def", type=str, default="PyTorch-YOLOv3-ModelArts/config/yolov3-44.cfg", help="path to model definition file")
230 | parser.add_argument("--data_config", type=str, default="PyTorch-YOLOv3-ModelArts/config/custom.data", help="path to data config file")
231 | parser.add_argument("--pretrained_weights", type=str, help="if specified starts from checkpoint model")
232 | parser.add_argument("--n_cpu", type=int, default=8, help="number of cpu threads to use during batch generation")
233 | parser.add_argument("--img_size", type=int, default=416, help="size of each image dimension")
234 | parser.add_argument("--checkpoint_interval", type=int, default=1, help="interval between saving model weights, "
235 | "here set same to evaluation_interval")
236 | parser.add_argument("--evaluation_interval", type=int, default=1, help="interval evaluations on validation set")
237 | parser.add_argument("--compute_map", default=False, help="if True computes mAP every tenth batch")
238 | parser.add_argument("--multiscale_training", default=True, help="allow for multi-scale training")
239 | parser.add_argument('--local_data_root', default='/cache/', type=str,
240 | help='a directory used for transfer data between local path and OBS path')
241 | parser.add_argument('--data_url', required=True, type=str, help='the training and validation data path')
242 | parser.add_argument('--data_local', default='', type=str, help='the training and validation data path on local')
243 | parser.add_argument('--train_url', required=True, type=str, help='the path to save training outputs')
244 | parser.add_argument('--train_local', default='', type=str, help='the training output results on local')
245 | parser.add_argument('--init_method', default='', type=str, help='the training output results on local')
246 |
247 | opt = parser.parse_args()
248 | print(opt)
249 | opt = prepare_data_on_modelarts(opt)
250 |
251 | # logger = Logger("logs")
252 |
253 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
254 |
255 | # Get data configuration
256 | data_config = parse_data_config(opt.data_config)
257 | train_path = data_config["train"]
258 | valid_path = data_config["valid"]
259 | class_names = load_classes(data_config["names"])
260 |
261 | # Initiate model
262 | model = Darknet(opt.model_def, opt.img_size).to(device)
263 | model.apply(weights_init_normal)
264 |
265 | # If specified we start from checkpoint
266 | if opt.pretrained_weights:
267 | if opt.pretrained_weights.endswith(".pth"):
268 | model.load_state_dict(torch.load(opt.pretrained_weights))
269 | else:
270 | model.load_darknet_weights(opt.pretrained_weights)
271 |
272 | # Get dataloader
273 | dataset = ListDataset(train_path, img_size=opt.img_size, augment=True, multiscale=opt.multiscale_training)
274 | dataloader = torch.utils.data.DataLoader(
275 | dataset,
276 | batch_size=opt.batch_size,
277 | shuffle=True,
278 | num_workers=opt.n_cpu,
279 | pin_memory=True,
280 | collate_fn=dataset.collate_fn,
281 | )
282 |
283 | # store the name of model with best mAP
284 | model_best = {'mAP': 0, 'name': ''}
285 |
286 | # first stage training to get a relatively stable model
287 | optimizer_1 = torch.optim.Adam(freeze_body(model, opt.freeze_body_1), lr=opt.lr_1)
288 | for epoch in range(opt.max_epochs_1):
289 |
290 | train(model, dataloader, optimizer_1, epoch, opt, device)
291 |
292 | if epoch % opt.evaluation_interval == 0:
293 | AP = valid(model, valid_path, class_names, opt)
294 |
295 | temp_model_name = f"ckpt_%d_%.2f.pth" % (epoch, 100 * AP.mean())
296 | ckpt_name = os.path.join(opt.train_local, temp_model_name)
297 | torch.save(model.state_dict(), ckpt_name)
298 | mox.file.copy_parallel(ckpt_name, os.path.join(opt.train_url, temp_model_name))
299 |
300 | if AP.mean() > model_best['mAP']:
301 | model_best['mAP'] = AP.mean()
302 | model_best['name'] = ckpt_name
303 |
304 | # second stage training to achieve higher mAP
305 | optimizer_2 = torch.optim.Adam(freeze_body(model, opt.freeze_body_2), lr=opt.lr_2)
306 | scheduler = torch.optim.lr_scheduler.StepLR(optimizer_2, step_size=10)
307 | for epoch in range(opt.max_epochs_1, opt.max_epochs_2):
308 |
309 | train(model, dataloader, optimizer_2, epoch, opt, device)
310 |
311 | if epoch % opt.evaluation_interval == 0:
312 | AP = valid(model, valid_path, class_names, opt)
313 |
314 | temp_model_name = f"ckpt_%d_%.2f.pth" % (epoch, 100 * AP.mean())
315 | ckpt_name = os.path.join(opt.train_local, temp_model_name)
316 | torch.save(model.state_dict(), ckpt_name)
317 | mox.file.copy_parallel(ckpt_name, os.path.join(opt.train_url, temp_model_name))
318 |
319 | if AP.mean() > model_best['mAP']:
320 | model_best['mAP'] = AP.mean()
321 | model_best['name'] = ckpt_name
322 |
323 | scheduler.step(epoch)
324 |
325 | print('The current learning rate is: ', scheduler.get_lr()[0])
326 |
327 | gen_model_dir(opt, model_best['name'])
328 |
--------------------------------------------------------------------------------
/weights/download_weights.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | # Download weights for vanilla YOLOv3
3 | wget -c https://pjreddie.com/media/files/yolov3.weights
4 | # # Download weights for tiny YOLOv3
5 | wget -c https://pjreddie.com/media/files/yolov3-tiny.weights
6 | # Download weights for backbone network
7 | wget -c https://pjreddie.com/media/files/darknet53.conv.74
8 |
--------------------------------------------------------------------------------