├── .gitignore
├── HRSC2016
    └── train
    │   ├── 100001675.jpg
    │   └── 100001675.txt
├── README.md
├── cfg
    ├── HRSC+
    │   ├── hyp.py
    │   ├── yolov3_512.cfg
    │   ├── yolov3_512_ma.cfg
    │   ├── yolov3_512_matrix.cfg
    │   └── yolov3_512_se.cfg
    ├── HRSC
    │   ├── hyp.py
    │   ├── yolov3-416.cfg
    │   └── yolov3-m.cfg
    ├── ICDAR
    │   ├── hyp.py
    │   ├── yolov3_608.cfg
    │   └── yolov3_608_se.cfg
    ├── hyp_template.py
    ├── yolov3-m.cfg
    ├── yolov3-tiny.cfg
    └── yolov3.cfg
├── data
    ├── IC_eval
    │   └── ic15
    │   │   └── rrc_evaluation_funcs.pyc
    ├── coco.data
    ├── coco.names
    ├── hrsc.data
    ├── hrsc.name
    ├── icdar.name
    ├── icdar_13+15.data
    ├── icdar_15.data
    └── icdar_15_all.data
├── demo.png
├── detect.py
├── experiment
    ├── HRSC+
    │   ├── 1000_SE_nosample.txt
    │   └── 1000_normal.txt
    ├── HRSC
    │   ├── context
    │   │   ├── context-0.8.txt
    │   │   ├── context-1.25.txt
    │   │   └── context-1.6.txt
    │   ├── hyp.py
    │   ├── hyper
    │   │   ├── iou_0.05_ang_12.txt
    │   │   ├── iou_0.1-ang_12.txt
    │   │   ├── iou_0.1-ang_24.txt
    │   │   └── iou_0.3-ang_12.txt
    │   └── mul-scale
    │   │   ├── hyp.txt
    │   │   └── results.txt
    ├── IC15
    │   ├── 0.1_12.txt
    │   ├── 0.3_12.txt
    │   ├── 0.5_12.txt
    │   ├── 0.5_6.txt
    │   ├── 0.7_12.txt
    │   └── ablation.png
    ├── ga-attention(_4).png
    └── tiny_test_gax4_o8_dh.png
├── make.sh
├── model
    ├── __init__.py
    ├── layer
    │   ├── DCNv2
    │   │   ├── .gitignore
    │   │   ├── __init__.py
    │   │   ├── dcn_test.py
    │   │   ├── dcn_v2.py
    │   │   ├── make.sh
    │   │   ├── setup.py
    │   │   └── src
    │   │   │   ├── cpu
    │   │   │       ├── dcn_v2_cpu.cpp
    │   │   │       └── vision.h
    │   │   │   ├── cuda
    │   │   │       ├── dcn_v2_cuda.cu
    │   │   │       ├── dcn_v2_im2col_cuda.cu
    │   │   │       ├── dcn_v2_im2col_cuda.h
    │   │   │       ├── dcn_v2_psroi_pooling_cuda.cu
    │   │   │       └── vision.h
    │   │   │   ├── dcn_v2.h
    │   │   │   └── vision.cpp
    │   └── __init__.py
    ├── loss.py
    ├── model_utils.py
    ├── models.py
    └── sampler_ratio.png
├── study.txt
├── test.py
├── train.py
└── utils
    ├── ICDAR
        ├── ICDAR2yolo.py
        └── icdar_utils.py
    ├── adabound.py
    ├── augment.py
    ├── datasets.py
    ├── gcp.sh
    ├── google_utils.py
    ├── init.py
    ├── kmeans
        ├── 416
        │   ├── 3
        │   │   ├── anchor_clusters.png
        │   │   ├── area_cluster.png
        │   │   ├── kmeans.png
        │   │   └── ratio_cluster.png
        │   └── 6
        │   │   ├── 2019-10-31 09-02-05屏幕截图.png
        │   │   ├── anchor_clusters.png
        │   │   ├── area_cluster.png
        │   │   └── ratio_cluster.png
        ├── hrsc_512.txt
        ├── icdar_608_all.txt
        ├── icdar_608_care.txt
        └── kmeans.py
    ├── nms
        ├── __init__.py
        ├── make.sh
        ├── nms.py
        ├── nms_wrapper_test.py
        ├── setup.py
        └── src
        │   ├── rotate_polygon_nms.cpp
        │   └── rotate_polygon_nms_kernel.cu
    ├── parse_config.py
    ├── torch_utils.py
    └── utils.py


/.gitignore:
--------------------------------------------------------------------------------
1 | *.pt
2 | *.pth
3 | 
4 | 
5 | 


--------------------------------------------------------------------------------
/HRSC2016/train/100001675.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/HRSC2016/train/100001675.jpg


--------------------------------------------------------------------------------
/HRSC2016/train/100001675.txt:
--------------------------------------------------------------------------------
1 | 0 0.748569 0.577177 0.488319 0.121866 1.415746
2 | 0 0.844500 0.501634 0.343307 0.049861 1.398959
3 | 0 0.221619 0.527416 0.332149 0.103440 -1.19584
4 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | ## Rotated-Yolov3
 2 | 
 3 | Rotaion object detection implemented with yolov3.
 4 | 
 5 | ---
 6 | 
 7 | Hello, the no-program [ryolov3](https://github.com/ming71/yolov3-polygon) is available now. Although not so many tricks are attached like this repo, it still achieves good results, and is friendly for beginners to learn, have a good luck.
 8 | 
 9 | ## Update
10 | 
11 | The latest code has been uploaded, unfortunately, due to my negligence, I incorrectly modified some parts of the code and did not save the historical version last year, which made it hard to reproduce the previous high performance. It is tentatively that there are some problems in the loss calculation part. 
12 | 
13 | But I found from the experimental results left last year that yolov3 is suitable for rotation detection. After using several tricks (attention, ORN, Mish, and etc.), it have achieved good performance. More previous experiment results can be found [here](https://github.com/ming71/rotate-yolo/blob/master/experiment).
14 | 
15 | ## Support 
16 | * SEBlock  
17 | * CUDA RNMS  
18 | * riou loss  
19 | * Inception module  
20 | * DCNv2  
21 | * ORN  
22 | * SeparableConv
23 | * Mish/Swish
24 | * GlobalAttention
25 | 
26 | ## Detection Results
27 | 
28 | The detection results from rotated yolov3 left over last year:
29 | 
30 | <div align=center><img  src="https://github.com/ming71/rotate-yolo/blob/master/demo.png"/></div>
31 | 
32 | ## Q&A
33 | 
34 | Following questions are frequently mentioned. And if you have something unclear, don't doubt and contact me via  opening issues. 
35 | 
36 | * Q: How can I obtain  `icdar_608_care.txt`?
37 | 
38 |   A: `icdar_608_care.txt` sets the initial anchors generated via kmeans, you need to run `kmeans.py` refer to my implemention  [here](https://github.com/ming71/toolbox/blob/master/kmeans.py). You can also check `utils/parse_config.py` for more details.
39 | 
40 | * Q: How to train the model on my own dataset?
41 | 
42 |   A: This ryolo implemention is based on this [repo](https://github.com/ultralytics/yolov3),  training and evaluation pipeline are the same as that one do.
43 | 
44 | * Q: Where is ORN codes?
45 | 
46 |   A: I'll release the whole codebase as I return school, and this [repo](https://github.com/ming71/CUDA/tree/master/ORN) may help.
47 | 
48 | * Q: I cannot reproduce the result you reported(80 mAP for hrsc and 0.7 F1 for IC15).
49 | * A: Refer to my reply [here](https://github.com/ming71/rotate-yolov3/issues/14#issuecomment-663328130). This is only a backup repo, the overall model is no problem, but **direct running does not necessarily guarantee good results**, cause it is not the latest version, and some parameters may have problems, you need to adjust some details and parameter settings yourself. 
50 |   I will upload the complete executable code as soon as I return to school in September (if lucky).
51 | 
52 | ## In the end
53 | There is no need or time to maintain the codebase to reproduce the previous performance. If you are interested in this work, you are welcome to fix the bugs in this codebase, and the trained models are available [here](https://pan.baidu.com/s/1EXhyGSiuUIPnkZ7cwpfCbQ) with extracted code `5noq` . I'll reimplement the rotation yolov4 or yolov5 if time permitting  in the future.


--------------------------------------------------------------------------------
/cfg/HRSC+/hyp.py:
--------------------------------------------------------------------------------
 1 | giou: 0.1       # giou loss gain 1.582
 2 | cls: 27.76      # cls loss gain  (CE=~1.0, uCE=~20)
 3 | cls_pw: 1.446   # cls BCELoss positive_weight
 4 | obj: 20.35      # obj loss gain (*=80 for uBCE with 80 classes)
 5 | obj_pw: 3.941   # obj BCELoss positive_weight
 6 | iou_t: 0.5      # iou training threshold
 7 | ang_t: 3.1415926/12
 8 | reg: 1.0
 9 | fl_gamma: 0.5   # focal loss gamma
10 | context_factor: 1.0 # 按照短边h来设置的,wh的增幅相同； 调试时设为倒数直接检测
11 | 
12 | 
13 | # lr
14 | lr0: 0.0001
15 | multiplier:10
16 | warm_epoch:5
17 | lrf: -4.        # final LambdaLR learning rate = lr0 * (10 ** lrf)
18 | momentum: 0.97  # SGD momentum
19 | weight_decay: 0.0004569  # optimizer weight decay
20 | 
21 | 
22 | # aug
23 | hsv_s: 0.5      # image HSV-Saturation augmentation (fraction)
24 | hsv_v: 0.3      # image HSV-Value augmentation (fraction)
25 | degrees: 5.0    # image rotation (+/- deg)
26 | translate: 0.1  # image translation (+/- fraction)
27 | scale: 0.1      # image scale (+/- gain)
28 | shear: 0.0
29 | gamma: 0.2
30 | blur:  1.3
31 | noise: 0.01
32 | contrast: 0.15
33 | sharpen: 0.15
34 | copypaste: 0.1  # 船身 h 的 3sigma 段位以内 
35 | grayscale: 0.3  # 灰度强度为0.3-1.0
36 | 
37 | 
38 | # training
39 | epochs: 1000
40 | batch_size: 4
41 | save_interval: 300
42 | test_interval: 5
43 | 


--------------------------------------------------------------------------------
/cfg/HRSC+/yolov3_512.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | #batch=1
  4 | #subdivisions=1
  5 | # Training
  6 | batch=16
  7 | subdivisions=1
  8 | width=512
  9 | height=512
 10 | channels=3
 11 | momentum=0.9
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.001
 19 | burn_in=1000
 20 | max_batches = 500200
 21 | policy=steps
 22 | steps=400000,450000
 23 | scales=.1,.1
 24 | 
 25 | [convolutional]
 26 | batch_normalize=1
 27 | filters=32
 28 | size=3
 29 | stride=1
 30 | pad=1
 31 | activation=leaky
 32 | 
 33 | # Downsample
 34 | 
 35 | [convolutional]
 36 | batch_normalize=1
 37 | filters=64
 38 | size=3
 39 | stride=2
 40 | pad=1
 41 | activation=leaky
 42 | 
 43 | [convolutional]
 44 | batch_normalize=1
 45 | filters=32
 46 | size=1
 47 | stride=1
 48 | pad=1
 49 | activation=leaky
 50 | 
 51 | [convolutional]
 52 | batch_normalize=1
 53 | filters=64
 54 | size=3
 55 | stride=1
 56 | pad=1
 57 | activation=leaky
 58 | 
 59 | [shortcut]
 60 | from=-3
 61 | activation=linear
 62 | 
 63 | # Downsample
 64 | 
 65 | [convolutional]
 66 | batch_normalize=1
 67 | filters=128
 68 | size=3
 69 | stride=2
 70 | pad=1
 71 | activation=leaky
 72 | 
 73 | [convolutional]
 74 | batch_normalize=1
 75 | filters=64
 76 | size=1
 77 | stride=1
 78 | pad=1
 79 | activation=leaky
 80 | 
 81 | [convolutional]
 82 | batch_normalize=1
 83 | filters=128
 84 | size=3
 85 | stride=1
 86 | pad=1
 87 | activation=leaky
 88 | 
 89 | [shortcut]
 90 | from=-3
 91 | activation=linear
 92 | 
 93 | [convolutional]
 94 | batch_normalize=1
 95 | filters=64
 96 | size=1
 97 | stride=1
 98 | pad=1
 99 | activation=leaky
100 | 
101 | [convolutional]
102 | batch_normalize=1
103 | filters=128
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 | 
109 | [shortcut]
110 | from=-3
111 | activation=linear
112 | 
113 | # Downsample
114 | 
115 | [convolutional]
116 | batch_normalize=1
117 | filters=256
118 | size=3
119 | stride=2
120 | pad=1
121 | activation=leaky
122 | 
123 | [convolutional]
124 | batch_normalize=1
125 | filters=128
126 | size=1
127 | stride=1
128 | pad=1
129 | activation=leaky
130 | 
131 | [convolutional]
132 | batch_normalize=1
133 | filters=256
134 | size=3
135 | stride=1
136 | pad=1
137 | activation=leaky
138 | 
139 | [shortcut]
140 | from=-3
141 | activation=linear
142 | 
143 | [convolutional]
144 | batch_normalize=1
145 | filters=128
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=leaky
150 | 
151 | [convolutional]
152 | batch_normalize=1
153 | filters=256
154 | size=3
155 | stride=1
156 | pad=1
157 | activation=leaky
158 | 
159 | [shortcut]
160 | from=-3
161 | activation=linear
162 | 
163 | [convolutional]
164 | batch_normalize=1
165 | filters=128
166 | size=1
167 | stride=1
168 | pad=1
169 | activation=leaky
170 | 
171 | [convolutional]
172 | batch_normalize=1
173 | filters=256
174 | size=3
175 | stride=1
176 | pad=1
177 | activation=leaky
178 | 
179 | [shortcut]
180 | from=-3
181 | activation=linear
182 | 
183 | [convolutional]
184 | batch_normalize=1
185 | filters=128
186 | size=1
187 | stride=1
188 | pad=1
189 | activation=leaky
190 | 
191 | [convolutional]
192 | batch_normalize=1
193 | filters=256
194 | size=3
195 | stride=1
196 | pad=1
197 | activation=leaky
198 | 
199 | [shortcut]
200 | from=-3
201 | activation=linear
202 | 
203 | 
204 | [convolutional]
205 | batch_normalize=1
206 | filters=128
207 | size=1
208 | stride=1
209 | pad=1
210 | activation=leaky
211 | 
212 | [convolutional]
213 | batch_normalize=1
214 | filters=256
215 | size=3
216 | stride=1
217 | pad=1
218 | activation=leaky
219 | 
220 | [shortcut]
221 | from=-3
222 | activation=linear
223 | 
224 | [convolutional]
225 | batch_normalize=1
226 | filters=128
227 | size=1
228 | stride=1
229 | pad=1
230 | activation=leaky
231 | 
232 | [convolutional]
233 | batch_normalize=1
234 | filters=256
235 | size=3
236 | stride=1
237 | pad=1
238 | activation=leaky
239 | 
240 | [shortcut]
241 | from=-3
242 | activation=linear
243 | 
244 | [convolutional]
245 | batch_normalize=1
246 | filters=128
247 | size=1
248 | stride=1
249 | pad=1
250 | activation=leaky
251 | 
252 | [convolutional]
253 | batch_normalize=1
254 | filters=256
255 | size=3
256 | stride=1
257 | pad=1
258 | activation=leaky
259 | 
260 | [shortcut]
261 | from=-3
262 | activation=linear
263 | 
264 | [convolutional]
265 | batch_normalize=1
266 | filters=128
267 | size=1
268 | stride=1
269 | pad=1
270 | activation=leaky
271 | 
272 | [convolutional]
273 | batch_normalize=1
274 | filters=256
275 | size=3
276 | stride=1
277 | pad=1
278 | activation=leaky
279 | 
280 | [shortcut]
281 | from=-3
282 | activation=linear
283 | 
284 | # Downsample
285 | 
286 | [convolutional]
287 | batch_normalize=1
288 | filters=512
289 | size=3
290 | stride=2
291 | pad=1
292 | activation=leaky
293 | 
294 | [convolutional]
295 | batch_normalize=1
296 | filters=256
297 | size=1
298 | stride=1
299 | pad=1
300 | activation=leaky
301 | 
302 | [convolutional]
303 | batch_normalize=1
304 | filters=512
305 | size=3
306 | stride=1
307 | pad=1
308 | activation=leaky
309 | 
310 | [shortcut]
311 | from=-3
312 | activation=linear
313 | 
314 | 
315 | [convolutional]
316 | batch_normalize=1
317 | filters=256
318 | size=1
319 | stride=1
320 | pad=1
321 | activation=leaky
322 | 
323 | [convolutional]
324 | batch_normalize=1
325 | filters=512
326 | size=3
327 | stride=1
328 | pad=1
329 | activation=leaky
330 | 
331 | [shortcut]
332 | from=-3
333 | activation=linear
334 | 
335 | 
336 | [convolutional]
337 | batch_normalize=1
338 | filters=256
339 | size=1
340 | stride=1
341 | pad=1
342 | activation=leaky
343 | 
344 | [convolutional]
345 | batch_normalize=1
346 | filters=512
347 | size=3
348 | stride=1
349 | pad=1
350 | activation=leaky
351 | 
352 | [shortcut]
353 | from=-3
354 | activation=linear
355 | 
356 | 
357 | [convolutional]
358 | batch_normalize=1
359 | filters=256
360 | size=1
361 | stride=1
362 | pad=1
363 | activation=leaky
364 | 
365 | [convolutional]
366 | batch_normalize=1
367 | filters=512
368 | size=3
369 | stride=1
370 | pad=1
371 | activation=leaky
372 | 
373 | [shortcut]
374 | from=-3
375 | activation=linear
376 | 
377 | [convolutional]
378 | batch_normalize=1
379 | filters=256
380 | size=1
381 | stride=1
382 | pad=1
383 | activation=leaky
384 | 
385 | [convolutional]
386 | batch_normalize=1
387 | filters=512
388 | size=3
389 | stride=1
390 | pad=1
391 | activation=leaky
392 | 
393 | [shortcut]
394 | from=-3
395 | activation=linear
396 | 
397 | 
398 | [convolutional]
399 | batch_normalize=1
400 | filters=256
401 | size=1
402 | stride=1
403 | pad=1
404 | activation=leaky
405 | 
406 | [convolutional]
407 | batch_normalize=1
408 | filters=512
409 | size=3
410 | stride=1
411 | pad=1
412 | activation=leaky
413 | 
414 | [shortcut]
415 | from=-3
416 | activation=linear
417 | 
418 | 
419 | [convolutional]
420 | batch_normalize=1
421 | filters=256
422 | size=1
423 | stride=1
424 | pad=1
425 | activation=leaky
426 | 
427 | [convolutional]
428 | batch_normalize=1
429 | filters=512
430 | size=3
431 | stride=1
432 | pad=1
433 | activation=leaky
434 | 
435 | [shortcut]
436 | from=-3
437 | activation=linear
438 | 
439 | [convolutional]
440 | batch_normalize=1
441 | filters=256
442 | size=1
443 | stride=1
444 | pad=1
445 | activation=leaky
446 | 
447 | [convolutional]
448 | batch_normalize=1
449 | filters=512
450 | size=3
451 | stride=1
452 | pad=1
453 | activation=leaky
454 | 
455 | [shortcut]
456 | from=-3
457 | activation=linear
458 | 
459 | # Downsample
460 | 
461 | [convolutional]
462 | batch_normalize=1
463 | filters=1024
464 | size=3
465 | stride=2
466 | pad=1
467 | activation=leaky
468 | 
469 | [convolutional]
470 | batch_normalize=1
471 | filters=512
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=leaky
476 | 
477 | [convolutional]
478 | batch_normalize=1
479 | filters=1024
480 | size=3
481 | stride=1
482 | pad=1
483 | activation=leaky
484 | 
485 | [shortcut]
486 | from=-3
487 | activation=linear
488 | 
489 | [convolutional]
490 | batch_normalize=1
491 | filters=512
492 | size=1
493 | stride=1
494 | pad=1
495 | activation=leaky
496 | 
497 | [convolutional]
498 | batch_normalize=1
499 | filters=1024
500 | size=3
501 | stride=1
502 | pad=1
503 | activation=leaky
504 | 
505 | [shortcut]
506 | from=-3
507 | activation=linear
508 | 
509 | [convolutional]
510 | batch_normalize=1
511 | filters=512
512 | size=1
513 | stride=1
514 | pad=1
515 | activation=leaky
516 | 
517 | [convolutional]
518 | batch_normalize=1
519 | filters=1024
520 | size=3
521 | stride=1
522 | pad=1
523 | activation=leaky
524 | 
525 | [shortcut]
526 | from=-3
527 | activation=linear
528 | 
529 | [convolutional]
530 | batch_normalize=1
531 | filters=512
532 | size=1
533 | stride=1
534 | pad=1
535 | activation=leaky
536 | 
537 | [convolutional]
538 | batch_normalize=1
539 | filters=1024
540 | size=3
541 | stride=1
542 | pad=1
543 | activation=leaky
544 | 
545 | [shortcut]
546 | from=-3
547 | activation=linear
548 | 
549 | ######################
550 | 
551 | [convolutional]
552 | batch_normalize=1
553 | filters=512
554 | size=1
555 | stride=1
556 | pad=1
557 | activation=leaky
558 | 
559 | [convolutional]
560 | batch_normalize=1
561 | size=3
562 | stride=1
563 | pad=1
564 | filters=1024
565 | activation=leaky
566 | 
567 | [convolutional]
568 | batch_normalize=1
569 | filters=512
570 | size=1
571 | stride=1
572 | pad=1
573 | activation=leaky
574 | 
575 | [convolutional]
576 | batch_normalize=1
577 | size=3
578 | stride=1
579 | pad=1
580 | filters=1024
581 | activation=leaky
582 | 
583 | [convolutional]
584 | batch_normalize=1
585 | filters=512
586 | size=1
587 | stride=1
588 | pad=1
589 | activation=leaky
590 | 
591 | [convolutional]
592 | batch_normalize=1
593 | size=3
594 | stride=1
595 | pad=1
596 | filters=1024
597 | activation=leaky
598 | 
599 | [convolutional]
600 | size=1
601 | stride=1
602 | pad=1
603 | filters=504
604 | activation=linear
605 | 
606 | # 角度定义为x+ 0; 顺时针; 正负0.5pi
607 | [yolo]
608 | mask = 144-215
609 | anchors = utils/kmeans/hrsc_512.txt
610 | classes=1
611 | num=9
612 | jitter=.3
613 | ignore_thresh = .7
614 | truth_thresh = 1
615 | random=1
616 | 
617 | 
618 | [route]
619 | layers = -4
620 | 
621 | [convolutional]
622 | batch_normalize=1
623 | filters=256
624 | size=1
625 | stride=1
626 | pad=1
627 | activation=leaky
628 | 
629 | [upsample]
630 | stride=2
631 | 
632 | [route]
633 | layers = -1, 61
634 | 
635 | 
636 | 
637 | [convolutional]
638 | batch_normalize=1
639 | filters=256
640 | size=1
641 | stride=1
642 | pad=1
643 | activation=leaky
644 | 
645 | [convolutional]
646 | batch_normalize=1
647 | size=3
648 | stride=1
649 | pad=1
650 | filters=512
651 | activation=leaky
652 | 
653 | [convolutional]
654 | batch_normalize=1
655 | filters=256
656 | size=1
657 | stride=1
658 | pad=1
659 | activation=leaky
660 | 
661 | [convolutional]
662 | batch_normalize=1
663 | size=3
664 | stride=1
665 | pad=1
666 | filters=512
667 | activation=leaky
668 | 
669 | [convolutional]
670 | batch_normalize=1
671 | filters=256
672 | size=1
673 | stride=1
674 | pad=1
675 | activation=leaky
676 | 
677 | [convolutional]
678 | batch_normalize=1
679 | size=3
680 | stride=1
681 | pad=1
682 | filters=512
683 | activation=leaky
684 | 
685 | [convolutional]
686 | size=1
687 | stride=1
688 | pad=1
689 | filters=504
690 | activation=linear
691 | 
692 | 
693 | [yolo]
694 | mask = 72-143
695 | anchors = utils/kmeans/hrsc_512.txt
696 | classes=1
697 | num=9
698 | jitter=.3
699 | ignore_thresh = .7
700 | truth_thresh = 1
701 | random=1
702 | 
703 | 
704 | 
705 | [route]
706 | layers = -4
707 | 
708 | [convolutional]
709 | batch_normalize=1
710 | filters=128
711 | size=1
712 | stride=1
713 | pad=1
714 | activation=leaky
715 | 
716 | [upsample]
717 | stride=2
718 | 
719 | [route]
720 | layers = -1, 36
721 | 
722 | 
723 | 
724 | [convolutional]
725 | batch_normalize=1
726 | filters=128
727 | size=1
728 | stride=1
729 | pad=1
730 | activation=leaky
731 | 
732 | [convolutional]
733 | batch_normalize=1
734 | size=3
735 | stride=1
736 | pad=1
737 | filters=256
738 | activation=leaky
739 | 
740 | [convolutional]
741 | batch_normalize=1
742 | filters=128
743 | size=1
744 | stride=1
745 | pad=1
746 | activation=leaky
747 | 
748 | [convolutional]
749 | batch_normalize=1
750 | size=3
751 | stride=1
752 | pad=1
753 | filters=256
754 | activation=leaky
755 | 
756 | [convolutional]
757 | batch_normalize=1
758 | filters=128
759 | size=1
760 | stride=1
761 | pad=1
762 | activation=leaky
763 | 
764 | [convolutional]
765 | batch_normalize=1
766 | size=3
767 | stride=1
768 | pad=1
769 | filters=256
770 | activation=leaky
771 | 
772 | [convolutional]
773 | size=1
774 | stride=1
775 | pad=1
776 | filters=504
777 | activation=linear
778 | 
779 | 
780 | [yolo]
781 | mask = 0-71
782 | anchors = utils/kmeans/hrsc_512.txt
783 | classes=1
784 | num=9
785 | jitter=.3
786 | ignore_thresh = .7
787 | truth_thresh = 1
788 | random=1
789 | 


--------------------------------------------------------------------------------
/cfg/HRSC+/yolov3_512_ma.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | #batch=1
  4 | #subdivisions=1
  5 | # Training
  6 | batch=16
  7 | subdivisions=1
  8 | width=512
  9 | height=512
 10 | channels=3
 11 | momentum=0.9
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.001
 19 | burn_in=1000
 20 | max_batches = 500200
 21 | policy=steps
 22 | steps=400000,450000
 23 | scales=.1,.1
 24 | 
 25 | [convolutional]
 26 | batch_normalize=1
 27 | filters=32
 28 | size=3
 29 | stride=1
 30 | pad=1
 31 | activation=leaky
 32 | 
 33 | # Downsample
 34 | 
 35 | [convolutional]
 36 | batch_normalize=1
 37 | filters=64
 38 | size=3
 39 | stride=2
 40 | pad=1
 41 | activation=leaky
 42 | 
 43 | [convolutional]
 44 | batch_normalize=1
 45 | filters=32
 46 | size=1
 47 | stride=1
 48 | pad=1
 49 | activation=leaky
 50 | 
 51 | [convolutional]
 52 | batch_normalize=1
 53 | filters=64
 54 | size=3
 55 | stride=1
 56 | pad=1
 57 | activation=leaky
 58 | 
 59 | [shortcut]
 60 | from=-3
 61 | activation=linear
 62 | 
 63 | # Downsample
 64 | 
 65 | [convolutional]
 66 | batch_normalize=1
 67 | filters=128
 68 | size=3
 69 | stride=2
 70 | pad=1
 71 | activation=leaky
 72 | 
 73 | [convolutional]
 74 | batch_normalize=1
 75 | filters=64
 76 | size=1
 77 | stride=1
 78 | pad=1
 79 | activation=leaky
 80 | 
 81 | [convolutional]
 82 | batch_normalize=1
 83 | filters=128
 84 | size=3
 85 | stride=1
 86 | pad=1
 87 | activation=leaky
 88 | 
 89 | [shortcut]
 90 | from=-3
 91 | activation=linear
 92 | 
 93 | [convolutional]
 94 | batch_normalize=1
 95 | filters=64
 96 | size=1
 97 | stride=1
 98 | pad=1
 99 | activation=leaky
100 | 
101 | [convolutional]
102 | batch_normalize=1
103 | filters=128
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 | 
109 | [shortcut]
110 | from=-3
111 | activation=linear
112 | 
113 | # Downsample
114 | 
115 | [convolutional]
116 | batch_normalize=1
117 | filters=256
118 | size=3
119 | stride=2
120 | pad=1
121 | activation=leaky
122 | 
123 | [convolutional]
124 | batch_normalize=1
125 | filters=128
126 | size=1
127 | stride=1
128 | pad=1
129 | activation=leaky
130 | 
131 | [convolutional]
132 | batch_normalize=1
133 | filters=256
134 | size=3
135 | stride=1
136 | pad=1
137 | activation=leaky
138 | 
139 | [shortcut]
140 | from=-3
141 | activation=linear
142 | 
143 | [convolutional]
144 | batch_normalize=1
145 | filters=128
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=leaky
150 | 
151 | [convolutional]
152 | batch_normalize=1
153 | filters=256
154 | size=3
155 | stride=1
156 | pad=1
157 | activation=leaky
158 | 
159 | [shortcut]
160 | from=-3
161 | activation=linear
162 | 
163 | [convolutional]
164 | batch_normalize=1
165 | filters=128
166 | size=1
167 | stride=1
168 | pad=1
169 | activation=leaky
170 | 
171 | [convolutional]
172 | batch_normalize=1
173 | filters=256
174 | size=3
175 | stride=1
176 | pad=1
177 | activation=leaky
178 | 
179 | [shortcut]
180 | from=-3
181 | activation=linear
182 | 
183 | [convolutional]
184 | batch_normalize=1
185 | filters=128
186 | size=1
187 | stride=1
188 | pad=1
189 | activation=leaky
190 | 
191 | [convolutional]
192 | batch_normalize=1
193 | filters=256
194 | size=3
195 | stride=1
196 | pad=1
197 | activation=leaky
198 | 
199 | [shortcut]
200 | from=-3
201 | activation=linear
202 | 
203 | 
204 | [convolutional]
205 | batch_normalize=1
206 | filters=128
207 | size=1
208 | stride=1
209 | pad=1
210 | activation=leaky
211 | 
212 | [convolutional]
213 | batch_normalize=1
214 | filters=256
215 | size=3
216 | stride=1
217 | pad=1
218 | activation=leaky
219 | 
220 | [shortcut]
221 | from=-3
222 | activation=linear
223 | 
224 | [convolutional]
225 | batch_normalize=1
226 | filters=128
227 | size=1
228 | stride=1
229 | pad=1
230 | activation=leaky
231 | 
232 | [convolutional]
233 | batch_normalize=1
234 | filters=256
235 | size=3
236 | stride=1
237 | pad=1
238 | activation=leaky
239 | 
240 | [shortcut]
241 | from=-3
242 | activation=linear
243 | 
244 | [convolutional]
245 | batch_normalize=1
246 | filters=128
247 | size=1
248 | stride=1
249 | pad=1
250 | activation=leaky
251 | 
252 | [convolutional]
253 | batch_normalize=1
254 | filters=256
255 | size=3
256 | stride=1
257 | pad=1
258 | activation=leaky
259 | 
260 | [shortcut]
261 | from=-3
262 | activation=linear
263 | 
264 | [convolutional]
265 | batch_normalize=1
266 | filters=128
267 | size=1
268 | stride=1
269 | pad=1
270 | activation=leaky
271 | 
272 | [convolutional]
273 | batch_normalize=1
274 | filters=256
275 | size=3
276 | stride=1
277 | pad=1
278 | activation=leaky
279 | 
280 | [shortcut]
281 | from=-3
282 | activation=linear
283 | 
284 | # Downsample
285 | 
286 | [convolutional]
287 | batch_normalize=1
288 | filters=512
289 | size=3
290 | stride=2
291 | pad=1
292 | activation=leaky
293 | 
294 | [convolutional]
295 | batch_normalize=1
296 | filters=256
297 | size=1
298 | stride=1
299 | pad=1
300 | activation=leaky
301 | 
302 | [convolutional]
303 | batch_normalize=1
304 | filters=512
305 | size=3
306 | stride=1
307 | pad=1
308 | activation=leaky
309 | 
310 | [shortcut]
311 | from=-3
312 | activation=linear
313 | 
314 | 
315 | [convolutional]
316 | batch_normalize=1
317 | filters=256
318 | size=1
319 | stride=1
320 | pad=1
321 | activation=leaky
322 | 
323 | [convolutional]
324 | batch_normalize=1
325 | filters=512
326 | size=3
327 | stride=1
328 | pad=1
329 | activation=leaky
330 | 
331 | [shortcut]
332 | from=-3
333 | activation=linear
334 | 
335 | 
336 | [convolutional]
337 | batch_normalize=1
338 | filters=256
339 | size=1
340 | stride=1
341 | pad=1
342 | activation=leaky
343 | 
344 | [convolutional]
345 | batch_normalize=1
346 | filters=512
347 | size=3
348 | stride=1
349 | pad=1
350 | activation=leaky
351 | 
352 | [shortcut]
353 | from=-3
354 | activation=linear
355 | 
356 | 
357 | [convolutional]
358 | batch_normalize=1
359 | filters=256
360 | size=1
361 | stride=1
362 | pad=1
363 | activation=leaky
364 | 
365 | [convolutional]
366 | batch_normalize=1
367 | filters=512
368 | size=3
369 | stride=1
370 | pad=1
371 | activation=leaky
372 | 
373 | [shortcut]
374 | from=-3
375 | activation=linear
376 | 
377 | [convolutional]
378 | batch_normalize=1
379 | filters=256
380 | size=1
381 | stride=1
382 | pad=1
383 | activation=leaky
384 | 
385 | [convolutional]
386 | batch_normalize=1
387 | filters=512
388 | size=3
389 | stride=1
390 | pad=1
391 | activation=leaky
392 | 
393 | [shortcut]
394 | from=-3
395 | activation=linear
396 | 
397 | 
398 | [convolutional]
399 | batch_normalize=1
400 | filters=256
401 | size=1
402 | stride=1
403 | pad=1
404 | activation=leaky
405 | 
406 | [convolutional]
407 | batch_normalize=1
408 | filters=512
409 | size=3
410 | stride=1
411 | pad=1
412 | activation=leaky
413 | 
414 | [shortcut]
415 | from=-3
416 | activation=linear
417 | 
418 | 
419 | [convolutional]
420 | batch_normalize=1
421 | filters=256
422 | size=1
423 | stride=1
424 | pad=1
425 | activation=leaky
426 | 
427 | [convolutional]
428 | batch_normalize=1
429 | filters=512
430 | size=3
431 | stride=1
432 | pad=1
433 | activation=leaky
434 | 
435 | [shortcut]
436 | from=-3
437 | activation=linear
438 | 
439 | [convolutional]
440 | batch_normalize=1
441 | filters=256
442 | size=1
443 | stride=1
444 | pad=1
445 | activation=leaky
446 | 
447 | [convolutional]
448 | batch_normalize=1
449 | filters=512
450 | size=3
451 | stride=1
452 | pad=1
453 | activation=leaky
454 | 
455 | [shortcut]
456 | from=-3
457 | activation=linear
458 | 
459 | # Downsample
460 | 
461 | [convolutional]
462 | batch_normalize=1
463 | filters=1024
464 | size=3
465 | stride=2
466 | pad=1
467 | activation=leaky
468 | 
469 | [convolutional]
470 | batch_normalize=1
471 | filters=512
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=leaky
476 | 
477 | [convolutional]
478 | batch_normalize=1
479 | filters=1024
480 | size=3
481 | stride=1
482 | pad=1
483 | activation=leaky
484 | 
485 | [shortcut]
486 | from=-3
487 | activation=linear
488 | 
489 | [convolutional]
490 | batch_normalize=1
491 | filters=512
492 | size=1
493 | stride=1
494 | pad=1
495 | activation=leaky
496 | 
497 | [convolutional]
498 | batch_normalize=1
499 | filters=1024
500 | size=3
501 | stride=1
502 | pad=1
503 | activation=leaky
504 | 
505 | [shortcut]
506 | from=-3
507 | activation=linear
508 | 
509 | [convolutional]
510 | batch_normalize=1
511 | filters=512
512 | size=1
513 | stride=1
514 | pad=1
515 | activation=leaky
516 | 
517 | [convolutional]
518 | batch_normalize=1
519 | filters=1024
520 | size=3
521 | stride=1
522 | pad=1
523 | activation=leaky
524 | 
525 | [shortcut]
526 | from=-3
527 | activation=linear
528 | 
529 | [convolutional]
530 | batch_normalize=1
531 | filters=512
532 | size=1
533 | stride=1
534 | pad=1
535 | activation=leaky
536 | 
537 | [convolutional]
538 | batch_normalize=1
539 | filters=1024
540 | size=3
541 | stride=1
542 | pad=1
543 | activation=leaky
544 | 
545 | [shortcut]
546 | from=-3
547 | activation=linear
548 | 
549 | ######################
550 | 
551 | [convolutional]
552 | batch_normalize=1
553 | filters=512
554 | size=1
555 | stride=1
556 | pad=1
557 | activation=leaky
558 | 
559 | [convolutional]
560 | batch_normalize=1
561 | size=3
562 | stride=1
563 | pad=1
564 | filters=1024
565 | activation=leaky
566 | 
567 | [convolutional]
568 | batch_normalize=1
569 | filters=512
570 | size=1
571 | stride=1
572 | pad=1
573 | activation=leaky
574 | 
575 | [convolutional]
576 | batch_normalize=1
577 | size=3
578 | stride=1
579 | pad=1
580 | filters=1024
581 | activation=leaky
582 | 
583 | [convolutional]
584 | batch_normalize=1
585 | filters=512
586 | size=1
587 | stride=1
588 | pad=1
589 | activation=leaky
590 | 
591 | [convolutional]
592 | batch_normalize=1
593 | size=3
594 | stride=1
595 | pad=1
596 | filters=1024
597 | activation=leaky
598 | 
599 | [convolutional]
600 | size=1
601 | stride=1
602 | pad=1
603 | filters=504
604 | activation=linear
605 | 
606 | # 角度定义为x+ 0; 顺时针; 正负0.5pi
607 | [yolo]
608 | mask = 144-215
609 | anchors = ara 879, 3170, 6813, 11599, 20813, 28065 / 4.0, 6.4, 9.2  / -75, -60, -45, -30, -15 ,0,15, 30,45, 60,75, 90
610 | classes=1
611 | num=9
612 | jitter=.3
613 | ignore_thresh = .7
614 | truth_thresh = 1
615 | random=1
616 | 
617 | 
618 | [route]
619 | layers = -4
620 | 
621 | [convolutional]
622 | batch_normalize=1
623 | filters=256
624 | size=1
625 | stride=1
626 | pad=1
627 | activation=leaky
628 | 
629 | [upsample]
630 | stride=2
631 | 
632 | [route]
633 | layers = -1, 61
634 | 
635 | 
636 | 
637 | [convolutional]
638 | batch_normalize=1
639 | filters=256
640 | size=1
641 | stride=1
642 | pad=1
643 | activation=leaky
644 | 
645 | [convolutional]
646 | batch_normalize=1
647 | size=3
648 | stride=1
649 | pad=1
650 | filters=512
651 | activation=leaky
652 | 
653 | [convolutional]
654 | batch_normalize=1
655 | filters=256
656 | size=1
657 | stride=1
658 | pad=1
659 | activation=leaky
660 | 
661 | [convolutional]
662 | batch_normalize=1
663 | size=3
664 | stride=1
665 | pad=1
666 | filters=512
667 | activation=leaky
668 | 
669 | [convolutional]
670 | batch_normalize=1
671 | filters=256
672 | size=1
673 | stride=1
674 | pad=1
675 | activation=leaky
676 | 
677 | [convolutional]
678 | batch_normalize=1
679 | size=3
680 | stride=1
681 | pad=1
682 | filters=512
683 | activation=leaky
684 | 
685 | [convolutional]
686 | size=1
687 | stride=1
688 | pad=1
689 | filters=504
690 | activation=linear
691 | 
692 | 
693 | [yolo]
694 | mask = 72-143
695 | anchors = ara 879, 3170, 6813, 11599, 20813, 28065 / 4.0, 6.4, 9.2  / -75, -60, -45, -30, -15 ,0,15, 30,45, 60,75, 90
696 | classes=1
697 | num=9
698 | jitter=.3
699 | ignore_thresh = .7
700 | truth_thresh = 1
701 | random=1
702 | 
703 | 
704 | 
705 | [route]
706 | layers = -4
707 | 
708 | [convolutional]
709 | batch_normalize=1
710 | filters=128
711 | size=1
712 | stride=1
713 | pad=1
714 | activation=leaky
715 | 
716 | [upsample]
717 | stride=2
718 | 
719 | [route]
720 | layers = -1, 36
721 | 
722 | 
723 | 
724 | [convolutional]
725 | batch_normalize=1
726 | filters=128
727 | size=1
728 | stride=1
729 | pad=1
730 | activation=leaky
731 | 
732 | [convolutional]
733 | batch_normalize=1
734 | size=3
735 | stride=1
736 | pad=1
737 | filters=256
738 | activation=leaky
739 | 
740 | [convolutional]
741 | batch_normalize=1
742 | filters=128
743 | size=1
744 | stride=1
745 | pad=1
746 | activation=leaky
747 | 
748 | [convolutional]
749 | batch_normalize=1
750 | size=3
751 | stride=1
752 | pad=1
753 | filters=256
754 | activation=leaky
755 | 
756 | [convolutional]
757 | batch_normalize=1
758 | filters=128
759 | size=1
760 | stride=1
761 | pad=1
762 | activation=leaky
763 | 
764 | [convolutional]
765 | batch_normalize=1
766 | size=3
767 | stride=1
768 | pad=1
769 | filters=256
770 | activation=leaky
771 | 
772 | [convolutional]
773 | size=1
774 | stride=1
775 | pad=1
776 | filters=504
777 | activation=linear
778 | 
779 | 
780 | [yolo]
781 | mask = 0-71
782 | anchors = ara 879, 3170, 6813, 11599, 20813, 28065 / 4.0, 6.4, 9.2  / -75, -60, -45, -30, -15 ,0,15, 30,45, 60,75, 90
783 | classes=1
784 | num=9
785 | jitter=.3
786 | ignore_thresh = .7
787 | truth_thresh = 1
788 | random=1
789 | 


--------------------------------------------------------------------------------
/cfg/HRSC+/yolov3_512_se.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | #batch=1
  4 | #subdivisions=1
  5 | # Training
  6 | batch=16
  7 | subdivisions=1
  8 | width=512
  9 | height=512
 10 | channels=3
 11 | momentum=0.9
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.001
 19 | burn_in=1000
 20 | max_batches = 500200
 21 | policy=steps
 22 | steps=400000,450000
 23 | scales=.1,.1
 24 | 
 25 | [convolutional]
 26 | batch_normalize=1
 27 | filters=32
 28 | size=3
 29 | stride=1
 30 | pad=1
 31 | activation=leaky
 32 | 
 33 | # Downsample
 34 | 
 35 | [convolutional]
 36 | batch_normalize=1
 37 | filters=64
 38 | size=3
 39 | stride=2
 40 | pad=1
 41 | activation=leaky
 42 | 
 43 | [convolutional]
 44 | batch_normalize=1
 45 | filters=32
 46 | size=1
 47 | stride=1
 48 | pad=1
 49 | activation=leaky
 50 | 
 51 | [convolutional]
 52 | batch_normalize=1
 53 | filters=64
 54 | size=3
 55 | stride=1
 56 | pad=1
 57 | activation=leaky
 58 | 
 59 | [shortcut]
 60 | from=-3
 61 | activation=linear
 62 | 
 63 | # Downsample
 64 | 
 65 | [convolutional]
 66 | batch_normalize=1
 67 | filters=128
 68 | size=3
 69 | stride=2
 70 | pad=1
 71 | activation=leaky
 72 | 
 73 | [convolutional]
 74 | batch_normalize=1
 75 | filters=64
 76 | size=1
 77 | stride=1
 78 | pad=1
 79 | activation=leaky
 80 | 
 81 | [convolutional]
 82 | batch_normalize=1
 83 | filters=128
 84 | size=3
 85 | stride=1
 86 | pad=1
 87 | activation=leaky
 88 | 
 89 | [shortcut]
 90 | from=-3
 91 | activation=linear
 92 | 
 93 | [convolutional]
 94 | batch_normalize=1
 95 | filters=64
 96 | size=1
 97 | stride=1
 98 | pad=1
 99 | activation=leaky
100 | 
101 | [convolutional]
102 | batch_normalize=1
103 | filters=128
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 | 
109 | [shortcut]
110 | from=-3
111 | activation=linear
112 | 
113 | # Downsample
114 | # s=8
115 | [convolutional]
116 | batch_normalize=1
117 | filters=256
118 | size=3
119 | stride=2
120 | pad=1
121 | activation=leaky
122 | 
123 | # --SE block---
124 | [se]
125 | channels=256 
126 | 
127 | [convolutional]
128 | batch_normalize=1
129 | filters=128
130 | size=1
131 | stride=1
132 | pad=1
133 | activation=leaky
134 | 
135 | [convolutional]
136 | batch_normalize=1
137 | filters=256
138 | size=3
139 | stride=1
140 | pad=1
141 | activation=leaky
142 | 
143 | [shortcut]
144 | from=-4
145 | activation=linear
146 | 
147 | [se]
148 | channels=256 
149 | 
150 | [convolutional]
151 | batch_normalize=1
152 | filters=128
153 | size=1
154 | stride=1
155 | pad=1
156 | activation=leaky
157 | 
158 | [convolutional]
159 | batch_normalize=1
160 | filters=256
161 | size=3
162 | stride=1
163 | pad=1
164 | activation=leaky
165 | 
166 | [shortcut]
167 | from=-4
168 | activation=linear
169 | 
170 | [se]
171 | channels=256 
172 | 
173 | [convolutional]
174 | batch_normalize=1
175 | filters=128
176 | size=1
177 | stride=1
178 | pad=1
179 | activation=leaky
180 | 
181 | [convolutional]
182 | batch_normalize=1
183 | filters=256
184 | size=3
185 | stride=1
186 | pad=1
187 | activation=leaky
188 | 
189 | [shortcut]
190 | from=-4
191 | activation=linear
192 | 
193 | [se]
194 | channels=256 
195 | 
196 | [convolutional]
197 | batch_normalize=1
198 | filters=128
199 | size=1
200 | stride=1
201 | pad=1
202 | activation=leaky
203 | 
204 | [convolutional]
205 | batch_normalize=1
206 | filters=256
207 | size=3
208 | stride=1
209 | pad=1
210 | activation=leaky
211 | 
212 | [shortcut]
213 | from=-4
214 | activation=linear
215 | 
216 | [se]
217 | channels=256 
218 | 
219 | [convolutional]
220 | batch_normalize=1
221 | filters=128
222 | size=1
223 | stride=1
224 | pad=1
225 | activation=leaky
226 | 
227 | [convolutional]
228 | batch_normalize=1
229 | filters=256
230 | size=3
231 | stride=1
232 | pad=1
233 | activation=leaky
234 | 
235 | [shortcut]
236 | from=-4
237 | activation=linear
238 | 
239 | [se]
240 | channels=256 
241 | 
242 | [convolutional]
243 | batch_normalize=1
244 | filters=128
245 | size=1
246 | stride=1
247 | pad=1
248 | activation=leaky
249 | 
250 | [convolutional]
251 | batch_normalize=1
252 | filters=256
253 | size=3
254 | stride=1
255 | pad=1
256 | activation=leaky
257 | 
258 | [shortcut]
259 | from=-4
260 | activation=linear
261 | 
262 | [se]
263 | channels=256 
264 | 
265 | [convolutional]
266 | batch_normalize=1
267 | filters=128
268 | size=1
269 | stride=1
270 | pad=1
271 | activation=leaky
272 | 
273 | [convolutional]
274 | batch_normalize=1
275 | filters=256
276 | size=3
277 | stride=1
278 | pad=1
279 | activation=leaky
280 | 
281 | [shortcut]
282 | from=-4
283 | activation=linear
284 | 
285 | [se]
286 | channels=256 
287 | 
288 | [convolutional]
289 | batch_normalize=1
290 | filters=128
291 | size=1
292 | stride=1
293 | pad=1
294 | activation=leaky
295 | 
296 | [convolutional]
297 | batch_normalize=1
298 | filters=256
299 | size=3
300 | stride=1
301 | pad=1
302 | activation=leaky
303 | 
304 | [shortcut]
305 | from=-4
306 | activation=linear
307 | 
308 | # Downsample
309 | # s=16
310 | [convolutional]
311 | batch_normalize=1
312 | filters=512
313 | size=3
314 | stride=2
315 | pad=1
316 | activation=leaky
317 | 
318 | [se]
319 | channels=512 
320 | 
321 | [convolutional]
322 | batch_normalize=1
323 | filters=256
324 | size=1
325 | stride=1
326 | pad=1
327 | activation=leaky
328 | 
329 | [convolutional]
330 | batch_normalize=1
331 | filters=512
332 | size=3
333 | stride=1
334 | pad=1
335 | activation=leaky
336 | 
337 | [shortcut]
338 | from=-4
339 | activation=linear
340 | 
341 | [se]
342 | channels=512 
343 | 
344 | [convolutional]
345 | batch_normalize=1
346 | filters=256
347 | size=1
348 | stride=1
349 | pad=1
350 | activation=leaky
351 | 
352 | [convolutional]
353 | batch_normalize=1
354 | filters=512
355 | size=3
356 | stride=1
357 | pad=1
358 | activation=leaky
359 | 
360 | [shortcut]
361 | from=-4
362 | activation=linear
363 | 
364 | [se]
365 | channels=512 
366 | 
367 | [convolutional]
368 | batch_normalize=1
369 | filters=256
370 | size=1
371 | stride=1
372 | pad=1
373 | activation=leaky
374 | 
375 | [convolutional]
376 | batch_normalize=1
377 | filters=512
378 | size=3
379 | stride=1
380 | pad=1
381 | activation=leaky
382 | 
383 | [shortcut]
384 | from=-4
385 | activation=linear
386 | 
387 | [se]
388 | channels=512 
389 | 
390 | [convolutional]
391 | batch_normalize=1
392 | filters=256
393 | size=1
394 | stride=1
395 | pad=1
396 | activation=leaky
397 | 
398 | [convolutional]
399 | batch_normalize=1
400 | filters=512
401 | size=3
402 | stride=1
403 | pad=1
404 | activation=leaky
405 | 
406 | [shortcut]
407 | from=-4
408 | activation=linear
409 | 
410 | [se]
411 | channels=512 
412 | 
413 | [convolutional]
414 | batch_normalize=1
415 | filters=256
416 | size=1
417 | stride=1
418 | pad=1
419 | activation=leaky
420 | 
421 | [convolutional]
422 | batch_normalize=1
423 | filters=512
424 | size=3
425 | stride=1
426 | pad=1
427 | activation=leaky
428 | 
429 | [shortcut]
430 | from=-4
431 | activation=linear
432 | 
433 | [se]
434 | channels=512 
435 | 
436 | [convolutional]
437 | batch_normalize=1
438 | filters=256
439 | size=1
440 | stride=1
441 | pad=1
442 | activation=leaky
443 | 
444 | [convolutional]
445 | batch_normalize=1
446 | filters=512
447 | size=3
448 | stride=1
449 | pad=1
450 | activation=leaky
451 | 
452 | [shortcut]
453 | from=-4
454 | activation=linear
455 | 
456 | [se]
457 | channels=512
458 | 
459 | [convolutional]
460 | batch_normalize=1
461 | filters=256
462 | size=1
463 | stride=1
464 | pad=1
465 | activation=leaky
466 | 
467 | [convolutional]
468 | batch_normalize=1
469 | filters=512
470 | size=3
471 | stride=1
472 | pad=1
473 | activation=leaky
474 | 
475 | [shortcut]
476 | from=-4
477 | activation=linear
478 | 
479 | [se]
480 | channels=512 
481 | 
482 | [convolutional]
483 | batch_normalize=1
484 | filters=256
485 | size=1
486 | stride=1
487 | pad=1
488 | activation=leaky
489 | 
490 | [convolutional]
491 | batch_normalize=1
492 | filters=512
493 | size=3
494 | stride=1
495 | pad=1
496 | activation=leaky
497 | 
498 | [shortcut]
499 | from=-4
500 | activation=linear
501 | 
502 | # Downsample
503 | # s=32
504 | [convolutional]
505 | batch_normalize=1
506 | filters=1024
507 | size=3
508 | stride=2
509 | pad=1
510 | activation=leaky
511 | 
512 | [se]
513 | channels=1024 
514 | 
515 | [convolutional]
516 | batch_normalize=1
517 | filters=512
518 | size=1
519 | stride=1
520 | pad=1
521 | activation=leaky
522 | 
523 | [convolutional]
524 | batch_normalize=1
525 | filters=1024
526 | size=3
527 | stride=1
528 | pad=1
529 | activation=leaky
530 | 
531 | [shortcut]
532 | from=-4
533 | activation=linear
534 | 
535 | [se]
536 | channels=1024 
537 | 
538 | [convolutional]
539 | batch_normalize=1
540 | filters=512
541 | size=1
542 | stride=1
543 | pad=1
544 | activation=leaky
545 | 
546 | [convolutional]
547 | batch_normalize=1
548 | filters=1024
549 | size=3
550 | stride=1
551 | pad=1
552 | activation=leaky
553 | 
554 | [shortcut]
555 | from=-4
556 | activation=linear
557 | 
558 | [se]
559 | channels=1024 
560 | 
561 | [convolutional]
562 | batch_normalize=1
563 | filters=512
564 | size=1
565 | stride=1
566 | pad=1
567 | activation=leaky
568 | 
569 | [convolutional]
570 | batch_normalize=1
571 | filters=1024
572 | size=3
573 | stride=1
574 | pad=1
575 | activation=leaky
576 | 
577 | [shortcut]
578 | from=-4
579 | activation=linear
580 | 
581 | 
582 | [se]
583 | channels=1024 
584 | 
585 | [convolutional]
586 | batch_normalize=1
587 | filters=512
588 | size=1
589 | stride=1
590 | pad=1
591 | activation=leaky
592 | 
593 | [convolutional]
594 | batch_normalize=1
595 | filters=1024
596 | size=3
597 | stride=1
598 | pad=1
599 | activation=leaky
600 | 
601 | [shortcut]
602 | from=-4
603 | activation=linear
604 | 
605 | ######## backbone到此为止 ##############
606 | 
607 | [convolutional]
608 | batch_normalize=1
609 | filters=512
610 | size=1
611 | stride=1
612 | pad=1
613 | activation=leaky
614 | 
615 | [convolutional]
616 | batch_normalize=1
617 | size=3
618 | stride=1
619 | pad=1
620 | filters=1024
621 | activation=leaky
622 | 
623 | [convolutional]
624 | batch_normalize=1
625 | filters=512
626 | size=1
627 | stride=1
628 | pad=1
629 | activation=leaky
630 | 
631 | [convolutional]
632 | batch_normalize=1
633 | size=3
634 | stride=1
635 | pad=1
636 | filters=1024
637 | activation=leaky
638 | 
639 | [convolutional]
640 | batch_normalize=1
641 | filters=512
642 | size=1
643 | stride=1
644 | pad=1
645 | activation=leaky
646 | 
647 | [convolutional]
648 | batch_normalize=1
649 | size=3
650 | stride=1
651 | pad=1
652 | filters=1024
653 | activation=leaky
654 | 
655 | [convolutional]
656 | size=1
657 | stride=1
658 | pad=1
659 | filters=504
660 | activation=linear
661 | 
662 | # 角度定义为x+ 0; 顺时针; 正负0.5pi
663 | [yolo]
664 | mask = 144-215
665 | anchors = utils/kmeans/hrsc_512.txt
666 | classes=1
667 | num=9
668 | jitter=.3
669 | ignore_thresh = .7
670 | truth_thresh = 1
671 | random=1
672 | 
673 | 
674 | [route]
675 | layers = -4
676 | 
677 | [convolutional]
678 | batch_normalize=1
679 | filters=256
680 | size=1
681 | stride=1
682 | pad=1
683 | activation=leaky
684 | 
685 | [upsample]
686 | stride=2
687 | 
688 | [route]
689 | layers = -1, 61
690 | 
691 | 
692 | 
693 | [convolutional]
694 | batch_normalize=1
695 | filters=256
696 | size=1
697 | stride=1
698 | pad=1
699 | activation=leaky
700 | 
701 | [convolutional]
702 | batch_normalize=1
703 | size=3
704 | stride=1
705 | pad=1
706 | filters=512
707 | activation=leaky
708 | 
709 | [convolutional]
710 | batch_normalize=1
711 | filters=256
712 | size=1
713 | stride=1
714 | pad=1
715 | activation=leaky
716 | 
717 | [convolutional]
718 | batch_normalize=1
719 | size=3
720 | stride=1
721 | pad=1
722 | filters=512
723 | activation=leaky
724 | 
725 | [convolutional]
726 | batch_normalize=1
727 | filters=256
728 | size=1
729 | stride=1
730 | pad=1
731 | activation=leaky
732 | 
733 | [convolutional]
734 | batch_normalize=1
735 | size=3
736 | stride=1
737 | pad=1
738 | filters=512
739 | activation=leaky
740 | 
741 | [convolutional]
742 | size=1
743 | stride=1
744 | pad=1
745 | filters=504
746 | activation=linear
747 | 
748 | 
749 | [yolo]
750 | mask = 72-143
751 | anchors = utils/kmeans/hrsc_512.txt
752 | classes=1
753 | num=9
754 | jitter=.3
755 | ignore_thresh = .7
756 | truth_thresh = 1
757 | random=1
758 | 
759 | 
760 | 
761 | [route]
762 | layers = -4
763 | 
764 | [convolutional]
765 | batch_normalize=1
766 | filters=128
767 | size=1
768 | stride=1
769 | pad=1
770 | activation=leaky
771 | 
772 | [upsample]
773 | stride=2
774 | 
775 | [route]
776 | layers = -1, 36
777 | 
778 | 
779 | 
780 | [convolutional]
781 | batch_normalize=1
782 | filters=128
783 | size=1
784 | stride=1
785 | pad=1
786 | activation=leaky
787 | 
788 | [convolutional]
789 | batch_normalize=1
790 | size=3
791 | stride=1
792 | pad=1
793 | filters=256
794 | activation=leaky
795 | 
796 | [convolutional]
797 | batch_normalize=1
798 | filters=128
799 | size=1
800 | stride=1
801 | pad=1
802 | activation=leaky
803 | 
804 | [convolutional]
805 | batch_normalize=1
806 | size=3
807 | stride=1
808 | pad=1
809 | filters=256
810 | activation=leaky
811 | 
812 | [convolutional]
813 | batch_normalize=1
814 | filters=128
815 | size=1
816 | stride=1
817 | pad=1
818 | activation=leaky
819 | 
820 | [convolutional]
821 | batch_normalize=1
822 | size=3
823 | stride=1
824 | pad=1
825 | filters=256
826 | activation=leaky
827 | 
828 | [convolutional]
829 | size=1
830 | stride=1
831 | pad=1
832 | filters=504
833 | activation=linear
834 | 
835 | 
836 | [yolo]
837 | mask = 0-71
838 | anchors = utils/kmeans/hrsc_512.txt
839 | classes=1
840 | num=9
841 | jitter=.3
842 | ignore_thresh = .7
843 | truth_thresh = 1
844 | random=1
845 | 


--------------------------------------------------------------------------------
/cfg/HRSC/hyp.py:
--------------------------------------------------------------------------------
 1 | giou: 0.1       # giou loss gain 1.582
 2 | cls: 27.76      # cls loss gain  (CE=~1.0, uCE=~20)
 3 | cls_pw: 1.446   # cls BCELoss positive_weight
 4 | obj: 20.35      # obj loss gain (*=80 for uBCE with 80 classes)
 5 | obj_pw: 3.941   # obj BCELoss positive_weight
 6 | iou_t: 0.4      # iou training threshold
 7 | ang_t: 3.1415926/6
 8 | reg: 1.0
 9 | fl_gamma: 0.5   # focal loss gamma
10 | context_factor: 1.0 # 按照短边h来设置的,wh的增幅相同； 调试时设为倒数直接检测
11 | grayscale: 0.3  # 灰度强度为0.3-1.0
12 | 
13 | 
14 | # lr
15 | lr0: 0.00001
16 | multiplier:10
17 | warm_epoch:1
18 | lrf: -4.        # final LambdaLR learning rate = lr0 * (10 ** lrf)
19 | momentum: 0.97  # SGD momentum
20 | weight_decay: 0.0004569  # optimizer weight decay
21 | 
22 | 
23 | # aug
24 | hsv_s: 0.5      # image HSV-Saturation augmentation (fraction)
25 | hsv_v: 0.3      # image HSV-Value augmentation (fraction)
26 | degrees: 5.0    # image rotation (+/- deg)
27 | translate: 0.1  # image translation (+/- fraction)
28 | scale: 0.15      # image scale (+/- gain)
29 | shear: 0.0
30 | gamma: 0.2
31 | blur:  1.3
32 | noise: 0.01
33 | contrast: 0.15
34 | sharpen: 0.15
35 | # copypaste: 0.3  # 船身 h 的 3sigma 段位以内 
36 | 
37 | 
38 | # training
39 | epochs: 1000
40 | batch_size: 1
41 | save_interval: 100
42 | test_interval: 10
43 | 


--------------------------------------------------------------------------------
/cfg/HRSC/yolov3-416.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | #batch=1
  4 | #subdivisions=1
  5 | # Training
  6 | batch=16
  7 | subdivisions=1
  8 | width=416
  9 | height=416
 10 | channels=3
 11 | momentum=0.9
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.001
 19 | burn_in=1000
 20 | max_batches = 500200
 21 | policy=steps
 22 | steps=400000,450000
 23 | scales=.1,.1
 24 | 
 25 | [convolutional]
 26 | batch_normalize=1
 27 | filters=32
 28 | size=3
 29 | stride=1
 30 | pad=1
 31 | activation=leaky
 32 | 
 33 | # Downsample
 34 | 
 35 | [convolutional]
 36 | batch_normalize=1
 37 | filters=64
 38 | size=3
 39 | stride=2
 40 | pad=1
 41 | activation=leaky
 42 | 
 43 | [convolutional]
 44 | batch_normalize=1
 45 | filters=32
 46 | size=1
 47 | stride=1
 48 | pad=1
 49 | activation=leaky
 50 | 
 51 | [convolutional]
 52 | batch_normalize=1
 53 | filters=64
 54 | size=3
 55 | stride=1
 56 | pad=1
 57 | activation=leaky
 58 | 
 59 | [shortcut]
 60 | from=-3
 61 | activation=linear
 62 | 
 63 | # Downsample
 64 | 
 65 | [convolutional]
 66 | batch_normalize=1
 67 | filters=128
 68 | size=3
 69 | stride=2
 70 | pad=1
 71 | activation=leaky
 72 | 
 73 | [convolutional]
 74 | batch_normalize=1
 75 | filters=64
 76 | size=1
 77 | stride=1
 78 | pad=1
 79 | activation=leaky
 80 | 
 81 | [convolutional]
 82 | batch_normalize=1
 83 | filters=128
 84 | size=3
 85 | stride=1
 86 | pad=1
 87 | activation=leaky
 88 | 
 89 | [shortcut]
 90 | from=-3
 91 | activation=linear
 92 | 
 93 | [convolutional]
 94 | batch_normalize=1
 95 | filters=64
 96 | size=1
 97 | stride=1
 98 | pad=1
 99 | activation=leaky
100 | 
101 | [convolutional]
102 | batch_normalize=1
103 | filters=128
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 | 
109 | [shortcut]
110 | from=-3
111 | activation=linear
112 | 
113 | # Downsample
114 | 
115 | [convolutional]
116 | batch_normalize=1
117 | filters=256
118 | size=3
119 | stride=2
120 | pad=1
121 | activation=leaky
122 | 
123 | [convolutional]
124 | batch_normalize=1
125 | filters=128
126 | size=1
127 | stride=1
128 | pad=1
129 | activation=leaky
130 | 
131 | [convolutional]
132 | batch_normalize=1
133 | filters=256
134 | size=3
135 | stride=1
136 | pad=1
137 | activation=leaky
138 | 
139 | [shortcut]
140 | from=-3
141 | activation=linear
142 | 
143 | [convolutional]
144 | batch_normalize=1
145 | filters=128
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=leaky
150 | 
151 | [convolutional]
152 | batch_normalize=1
153 | filters=256
154 | size=3
155 | stride=1
156 | pad=1
157 | activation=leaky
158 | 
159 | [shortcut]
160 | from=-3
161 | activation=linear
162 | 
163 | [convolutional]
164 | batch_normalize=1
165 | filters=128
166 | size=1
167 | stride=1
168 | pad=1
169 | activation=leaky
170 | 
171 | [convolutional]
172 | batch_normalize=1
173 | filters=256
174 | size=3
175 | stride=1
176 | pad=1
177 | activation=leaky
178 | 
179 | [shortcut]
180 | from=-3
181 | activation=linear
182 | 
183 | [convolutional]
184 | batch_normalize=1
185 | filters=128
186 | size=1
187 | stride=1
188 | pad=1
189 | activation=leaky
190 | 
191 | [convolutional]
192 | batch_normalize=1
193 | filters=256
194 | size=3
195 | stride=1
196 | pad=1
197 | activation=leaky
198 | 
199 | [shortcut]
200 | from=-3
201 | activation=linear
202 | 
203 | 
204 | [convolutional]
205 | batch_normalize=1
206 | filters=128
207 | size=1
208 | stride=1
209 | pad=1
210 | activation=leaky
211 | 
212 | [convolutional]
213 | batch_normalize=1
214 | filters=256
215 | size=3
216 | stride=1
217 | pad=1
218 | activation=leaky
219 | 
220 | [shortcut]
221 | from=-3
222 | activation=linear
223 | 
224 | [convolutional]
225 | batch_normalize=1
226 | filters=128
227 | size=1
228 | stride=1
229 | pad=1
230 | activation=leaky
231 | 
232 | [convolutional]
233 | batch_normalize=1
234 | filters=256
235 | size=3
236 | stride=1
237 | pad=1
238 | activation=leaky
239 | 
240 | [shortcut]
241 | from=-3
242 | activation=linear
243 | 
244 | [convolutional]
245 | batch_normalize=1
246 | filters=128
247 | size=1
248 | stride=1
249 | pad=1
250 | activation=leaky
251 | 
252 | [convolutional]
253 | batch_normalize=1
254 | filters=256
255 | size=3
256 | stride=1
257 | pad=1
258 | activation=leaky
259 | 
260 | [shortcut]
261 | from=-3
262 | activation=linear
263 | 
264 | [convolutional]
265 | batch_normalize=1
266 | filters=128
267 | size=1
268 | stride=1
269 | pad=1
270 | activation=leaky
271 | 
272 | [convolutional]
273 | batch_normalize=1
274 | filters=256
275 | size=3
276 | stride=1
277 | pad=1
278 | activation=leaky
279 | 
280 | [shortcut]
281 | from=-3
282 | activation=linear
283 | 
284 | # Downsample
285 | 
286 | [convolutional]
287 | batch_normalize=1
288 | filters=512
289 | size=3
290 | stride=2
291 | pad=1
292 | activation=leaky
293 | 
294 | [convolutional]
295 | batch_normalize=1
296 | filters=256
297 | size=1
298 | stride=1
299 | pad=1
300 | activation=leaky
301 | 
302 | [convolutional]
303 | batch_normalize=1
304 | filters=512
305 | size=3
306 | stride=1
307 | pad=1
308 | activation=leaky
309 | 
310 | [shortcut]
311 | from=-3
312 | activation=linear
313 | 
314 | 
315 | [convolutional]
316 | batch_normalize=1
317 | filters=256
318 | size=1
319 | stride=1
320 | pad=1
321 | activation=leaky
322 | 
323 | [convolutional]
324 | batch_normalize=1
325 | filters=512
326 | size=3
327 | stride=1
328 | pad=1
329 | activation=leaky
330 | 
331 | [shortcut]
332 | from=-3
333 | activation=linear
334 | 
335 | 
336 | [convolutional]
337 | batch_normalize=1
338 | filters=256
339 | size=1
340 | stride=1
341 | pad=1
342 | activation=leaky
343 | 
344 | [convolutional]
345 | batch_normalize=1
346 | filters=512
347 | size=3
348 | stride=1
349 | pad=1
350 | activation=leaky
351 | 
352 | [shortcut]
353 | from=-3
354 | activation=linear
355 | 
356 | 
357 | [convolutional]
358 | batch_normalize=1
359 | filters=256
360 | size=1
361 | stride=1
362 | pad=1
363 | activation=leaky
364 | 
365 | [convolutional]
366 | batch_normalize=1
367 | filters=512
368 | size=3
369 | stride=1
370 | pad=1
371 | activation=leaky
372 | 
373 | [shortcut]
374 | from=-3
375 | activation=linear
376 | 
377 | [convolutional]
378 | batch_normalize=1
379 | filters=256
380 | size=1
381 | stride=1
382 | pad=1
383 | activation=leaky
384 | 
385 | [convolutional]
386 | batch_normalize=1
387 | filters=512
388 | size=3
389 | stride=1
390 | pad=1
391 | activation=leaky
392 | 
393 | [shortcut]
394 | from=-3
395 | activation=linear
396 | 
397 | 
398 | [convolutional]
399 | batch_normalize=1
400 | filters=256
401 | size=1
402 | stride=1
403 | pad=1
404 | activation=leaky
405 | 
406 | [convolutional]
407 | batch_normalize=1
408 | filters=512
409 | size=3
410 | stride=1
411 | pad=1
412 | activation=leaky
413 | 
414 | [shortcut]
415 | from=-3
416 | activation=linear
417 | 
418 | 
419 | [convolutional]
420 | batch_normalize=1
421 | filters=256
422 | size=1
423 | stride=1
424 | pad=1
425 | activation=leaky
426 | 
427 | [convolutional]
428 | batch_normalize=1
429 | filters=512
430 | size=3
431 | stride=1
432 | pad=1
433 | activation=leaky
434 | 
435 | [shortcut]
436 | from=-3
437 | activation=linear
438 | 
439 | [convolutional]
440 | batch_normalize=1
441 | filters=256
442 | size=1
443 | stride=1
444 | pad=1
445 | activation=leaky
446 | 
447 | [convolutional]
448 | batch_normalize=1
449 | filters=512
450 | size=3
451 | stride=1
452 | pad=1
453 | activation=leaky
454 | 
455 | [shortcut]
456 | from=-3
457 | activation=linear
458 | 
459 | # Downsample
460 | 
461 | [convolutional]
462 | batch_normalize=1
463 | filters=1024
464 | size=3
465 | stride=2
466 | pad=1
467 | activation=leaky
468 | 
469 | [convolutional]
470 | batch_normalize=1
471 | filters=512
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=leaky
476 | 
477 | [convolutional]
478 | batch_normalize=1
479 | filters=1024
480 | size=3
481 | stride=1
482 | pad=1
483 | activation=leaky
484 | 
485 | [shortcut]
486 | from=-3
487 | activation=linear
488 | 
489 | [convolutional]
490 | batch_normalize=1
491 | filters=512
492 | size=1
493 | stride=1
494 | pad=1
495 | activation=leaky
496 | 
497 | [convolutional]
498 | batch_normalize=1
499 | filters=1024
500 | size=3
501 | stride=1
502 | pad=1
503 | activation=leaky
504 | 
505 | [shortcut]
506 | from=-3
507 | activation=linear
508 | 
509 | [convolutional]
510 | batch_normalize=1
511 | filters=512
512 | size=1
513 | stride=1
514 | pad=1
515 | activation=leaky
516 | 
517 | [convolutional]
518 | batch_normalize=1
519 | filters=1024
520 | size=3
521 | stride=1
522 | pad=1
523 | activation=leaky
524 | 
525 | [shortcut]
526 | from=-3
527 | activation=linear
528 | 
529 | [convolutional]
530 | batch_normalize=1
531 | filters=512
532 | size=1
533 | stride=1
534 | pad=1
535 | activation=leaky
536 | 
537 | [convolutional]
538 | batch_normalize=1
539 | filters=1024
540 | size=3
541 | stride=1
542 | pad=1
543 | activation=leaky
544 | 
545 | [shortcut]
546 | from=-3
547 | activation=linear
548 | 
549 | ######################
550 | 
551 | [convolutional]
552 | batch_normalize=1
553 | filters=512
554 | size=1
555 | stride=1
556 | pad=1
557 | activation=leaky
558 | 
559 | [convolutional]
560 | batch_normalize=1
561 | size=3
562 | stride=1
563 | pad=1
564 | filters=1024
565 | activation=leaky
566 | 
567 | [convolutional]
568 | batch_normalize=1
569 | filters=512
570 | size=1
571 | stride=1
572 | pad=1
573 | activation=leaky
574 | 
575 | [convolutional]
576 | batch_normalize=1
577 | size=3
578 | stride=1
579 | pad=1
580 | filters=1024
581 | activation=leaky
582 | 
583 | [convolutional]
584 | batch_normalize=1
585 | filters=512
586 | size=1
587 | stride=1
588 | pad=1
589 | activation=leaky
590 | 
591 | [convolutional]
592 | batch_normalize=1
593 | size=3
594 | stride=1
595 | pad=1
596 | filters=1024
597 | activation=leaky
598 | 
599 | [convolutional]
600 | size=1
601 | stride=1
602 | pad=1
603 | filters=504
604 | activation=linear
605 | 
606 | # 角度定义为x+ 0; 顺时针; 正负0.5pi
607 | [yolo]
608 | mask = 144-215
609 | anchors = /py/rotated-yolo/utils/kmeans/hrsc_512.txt
610 | classes=1
611 | num=9
612 | jitter=.3
613 | ignore_thresh = .7
614 | truth_thresh = 1
615 | random=1
616 | 
617 | 
618 | [route]
619 | layers = -4
620 | 
621 | [convolutional]
622 | batch_normalize=1
623 | filters=256
624 | size=1
625 | stride=1
626 | pad=1
627 | activation=leaky
628 | 
629 | [upsample]
630 | stride=2
631 | 
632 | [route]
633 | layers = -1, 61
634 | 
635 | 
636 | 
637 | [convolutional]
638 | batch_normalize=1
639 | filters=256
640 | size=1
641 | stride=1
642 | pad=1
643 | activation=leaky
644 | 
645 | [convolutional]
646 | batch_normalize=1
647 | size=3
648 | stride=1
649 | pad=1
650 | filters=512
651 | activation=leaky
652 | 
653 | [convolutional]
654 | batch_normalize=1
655 | filters=256
656 | size=1
657 | stride=1
658 | pad=1
659 | activation=leaky
660 | 
661 | [convolutional]
662 | batch_normalize=1
663 | size=3
664 | stride=1
665 | pad=1
666 | filters=512
667 | activation=leaky
668 | 
669 | [convolutional]
670 | batch_normalize=1
671 | filters=256
672 | size=1
673 | stride=1
674 | pad=1
675 | activation=leaky
676 | 
677 | [convolutional]
678 | batch_normalize=1
679 | size=3
680 | stride=1
681 | pad=1
682 | filters=512
683 | activation=leaky
684 | 
685 | [convolutional]
686 | size=1
687 | stride=1
688 | pad=1
689 | filters=504
690 | activation=linear
691 | 
692 | 
693 | [yolo]
694 | mask = 72-143
695 | anchors = /py/rotated-yolo/utils/kmeans/hrsc_512.txt
696 | classes=1
697 | num=18
698 | jitter=.3
699 | ignore_thresh = .7
700 | truth_thresh = 1
701 | random=1
702 | 
703 | 
704 | 
705 | [route]
706 | layers = -4
707 | 
708 | [convolutional]
709 | batch_normalize=1
710 | filters=128
711 | size=1
712 | stride=1
713 | pad=1
714 | activation=leaky
715 | 
716 | [upsample]
717 | stride=2
718 | 
719 | [route]
720 | layers = -1, 36
721 | 
722 | 
723 | 
724 | [convolutional]
725 | batch_normalize=1
726 | filters=128
727 | size=1
728 | stride=1
729 | pad=1
730 | activation=leaky
731 | 
732 | [convolutional]
733 | batch_normalize=1
734 | size=3
735 | stride=1
736 | pad=1
737 | filters=256
738 | activation=leaky
739 | 
740 | [convolutional]
741 | batch_normalize=1
742 | filters=128
743 | size=1
744 | stride=1
745 | pad=1
746 | activation=leaky
747 | 
748 | [convolutional]
749 | batch_normalize=1
750 | size=3
751 | stride=1
752 | pad=1
753 | filters=256
754 | activation=leaky
755 | 
756 | [convolutional]
757 | batch_normalize=1
758 | filters=128
759 | size=1
760 | stride=1
761 | pad=1
762 | activation=leaky
763 | 
764 | [convolutional]
765 | batch_normalize=1
766 | size=3
767 | stride=1
768 | pad=1
769 | filters=256
770 | activation=leaky
771 | 
772 | [convolutional]
773 | size=1
774 | stride=1
775 | pad=1
776 | filters=504
777 | activation=linear
778 | 
779 | 
780 | [yolo]
781 | mask = 0-71
782 | anchors = /py/rotated-yolo/utils/kmeans/hrsc_512.txt
783 | classes=1
784 | num=9
785 | jitter=.3
786 | ignore_thresh = .7
787 | truth_thresh = 1
788 | random=1
789 | 


--------------------------------------------------------------------------------
/cfg/ICDAR/hyp.py:
--------------------------------------------------------------------------------
 1 | giou: 0.1       # giou loss gain 1.582
 2 | cls: 27.76      # cls loss gain  (CE=~1.0, uCE=~20)
 3 | cls_pw: 1.446   # cls BCELoss positive_weight
 4 | obj: 20.35      # obj loss gain (*=80 for uBCE with 80 classes)
 5 | obj_pw: 3.941   # obj BCELoss positive_weight
 6 | iou_t: 0.3      # iou training threshold
 7 | ang_t: 3.1415926/12
 8 | reg: 1.0
 9 | fl_gamma: 0.5   # focal loss gamma
10 | context_factor: 1.0 # 按照短边h来设置的,wh的增幅相同； 调试时设为倒数直接检测
11 | 
12 | 
13 | # lr
14 | lr0: 0.00008
15 | multiplier:10
16 | warm_epoch:5
17 | lrf: -4.        # final LambdaLR learning rate = lr0 * (10 ** lrf)
18 | momentum: 0.97  # SGD momentum
19 | weight_decay: 0.0004569  # optimizer weight decay
20 | 
21 | 
22 | # aug
23 | hsv_s: 0.5      # image HSV-Saturation augmentation (fraction)
24 | hsv_v: 0.3      # image HSV-Value augmentation (fraction)
25 | degrees: 50.0    # image rotation (+/- deg)
26 | translate: 0.2  # image translation (+/- fraction)
27 | scale: 0.2      # image scale (+/- gain)
28 | shear: 0.0
29 | gamma: 0.2
30 | blur:  1.2
31 | noise: 0.005
32 | contrast: 0.0
33 | sharpen: 0.0
34 | copypaste: 0.0  
35 | grayscale: 0.05 
36 | 
37 | 
38 | # training
39 | epochs: 1500
40 | batch_size: 4
41 | save_interval: 50
42 | test_interval: 1
43 | 


--------------------------------------------------------------------------------
/cfg/ICDAR/yolov3_608.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | #batch=1
  4 | #subdivisions=1
  5 | # Training
  6 | batch=16
  7 | subdivisions=1
  8 | width=512
  9 | height=512
 10 | channels=3
 11 | momentum=0.9
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.001
 19 | burn_in=1000
 20 | max_batches = 500200
 21 | policy=steps
 22 | steps=400000,450000
 23 | scales=.1,.1
 24 | 
 25 | [convolutional]
 26 | batch_normalize=1
 27 | filters=32
 28 | size=3
 29 | stride=1
 30 | pad=1
 31 | activation=leaky
 32 | 
 33 | # Downsample
 34 | 
 35 | [convolutional]
 36 | batch_normalize=1
 37 | filters=64
 38 | size=3
 39 | stride=2
 40 | pad=1
 41 | activation=leaky
 42 | 
 43 | [convolutional]
 44 | batch_normalize=1
 45 | filters=32
 46 | size=1
 47 | stride=1
 48 | pad=1
 49 | activation=leaky
 50 | 
 51 | [convolutional]
 52 | batch_normalize=1
 53 | filters=64
 54 | size=3
 55 | stride=1
 56 | pad=1
 57 | activation=leaky
 58 | 
 59 | [shortcut]
 60 | from=-3
 61 | activation=linear
 62 | 
 63 | # Downsample
 64 | 
 65 | [convolutional]
 66 | batch_normalize=1
 67 | filters=128
 68 | size=3
 69 | stride=2
 70 | pad=1
 71 | activation=leaky
 72 | 
 73 | [convolutional]
 74 | batch_normalize=1
 75 | filters=64
 76 | size=1
 77 | stride=1
 78 | pad=1
 79 | activation=leaky
 80 | 
 81 | [convolutional]
 82 | batch_normalize=1
 83 | filters=128
 84 | size=3
 85 | stride=1
 86 | pad=1
 87 | activation=leaky
 88 | 
 89 | [shortcut]
 90 | from=-3
 91 | activation=linear
 92 | 
 93 | [convolutional]
 94 | batch_normalize=1
 95 | filters=64
 96 | size=1
 97 | stride=1
 98 | pad=1
 99 | activation=leaky
100 | 
101 | [convolutional]
102 | batch_normalize=1
103 | filters=128
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 | 
109 | [shortcut]
110 | from=-3
111 | activation=linear
112 | 
113 | # Downsample
114 | 
115 | [convolutional]
116 | batch_normalize=1
117 | filters=256
118 | size=3
119 | stride=2
120 | pad=1
121 | activation=leaky
122 | 
123 | [convolutional]
124 | batch_normalize=1
125 | filters=128
126 | size=1
127 | stride=1
128 | pad=1
129 | activation=leaky
130 | 
131 | [convolutional]
132 | batch_normalize=1
133 | filters=256
134 | size=3
135 | stride=1
136 | pad=1
137 | activation=leaky
138 | 
139 | [shortcut]
140 | from=-3
141 | activation=linear
142 | 
143 | [convolutional]
144 | batch_normalize=1
145 | filters=128
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=leaky
150 | 
151 | [convolutional]
152 | batch_normalize=1
153 | filters=256
154 | size=3
155 | stride=1
156 | pad=1
157 | activation=leaky
158 | 
159 | [shortcut]
160 | from=-3
161 | activation=linear
162 | 
163 | [convolutional]
164 | batch_normalize=1
165 | filters=128
166 | size=1
167 | stride=1
168 | pad=1
169 | activation=leaky
170 | 
171 | [convolutional]
172 | batch_normalize=1
173 | filters=256
174 | size=3
175 | stride=1
176 | pad=1
177 | activation=leaky
178 | 
179 | [shortcut]
180 | from=-3
181 | activation=linear
182 | 
183 | [convolutional]
184 | batch_normalize=1
185 | filters=128
186 | size=1
187 | stride=1
188 | pad=1
189 | activation=leaky
190 | 
191 | [convolutional]
192 | batch_normalize=1
193 | filters=256
194 | size=3
195 | stride=1
196 | pad=1
197 | activation=leaky
198 | 
199 | [shortcut]
200 | from=-3
201 | activation=linear
202 | 
203 | 
204 | [convolutional]
205 | batch_normalize=1
206 | filters=128
207 | size=1
208 | stride=1
209 | pad=1
210 | activation=leaky
211 | 
212 | [convolutional]
213 | batch_normalize=1
214 | filters=256
215 | size=3
216 | stride=1
217 | pad=1
218 | activation=leaky
219 | 
220 | [shortcut]
221 | from=-3
222 | activation=linear
223 | 
224 | [convolutional]
225 | batch_normalize=1
226 | filters=128
227 | size=1
228 | stride=1
229 | pad=1
230 | activation=leaky
231 | 
232 | [convolutional]
233 | batch_normalize=1
234 | filters=256
235 | size=3
236 | stride=1
237 | pad=1
238 | activation=leaky
239 | 
240 | [shortcut]
241 | from=-3
242 | activation=linear
243 | 
244 | [convolutional]
245 | batch_normalize=1
246 | filters=128
247 | size=1
248 | stride=1
249 | pad=1
250 | activation=leaky
251 | 
252 | [convolutional]
253 | batch_normalize=1
254 | filters=256
255 | size=3
256 | stride=1
257 | pad=1
258 | activation=leaky
259 | 
260 | [shortcut]
261 | from=-3
262 | activation=linear
263 | 
264 | [convolutional]
265 | batch_normalize=1
266 | filters=128
267 | size=1
268 | stride=1
269 | pad=1
270 | activation=leaky
271 | 
272 | [convolutional]
273 | batch_normalize=1
274 | filters=256
275 | size=3
276 | stride=1
277 | pad=1
278 | activation=leaky
279 | 
280 | [shortcut]
281 | from=-3
282 | activation=linear
283 | 
284 | # Downsample
285 | 
286 | [convolutional]
287 | batch_normalize=1
288 | filters=512
289 | size=3
290 | stride=2
291 | pad=1
292 | activation=leaky
293 | 
294 | [convolutional]
295 | batch_normalize=1
296 | filters=256
297 | size=1
298 | stride=1
299 | pad=1
300 | activation=leaky
301 | 
302 | [convolutional]
303 | batch_normalize=1
304 | filters=512
305 | size=3
306 | stride=1
307 | pad=1
308 | activation=leaky
309 | 
310 | [shortcut]
311 | from=-3
312 | activation=linear
313 | 
314 | 
315 | [convolutional]
316 | batch_normalize=1
317 | filters=256
318 | size=1
319 | stride=1
320 | pad=1
321 | activation=leaky
322 | 
323 | [convolutional]
324 | batch_normalize=1
325 | filters=512
326 | size=3
327 | stride=1
328 | pad=1
329 | activation=leaky
330 | 
331 | [shortcut]
332 | from=-3
333 | activation=linear
334 | 
335 | 
336 | [convolutional]
337 | batch_normalize=1
338 | filters=256
339 | size=1
340 | stride=1
341 | pad=1
342 | activation=leaky
343 | 
344 | [convolutional]
345 | batch_normalize=1
346 | filters=512
347 | size=3
348 | stride=1
349 | pad=1
350 | activation=leaky
351 | 
352 | [shortcut]
353 | from=-3
354 | activation=linear
355 | 
356 | 
357 | [convolutional]
358 | batch_normalize=1
359 | filters=256
360 | size=1
361 | stride=1
362 | pad=1
363 | activation=leaky
364 | 
365 | [convolutional]
366 | batch_normalize=1
367 | filters=512
368 | size=3
369 | stride=1
370 | pad=1
371 | activation=leaky
372 | 
373 | [shortcut]
374 | from=-3
375 | activation=linear
376 | 
377 | [convolutional]
378 | batch_normalize=1
379 | filters=256
380 | size=1
381 | stride=1
382 | pad=1
383 | activation=leaky
384 | 
385 | [convolutional]
386 | batch_normalize=1
387 | filters=512
388 | size=3
389 | stride=1
390 | pad=1
391 | activation=leaky
392 | 
393 | [shortcut]
394 | from=-3
395 | activation=linear
396 | 
397 | 
398 | [convolutional]
399 | batch_normalize=1
400 | filters=256
401 | size=1
402 | stride=1
403 | pad=1
404 | activation=leaky
405 | 
406 | [convolutional]
407 | batch_normalize=1
408 | filters=512
409 | size=3
410 | stride=1
411 | pad=1
412 | activation=leaky
413 | 
414 | [shortcut]
415 | from=-3
416 | activation=linear
417 | 
418 | 
419 | [convolutional]
420 | batch_normalize=1
421 | filters=256
422 | size=1
423 | stride=1
424 | pad=1
425 | activation=leaky
426 | 
427 | [convolutional]
428 | batch_normalize=1
429 | filters=512
430 | size=3
431 | stride=1
432 | pad=1
433 | activation=leaky
434 | 
435 | [shortcut]
436 | from=-3
437 | activation=linear
438 | 
439 | [convolutional]
440 | batch_normalize=1
441 | filters=256
442 | size=1
443 | stride=1
444 | pad=1
445 | activation=leaky
446 | 
447 | [convolutional]
448 | batch_normalize=1
449 | filters=512
450 | size=3
451 | stride=1
452 | pad=1
453 | activation=leaky
454 | 
455 | [shortcut]
456 | from=-3
457 | activation=linear
458 | 
459 | # Downsample
460 | 
461 | [convolutional]
462 | batch_normalize=1
463 | filters=1024
464 | size=3
465 | stride=2
466 | pad=1
467 | activation=leaky
468 | 
469 | [convolutional]
470 | batch_normalize=1
471 | filters=512
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=leaky
476 | 
477 | [convolutional]
478 | batch_normalize=1
479 | filters=1024
480 | size=3
481 | stride=1
482 | pad=1
483 | activation=leaky
484 | 
485 | [shortcut]
486 | from=-3
487 | activation=linear
488 | 
489 | [convolutional]
490 | batch_normalize=1
491 | filters=512
492 | size=1
493 | stride=1
494 | pad=1
495 | activation=leaky
496 | 
497 | [convolutional]
498 | batch_normalize=1
499 | filters=1024
500 | size=3
501 | stride=1
502 | pad=1
503 | activation=leaky
504 | 
505 | [shortcut]
506 | from=-3
507 | activation=linear
508 | 
509 | [convolutional]
510 | batch_normalize=1
511 | filters=512
512 | size=1
513 | stride=1
514 | pad=1
515 | activation=leaky
516 | 
517 | [convolutional]
518 | batch_normalize=1
519 | filters=1024
520 | size=3
521 | stride=1
522 | pad=1
523 | activation=leaky
524 | 
525 | [shortcut]
526 | from=-3
527 | activation=linear
528 | 
529 | [convolutional]
530 | batch_normalize=1
531 | filters=512
532 | size=1
533 | stride=1
534 | pad=1
535 | activation=leaky
536 | 
537 | [convolutional]
538 | batch_normalize=1
539 | filters=1024
540 | size=3
541 | stride=1
542 | pad=1
543 | activation=leaky
544 | 
545 | [shortcut]
546 | from=-3
547 | activation=linear
548 | 
549 | ######################
550 | 
551 | [convolutional]
552 | batch_normalize=1
553 | filters=512
554 | size=1
555 | stride=1
556 | pad=1
557 | activation=leaky
558 | 
559 | [convolutional]
560 | batch_normalize=1
561 | size=3
562 | stride=1
563 | pad=1
564 | filters=1024
565 | activation=leaky
566 | 
567 | [convolutional]
568 | batch_normalize=1
569 | filters=512
570 | size=1
571 | stride=1
572 | pad=1
573 | activation=leaky
574 | 
575 | [convolutional]
576 | batch_normalize=1
577 | size=3
578 | stride=1
579 | pad=1
580 | filters=1024
581 | activation=leaky
582 | 
583 | [convolutional]
584 | batch_normalize=1
585 | filters=512
586 | size=1
587 | stride=1
588 | pad=1
589 | activation=leaky
590 | 
591 | [convolutional]
592 | batch_normalize=1
593 | size=3
594 | stride=1
595 | pad=1
596 | filters=1024
597 | activation=leaky
598 | 
599 | [convolutional]
600 | size=1
601 | stride=1
602 | pad=1
603 | filters=504
604 | activation=linear
605 | 
606 | # 角度定义为x+ 0; 顺时针; 正负0.5pi
607 | [yolo]
608 | mask = 144-215
609 | anchors = utils/kmeans/icdar_608_care.txt
610 | classes=1
611 | num=9
612 | jitter=.3
613 | ignore_thresh = .7
614 | truth_thresh = 1
615 | random=1
616 | 
617 | 
618 | [route]
619 | layers = -4
620 | 
621 | [convolutional]
622 | batch_normalize=1
623 | filters=256
624 | size=1
625 | stride=1
626 | pad=1
627 | activation=leaky
628 | 
629 | [upsample]
630 | stride=2
631 | 
632 | [route]
633 | layers = -1, 61
634 | 
635 | 
636 | 
637 | [convolutional]
638 | batch_normalize=1
639 | filters=256
640 | size=1
641 | stride=1
642 | pad=1
643 | activation=leaky
644 | 
645 | [convolutional]
646 | batch_normalize=1
647 | size=3
648 | stride=1
649 | pad=1
650 | filters=512
651 | activation=leaky
652 | 
653 | [convolutional]
654 | batch_normalize=1
655 | filters=256
656 | size=1
657 | stride=1
658 | pad=1
659 | activation=leaky
660 | 
661 | [convolutional]
662 | batch_normalize=1
663 | size=3
664 | stride=1
665 | pad=1
666 | filters=512
667 | activation=leaky
668 | 
669 | [convolutional]
670 | batch_normalize=1
671 | filters=256
672 | size=1
673 | stride=1
674 | pad=1
675 | activation=leaky
676 | 
677 | [convolutional]
678 | batch_normalize=1
679 | size=3
680 | stride=1
681 | pad=1
682 | filters=512
683 | activation=leaky
684 | 
685 | [convolutional]
686 | size=1
687 | stride=1
688 | pad=1
689 | filters=504
690 | activation=linear
691 | 
692 | 
693 | [yolo]
694 | mask = 72-143
695 | anchors = utils/kmeans/icdar_608_care.txt
696 | classes=1
697 | num=9
698 | jitter=.3
699 | ignore_thresh = .7
700 | truth_thresh = 1
701 | random=1
702 | 
703 | 
704 | 
705 | [route]
706 | layers = -4
707 | 
708 | [convolutional]
709 | batch_normalize=1
710 | filters=128
711 | size=1
712 | stride=1
713 | pad=1
714 | activation=leaky
715 | 
716 | [upsample]
717 | stride=2
718 | 
719 | [route]
720 | layers = -1, 36
721 | 
722 | 
723 | 
724 | [convolutional]
725 | batch_normalize=1
726 | filters=128
727 | size=1
728 | stride=1
729 | pad=1
730 | activation=leaky
731 | 
732 | [convolutional]
733 | batch_normalize=1
734 | size=3
735 | stride=1
736 | pad=1
737 | filters=256
738 | activation=leaky
739 | 
740 | [convolutional]
741 | batch_normalize=1
742 | filters=128
743 | size=1
744 | stride=1
745 | pad=1
746 | activation=leaky
747 | 
748 | [convolutional]
749 | batch_normalize=1
750 | size=3
751 | stride=1
752 | pad=1
753 | filters=256
754 | activation=leaky
755 | 
756 | [convolutional]
757 | batch_normalize=1
758 | filters=128
759 | size=1
760 | stride=1
761 | pad=1
762 | activation=leaky
763 | 
764 | [convolutional]
765 | batch_normalize=1
766 | size=3
767 | stride=1
768 | pad=1
769 | filters=256
770 | activation=leaky
771 | 
772 | [convolutional]
773 | size=1
774 | stride=1
775 | pad=1
776 | filters=504
777 | activation=linear
778 | 
779 | 
780 | [yolo]
781 | mask = 0-71
782 | anchors = utils/kmeans/icdar_608_care.txt
783 | classes=1
784 | num=9
785 | jitter=.3
786 | ignore_thresh = .7
787 | truth_thresh = 1
788 | random=1
789 | 


--------------------------------------------------------------------------------
/cfg/ICDAR/yolov3_608_se.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | #batch=1
  4 | #subdivisions=1
  5 | # Training
  6 | batch=16
  7 | subdivisions=1
  8 | width=512
  9 | height=512
 10 | channels=3
 11 | momentum=0.9
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.001
 19 | burn_in=1000
 20 | max_batches = 500200
 21 | policy=steps
 22 | steps=400000,450000
 23 | scales=.1,.1
 24 | 
 25 | [convolutional]
 26 | batch_normalize=1
 27 | filters=32
 28 | size=3
 29 | stride=1
 30 | pad=1
 31 | activation=leaky
 32 | 
 33 | # Downsample
 34 | 
 35 | [convolutional]
 36 | batch_normalize=1
 37 | filters=64
 38 | size=3
 39 | stride=2
 40 | pad=1
 41 | activation=leaky
 42 | 
 43 | [convolutional]
 44 | batch_normalize=1
 45 | filters=32
 46 | size=1
 47 | stride=1
 48 | pad=1
 49 | activation=leaky
 50 | 
 51 | [convolutional]
 52 | batch_normalize=1
 53 | filters=64
 54 | size=3
 55 | stride=1
 56 | pad=1
 57 | activation=leaky
 58 | 
 59 | [shortcut]
 60 | from=-3
 61 | activation=linear
 62 | 
 63 | # Downsample
 64 | 
 65 | [convolutional]
 66 | batch_normalize=1
 67 | filters=128
 68 | size=3
 69 | stride=2
 70 | pad=1
 71 | activation=leaky
 72 | 
 73 | [convolutional]
 74 | batch_normalize=1
 75 | filters=64
 76 | size=1
 77 | stride=1
 78 | pad=1
 79 | activation=leaky
 80 | 
 81 | [convolutional]
 82 | batch_normalize=1
 83 | filters=128
 84 | size=3
 85 | stride=1
 86 | pad=1
 87 | activation=leaky
 88 | 
 89 | [shortcut]
 90 | from=-3
 91 | activation=linear
 92 | 
 93 | [convolutional]
 94 | batch_normalize=1
 95 | filters=64
 96 | size=1
 97 | stride=1
 98 | pad=1
 99 | activation=leaky
100 | 
101 | [convolutional]
102 | batch_normalize=1
103 | filters=128
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 | 
109 | [shortcut]
110 | from=-3
111 | activation=linear
112 | 
113 | # Downsample
114 | # s=8
115 | [convolutional]
116 | batch_normalize=1
117 | filters=256
118 | size=3
119 | stride=2
120 | pad=1
121 | activation=leaky
122 | 
123 | # --SE block---
124 | [se]
125 | channels=256 
126 | 
127 | [convolutional]
128 | batch_normalize=1
129 | filters=128
130 | size=1
131 | stride=1
132 | pad=1
133 | activation=leaky
134 | 
135 | [convolutional]
136 | batch_normalize=1
137 | filters=256
138 | size=3
139 | stride=1
140 | pad=1
141 | activation=leaky
142 | 
143 | [shortcut]
144 | from=-4
145 | activation=linear
146 | 
147 | [se]
148 | channels=256 
149 | 
150 | [convolutional]
151 | batch_normalize=1
152 | filters=128
153 | size=1
154 | stride=1
155 | pad=1
156 | activation=leaky
157 | 
158 | [convolutional]
159 | batch_normalize=1
160 | filters=256
161 | size=3
162 | stride=1
163 | pad=1
164 | activation=leaky
165 | 
166 | [shortcut]
167 | from=-4
168 | activation=linear
169 | 
170 | [se]
171 | channels=256 
172 | 
173 | [convolutional]
174 | batch_normalize=1
175 | filters=128
176 | size=1
177 | stride=1
178 | pad=1
179 | activation=leaky
180 | 
181 | [convolutional]
182 | batch_normalize=1
183 | filters=256
184 | size=3
185 | stride=1
186 | pad=1
187 | activation=leaky
188 | 
189 | [shortcut]
190 | from=-4
191 | activation=linear
192 | 
193 | [se]
194 | channels=256 
195 | 
196 | [convolutional]
197 | batch_normalize=1
198 | filters=128
199 | size=1
200 | stride=1
201 | pad=1
202 | activation=leaky
203 | 
204 | [convolutional]
205 | batch_normalize=1
206 | filters=256
207 | size=3
208 | stride=1
209 | pad=1
210 | activation=leaky
211 | 
212 | [shortcut]
213 | from=-4
214 | activation=linear
215 | 
216 | [se]
217 | channels=256 
218 | 
219 | [convolutional]
220 | batch_normalize=1
221 | filters=128
222 | size=1
223 | stride=1
224 | pad=1
225 | activation=leaky
226 | 
227 | [convolutional]
228 | batch_normalize=1
229 | filters=256
230 | size=3
231 | stride=1
232 | pad=1
233 | activation=leaky
234 | 
235 | [shortcut]
236 | from=-4
237 | activation=linear
238 | 
239 | [se]
240 | channels=256 
241 | 
242 | [convolutional]
243 | batch_normalize=1
244 | filters=128
245 | size=1
246 | stride=1
247 | pad=1
248 | activation=leaky
249 | 
250 | [convolutional]
251 | batch_normalize=1
252 | filters=256
253 | size=3
254 | stride=1
255 | pad=1
256 | activation=leaky
257 | 
258 | [shortcut]
259 | from=-4
260 | activation=linear
261 | 
262 | [se]
263 | channels=256 
264 | 
265 | [convolutional]
266 | batch_normalize=1
267 | filters=128
268 | size=1
269 | stride=1
270 | pad=1
271 | activation=leaky
272 | 
273 | [convolutional]
274 | batch_normalize=1
275 | filters=256
276 | size=3
277 | stride=1
278 | pad=1
279 | activation=leaky
280 | 
281 | [shortcut]
282 | from=-4
283 | activation=linear
284 | 
285 | [se]
286 | channels=256 
287 | 
288 | [convolutional]
289 | batch_normalize=1
290 | filters=128
291 | size=1
292 | stride=1
293 | pad=1
294 | activation=leaky
295 | 
296 | [convolutional]
297 | batch_normalize=1
298 | filters=256
299 | size=3
300 | stride=1
301 | pad=1
302 | activation=leaky
303 | 
304 | [shortcut]
305 | from=-4
306 | activation=linear
307 | 
308 | # Downsample
309 | # s=16
310 | [convolutional]
311 | batch_normalize=1
312 | filters=512
313 | size=3
314 | stride=2
315 | pad=1
316 | activation=leaky
317 | 
318 | [se]
319 | channels=512 
320 | 
321 | [convolutional]
322 | batch_normalize=1
323 | filters=256
324 | size=1
325 | stride=1
326 | pad=1
327 | activation=leaky
328 | 
329 | [convolutional]
330 | batch_normalize=1
331 | filters=512
332 | size=3
333 | stride=1
334 | pad=1
335 | activation=leaky
336 | 
337 | [shortcut]
338 | from=-4
339 | activation=linear
340 | 
341 | [se]
342 | channels=512 
343 | 
344 | [convolutional]
345 | batch_normalize=1
346 | filters=256
347 | size=1
348 | stride=1
349 | pad=1
350 | activation=leaky
351 | 
352 | [convolutional]
353 | batch_normalize=1
354 | filters=512
355 | size=3
356 | stride=1
357 | pad=1
358 | activation=leaky
359 | 
360 | [shortcut]
361 | from=-4
362 | activation=linear
363 | 
364 | [se]
365 | channels=512 
366 | 
367 | [convolutional]
368 | batch_normalize=1
369 | filters=256
370 | size=1
371 | stride=1
372 | pad=1
373 | activation=leaky
374 | 
375 | [convolutional]
376 | batch_normalize=1
377 | filters=512
378 | size=3
379 | stride=1
380 | pad=1
381 | activation=leaky
382 | 
383 | [shortcut]
384 | from=-4
385 | activation=linear
386 | 
387 | [se]
388 | channels=512 
389 | 
390 | [convolutional]
391 | batch_normalize=1
392 | filters=256
393 | size=1
394 | stride=1
395 | pad=1
396 | activation=leaky
397 | 
398 | [convolutional]
399 | batch_normalize=1
400 | filters=512
401 | size=3
402 | stride=1
403 | pad=1
404 | activation=leaky
405 | 
406 | [shortcut]
407 | from=-4
408 | activation=linear
409 | 
410 | [se]
411 | channels=512 
412 | 
413 | [convolutional]
414 | batch_normalize=1
415 | filters=256
416 | size=1
417 | stride=1
418 | pad=1
419 | activation=leaky
420 | 
421 | [convolutional]
422 | batch_normalize=1
423 | filters=512
424 | size=3
425 | stride=1
426 | pad=1
427 | activation=leaky
428 | 
429 | [shortcut]
430 | from=-4
431 | activation=linear
432 | 
433 | [se]
434 | channels=512 
435 | 
436 | [convolutional]
437 | batch_normalize=1
438 | filters=256
439 | size=1
440 | stride=1
441 | pad=1
442 | activation=leaky
443 | 
444 | [convolutional]
445 | batch_normalize=1
446 | filters=512
447 | size=3
448 | stride=1
449 | pad=1
450 | activation=leaky
451 | 
452 | [shortcut]
453 | from=-4
454 | activation=linear
455 | 
456 | [se]
457 | channels=512
458 | 
459 | [convolutional]
460 | batch_normalize=1
461 | filters=256
462 | size=1
463 | stride=1
464 | pad=1
465 | activation=leaky
466 | 
467 | [convolutional]
468 | batch_normalize=1
469 | filters=512
470 | size=3
471 | stride=1
472 | pad=1
473 | activation=leaky
474 | 
475 | [shortcut]
476 | from=-4
477 | activation=linear
478 | 
479 | [se]
480 | channels=512 
481 | 
482 | [convolutional]
483 | batch_normalize=1
484 | filters=256
485 | size=1
486 | stride=1
487 | pad=1
488 | activation=leaky
489 | 
490 | [convolutional]
491 | batch_normalize=1
492 | filters=512
493 | size=3
494 | stride=1
495 | pad=1
496 | activation=leaky
497 | 
498 | [shortcut]
499 | from=-4
500 | activation=linear
501 | 
502 | # Downsample
503 | # s=32
504 | [convolutional]
505 | batch_normalize=1
506 | filters=1024
507 | size=3
508 | stride=2
509 | pad=1
510 | activation=leaky
511 | 
512 | [se]
513 | channels=1024 
514 | 
515 | [convolutional]
516 | batch_normalize=1
517 | filters=512
518 | size=1
519 | stride=1
520 | pad=1
521 | activation=leaky
522 | 
523 | [convolutional]
524 | batch_normalize=1
525 | filters=1024
526 | size=3
527 | stride=1
528 | pad=1
529 | activation=leaky
530 | 
531 | [shortcut]
532 | from=-4
533 | activation=linear
534 | 
535 | [se]
536 | channels=1024 
537 | 
538 | [convolutional]
539 | batch_normalize=1
540 | filters=512
541 | size=1
542 | stride=1
543 | pad=1
544 | activation=leaky
545 | 
546 | [convolutional]
547 | batch_normalize=1
548 | filters=1024
549 | size=3
550 | stride=1
551 | pad=1
552 | activation=leaky
553 | 
554 | [shortcut]
555 | from=-4
556 | activation=linear
557 | 
558 | [se]
559 | channels=1024 
560 | 
561 | [convolutional]
562 | batch_normalize=1
563 | filters=512
564 | size=1
565 | stride=1
566 | pad=1
567 | activation=leaky
568 | 
569 | [convolutional]
570 | batch_normalize=1
571 | filters=1024
572 | size=3
573 | stride=1
574 | pad=1
575 | activation=leaky
576 | 
577 | [shortcut]
578 | from=-4
579 | activation=linear
580 | 
581 | 
582 | [se]
583 | channels=1024 
584 | 
585 | [convolutional]
586 | batch_normalize=1
587 | filters=512
588 | size=1
589 | stride=1
590 | pad=1
591 | activation=leaky
592 | 
593 | [convolutional]
594 | batch_normalize=1
595 | filters=1024
596 | size=3
597 | stride=1
598 | pad=1
599 | activation=leaky
600 | 
601 | [shortcut]
602 | from=-4
603 | activation=linear
604 | 
605 | ######## backbone到此为止 ##############
606 | 
607 | [convolutional]
608 | batch_normalize=1
609 | filters=512
610 | size=1
611 | stride=1
612 | pad=1
613 | activation=leaky
614 | 
615 | [convolutional]
616 | batch_normalize=1
617 | size=3
618 | stride=1
619 | pad=1
620 | filters=1024
621 | activation=leaky
622 | 
623 | [convolutional]
624 | batch_normalize=1
625 | filters=512
626 | size=1
627 | stride=1
628 | pad=1
629 | activation=leaky
630 | 
631 | [convolutional]
632 | batch_normalize=1
633 | size=3
634 | stride=1
635 | pad=1
636 | filters=1024
637 | activation=leaky
638 | 
639 | [convolutional]
640 | batch_normalize=1
641 | filters=512
642 | size=1
643 | stride=1
644 | pad=1
645 | activation=leaky
646 | 
647 | [convolutional]
648 | batch_normalize=1
649 | size=3
650 | stride=1
651 | pad=1
652 | filters=1024
653 | activation=leaky
654 | 
655 | [convolutional]
656 | size=1
657 | stride=1
658 | pad=1
659 | filters=504
660 | activation=linear
661 | 
662 | # 角度定义为x+ 0; 顺时针; 正负0.5pi
663 | [yolo]
664 | mask = 144-215
665 | anchors = utils/kmeans/icdar_608_care.txt
666 | classes=1
667 | num=9
668 | jitter=.3
669 | ignore_thresh = .7
670 | truth_thresh = 1
671 | random=1
672 | 
673 | 
674 | [route]
675 | layers = -4
676 | 
677 | [convolutional]
678 | batch_normalize=1
679 | filters=256
680 | size=1
681 | stride=1
682 | pad=1
683 | activation=leaky
684 | 
685 | [upsample]
686 | stride=2
687 | 
688 | [route]
689 | layers = -1, 61
690 | 
691 | 
692 | 
693 | [convolutional]
694 | batch_normalize=1
695 | filters=256
696 | size=1
697 | stride=1
698 | pad=1
699 | activation=leaky
700 | 
701 | [convolutional]
702 | batch_normalize=1
703 | size=3
704 | stride=1
705 | pad=1
706 | filters=512
707 | activation=leaky
708 | 
709 | [convolutional]
710 | batch_normalize=1
711 | filters=256
712 | size=1
713 | stride=1
714 | pad=1
715 | activation=leaky
716 | 
717 | [convolutional]
718 | batch_normalize=1
719 | size=3
720 | stride=1
721 | pad=1
722 | filters=512
723 | activation=leaky
724 | 
725 | [convolutional]
726 | batch_normalize=1
727 | filters=256
728 | size=1
729 | stride=1
730 | pad=1
731 | activation=leaky
732 | 
733 | [convolutional]
734 | batch_normalize=1
735 | size=3
736 | stride=1
737 | pad=1
738 | filters=512
739 | activation=leaky
740 | 
741 | [convolutional]
742 | size=1
743 | stride=1
744 | pad=1
745 | filters=504
746 | activation=linear
747 | 
748 | 
749 | [yolo]
750 | mask = 72-143
751 | anchors = utils/kmeans/icdar_608_care.txt
752 | classes=1
753 | num=9
754 | jitter=.3
755 | ignore_thresh = .7
756 | truth_thresh = 1
757 | random=1
758 | 
759 | 
760 | 
761 | [route]
762 | layers = -4
763 | 
764 | [convolutional]
765 | batch_normalize=1
766 | filters=128
767 | size=1
768 | stride=1
769 | pad=1
770 | activation=leaky
771 | 
772 | [upsample]
773 | stride=2
774 | 
775 | [route]
776 | layers = -1, 36
777 | 
778 | 
779 | 
780 | [convolutional]
781 | batch_normalize=1
782 | filters=128
783 | size=1
784 | stride=1
785 | pad=1
786 | activation=leaky
787 | 
788 | [convolutional]
789 | batch_normalize=1
790 | size=3
791 | stride=1
792 | pad=1
793 | filters=256
794 | activation=leaky
795 | 
796 | [convolutional]
797 | batch_normalize=1
798 | filters=128
799 | size=1
800 | stride=1
801 | pad=1
802 | activation=leaky
803 | 
804 | [convolutional]
805 | batch_normalize=1
806 | size=3
807 | stride=1
808 | pad=1
809 | filters=256
810 | activation=leaky
811 | 
812 | [convolutional]
813 | batch_normalize=1
814 | filters=128
815 | size=1
816 | stride=1
817 | pad=1
818 | activation=leaky
819 | 
820 | [convolutional]
821 | batch_normalize=1
822 | size=3
823 | stride=1
824 | pad=1
825 | filters=256
826 | activation=leaky
827 | 
828 | [convolutional]
829 | size=1
830 | stride=1
831 | pad=1
832 | filters=504
833 | activation=linear
834 | 
835 | 
836 | [yolo]
837 | mask = 0-71
838 | anchors = utils/kmeans/icdar_608_care.txt
839 | classes=1
840 | num=9
841 | jitter=.3
842 | ignore_thresh = .7
843 | truth_thresh = 1
844 | random=1
845 | 


--------------------------------------------------------------------------------
/cfg/hyp_template.py:
--------------------------------------------------------------------------------
 1 | giou: 0.1       # giou loss gain 1.582
 2 | cls: 27.76      # cls loss gain  (CE=~1.0, uCE=~20)
 3 | cls_pw: 1.446   # cls BCELoss positive_weight
 4 | obj: 20.35      # obj loss gain (*=80 for uBCE with 80 classes)
 5 | obj_pw: 3.941   # obj BCELoss positive_weight
 6 | iou_t: 0.5      # iou training threshold
 7 | ang_t: 3.1415926/12
 8 | reg: 1.0
 9 | # fl_gamma: 0.5   # focal loss gamma
10 | context_factor: 1.0 # 按照短边h来设置的,wh的增幅相同； 调试时设为倒数直接检测
11 | 
12 | 
13 | # lr
14 | lr0: 0.0001
15 | multiplier:10
16 | warm_epoch:5
17 | lrf: -4.        # final LambdaLR learning rate = lr0 * (10 ** lrf)
18 | momentum: 0.97  # SGD momentum
19 | weight_decay: 0.0004569  # optimizer weight decay
20 | 
21 | 
22 | # aug
23 | hsv_s: 0.5      # image HSV-Saturation augmentation (fraction)
24 | hsv_v: 0.3      # image HSV-Value augmentation (fraction)
25 | degrees: 5.0    # image rotation (+/- deg)
26 | translate: 0.1  # image translation (+/- fraction)
27 | scale: 0.1      # image scale (+/- gain)
28 | shear: 0.0
29 | gamma: 0.2
30 | blur:  1.3
31 | noise: 0.01
32 | contrast: 0.15
33 | sharpen: 0.15
34 | copypaste: 0.1  # 船身 h 的 3sigma 段位以内 
35 | grayscale: 0.3  # 灰度强度为0.3-1.0
36 | 
37 | 
38 | # training
39 | epochs: 100
40 | batch_size: 8
41 | save_interval: 300
42 | test_interval: 5
43 | 


--------------------------------------------------------------------------------
/cfg/yolov3-tiny.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | batch=1
  4 | subdivisions=1
  5 | # Training
  6 | # batch=64
  7 | # subdivisions=2
  8 | width=416
  9 | height=416
 10 | channels=3
 11 | momentum=0.9
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.001
 19 | burn_in=1000
 20 | max_batches = 500200
 21 | policy=steps
 22 | steps=400000,450000
 23 | scales=.1,.1
 24 | 
 25 | [convolutional]
 26 | batch_normalize=1
 27 | filters=16
 28 | size=3
 29 | stride=1
 30 | pad=1
 31 | activation=leaky
 32 | 
 33 | [maxpool]
 34 | size=2
 35 | stride=2
 36 | 
 37 | [convolutional]
 38 | batch_normalize=1
 39 | filters=32
 40 | size=3
 41 | stride=1
 42 | pad=1
 43 | activation=leaky
 44 | 
 45 | [maxpool]
 46 | size=2
 47 | stride=2
 48 | 
 49 | [convolutional]
 50 | batch_normalize=1
 51 | filters=64
 52 | size=3
 53 | stride=1
 54 | pad=1
 55 | activation=leaky
 56 | 
 57 | [maxpool]
 58 | size=2
 59 | stride=2
 60 | 
 61 | [convolutional]
 62 | batch_normalize=1
 63 | filters=128
 64 | size=3
 65 | stride=1
 66 | pad=1
 67 | activation=leaky
 68 | 
 69 | [maxpool]
 70 | size=2
 71 | stride=2
 72 | 
 73 | [convolutional]
 74 | batch_normalize=1
 75 | filters=256
 76 | size=3
 77 | stride=1
 78 | pad=1
 79 | activation=leaky
 80 | 
 81 | [maxpool]
 82 | size=2
 83 | stride=2
 84 | 
 85 | [convolutional]
 86 | batch_normalize=1
 87 | filters=512
 88 | size=3
 89 | stride=1
 90 | pad=1
 91 | activation=leaky
 92 | 
 93 | [maxpool]
 94 | size=2
 95 | stride=1
 96 | 
 97 | [convolutional]
 98 | batch_normalize=1
 99 | filters=1024
100 | size=3
101 | stride=1
102 | pad=1
103 | activation=leaky
104 | 
105 | ###########
106 | 
107 | [convolutional]
108 | batch_normalize=1
109 | filters=256
110 | size=1
111 | stride=1
112 | pad=1
113 | activation=leaky
114 | 
115 | [convolutional]
116 | batch_normalize=1
117 | filters=512
118 | size=3
119 | stride=1
120 | pad=1
121 | activation=leaky
122 | 
123 | [convolutional]
124 | size=1
125 | stride=1
126 | pad=1
127 | filters=255
128 | activation=linear
129 | 
130 | 
131 | 
132 | [yolo]
133 | mask = 3,4,5
134 | anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
135 | classes=80
136 | num=6
137 | jitter=.3
138 | ignore_thresh = .7
139 | truth_thresh = 1
140 | random=1
141 | 
142 | [route]
143 | layers = -4
144 | 
145 | [convolutional]
146 | batch_normalize=1
147 | filters=128
148 | size=1
149 | stride=1
150 | pad=1
151 | activation=leaky
152 | 
153 | [upsample]
154 | stride=2
155 | 
156 | [route]
157 | layers = -1, 8
158 | 
159 | [convolutional]
160 | batch_normalize=1
161 | filters=256
162 | size=3
163 | stride=1
164 | pad=1
165 | activation=leaky
166 | 
167 | [convolutional]
168 | size=1
169 | stride=1
170 | pad=1
171 | filters=255
172 | activation=linear
173 | 
174 | [yolo]
175 | mask = 1,2,3
176 | anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
177 | classes=80
178 | num=6
179 | jitter=.3
180 | ignore_thresh = .7
181 | truth_thresh = 1
182 | random=1
183 | 


--------------------------------------------------------------------------------
/cfg/yolov3.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | #batch=1
  4 | #subdivisions=1
  5 | # Training
  6 | batch=16
  7 | subdivisions=1
  8 | width=416
  9 | height=416
 10 | channels=3
 11 | momentum=0.9
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.001
 19 | burn_in=1000
 20 | max_batches = 500200
 21 | policy=steps
 22 | steps=400000,450000
 23 | scales=.1,.1
 24 | 
 25 | [convolutional]
 26 | batch_normalize=1
 27 | filters=32
 28 | size=3
 29 | stride=1
 30 | pad=1
 31 | activation=leaky
 32 | 
 33 | # Downsample
 34 | 
 35 | [convolutional]
 36 | batch_normalize=1
 37 | filters=64
 38 | size=3
 39 | stride=2
 40 | pad=1
 41 | activation=leaky
 42 | 
 43 | [convolutional]
 44 | batch_normalize=1
 45 | filters=32
 46 | size=1
 47 | stride=1
 48 | pad=1
 49 | activation=leaky
 50 | 
 51 | [convolutional]
 52 | batch_normalize=1
 53 | filters=64
 54 | size=3
 55 | stride=1
 56 | pad=1
 57 | activation=leaky
 58 | 
 59 | [shortcut]
 60 | from=-3
 61 | activation=linear
 62 | 
 63 | # Downsample
 64 | 
 65 | [convolutional]
 66 | batch_normalize=1
 67 | filters=128
 68 | size=3
 69 | stride=2
 70 | pad=1
 71 | activation=leaky
 72 | 
 73 | [convolutional]
 74 | batch_normalize=1
 75 | filters=64
 76 | size=1
 77 | stride=1
 78 | pad=1
 79 | activation=leaky
 80 | 
 81 | [convolutional]
 82 | batch_normalize=1
 83 | filters=128
 84 | size=3
 85 | stride=1
 86 | pad=1
 87 | activation=leaky
 88 | 
 89 | [shortcut]
 90 | from=-3
 91 | activation=linear
 92 | 
 93 | [convolutional]
 94 | batch_normalize=1
 95 | filters=64
 96 | size=1
 97 | stride=1
 98 | pad=1
 99 | activation=leaky
100 | 
101 | [convolutional]
102 | batch_normalize=1
103 | filters=128
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 | 
109 | [shortcut]
110 | from=-3
111 | activation=linear
112 | 
113 | # Downsample
114 | 
115 | [convolutional]
116 | batch_normalize=1
117 | filters=256
118 | size=3
119 | stride=2
120 | pad=1
121 | activation=leaky
122 | 
123 | [convolutional]
124 | batch_normalize=1
125 | filters=128
126 | size=1
127 | stride=1
128 | pad=1
129 | activation=leaky
130 | 
131 | [convolutional]
132 | batch_normalize=1
133 | filters=256
134 | size=3
135 | stride=1
136 | pad=1
137 | activation=leaky
138 | 
139 | [shortcut]
140 | from=-3
141 | activation=linear
142 | 
143 | [convolutional]
144 | batch_normalize=1
145 | filters=128
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=leaky
150 | 
151 | [convolutional]
152 | batch_normalize=1
153 | filters=256
154 | size=3
155 | stride=1
156 | pad=1
157 | activation=leaky
158 | 
159 | [shortcut]
160 | from=-3
161 | activation=linear
162 | 
163 | [convolutional]
164 | batch_normalize=1
165 | filters=128
166 | size=1
167 | stride=1
168 | pad=1
169 | activation=leaky
170 | 
171 | [convolutional]
172 | batch_normalize=1
173 | filters=256
174 | size=3
175 | stride=1
176 | pad=1
177 | activation=leaky
178 | 
179 | [shortcut]
180 | from=-3
181 | activation=linear
182 | 
183 | [convolutional]
184 | batch_normalize=1
185 | filters=128
186 | size=1
187 | stride=1
188 | pad=1
189 | activation=leaky
190 | 
191 | [convolutional]
192 | batch_normalize=1
193 | filters=256
194 | size=3
195 | stride=1
196 | pad=1
197 | activation=leaky
198 | 
199 | [shortcut]
200 | from=-3
201 | activation=linear
202 | 
203 | 
204 | [convolutional]
205 | batch_normalize=1
206 | filters=128
207 | size=1
208 | stride=1
209 | pad=1
210 | activation=leaky
211 | 
212 | [convolutional]
213 | batch_normalize=1
214 | filters=256
215 | size=3
216 | stride=1
217 | pad=1
218 | activation=leaky
219 | 
220 | [shortcut]
221 | from=-3
222 | activation=linear
223 | 
224 | [convolutional]
225 | batch_normalize=1
226 | filters=128
227 | size=1
228 | stride=1
229 | pad=1
230 | activation=leaky
231 | 
232 | [convolutional]
233 | batch_normalize=1
234 | filters=256
235 | size=3
236 | stride=1
237 | pad=1
238 | activation=leaky
239 | 
240 | [shortcut]
241 | from=-3
242 | activation=linear
243 | 
244 | [convolutional]
245 | batch_normalize=1
246 | filters=128
247 | size=1
248 | stride=1
249 | pad=1
250 | activation=leaky
251 | 
252 | [convolutional]
253 | batch_normalize=1
254 | filters=256
255 | size=3
256 | stride=1
257 | pad=1
258 | activation=leaky
259 | 
260 | [shortcut]
261 | from=-3
262 | activation=linear
263 | 
264 | [convolutional]
265 | batch_normalize=1
266 | filters=128
267 | size=1
268 | stride=1
269 | pad=1
270 | activation=leaky
271 | 
272 | [convolutional]
273 | batch_normalize=1
274 | filters=256
275 | size=3
276 | stride=1
277 | pad=1
278 | activation=leaky
279 | 
280 | [shortcut]
281 | from=-3
282 | activation=linear
283 | 
284 | # Downsample
285 | 
286 | [convolutional]
287 | batch_normalize=1
288 | filters=512
289 | size=3
290 | stride=2
291 | pad=1
292 | activation=leaky
293 | 
294 | [convolutional]
295 | batch_normalize=1
296 | filters=256
297 | size=1
298 | stride=1
299 | pad=1
300 | activation=leaky
301 | 
302 | [convolutional]
303 | batch_normalize=1
304 | filters=512
305 | size=3
306 | stride=1
307 | pad=1
308 | activation=leaky
309 | 
310 | [shortcut]
311 | from=-3
312 | activation=linear
313 | 
314 | 
315 | [convolutional]
316 | batch_normalize=1
317 | filters=256
318 | size=1
319 | stride=1
320 | pad=1
321 | activation=leaky
322 | 
323 | [convolutional]
324 | batch_normalize=1
325 | filters=512
326 | size=3
327 | stride=1
328 | pad=1
329 | activation=leaky
330 | 
331 | [shortcut]
332 | from=-3
333 | activation=linear
334 | 
335 | 
336 | [convolutional]
337 | batch_normalize=1
338 | filters=256
339 | size=1
340 | stride=1
341 | pad=1
342 | activation=leaky
343 | 
344 | [convolutional]
345 | batch_normalize=1
346 | filters=512
347 | size=3
348 | stride=1
349 | pad=1
350 | activation=leaky
351 | 
352 | [shortcut]
353 | from=-3
354 | activation=linear
355 | 
356 | 
357 | [convolutional]
358 | batch_normalize=1
359 | filters=256
360 | size=1
361 | stride=1
362 | pad=1
363 | activation=leaky
364 | 
365 | [convolutional]
366 | batch_normalize=1
367 | filters=512
368 | size=3
369 | stride=1
370 | pad=1
371 | activation=leaky
372 | 
373 | [shortcut]
374 | from=-3
375 | activation=linear
376 | 
377 | [convolutional]
378 | batch_normalize=1
379 | filters=256
380 | size=1
381 | stride=1
382 | pad=1
383 | activation=leaky
384 | 
385 | [convolutional]
386 | batch_normalize=1
387 | filters=512
388 | size=3
389 | stride=1
390 | pad=1
391 | activation=leaky
392 | 
393 | [shortcut]
394 | from=-3
395 | activation=linear
396 | 
397 | 
398 | [convolutional]
399 | batch_normalize=1
400 | filters=256
401 | size=1
402 | stride=1
403 | pad=1
404 | activation=leaky
405 | 
406 | [convolutional]
407 | batch_normalize=1
408 | filters=512
409 | size=3
410 | stride=1
411 | pad=1
412 | activation=leaky
413 | 
414 | [shortcut]
415 | from=-3
416 | activation=linear
417 | 
418 | 
419 | [convolutional]
420 | batch_normalize=1
421 | filters=256
422 | size=1
423 | stride=1
424 | pad=1
425 | activation=leaky
426 | 
427 | [convolutional]
428 | batch_normalize=1
429 | filters=512
430 | size=3
431 | stride=1
432 | pad=1
433 | activation=leaky
434 | 
435 | [shortcut]
436 | from=-3
437 | activation=linear
438 | 
439 | [convolutional]
440 | batch_normalize=1
441 | filters=256
442 | size=1
443 | stride=1
444 | pad=1
445 | activation=leaky
446 | 
447 | [convolutional]
448 | batch_normalize=1
449 | filters=512
450 | size=3
451 | stride=1
452 | pad=1
453 | activation=leaky
454 | 
455 | [shortcut]
456 | from=-3
457 | activation=linear
458 | 
459 | # Downsample
460 | 
461 | [convolutional]
462 | batch_normalize=1
463 | filters=1024
464 | size=3
465 | stride=2
466 | pad=1
467 | activation=leaky
468 | 
469 | [convolutional]
470 | batch_normalize=1
471 | filters=512
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=leaky
476 | 
477 | [convolutional]
478 | batch_normalize=1
479 | filters=1024
480 | size=3
481 | stride=1
482 | pad=1
483 | activation=leaky
484 | 
485 | [shortcut]
486 | from=-3
487 | activation=linear
488 | 
489 | [convolutional]
490 | batch_normalize=1
491 | filters=512
492 | size=1
493 | stride=1
494 | pad=1
495 | activation=leaky
496 | 
497 | [convolutional]
498 | batch_normalize=1
499 | filters=1024
500 | size=3
501 | stride=1
502 | pad=1
503 | activation=leaky
504 | 
505 | [shortcut]
506 | from=-3
507 | activation=linear
508 | 
509 | [convolutional]
510 | batch_normalize=1
511 | filters=512
512 | size=1
513 | stride=1
514 | pad=1
515 | activation=leaky
516 | 
517 | [convolutional]
518 | batch_normalize=1
519 | filters=1024
520 | size=3
521 | stride=1
522 | pad=1
523 | activation=leaky
524 | 
525 | [shortcut]
526 | from=-3
527 | activation=linear
528 | 
529 | [convolutional]
530 | batch_normalize=1
531 | filters=512
532 | size=1
533 | stride=1
534 | pad=1
535 | activation=leaky
536 | 
537 | [convolutional]
538 | batch_normalize=1
539 | filters=1024
540 | size=3
541 | stride=1
542 | pad=1
543 | activation=leaky
544 | 
545 | [shortcut]
546 | from=-3
547 | activation=linear
548 | 
549 | ######################
550 | 
551 | [convolutional]
552 | batch_normalize=1
553 | filters=512
554 | size=1
555 | stride=1
556 | pad=1
557 | activation=leaky
558 | 
559 | [convolutional]
560 | batch_normalize=1
561 | size=3
562 | stride=1
563 | pad=1
564 | filters=1024
565 | activation=leaky
566 | 
567 | [convolutional]
568 | batch_normalize=1
569 | filters=512
570 | size=1
571 | stride=1
572 | pad=1
573 | activation=leaky
574 | 
575 | [convolutional]
576 | batch_normalize=1
577 | size=3
578 | stride=1
579 | pad=1
580 | filters=1024
581 | activation=leaky
582 | 
583 | [convolutional]
584 | batch_normalize=1
585 | filters=512
586 | size=1
587 | stride=1
588 | pad=1
589 | activation=leaky
590 | 
591 | [convolutional]
592 | batch_normalize=1
593 | size=3
594 | stride=1
595 | pad=1
596 | filters=1024
597 | activation=leaky
598 | 
599 | [convolutional]
600 | size=1
601 | stride=1
602 | pad=1
603 | filters=504
604 | activation=linear
605 | 
606 | # 角度定义为x+ 0; 顺时针; 正负0.5pi
607 | [yolo]
608 | mask = 144-215
609 | anchors = 792, 2061, 3870, 6353, 9623, 15803 / 4.18, 6.48, 8.71  / -75, -60, -45, -30, -15 ,0,15, 30,45, 60,75, 90
610 | classes=1
611 | num=9
612 | jitter=.3
613 | ignore_thresh = .7
614 | truth_thresh = 1
615 | random=1
616 | 
617 | 
618 | [route]
619 | layers = -4
620 | 
621 | [convolutional]
622 | batch_normalize=1
623 | filters=256
624 | size=1
625 | stride=1
626 | pad=1
627 | activation=leaky
628 | 
629 | [upsample]
630 | stride=2
631 | 
632 | [route]
633 | layers = -1, 61
634 | 
635 | 
636 | 
637 | [convolutional]
638 | batch_normalize=1
639 | filters=256
640 | size=1
641 | stride=1
642 | pad=1
643 | activation=leaky
644 | 
645 | [convolutional]
646 | batch_normalize=1
647 | size=3
648 | stride=1
649 | pad=1
650 | filters=512
651 | activation=leaky
652 | 
653 | [convolutional]
654 | batch_normalize=1
655 | filters=256
656 | size=1
657 | stride=1
658 | pad=1
659 | activation=leaky
660 | 
661 | [convolutional]
662 | batch_normalize=1
663 | size=3
664 | stride=1
665 | pad=1
666 | filters=512
667 | activation=leaky
668 | 
669 | [convolutional]
670 | batch_normalize=1
671 | filters=256
672 | size=1
673 | stride=1
674 | pad=1
675 | activation=leaky
676 | 
677 | [convolutional]
678 | batch_normalize=1
679 | size=3
680 | stride=1
681 | pad=1
682 | filters=512
683 | activation=leaky
684 | 
685 | [convolutional]
686 | size=1
687 | stride=1
688 | pad=1
689 | filters=504
690 | activation=linear
691 | 
692 | 
693 | [yolo]
694 | mask = 72-143
695 | anchors = 792, 2061, 3870, 6353, 9623, 15803 / 4.18, 6.48, 8.71  / -75, -60, -45, -30, -15 ,0,15, 30,45, 60,75, 90
696 | classes=1
697 | num=9
698 | jitter=.3
699 | ignore_thresh = .7
700 | truth_thresh = 1
701 | random=1
702 | 
703 | 
704 | 
705 | [route]
706 | layers = -4
707 | 
708 | [convolutional]
709 | batch_normalize=1
710 | filters=128
711 | size=1
712 | stride=1
713 | pad=1
714 | activation=leaky
715 | 
716 | [upsample]
717 | stride=2
718 | 
719 | [route]
720 | layers = -1, 36
721 | 
722 | 
723 | 
724 | [convolutional]
725 | batch_normalize=1
726 | filters=128
727 | size=1
728 | stride=1
729 | pad=1
730 | activation=leaky
731 | 
732 | [convolutional]
733 | batch_normalize=1
734 | size=3
735 | stride=1
736 | pad=1
737 | filters=256
738 | activation=leaky
739 | 
740 | [convolutional]
741 | batch_normalize=1
742 | filters=128
743 | size=1
744 | stride=1
745 | pad=1
746 | activation=leaky
747 | 
748 | [convolutional]
749 | batch_normalize=1
750 | size=3
751 | stride=1
752 | pad=1
753 | filters=256
754 | activation=leaky
755 | 
756 | [convolutional]
757 | batch_normalize=1
758 | filters=128
759 | size=1
760 | stride=1
761 | pad=1
762 | activation=leaky
763 | 
764 | [convolutional]
765 | batch_normalize=1
766 | size=3
767 | stride=1
768 | pad=1
769 | filters=256
770 | activation=leaky
771 | 
772 | [convolutional]
773 | size=1
774 | stride=1
775 | pad=1
776 | filters=504
777 | activation=linear
778 | 
779 | 
780 | [yolo]
781 | mask = 0-71
782 | anchors = 792, 2061, 3870, 6353, 9623, 15803 / 4.18, 6.48, 8.71  / -75, -60, -45, -30, -15 ,0,15, 30,45, 60,75, 90
783 | classes=1
784 | num=9
785 | jitter=.3
786 | ignore_thresh = .7
787 | truth_thresh = 1
788 | random=1
789 | 


--------------------------------------------------------------------------------
/data/IC_eval/ic15/rrc_evaluation_funcs.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/data/IC_eval/ic15/rrc_evaluation_funcs.pyc


--------------------------------------------------------------------------------
/data/coco.data:
--------------------------------------------------------------------------------
1 | classes=80
2 | train=../coco/trainvalno5k.txt
3 | valid=../coco/5k.txt
4 | names=data/coco.names
5 | backup=backup/
6 | eval=coco
7 | 


--------------------------------------------------------------------------------
/data/coco.names:
--------------------------------------------------------------------------------
 1 | person
 2 | bicycle
 3 | car
 4 | motorcycle
 5 | airplane
 6 | bus
 7 | train
 8 | truck
 9 | boat
10 | traffic light
11 | fire hydrant
12 | stop sign
13 | parking meter
14 | bench
15 | bird
16 | cat
17 | dog
18 | horse
19 | sheep
20 | cow
21 | elephant
22 | bear
23 | zebra
24 | giraffe
25 | backpack
26 | umbrella
27 | handbag
28 | tie
29 | suitcase
30 | frisbee
31 | skis
32 | snowboard
33 | sports ball
34 | kite
35 | baseball bat
36 | baseball glove
37 | skateboard
38 | surfboard
39 | tennis racket
40 | bottle
41 | wine glass
42 | cup
43 | fork
44 | knife
45 | spoon
46 | bowl
47 | banana
48 | apple
49 | sandwich
50 | orange
51 | broccoli
52 | carrot
53 | hot dog
54 | pizza
55 | donut
56 | cake
57 | chair
58 | couch
59 | potted plant
60 | bed
61 | dining table
62 | toilet
63 | tv
64 | laptop
65 | mouse
66 | remote
67 | keyboard
68 | cell phone
69 | microwave
70 | oven
71 | toaster
72 | sink
73 | refrigerator
74 | book
75 | clock
76 | vase
77 | scissors
78 | teddy bear
79 | hair drier
80 | toothbrush
81 | 


--------------------------------------------------------------------------------
/data/hrsc.data:
--------------------------------------------------------------------------------
1 | classes=1
2 | train=/py/datasets/HRSC2016/yolo-dataset/single-train.txt
3 | valid=/py/datasets/HRSC2016/yolo-dataset/single-train.txt
4 | names=data/hrsc.name
5 | backup=backup/
6 | eval=coco
7 | 


--------------------------------------------------------------------------------
/data/hrsc.name:
--------------------------------------------------------------------------------
1 | ship
2 | 


--------------------------------------------------------------------------------
/data/icdar.name:
--------------------------------------------------------------------------------
1 | text
2 | 


--------------------------------------------------------------------------------
/data/icdar_13+15.data:
--------------------------------------------------------------------------------
1 | classes=1
2 | train=/py/datasets/ICDAR2015/yolo/13+15/train.txt
3 | valid=/py/datasets/ICDAR2015/yolo/13+15/val.txt
4 | names=data/icdar.name
5 | backup=backup/
6 | eval=coco
7 | 


--------------------------------------------------------------------------------
/data/icdar_15.data:
--------------------------------------------------------------------------------
1 | classes=1
2 | train=/py/datasets/ICDAR2015/yolo/train.txt
3 | valid=/py/datasets/ICDAR2015/yolo/val.txt
4 | names=data/icdar.name
5 | backup=backup/
6 | eval=coco
7 | 


--------------------------------------------------------------------------------
/data/icdar_15_all.data:
--------------------------------------------------------------------------------
1 | classes=1
2 | train=/py/datasets/ICDAR2015/yolo/care_all/train.txt
3 | valid=/py/datasets/ICDAR2015/yolo/care_all/val.txt
4 | names=data/icdar.name
5 | backup=backup/
6 | eval=coco
7 | 


--------------------------------------------------------------------------------
/demo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/demo.png


--------------------------------------------------------------------------------
/experiment/HRSC/hyp.py:
--------------------------------------------------------------------------------
 1 | giou: 0.1  # giou loss gain 1.582
 2 | cls: 27.76  # cls loss gain  (CE=~1.0, uCE=~20)
 3 | cls_pw: 1.446  # cls BCELoss positive_weight
 4 | obj: 20.35  # obj loss gain (*=80 for uBCE with 80 classes)
 5 | obj_pw: 3.941  # obj BCELoss positive_weight
 6 | iou_t: 0.1  # iou training threshold
 7 | ang_t: 3.1415926/12
 8 | reg: 1.0
 9 | lr0: 0.00005
10 | multiplier:10
11 | lrf: -4.  # final LambdaLR learning rate = lr0 * (10 ** lrf)
12 | momentum: 0.97  # SGD momentum
13 | weight_decay: 0.0004569  # optimizer weight decay
14 | fl_gamma: 0.5  # focal loss gamma
15 | hsv_s: 0.5  # image HSV-Saturation augmentation (fraction)
16 | hsv_v: 0.3  # image HSV-Value augmentation (fraction)
17 | degrees: 5.0  # image rotation (+/- deg)
18 | translate': 0.1  # image translation (+/- fraction)
19 | scale: 0.2  # image scale (+/- gain)
20 | shear: 0.5
21 | gamma:0.3
22 | blur:2.0
23 | noise:0.02
24 | contrast:0.3
25 | sharpen:0.3


--------------------------------------------------------------------------------
/experiment/HRSC/mul-scale/hyp.txt:
--------------------------------------------------------------------------------
 1 | # ryolo
 2 | # hyp = {'giou': 0.1,  # giou loss gain 1.582
 3 | #        'cls': 27.76,  # cls loss gain  (CE=~1.0, uCE=~20)
 4 | #        'cls_pw': 1.446,  # cls BCELoss positive_weight
 5 | #        'obj': 20.35,  # obj loss gain (*=80 for uBCE with 80 classes)
 6 | #        'obj_pw': 3.941,  # obj BCELoss positive_weight
 7 | #        'iou_t': 0.5,  # iou training threshold
 8 | #        'ang_t': 3.1415926/6,
 9 | #        'reg': 1.0,
10 | #     #    'lr0': 0.002324,  # initial learning rate (SGD=1E-3, Adam=9E-5)
11 | #        'lr0': 0.00005,
12 | #        'multiplier':10,
13 | #        'lrf': -4.,  # final LambdaLR learning rate = lr0 * (10 ** lrf)
14 | #        'momentum': 0.97,  # SGD momentum
15 | #        'weight_decay': 0.0004569,  # optimizer weight decay
16 | #        'fl_gamma': 0.5,  # focal loss gamma
17 | #        'hsv_s': 0.5,  # image HSV-Saturation augmentation (fraction)
18 | #        'hsv_v': 0.3,  # image HSV-Value augmentation (fraction)
19 | #        'degrees': 5.0,  # image rotation (+/- deg)
20 | #        'translate': 0.1,  # image translation (+/- fraction)
21 | #        'scale': 0.2,  # image scale (+/- gain)
22 | #        'shear': 0.5,
23 | #        'gamma':0.3,
24 | #        'blur':2.0,
25 | #        'noise':0.02,
26 | #        'contrast':0.3,
27 | #        'sharpen':0.3,
28 | #        }  
29 | 
30 | 


--------------------------------------------------------------------------------
/experiment/IC15/ablation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/experiment/IC15/ablation.png


--------------------------------------------------------------------------------
/experiment/ga-attention(_4).png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/experiment/ga-attention(_4).png


--------------------------------------------------------------------------------
/experiment/tiny_test_gax4_o8_dh.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/experiment/tiny_test_gax4_o8_dh.png


--------------------------------------------------------------------------------
/make.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | cd model/layer/ORN
 3 | ./make.sh
 4 | 
 5 | cd ../../layer/DCNv2
 6 | ./make.sh
 7 | 
 8 | cd ../../../utils/nms
 9 | ./make.sh
10 | 


--------------------------------------------------------------------------------
/model/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/model/__init__.py


--------------------------------------------------------------------------------
/model/layer/DCNv2/.gitignore:
--------------------------------------------------------------------------------
1 | .vscode
2 | .idea
3 | *.so
4 | *.o
5 | *pyc
6 | _ext
7 | build
8 | DCNv2.egg-info
9 | dist


--------------------------------------------------------------------------------
/model/layer/DCNv2/__init__.py:
--------------------------------------------------------------------------------
1 | from .dcn_v2 import  DCN
2 | 
3 | __all__ = ['DCN']


--------------------------------------------------------------------------------
/model/layer/DCNv2/dcn_test.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | from __future__ import absolute_import
  3 | from __future__ import print_function
  4 | from __future__ import division
  5 | 
  6 | import time
  7 | import torch
  8 | import torch.nn as nn
  9 | from torch.autograd import gradcheck
 10 | 
 11 | from dcn_v2 import dcn_v2_conv, DCNv2, DCN
 12 | from dcn_v2 import dcn_v2_pooling, DCNv2Pooling, DCNPooling
 13 | 
 14 | deformable_groups = 1
 15 | N, inC, inH, inW = 2, 2, 4, 4
 16 | outC = 2
 17 | kH, kW = 3, 3
 18 | 
 19 | 
 20 | def conv_identify(weight, bias):
 21 |     weight.data.zero_()
 22 |     bias.data.zero_()
 23 |     o, i, h, w = weight.shape
 24 |     y = h//2
 25 |     x = w//2
 26 |     for p in range(i):
 27 |         for q in range(o):
 28 |             if p == q:
 29 |                 weight.data[q, p, y, x] = 1.0
 30 | 
 31 | 
 32 | def check_zero_offset():
 33 |     conv_offset = nn.Conv2d(inC, deformable_groups * 2 * kH * kW,
 34 |                             kernel_size=(kH, kW),
 35 |                             stride=(1, 1),
 36 |                             padding=(1, 1),
 37 |                             bias=True).cuda()
 38 | 
 39 |     conv_mask = nn.Conv2d(inC, deformable_groups * 1 * kH * kW,
 40 |                           kernel_size=(kH, kW),
 41 |                           stride=(1, 1),
 42 |                           padding=(1, 1),
 43 |                           bias=True).cuda()
 44 | 
 45 |     dcn_v2 = DCNv2(inC, outC, (kH, kW),
 46 |                    stride=1, padding=1, dilation=1,
 47 |                    deformable_groups=deformable_groups).cuda()
 48 | 
 49 |     conv_offset.weight.data.zero_()
 50 |     conv_offset.bias.data.zero_()
 51 |     conv_mask.weight.data.zero_()
 52 |     conv_mask.bias.data.zero_()
 53 |     conv_identify(dcn_v2.weight, dcn_v2.bias)
 54 | 
 55 |     input = torch.randn(N, inC, inH, inW).cuda()
 56 |     offset = conv_offset(input)
 57 |     mask = conv_mask(input)
 58 |     mask = torch.sigmoid(mask)
 59 |     output = dcn_v2(input, offset, mask)
 60 |     output *= 2
 61 |     d = (input - output).abs().max()
 62 |     if d < 1e-10:
 63 |         print('Zero offset passed')
 64 |     else:
 65 |         print('Zero offset failed')
 66 |         print(input)
 67 |         print(output)
 68 | 
 69 | def check_gradient_dconv():
 70 | 
 71 |     input = torch.rand(N, inC, inH, inW).cuda() * 0.01
 72 |     input.requires_grad = True
 73 | 
 74 |     offset = torch.randn(N, deformable_groups * 2 * kW * kH, inH, inW).cuda() * 2
 75 |     # offset.data.zero_()
 76 |     # offset.data -= 0.5
 77 |     offset.requires_grad = True
 78 | 
 79 |     mask = torch.rand(N, deformable_groups * 1 * kW * kH, inH, inW).cuda()
 80 |     # mask.data.zero_()
 81 |     mask.requires_grad = True
 82 |     mask = torch.sigmoid(mask)
 83 | 
 84 |     weight = torch.randn(outC, inC, kH, kW).cuda()
 85 |     weight.requires_grad = True
 86 | 
 87 |     bias = torch.rand(outC).cuda()
 88 |     bias.requires_grad = True
 89 | 
 90 |     stride = 1
 91 |     padding = 1
 92 |     dilation = 1
 93 | 
 94 |     print('check_gradient_dconv: ',
 95 |           gradcheck(dcn_v2_conv, (input, offset, mask, weight, bias,
 96 |                     stride, padding, dilation, deformable_groups),
 97 |                     eps=1e-3, atol=1e-4, rtol=1e-2))
 98 | 
 99 | 
100 | def check_pooling_zero_offset():
101 | 
102 |     input = torch.randn(2, 16, 64, 64).cuda().zero_()
103 |     input[0, :, 16:26, 16:26] = 1.
104 |     input[1, :, 10:20, 20:30] = 2.
105 |     rois = torch.tensor([
106 |         [0, 65, 65, 103, 103],
107 |         [1, 81, 41, 119, 79],
108 |     ]).cuda().float()
109 |     pooling = DCNv2Pooling(spatial_scale=1.0 / 4,
110 |                            pooled_size=7,
111 |                            output_dim=16,
112 |                            no_trans=True,
113 |                            group_size=1,
114 |                            trans_std=0.0).cuda()
115 | 
116 |     out = pooling(input, rois, input.new())
117 |     s = ', '.join(['%f' % out[i, :, :, :].mean().item()
118 |                    for i in range(rois.shape[0])])
119 |     print(s)
120 | 
121 |     dpooling = DCNv2Pooling(spatial_scale=1.0 / 4,
122 |                             pooled_size=7,
123 |                             output_dim=16,
124 |                             no_trans=False,
125 |                             group_size=1,
126 |                             trans_std=0.0).cuda()
127 |     offset = torch.randn(20, 2, 7, 7).cuda().zero_()
128 |     dout = dpooling(input, rois, offset)
129 |     s = ', '.join(['%f' % dout[i, :, :, :].mean().item()
130 |                    for i in range(rois.shape[0])])
131 |     print(s)
132 | 
133 | 
134 | def check_gradient_dpooling():
135 |     input = torch.randn(2, 3, 5, 5).cuda() * 0.01
136 |     N = 4
137 |     batch_inds = torch.randint(2, (N, 1)).cuda().float()
138 |     x = torch.rand((N, 1)).cuda().float() * 15
139 |     y = torch.rand((N, 1)).cuda().float() * 15
140 |     w = torch.rand((N, 1)).cuda().float() * 10
141 |     h = torch.rand((N, 1)).cuda().float() * 10
142 |     rois = torch.cat((batch_inds, x, y, x + w, y + h), dim=1)
143 |     offset = torch.randn(N, 2, 3, 3).cuda()
144 |     input.requires_grad = True
145 |     offset.requires_grad = True
146 | 
147 |     spatial_scale = 1.0 / 4
148 |     pooled_size = 3
149 |     output_dim = 3
150 |     no_trans = 0
151 |     group_size = 1
152 |     trans_std = 0.0
153 |     sample_per_part = 4
154 |     part_size = pooled_size
155 | 
156 |     print('check_gradient_dpooling:',
157 |           gradcheck(dcn_v2_pooling, (input, rois, offset,
158 |                                      spatial_scale,
159 |                                      pooled_size,
160 |                                      output_dim,
161 |                                      no_trans,
162 |                                      group_size,
163 |                                      part_size,
164 |                                      sample_per_part,
165 |                                      trans_std),
166 |                     eps=1e-4))
167 | 
168 | 
169 | def example_dconv():
170 |     input = torch.randn(2, 64, 128, 128).cuda()
171 |     # wrap all things (offset and mask) in DCN
172 |     dcn = DCN(64, 64, kernel_size=(3, 3), stride=1,
173 |               padding=1, deformable_groups=2).cuda()
174 |     # print(dcn.weight.shape, input.shape)
175 |     output = dcn(input)
176 |     targert = output.new(*output.size())
177 |     targert.data.uniform_(-0.01, 0.01)
178 |     error = (targert - output).mean()
179 |     error.backward()
180 |     print(output.shape)
181 | 
182 | 
183 | def example_dpooling():
184 |     input = torch.randn(2, 32, 64, 64).cuda()
185 |     batch_inds = torch.randint(2, (20, 1)).cuda().float()
186 |     x = torch.randint(256, (20, 1)).cuda().float()
187 |     y = torch.randint(256, (20, 1)).cuda().float()
188 |     w = torch.randint(64, (20, 1)).cuda().float()
189 |     h = torch.randint(64, (20, 1)).cuda().float()
190 |     rois = torch.cat((batch_inds, x, y, x + w, y + h), dim=1)
191 |     offset = torch.randn(20, 2, 7, 7).cuda()
192 |     input.requires_grad = True
193 |     offset.requires_grad = True
194 | 
195 |     # normal roi_align
196 |     pooling = DCNv2Pooling(spatial_scale=1.0 / 4,
197 |                            pooled_size=7,
198 |                            output_dim=32,
199 |                            no_trans=True,
200 |                            group_size=1,
201 |                            trans_std=0.1).cuda()
202 | 
203 |     # deformable pooling
204 |     dpooling = DCNv2Pooling(spatial_scale=1.0 / 4,
205 |                             pooled_size=7,
206 |                             output_dim=32,
207 |                             no_trans=False,
208 |                             group_size=1,
209 |                             trans_std=0.1).cuda()
210 | 
211 |     out = pooling(input, rois, offset)
212 |     dout = dpooling(input, rois, offset)
213 |     print(out.shape)
214 |     print(dout.shape)
215 | 
216 |     target_out = out.new(*out.size())
217 |     target_out.data.uniform_(-0.01, 0.01)
218 |     target_dout = dout.new(*dout.size())
219 |     target_dout.data.uniform_(-0.01, 0.01)
220 |     e = (target_out - out).mean()
221 |     e.backward()
222 |     e = (target_dout - dout).mean()
223 |     e.backward()
224 | 
225 | 
226 | def example_mdpooling():
227 |     input = torch.randn(2, 32, 64, 64).cuda()
228 |     input.requires_grad = True
229 |     batch_inds = torch.randint(2, (20, 1)).cuda().float()
230 |     x = torch.randint(256, (20, 1)).cuda().float()
231 |     y = torch.randint(256, (20, 1)).cuda().float()
232 |     w = torch.randint(64, (20, 1)).cuda().float()
233 |     h = torch.randint(64, (20, 1)).cuda().float()
234 |     rois = torch.cat((batch_inds, x, y, x + w, y + h), dim=1)
235 | 
236 |     # mdformable pooling (V2)
237 |     dpooling = DCNPooling(spatial_scale=1.0 / 4,
238 |                           pooled_size=7,
239 |                           output_dim=32,
240 |                           no_trans=False,
241 |                           group_size=1,
242 |                           trans_std=0.1,
243 |                           deform_fc_dim=1024).cuda()
244 | 
245 |     dout = dpooling(input, rois)
246 |     target = dout.new(*dout.size())
247 |     target.data.uniform_(-0.1, 0.1)
248 |     error = (target - dout).mean()
249 |     error.backward()
250 |     print(dout.shape)
251 | 
252 | 
253 | if __name__ == '__main__':
254 | 
255 |     example_dconv()
256 |     # example_dpooling()
257 |     # example_mdpooling()
258 | 
259 |     # check_pooling_zero_offset()
260 |     # zero offset check
261 |     # if inC == outC:
262 |         # check_zero_offset()
263 | 
264 |     # check_gradient_dpooling()
265 |     # check_gradient_dconv()
266 |     # """
267 |     # ****** Note: backward is not reentrant error may not be a serious problem,
268 |     # ****** since the max error is less than 1e-7,
269 |     # ****** Still looking for what trigger this problem
270 |     # """
271 | 


--------------------------------------------------------------------------------
/model/layer/DCNv2/make.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 | 
3 | rm -rf build
4 | python setup.py clean && python setup.py build develop
5 | 


--------------------------------------------------------------------------------
/model/layer/DCNv2/setup.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | 
 3 | import os
 4 | import glob
 5 | 
 6 | import torch
 7 | 
 8 | from torch.utils.cpp_extension import CUDA_HOME
 9 | from torch.utils.cpp_extension import CppExtension
10 | from torch.utils.cpp_extension import CUDAExtension
11 | 
12 | from setuptools import find_packages
13 | from setuptools import setup
14 | 
15 | requirements = ["torch", "torchvision"]
16 | 
17 | def get_extensions():
18 |     this_dir = os.path.dirname(os.path.abspath(__file__))
19 |     extensions_dir = os.path.join(this_dir, "src")
20 | 
21 |     main_file = glob.glob(os.path.join(extensions_dir, "*.cpp"))
22 |     source_cpu = glob.glob(os.path.join(extensions_dir, "cpu", "*.cpp"))
23 |     source_cuda = glob.glob(os.path.join(extensions_dir, "cuda", "*.cu"))
24 | 
25 |     sources = main_file + source_cpu
26 |     extension = CppExtension
27 |     extra_compile_args = {"cxx": []}
28 |     define_macros = []
29 | 
30 |     if torch.cuda.is_available() and CUDA_HOME is not None:
31 |         extension = CUDAExtension
32 |         sources += source_cuda
33 |         define_macros += [("WITH_CUDA", None)]
34 |         extra_compile_args["nvcc"] = [
35 |             "-DCUDA_HAS_FP16=1",
36 |             "-D__CUDA_NO_HALF_OPERATORS__",
37 |             "-D__CUDA_NO_HALF_CONVERSIONS__",
38 |             "-D__CUDA_NO_HALF2_OPERATORS__",
39 |         ]
40 |     else:
41 |         raise NotImplementedError('Cuda is not availabel')
42 | 
43 |     sources = [os.path.join(extensions_dir, s) for s in sources]
44 |     include_dirs = [extensions_dir]
45 |     ext_modules = [
46 |         extension(
47 |             "_ext",
48 |             sources,
49 |             include_dirs=include_dirs,
50 |             define_macros=define_macros,
51 |             extra_compile_args=extra_compile_args,
52 |         )
53 |     ]
54 |     return ext_modules
55 | 
56 | setup(
57 |     name="DCNv2",
58 |     version="0.1",
59 |     author="charlesshang",
60 |     url="https://github.com/charlesshang/DCNv2",
61 |     description="deformable convolutional networks",
62 |     packages=find_packages(exclude=("configs", "tests",)),
63 |     # install_requires=requirements,
64 |     ext_modules=get_extensions(),
65 |     cmdclass={"build_ext": torch.utils.cpp_extension.BuildExtension},
66 | )


--------------------------------------------------------------------------------
/model/layer/DCNv2/src/cpu/dcn_v2_cpu.cpp:
--------------------------------------------------------------------------------
 1 | #include <vector>
 2 | 
 3 | #include <ATen/ATen.h>
 4 | #include <ATen/cuda/CUDAContext.h>
 5 | 
 6 | 
 7 | at::Tensor
 8 | dcn_v2_cpu_forward(const at::Tensor &input,
 9 |                    const at::Tensor &weight,
10 |                    const at::Tensor &bias,
11 |                    const at::Tensor &offset,
12 |                    const at::Tensor &mask,
13 |                    const int kernel_h,
14 |                    const int kernel_w,
15 |                    const int stride_h,
16 |                    const int stride_w,
17 |                    const int pad_h,
18 |                    const int pad_w,
19 |                    const int dilation_h,
20 |                    const int dilation_w,
21 |                    const int deformable_group)
22 | {
23 |     AT_ERROR("Not implement on cpu");
24 | }
25 | 
26 | std::vector<at::Tensor>
27 | dcn_v2_cpu_backward(const at::Tensor &input,
28 |                     const at::Tensor &weight,
29 |                     const at::Tensor &bias,
30 |                     const at::Tensor &offset,
31 |                     const at::Tensor &mask,
32 |                     const at::Tensor &grad_output,
33 |                     int kernel_h, int kernel_w,
34 |                     int stride_h, int stride_w,
35 |                     int pad_h, int pad_w,
36 |                     int dilation_h, int dilation_w,
37 |                     int deformable_group)
38 | {
39 |     AT_ERROR("Not implement on cpu");
40 | }
41 | 
42 | std::tuple<at::Tensor, at::Tensor>
43 | dcn_v2_psroi_pooling_cpu_forward(const at::Tensor &input,
44 |                                  const at::Tensor &bbox,
45 |                                  const at::Tensor &trans,
46 |                                  const int no_trans,
47 |                                  const float spatial_scale,
48 |                                  const int output_dim,
49 |                                  const int group_size,
50 |                                  const int pooled_size,
51 |                                  const int part_size,
52 |                                  const int sample_per_part,
53 |                                  const float trans_std)
54 | {
55 |     AT_ERROR("Not implement on cpu");
56 | }
57 | 
58 | std::tuple<at::Tensor, at::Tensor>
59 | dcn_v2_psroi_pooling_cpu_backward(const at::Tensor &out_grad,
60 |                                   const at::Tensor &input,
61 |                                   const at::Tensor &bbox,
62 |                                   const at::Tensor &trans,
63 |                                   const at::Tensor &top_count,
64 |                                   const int no_trans,
65 |                                   const float spatial_scale,
66 |                                   const int output_dim,
67 |                                   const int group_size,
68 |                                   const int pooled_size,
69 |                                   const int part_size,
70 |                                   const int sample_per_part,
71 |                                   const float trans_std)
72 | {
73 |     AT_ERROR("Not implement on cpu");
74 | }


--------------------------------------------------------------------------------
/model/layer/DCNv2/src/cpu/vision.h:
--------------------------------------------------------------------------------
 1 | #pragma once
 2 | #include <torch/extension.h>
 3 | 
 4 | at::Tensor
 5 | dcn_v2_cpu_forward(const at::Tensor &input,
 6 |                     const at::Tensor &weight,
 7 |                     const at::Tensor &bias,
 8 |                     const at::Tensor &offset,
 9 |                     const at::Tensor &mask,
10 |                     const int kernel_h,
11 |                     const int kernel_w,
12 |                     const int stride_h,
13 |                     const int stride_w,
14 |                     const int pad_h,
15 |                     const int pad_w,
16 |                     const int dilation_h,
17 |                     const int dilation_w,
18 |                     const int deformable_group);
19 | 
20 | std::vector<at::Tensor>
21 | dcn_v2_cpu_backward(const at::Tensor &input,
22 |                      const at::Tensor &weight,
23 |                      const at::Tensor &bias,
24 |                      const at::Tensor &offset,
25 |                      const at::Tensor &mask,
26 |                      const at::Tensor &grad_output,
27 |                      int kernel_h, int kernel_w,
28 |                      int stride_h, int stride_w,
29 |                      int pad_h, int pad_w,
30 |                      int dilation_h, int dilation_w,
31 |                      int deformable_group);
32 | 
33 | 
34 | std::tuple<at::Tensor, at::Tensor>
35 | dcn_v2_psroi_pooling_cpu_forward(const at::Tensor &input,
36 |                                   const at::Tensor &bbox,
37 |                                   const at::Tensor &trans,
38 |                                   const int no_trans,
39 |                                   const float spatial_scale,
40 |                                   const int output_dim,
41 |                                   const int group_size,
42 |                                   const int pooled_size,
43 |                                   const int part_size,
44 |                                   const int sample_per_part,
45 |                                   const float trans_std);
46 | 
47 | std::tuple<at::Tensor, at::Tensor>
48 | dcn_v2_psroi_pooling_cpu_backward(const at::Tensor &out_grad,
49 |                                    const at::Tensor &input,
50 |                                    const at::Tensor &bbox,
51 |                                    const at::Tensor &trans,
52 |                                    const at::Tensor &top_count,
53 |                                    const int no_trans,
54 |                                    const float spatial_scale,
55 |                                    const int output_dim,
56 |                                    const int group_size,
57 |                                    const int pooled_size,
58 |                                    const int part_size,
59 |                                    const int sample_per_part,
60 |                                    const float trans_std);


--------------------------------------------------------------------------------
/model/layer/DCNv2/src/cuda/dcn_v2_im2col_cuda.h:
--------------------------------------------------------------------------------
  1 | 
  2 | /*!
  3 |  ******************* BEGIN Caffe Copyright Notice and Disclaimer ****************
  4 |  *
  5 |  * COPYRIGHT
  6 |  *
  7 |  * All contributions by the University of California:
  8 |  * Copyright (c) 2014-2017 The Regents of the University of California (Regents)
  9 |  * All rights reserved.
 10 |  *
 11 |  * All other contributions:
 12 |  * Copyright (c) 2014-2017, the respective contributors
 13 |  * All rights reserved.
 14 |  *
 15 |  * Caffe uses a shared copyright model: each contributor holds copyright over
 16 |  * their contributions to Caffe. The project versioning records all such
 17 |  * contribution and copyright details. If a contributor wants to further mark
 18 |  * their specific copyright on a particular contribution, they should indicate
 19 |  * their copyright solely in the commit message of the change when it is
 20 |  * committed.
 21 |  *
 22 |  * LICENSE
 23 |  *
 24 |  * Redistribution and use in source and binary forms, with or without
 25 |  * modification, are permitted provided that the following conditions are met:
 26 |  *
 27 |  * 1. Redistributions of source code must retain the above copyright notice, this
 28 |  * list of conditions and the following disclaimer.
 29 |  * 2. Redistributions in binary form must reproduce the above copyright notice,
 30 |  * this list of conditions and the following disclaimer in the documentation
 31 |  * and/or other materials provided with the distribution.
 32 |  *
 33 |  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
 34 |  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 35 |  * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 36 |  * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
 37 |  * ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 38 |  * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
 39 |  * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
 40 |  * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 41 |  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 42 |  * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 43 |  *
 44 |  * CONTRIBUTION AGREEMENT
 45 |  *
 46 |  * By contributing to the BVLC/caffe repository through pull-request, comment,
 47 |  * or otherwise, the contributor releases their content to the
 48 |  * license and copyright terms herein.
 49 |  *
 50 |  ***************** END Caffe Copyright Notice and Disclaimer ********************
 51 |  *
 52 |  * Copyright (c) 2018 Microsoft
 53 |  * Licensed under The MIT License [see LICENSE for details]
 54 |  * \file modulated_deformable_im2col.h
 55 |  * \brief Function definitions of converting an image to
 56 |  * column matrix based on kernel, padding, dilation, and offset.
 57 |  * These functions are mainly used in deformable convolution operators.
 58 |  * \ref: https://arxiv.org/abs/1811.11168
 59 |  * \author Yuwen Xiong, Haozhi Qi, Jifeng Dai, Xizhou Zhu, Han Hu
 60 |  */
 61 | 
 62 | /***************** Adapted by Charles Shang *********************/
 63 | 
 64 | #ifndef DCN_V2_IM2COL_CUDA
 65 | #define DCN_V2_IM2COL_CUDA
 66 | 
 67 | #ifdef __cplusplus
 68 | extern "C"
 69 | {
 70 | #endif
 71 | 
 72 |   void modulated_deformable_im2col_cuda(cudaStream_t stream,
 73 |                                         const float *data_im, const float *data_offset, const float *data_mask,
 74 |                                         const int batch_size, const int channels, const int height_im, const int width_im,
 75 |                                         const int height_col, const int width_col, const int kernel_h, const int kenerl_w,
 76 |                                         const int pad_h, const int pad_w, const int stride_h, const int stride_w,
 77 |                                         const int dilation_h, const int dilation_w,
 78 |                                         const int deformable_group, float *data_col);
 79 | 
 80 |   void modulated_deformable_col2im_cuda(cudaStream_t stream,
 81 |                                         const float *data_col, const float *data_offset, const float *data_mask,
 82 |                                         const int batch_size, const int channels, const int height_im, const int width_im,
 83 |                                         const int height_col, const int width_col, const int kernel_h, const int kenerl_w,
 84 |                                         const int pad_h, const int pad_w, const int stride_h, const int stride_w,
 85 |                                         const int dilation_h, const int dilation_w,
 86 |                                         const int deformable_group, float *grad_im);
 87 | 
 88 |   void modulated_deformable_col2im_coord_cuda(cudaStream_t stream,
 89 |                                          const float *data_col, const float *data_im, const float *data_offset, const float *data_mask,
 90 |                                          const int batch_size, const int channels, const int height_im, const int width_im,
 91 |                                          const int height_col, const int width_col, const int kernel_h, const int kenerl_w,
 92 |                                          const int pad_h, const int pad_w, const int stride_h, const int stride_w,
 93 |                                          const int dilation_h, const int dilation_w,
 94 |                                          const int deformable_group,
 95 |                                          float *grad_offset, float *grad_mask);
 96 | 
 97 | #ifdef __cplusplus
 98 | }
 99 | #endif
100 | 
101 | #endif


--------------------------------------------------------------------------------
/model/layer/DCNv2/src/cuda/vision.h:
--------------------------------------------------------------------------------
 1 | #pragma once
 2 | #include <torch/extension.h>
 3 | 
 4 | at::Tensor
 5 | dcn_v2_cuda_forward(const at::Tensor &input,
 6 |                     const at::Tensor &weight,
 7 |                     const at::Tensor &bias,
 8 |                     const at::Tensor &offset,
 9 |                     const at::Tensor &mask,
10 |                     const int kernel_h,
11 |                     const int kernel_w,
12 |                     const int stride_h,
13 |                     const int stride_w,
14 |                     const int pad_h,
15 |                     const int pad_w,
16 |                     const int dilation_h,
17 |                     const int dilation_w,
18 |                     const int deformable_group);
19 | 
20 | std::vector<at::Tensor>
21 | dcn_v2_cuda_backward(const at::Tensor &input,
22 |                      const at::Tensor &weight,
23 |                      const at::Tensor &bias,
24 |                      const at::Tensor &offset,
25 |                      const at::Tensor &mask,
26 |                      const at::Tensor &grad_output,
27 |                      int kernel_h, int kernel_w,
28 |                      int stride_h, int stride_w,
29 |                      int pad_h, int pad_w,
30 |                      int dilation_h, int dilation_w,
31 |                      int deformable_group);
32 | 
33 | 
34 | std::tuple<at::Tensor, at::Tensor>
35 | dcn_v2_psroi_pooling_cuda_forward(const at::Tensor &input,
36 |                                   const at::Tensor &bbox,
37 |                                   const at::Tensor &trans,
38 |                                   const int no_trans,
39 |                                   const float spatial_scale,
40 |                                   const int output_dim,
41 |                                   const int group_size,
42 |                                   const int pooled_size,
43 |                                   const int part_size,
44 |                                   const int sample_per_part,
45 |                                   const float trans_std);
46 | 
47 | std::tuple<at::Tensor, at::Tensor>
48 | dcn_v2_psroi_pooling_cuda_backward(const at::Tensor &out_grad,
49 |                                    const at::Tensor &input,
50 |                                    const at::Tensor &bbox,
51 |                                    const at::Tensor &trans,
52 |                                    const at::Tensor &top_count,
53 |                                    const int no_trans,
54 |                                    const float spatial_scale,
55 |                                    const int output_dim,
56 |                                    const int group_size,
57 |                                    const int pooled_size,
58 |                                    const int part_size,
59 |                                    const int sample_per_part,
60 |                                    const float trans_std);


--------------------------------------------------------------------------------
/model/layer/DCNv2/src/dcn_v2.h:
--------------------------------------------------------------------------------
  1 | #pragma once
  2 | 
  3 | #include "cpu/vision.h"
  4 | 
  5 | #ifdef WITH_CUDA
  6 | #include "cuda/vision.h"
  7 | #endif
  8 | 
  9 | at::Tensor
 10 | dcn_v2_forward(const at::Tensor &input,
 11 |                const at::Tensor &weight,
 12 |                const at::Tensor &bias,
 13 |                const at::Tensor &offset,
 14 |                const at::Tensor &mask,
 15 |                const int kernel_h,
 16 |                const int kernel_w,
 17 |                const int stride_h,
 18 |                const int stride_w,
 19 |                const int pad_h,
 20 |                const int pad_w,
 21 |                const int dilation_h,
 22 |                const int dilation_w,
 23 |                const int deformable_group)
 24 | {
 25 |     if (input.type().is_cuda())
 26 |     {
 27 | #ifdef WITH_CUDA
 28 |         return dcn_v2_cuda_forward(input, weight, bias, offset, mask,
 29 |                                    kernel_h, kernel_w,
 30 |                                    stride_h, stride_w,
 31 |                                    pad_h, pad_w,
 32 |                                    dilation_h, dilation_w,
 33 |                                    deformable_group);
 34 | #else
 35 |         AT_ERROR("Not compiled with GPU support");
 36 | #endif
 37 |     }
 38 |     AT_ERROR("Not implemented on the CPU");
 39 | }
 40 | 
 41 | std::vector<at::Tensor>
 42 | dcn_v2_backward(const at::Tensor &input,
 43 |                 const at::Tensor &weight,
 44 |                 const at::Tensor &bias,
 45 |                 const at::Tensor &offset,
 46 |                 const at::Tensor &mask,
 47 |                 const at::Tensor &grad_output,
 48 |                 int kernel_h, int kernel_w,
 49 |                 int stride_h, int stride_w,
 50 |                 int pad_h, int pad_w,
 51 |                 int dilation_h, int dilation_w,
 52 |                 int deformable_group)
 53 | {
 54 |     if (input.type().is_cuda())
 55 |     {
 56 | #ifdef WITH_CUDA
 57 |         return dcn_v2_cuda_backward(input,
 58 |                                     weight,
 59 |                                     bias,
 60 |                                     offset,
 61 |                                     mask,
 62 |                                     grad_output,
 63 |                                     kernel_h, kernel_w,
 64 |                                     stride_h, stride_w,
 65 |                                     pad_h, pad_w,
 66 |                                     dilation_h, dilation_w,
 67 |                                     deformable_group);
 68 | #else
 69 |         AT_ERROR("Not compiled with GPU support");
 70 | #endif
 71 |     }
 72 |     AT_ERROR("Not implemented on the CPU");
 73 | }
 74 | 
 75 | std::tuple<at::Tensor, at::Tensor>
 76 | dcn_v2_psroi_pooling_forward(const at::Tensor &input,
 77 |                              const at::Tensor &bbox,
 78 |                              const at::Tensor &trans,
 79 |                              const int no_trans,
 80 |                              const float spatial_scale,
 81 |                              const int output_dim,
 82 |                              const int group_size,
 83 |                              const int pooled_size,
 84 |                              const int part_size,
 85 |                              const int sample_per_part,
 86 |                              const float trans_std)
 87 | {
 88 |     if (input.type().is_cuda())
 89 |     {
 90 | #ifdef WITH_CUDA
 91 |         return dcn_v2_psroi_pooling_cuda_forward(input,
 92 |                                                  bbox,
 93 |                                                  trans,
 94 |                                                  no_trans,
 95 |                                                  spatial_scale,
 96 |                                                  output_dim,
 97 |                                                  group_size,
 98 |                                                  pooled_size,
 99 |                                                  part_size,
100 |                                                  sample_per_part,
101 |                                                  trans_std);
102 | #else
103 |         AT_ERROR("Not compiled with GPU support");
104 | #endif
105 |     }
106 |     AT_ERROR("Not implemented on the CPU");
107 | }
108 | 
109 | std::tuple<at::Tensor, at::Tensor>
110 | dcn_v2_psroi_pooling_backward(const at::Tensor &out_grad,
111 |                               const at::Tensor &input,
112 |                               const at::Tensor &bbox,
113 |                               const at::Tensor &trans,
114 |                               const at::Tensor &top_count,
115 |                               const int no_trans,
116 |                               const float spatial_scale,
117 |                               const int output_dim,
118 |                               const int group_size,
119 |                               const int pooled_size,
120 |                               const int part_size,
121 |                               const int sample_per_part,
122 |                               const float trans_std)
123 | {
124 |     if (input.type().is_cuda())
125 |     {
126 | #ifdef WITH_CUDA
127 |         return dcn_v2_psroi_pooling_cuda_backward(out_grad,
128 |                                                   input,
129 |                                                   bbox,
130 |                                                   trans,
131 |                                                   top_count,
132 |                                                   no_trans,
133 |                                                   spatial_scale,
134 |                                                   output_dim,
135 |                                                   group_size,
136 |                                                   pooled_size,
137 |                                                   part_size,
138 |                                                   sample_per_part,
139 |                                                   trans_std);
140 | #else
141 |         AT_ERROR("Not compiled with GPU support");
142 | #endif
143 |     }
144 |     AT_ERROR("Not implemented on the CPU");
145 | }


--------------------------------------------------------------------------------
/model/layer/DCNv2/src/vision.cpp:
--------------------------------------------------------------------------------
 1 | 
 2 | #include "dcn_v2.h"
 3 | 
 4 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
 5 |   m.def("dcn_v2_forward", &dcn_v2_forward, "dcn_v2_forward");
 6 |   m.def("dcn_v2_backward", &dcn_v2_backward, "dcn_v2_backward");
 7 |   m.def("dcn_v2_psroi_pooling_forward", &dcn_v2_psroi_pooling_forward, "dcn_v2_psroi_pooling_forward");
 8 |   m.def("dcn_v2_psroi_pooling_backward", &dcn_v2_psroi_pooling_backward, "dcn_v2_psroi_pooling_backward");
 9 | }
10 | 


--------------------------------------------------------------------------------
/model/layer/__init__.py:
--------------------------------------------------------------------------------
1 | from .DCNv2 import *
2 | from .ORN import *
3 | 


--------------------------------------------------------------------------------
/model/model_utils.py:
--------------------------------------------------------------------------------
  1 | import torch.nn.functional as F
  2 | 
  3 | from utils.google_utils import *
  4 | from utils.parse_config import *
  5 | from utils.utils import *
  6 | 
  7 | 
  8 | 
  9 | def get_yolo_layers(model):
 10 |     return [i for i, x in enumerate(model.module_defs) if x['type'] == 'yolo']  # [82, 94, 106] for yolov3
 11 | 
 12 | 
 13 | # 做了两件事：
 14 |     # - 编码grid cell的坐标
 15 |     # - 将anchor缩放到特征图尺度（后面在特征图上进行预测）
 16 | def create_grids(self, img_size=416, ng=(13, 13), device='cpu', type=torch.float32):
 17 |     nx, ny = ng  # x and y grid size # ng是传入的特征图宽高tuple
 18 |     # 计算降采样步长self.stride 32/16/8
 19 |     self.img_size = max(img_size)
 20 |     self.stride = self.img_size / max(ng)   
 21 | 
 22 |     # build xy offsets
 23 |     # 最终结果self.grid_xy的维度为torch.Size([1, 1, 10, 13, 2])，其中10和13的维度对应的是特征图的每个点，最后的2是其上的编号
 24 |     # 如特征图为10*13,则构建的偏移阵列从[0,0]，[1,0]...[12,0],   [0,1].[1,1]...[12,1], ...[12,9]
 25 |     # 表示的是特征图每个像素点的位置，也就是原图的grid左上角坐标，和后面预测的cell内偏移共同表示最终预测的物体位置
 26 |     yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])
 27 |     self.grid_xy = torch.stack((xv, yv), 2).to(device).type(type).view((1, 1, ny, nx, 2))
 28 | 
 29 |     # build wh gains
 30 |     self.anchor_vec = self.anchors.to(device)
 31 |     self.anchor_vec[:,:2] /= self.stride
 32 |     self.anchor_wh = self.anchor_vec.view(1, self.na, 1, 1, 3).to(device).type(type)    # torch.Size([1, 18, 1, 1, 3])
 33 |     self.ng = torch.Tensor(ng).to(device)
 34 |     self.nx = nx
 35 |     self.ny = ny
 36 | 
 37 | 
 38 | def load_darknet_weights(self, weights, cutoff=-1):
 39 |     # Parses and loads the weights stored in 'weights'
 40 | 
 41 |     # Establish cutoffs (load layers between 0 and cutoff. if cutoff = -1 all are loaded)
 42 |     file = Path(weights).name
 43 |     if file == 'darknet53.conv.74':
 44 |         cutoff = 75
 45 |     elif file == 'yolov3-tiny.conv.15':
 46 |         cutoff = 15
 47 | 
 48 |     # Read weights file
 49 |     with open(weights, 'rb') as f:
 50 |         # Read Header https://github.com/AlexeyAB/darknet/issues/2914#issuecomment-496675346
 51 |         self.version = np.fromfile(f, dtype=np.int32, count=3)  # (int32) version info: major, minor, revision
 52 |         self.seen = np.fromfile(f, dtype=np.int64, count=1)  # (int64) number of images seen during training
 53 | 
 54 |         weights = np.fromfile(f, dtype=np.float32)  # The rest are weights
 55 | 
 56 |     ptr = 0
 57 |     for i, (mdef, module) in enumerate(zip(self.module_defs[:cutoff], self.module_list[:cutoff])):
 58 |         if mdef['type'] == 'convolutional':
 59 |             conv_layer = module[0]
 60 |             if mdef['batch_normalize']:
 61 |                 # Load BN bias, weights, running mean and running variance
 62 |                 bn_layer = module[1]
 63 |                 num_b = bn_layer.bias.numel()  # Number of biases
 64 |                 # Bias
 65 |                 bn_b = torch.from_numpy(weights[ptr:ptr + num_b]).view_as(bn_layer.bias)
 66 |                 bn_layer.bias.data.copy_(bn_b)
 67 |                 ptr += num_b
 68 |                 # Weight
 69 |                 bn_w = torch.from_numpy(weights[ptr:ptr + num_b]).view_as(bn_layer.weight)
 70 |                 bn_layer.weight.data.copy_(bn_w)
 71 |                 ptr += num_b
 72 |                 # Running Mean
 73 |                 bn_rm = torch.from_numpy(weights[ptr:ptr + num_b]).view_as(bn_layer.running_mean)
 74 |                 bn_layer.running_mean.data.copy_(bn_rm)
 75 |                 ptr += num_b
 76 |                 # Running Var
 77 |                 bn_rv = torch.from_numpy(weights[ptr:ptr + num_b]).view_as(bn_layer.running_var)
 78 |                 bn_layer.running_var.data.copy_(bn_rv)
 79 |                 ptr += num_b
 80 |             else:
 81 |                 # Load conv. bias
 82 |                 num_b = conv_layer.bias.numel()
 83 |                 conv_b = torch.from_numpy(weights[ptr:ptr + num_b]).view_as(conv_layer.bias)
 84 |                 conv_layer.bias.data.copy_(conv_b)
 85 |                 ptr += num_b
 86 |             # Load conv. weights
 87 |             num_w = conv_layer.weight.numel()
 88 |             conv_w = torch.from_numpy(weights[ptr:ptr + num_w]).view_as(conv_layer.weight)
 89 |             conv_layer.weight.data.copy_(conv_w)
 90 |             ptr += num_w
 91 | 
 92 |     return cutoff
 93 | 
 94 | 
 95 | def save_weights(self, path='model.weights', cutoff=-1):
 96 |     # Converts a PyTorch model to Darket format (*.pt to *.weights)
 97 |     # Note: Does not work if model.fuse() is applied
 98 |     with open(path, 'wb') as f:
 99 |         # Write Header https://github.com/AlexeyAB/darknet/issues/2914#issuecomment-496675346
100 |         self.version.tofile(f)  # (int32) version info: major, minor, revision
101 |         self.seen.tofile(f)  # (int64) number of images seen during training
102 | 
103 |         # Iterate through layers
104 |         for i, (mdef, module) in enumerate(zip(self.module_defs[:cutoff], self.module_list[:cutoff])):
105 |             if mdef['type'] == 'convolutional':
106 |                 conv_layer = module[0]
107 |                 # If batch norm, load bn first
108 |                 if mdef['batch_normalize']:
109 |                     bn_layer = module[1]
110 |                     bn_layer.bias.data.cpu().numpy().tofile(f)
111 |                     bn_layer.weight.data.cpu().numpy().tofile(f)
112 |                     bn_layer.running_mean.data.cpu().numpy().tofile(f)
113 |                     bn_layer.running_var.data.cpu().numpy().tofile(f)
114 |                 # Load conv bias
115 |                 else:
116 |                     conv_layer.bias.data.cpu().numpy().tofile(f)
117 |                 # Load conv weights
118 |                 conv_layer.weight.data.cpu().numpy().tofile(f)
119 | 
120 | 
121 | def convert(cfg='cfg/yolov3-spp.cfg', weights='weights/yolov3-spp.weights'):
122 |     # Converts between PyTorch and Darknet format per extension (i.e. *.weights convert to *.pt and vice versa)
123 |     # from models import *; convert('cfg/yolov3-spp.cfg', 'weights/yolov3-spp.weights')
124 | 
125 |     # Initialize model
126 |     model = Darknet(cfg)
127 | 
128 |     # Load weights and save
129 |     if weights.endswith('.pt'):  # if PyTorch format
130 |         model.load_state_dict(torch.load(weights, map_location='cpu')['model'])
131 |         save_weights(model, path='converted.weights', cutoff=-1)
132 |         print("Success: converted '%s' to 'converted.weights'" % weights)
133 | 
134 |     elif weights.endswith('.weights'):  # darknet format
135 |         _ = load_darknet_weights(model, weights)
136 | 
137 |         chkpt = {'epoch': -1,
138 |                  'best_fitness': None,
139 |                  'training_results': None,
140 |                  'model': model.state_dict(),
141 |                  'optimizer': None}
142 | 
143 |         torch.save(chkpt, 'converted.pt')
144 |         print("Success: converted '%s' to 'converted.pt'" % weights)
145 | 
146 |     else:
147 |         print('Error: extension not supported.')
148 | 
149 | # 如果weights指定的权重不存在，则下载；存在则该函数不返回直接pass
150 | def attempt_download(weights):
151 |     # Attempt to download pretrained weights if not found locally
152 |     msg = weights + ' missing, download from https://drive.google.com/drive/folders/1uxgUBemJVw9wZsdpboYbzUN4bcRhsuAI'
153 |     if weights and not os.path.isfile(weights): # 指定路径的权值文件不存在
154 |         file = Path(weights).name   # 分割路径文件名
155 | 
156 |         if file == 'yolov3-spp.weights':
157 |             gdrive_download(id='1oPCHKsM2JpM-zgyepQciGli9X0MTsJCO', name=weights)
158 |         elif file == 'yolov3-spp.pt':
159 |             gdrive_download(id='1vFlbJ_dXPvtwaLLOu-twnjK4exdFiQ73', name=weights)
160 |         elif file == 'yolov3.pt':
161 |             gdrive_download(id='11uy0ybbOXA2hc-NJkJbbbkDwNX1QZDlz', name=weights)
162 |         elif file == 'yolov3-tiny.pt':
163 |             gdrive_download(id='1qKSgejNeNczgNNiCn9ZF_o55GFk1DjY_', name=weights)
164 |         elif file == 'darknet53.conv.74':
165 |             gdrive_download(id='18xqvs_uwAqfTXp-LJCYLYNHBOcrwbrp0', name=weights)
166 |         elif file == 'yolov3-tiny.conv.15':
167 |             gdrive_download(id='140PnSedCsGGgu3rOD6Ez4oI6cdDzerLC', name=weights)
168 | 
169 |         else:
170 |             try:  # download from pjreddie.com
171 |                 url = 'https://pjreddie.com/media/files/' + file
172 |                 print('Downloading ' + url)
173 |                 os.system('curl -f ' + url + ' -o ' + weights)
174 |             except IOError:
175 |                 print(msg)
176 |                 os.system('rm ' + weights)  # remove partial downloads
177 | 
178 |         assert os.path.exists(weights), msg  # download missing weights from Google Drive
179 | 


--------------------------------------------------------------------------------
/model/sampler_ratio.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/model/sampler_ratio.png


--------------------------------------------------------------------------------
/study.txt:
--------------------------------------------------------------------------------
 1 |       0.88          1          1     0.9362          0          0          0       2.87
 2 |     0.9167          1          1     0.9565          0          0          0      0.821
 3 |     0.9565          1          1     0.9778          0          0          0     0.8187
 4 |     0.9565          1          1     0.9778          0          0          0     0.8235
 5 |          1          1          1          1          0          0          0     0.8223
 6 |          1          1          1          1          0          0          0     0.8243
 7 |          1          1          1          1          0          0          0     0.8386
 8 |          1          1          1          1          0          0          0       0.82
 9 |          1          1          1          1          0          0          0     0.8239
10 |          1          1          1          1          0          0          0     0.8341
11 | 


--------------------------------------------------------------------------------
/test.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import json
  3 | import torch
  4 | 
  5 | from torch.utils.data import DataLoader
  6 | 
  7 | from model.models import Darknet
  8 | from model.model_utils import attempt_download, parse_data_cfg
  9 | from utils.datasets import LoadImagesAndLabels
 10 | from utils.utils import *
 11 | from utils.parse_config import parse_model_cfg
 12 | from utils.nms.r_nms import r_nms
 13 | from model.loss import compute_loss
 14 | from utils.nms.nms import non_max_suppression
 15 | 
 16 | 
 17 | def test(cfg,
 18 |          data,
 19 |          weights=None,
 20 |          batch_size=16,
 21 |          img_size=416,
 22 |          iou_thres=0.5,
 23 |          conf_thres=0.001,
 24 |          nms_thres=0.5,
 25 |          save_json=False,
 26 |          hyp=None,
 27 |          model=None):
 28 |     # Initialize/load model and set device
 29 |     if model is None:
 30 |         device = torch_utils.select_device(opt.device)
 31 |         verbose = True
 32 | 
 33 |         # Initialize model
 34 |         model = Darknet(cfg, hyp).to(device)
 35 | 
 36 |         # Load weights
 37 |         attempt_download(weights)
 38 |         if weights.endswith('.pt'):  # pytorch format
 39 |             model.load_state_dict(torch.load(weights, map_location=device)['model'])
 40 |         else:  # darknet format
 41 |             _ = load_darknet_weights(model, weights)
 42 | 
 43 |         if torch.cuda.device_count() > 1:
 44 |             model = nn.DataParallel(model)
 45 |     else:
 46 |         device = next(model.parameters()).device  # get model device
 47 |         verbose = False
 48 | 
 49 |     # Configure run
 50 |     data = parse_data_cfg(data)
 51 |     nc = int(data['classes'])  # number of classes
 52 |     test_path = data['valid']  # path to test images
 53 |     names = load_classes(data['names'])  # class names
 54 | 
 55 |     # Dataloader
 56 |     dataset = LoadImagesAndLabels(test_path, img_size, batch_size,augment=False, hyp=hyp)
 57 |     dataloader = DataLoader(dataset,
 58 |                             batch_size=batch_size,
 59 |                             num_workers=min([os.cpu_count(), batch_size, 16]),
 60 |                             pin_memory=True,
 61 |                             collate_fn=dataset.collate_fn)
 62 | 
 63 |     seen = 0
 64 |     model.eval()
 65 |     coco91class = coco80_to_coco91_class()
 66 |     s = ('%20s' + '%10s' * 6) % ('Class', 'Images', 'Targets', 'P', 'R', 'mAP', 'F1')
 67 |     p, r, f1, mp, mr, map, mf1 = 0., 0., 0., 0., 0., 0., 0.
 68 |     loss = torch.zeros(3)
 69 |     jdict, stats, ap, ap_class = [], [], [], []
 70 |     for batch_i, (imgs, targets, paths, shapes) in enumerate(tqdm(dataloader, desc=s)):
 71 |         targets = targets.to(device)    # [img_id, cls_id, x, y, w, h, a]
 72 |         imgs = imgs.to(device)
 73 |         _, _, height, width = imgs.shape  # batch size, channels, height, width
 74 | 
 75 |         # Plot images with bounding boxes
 76 |         if batch_i == 0 and not os.path.exists('test_batch0.jpg'):
 77 |             plot_images(imgs=imgs, targets=targets, paths=paths, fname='test_batch0.jpg')
 78 | 
 79 |         # Run model
 80 |         inf_out, train_out = model(imgs)  # inference and training outputs
 81 | 
 82 |         # # Compute loss
 83 |         # if hasattr(model, 'hyp'):  # if model has loss hyperparameters
 84 |         #     loss += compute_loss(train_out, targets, model,hyp)[1][:3].cpu()  # GIoU, obj, cls
 85 | 
 86 |         # Run NMS
 87 |         output = non_max_suppression(inf_out, conf_thres=conf_thres, nms_thres=nms_thres)
 88 | 
 89 |         # Statistics per image
 90 |         for si, pred in enumerate(output):
 91 |             labels = targets[targets[:, 0] == si, 1:]   # 当前图像的gt  [cls_id, x, y, w, h, a]
 92 |             nl = len(labels)
 93 |             tcls = labels[:, 0].tolist() if nl else []  # target class
 94 |             seen += 1
 95 | 
 96 |             if pred is None:
 97 |                 if nl:
 98 |                     stats.append(([], torch.Tensor(), torch.Tensor(), tcls))
 99 |                 continue
100 | 
101 |             # Append to text file
102 |             # with open('test.txt', 'a') as file:
103 |             #    [file.write('%11.5g' * 7 % tuple(x) + '\n') for x in pred]
104 | 
105 |             # Append to pycocotools JSON dictionary
106 |             if save_json:
107 |                 # [{"image_id": 42, "category_id": 18, "bbox": [258.15, 41.29, 348.26, 243.78], "score": 0.236}, ...
108 |                 image_id = int(Path(paths[si]).stem.split('_')[-1])
109 |                 box = pred[:, :4].clone()  # xyxy
110 |                 scale_coords(imgs[si].shape[1:], box, shapes[si])  # to original shape
111 |                 box = xyxy2xywh(box)  # xywh
112 |                 box[:, :2] -= box[:, 2:] / 2  # xy center to top-left corner
113 |                 for di, d in enumerate(pred):
114 |                     jdict.append({'image_id': image_id,
115 |                                   'category_id': coco91class[int(d[6])],
116 |                                   'bbox': [floatn(x, 3) for x in box[di]],
117 |                                   'score': floatn(d[4], 5)})
118 | 
119 |             # Clip boxes to image bounds   
120 |             clip_coords(pred, (height, width))
121 | 
122 |             # Assign all predictions as incorrect
123 |             correct = [0] * len(pred)
124 |             if nl:
125 |                 detected = []
126 |                 tcls_tensor = labels[:, 0]
127 | 
128 |                 # target boxes
129 |                 tbox = labels[:, 1:6]
130 |                 tbox[:, [0, 2]] *= width
131 |                 tbox[:, [1, 3]] *= height
132 | 
133 |                 # Search for correct predictions遍历每个检测出的box
134 |                 for i, (*pbox, pconf, pcls_conf, pcls) in enumerate(pred):
135 | 
136 |                     # Break if all targets already located in image
137 |                     if len(detected) == nl:
138 |                         break
139 | 
140 |                     # Continue if predicted class not among image classes
141 |                     if pcls.item() not in tcls:
142 |                         continue
143 | 
144 |                     # Best iou, index between pred and targets
145 |                     m = (pcls == tcls_tensor).nonzero().view(-1)
146 |                     iou, bi = skew_bbox_iou(pbox, tbox[m]).max(0)
147 | 
148 |                     # If iou > threshold and class is correct mark as correct
149 |                     if iou > iou_thres and m[bi] not in detected:  # and pcls == tcls[bi]:
150 |                         correct[i] = 1
151 |                         detected.append(m[bi])
152 | 
153 |             # Append statistics (correct, conf, pcls, tcls)
154 |             stats.append((correct, pred[:, 5].cpu(), pred[:, 7].cpu(), tcls))
155 | 
156 |     # Compute statistics
157 |     stats = [np.concatenate(x, 0) for x in list(zip(*stats))]  # to numpy
158 |     if len(stats):
159 |         p, r, ap, f1, ap_class = ap_per_class(*stats)
160 |         mp, mr, map, mf1 = p.mean(), r.mean(), ap.mean(), f1.mean()
161 |         nt = np.bincount(stats[3].astype(np.int64), minlength=nc)  # number of targets per class
162 |     else:
163 |         nt = torch.zeros(1)
164 | 
165 |     # Print results
166 |     pf = '%20s' + '%10.3g' * 6  # print format
167 |     print(pf % ('all', seen, nt.sum(), mp, mr, map, mf1))
168 | 
169 |     # Print results per class
170 |     if verbose and nc > 1 and len(stats):
171 |         for i, c in enumerate(ap_class):
172 |             print(pf % (names[c], seen, nt[c], p[i], r[i], ap[i], f1[i]))
173 | 
174 |     # Save JSON
175 |     if save_json and map and len(jdict):
176 |         try:
177 |             imgIds = [int(Path(x).stem.split('_')[-1]) for x in dataset.img_files]
178 |             with open('results.json', 'w') as file:
179 |                 json.dump(jdict, file)
180 | 
181 |             from pycocotools.coco import COCO
182 |             from pycocotools.cocoeval import COCOeval
183 | 
184 |             # https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb
185 |             cocoGt = COCO('../coco/annotations/instances_val2014.json')  # initialize COCO ground truth api
186 |             cocoDt = cocoGt.loadRes('results.json')  # initialize COCO pred api
187 | 
188 |             cocoEval = COCOeval(cocoGt, cocoDt, 'bbox')
189 |             cocoEval.params.imgIds = imgIds  # [:32]  # only evaluate these images
190 |             cocoEval.evaluate()
191 |             cocoEval.accumulate()
192 |             cocoEval.summarize()
193 |             map = cocoEval.stats[1]  # update mAP to pycocotools mAP
194 |         except:
195 |             print('WARNING: missing dependency pycocotools from requirements.txt. Can not compute official COCO mAP.')
196 | 
197 |     # Return results
198 |     maps = np.zeros(nc) + map
199 |     for i, c in enumerate(ap_class):
200 |         maps[c] = ap[i]
201 |     return (mp, mr, map, mf1, *(loss / len(dataloader)).tolist()), maps
202 | 
203 | 
204 | if __name__ == '__main__':
205 |     parser = argparse.ArgumentParser(prog='test.py')
206 |     parser.add_argument('--hyp', type=str, default='cfg/ICDAR/hyp.py', help='hyper-parameter path')
207 |     parser.add_argument('--cfg', type=str, default='cfg/ICDAR/yolov3_608_se.cfg', help='cfg file path')
208 |     parser.add_argument('--data', type=str, default='data/icdar_13+15.data', help='coco.data file path')
209 |     parser.add_argument('--weights', type=str, default='weights/best.pt', help='path to weights file')
210 |     parser.add_argument('--batch-size', type=int, default=1, help='size of each image batch')
211 |     parser.add_argument('--img-size', type=int, default=608, help='inference size (pixels)')
212 |     parser.add_argument('--iou-thres', type=float, default=0.5, help='iou threshold required to qualify as detected')
213 |     parser.add_argument('--conf-thres', type=float, default=0.001, help='object confidence threshold')
214 |     parser.add_argument('--nms-thres', type=float, default=0.5, help='iou threshold for non-maximum suppression')
215 |     parser.add_argument('--save-json', action='store_true', help='save a cocoapi-compatible JSON results file')
216 |     parser.add_argument('--device', default='', help='device id (i.e. 0 or 0,1) or cpu')
217 |     opt = parser.parse_args()
218 |     print(opt)
219 | 
220 |     hyp = hyp_parse(opt.hyp)
221 | 
222 |     with torch.no_grad():
223 |         test(opt.cfg,
224 |              opt.data,
225 |              opt.weights,
226 |              opt.batch_size,
227 |              opt.img_size,
228 |              opt.iou_thres,
229 |              opt.conf_thres,
230 |              opt.nms_thres,
231 |              opt.save_json,
232 |              hyp)
233 | 


--------------------------------------------------------------------------------
/utils/ICDAR/ICDAR2yolo.py:
--------------------------------------------------------------------------------
 1 | # ICDAR坐标为四点多边形
 2 | # 这里将其处理成近似拟合的矩形框，并且归一化得到yolo格式
 3 | # 去除do not care的label
 4 | 
 5 | import os
 6 | import sys
 7 | import cv2
 8 | import math
 9 | import numpy as np 
10 | from tqdm import tqdm
11 | from decimal import Decimal
12 | 
13 | 
14 | # 检查异常文件并返回
15 | # 异常类型：1. xywh数值超出1（图像范围）  2. 负值（max和min标反了的）
16 | def check_exception(txt_path):
17 |    files = os.listdir(txt_path)
18 |    class_id = []
19 |    exception = []
20 |    for file in files:
21 |       with open(os.path.join(txt_path,file),'r') as f:
22 |          contents = f.read()
23 |          lines = contents.split('\n')
24 |          lines = [i for i in lines if len(i)>0]
25 |          for line in lines:
26 |             line = line.split(' ')
27 |             
28 |             assert len(line) == 6 ,'wrong length!!'
29 |             c,x,y,w,h,a = line
30 |             if c not in class_id:
31 |                class_id.append(c)
32 |             if float(x)>1.0 or float(y)>1.0 or float(w)>1.0 or float(h)>1.0 or (float(eval(a))>0.5*math.pi or float(eval(a))<-0.5*math.pi):
33 |                exception.append(file)
34 |             elif float(x)<0 or float(y)<0 or float(w)<0 or float(h)<0:
35 |                exception.append(file)
36 |             
37 |    assert '0' in class_id , 'Class counting from 0 rather than 1!'
38 |    if len(exception) ==0:
39 |       return 'No exception found.'
40 |    else:
41 |       return exception
42 |             
43 |             
44 | 
45 | def convert(src_path,img_path,dst_path):
46 |    icdar_files= os.listdir(src_path)                            
47 |    for icdar_file in tqdm(icdar_files):                                      #每个文件名称
48 |       with open(os.path.join(dst_path, os.path.splitext(icdar_file)[0]+'.txt'),'w') as f:   #打开要写的文件
49 |          with open(os.path.join(src_path,icdar_file),'r',encoding='utf-8-sig') as fd:        #打开要读的文件
50 |                objects = fd.readlines()
51 |                # objects = [x[ :x.find(x.split(',')[8])-1] for x in objects]
52 |                assert len(objects) > 0, 'No object found in ' + xml_path 
53 | 
54 |                class_label = 0      # 只分前景背景
55 |                height, width, _ = cv2.imread(os.path.join(img_path, os.path.splitext(icdar_file)[0][3:])+'.jpg').shape
56 | 
57 |                for object in objects:
58 |                   if '###' not in object: 
59 |                      object = object.split(',')[:8]
60 |                      coors = np.array([int(x) for x in object]).reshape(4,2).astype(np.int32)
61 |                      ((cx, cy), (w, h), theta) = cv2.minAreaRect(coors)
62 |                      ###  vis & debug  opencv 0度起点，顺时针为+
63 |                      # print(cv2.minAreaRect(coors))
64 |                      # img = cv2.imread(os.path.join(img_path, os.path.splitext(icdar_file)[0][3:])+'.jpg')
65 |                      # points = cv2.boxPoints(cv2.minAreaRect(coors)).astype(np.int32)
66 |                      # img = cv2.polylines(img,[points],True,(0,0,255),2)	# 后三个参数为：是否封闭/color/thickness
67 |                      # cv2.imshow('display box',img)
68 |                      # cv2.waitKey(0)
69 | 
70 |                      # 转换为自己的标准：-0.5pi, 0.5pi
71 |                      a = theta / 180 * math.pi
72 |                      if a >  0.5*math.pi: a = math.pi - a
73 |                      if a < -0.5*math.pi: a = math.pi + a
74 | 
75 |                      x = Decimal(cx/width).quantize(Decimal('0.000000'))
76 |                      y = Decimal(cy/height).quantize(Decimal('0.000000'))
77 |                      w = Decimal(w/width).quantize(Decimal('0.000000'))
78 |                      h = Decimal(h/height).quantize(Decimal('0.000000'))
79 |                      a = Decimal(a).quantize(Decimal('0.000000'))
80 | 
81 |                      f.write(str(class_label)+' '+str(x)+' '+str(y)+' '+str(w)+' '+str(h)+' '+str(a)+'\n')
82 | 
83 | 
84 | 
85 | 
86 | if __name__ == "__main__":
87 |    
88 |    care_all = True
89 |    src_path = "/py/datasets/ICDAR2015/ICDAR/val_labels"
90 |    img_path = '/py/datasets/ICDAR2015/ICDAR/val_imgs'
91 |    dst_path = "/py/datasets/ICDAR2015/yolo/separate/val_labels"
92 | 
93 |    convert(src_path,img_path,dst_path)
94 |    
95 |    exception_files = check_exception(dst_path)
96 |    print(exception_files)
97 | 


--------------------------------------------------------------------------------
/utils/ICDAR/icdar_utils.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import math
  3 | import cv2
  4 | import numpy as np
  5 | from scipy.spatial import distance as dist
  6 | import zipfile
  7 |  
  8 | def zip_dir(dirname,zipfilename):
  9 |     filelist = []
 10 |     if os.path.isfile(dirname):
 11 |         filelist.append(dirname)
 12 |     else :
 13 |         for root, dirs, files in os.walk(dirname):
 14 |             for name in files:
 15 |                 filelist.append(os.path.join(root, name))
 16 |         
 17 |     zf = zipfile.ZipFile(zipfilename, "w", zipfile.zlib.DEFLATED)
 18 |     for tar in filelist:
 19 |         arcname = tar[len(dirname):]
 20 |         #print arcname
 21 |         zf.write(tar,arcname)
 22 |     zf.close()
 23 | 
 24 | 
 25 | def cos_dist(a, b):
 26 |     if len(a) != len(b):
 27 |         return None
 28 |     part_up = 0.0
 29 |     a_sq = 0.0
 30 |     b_sq = 0.0
 31 |     # print(a, b)
 32 |     # print(zip(a, b))
 33 |     for a1, b1 in zip(a, b):
 34 |         part_up += a1*b1
 35 |         a_sq += a1**2
 36 |         b_sq += b1**2
 37 |     part_down = math.sqrt(a_sq*b_sq)
 38 |     if part_down == 0.0:
 39 |         return None
 40 |     else:
 41 |         return part_up / part_down
 42 | 
 43 | 
 44 | # this function is confined to rectangle
 45 | def order_points(pts):
 46 |     # sort the points based on their x-coordinates
 47 |     xSorted = pts[np.argsort(pts[:, 0]), :]
 48 | 
 49 |     # grab the left-most and right-most points from the sorted
 50 |     # x-roodinate points
 51 |     leftMost = xSorted[:2, :]
 52 |     rightMost = xSorted[2:, :]
 53 | 
 54 |     # now, sort the left-most coordinates according to their
 55 |     # y-coordinates so we can grab the top-left and bottom-left
 56 |     # points, respectively
 57 |     leftMost = leftMost[np.argsort(leftMost[:, 1]), :]
 58 |     (tl, bl) = leftMost
 59 | 
 60 |     # now that we have the top-left coordinate, use it as an
 61 |     # anchor to calculate the Euclidean distance between the
 62 |     # top-left and right-most points; by the Pythagorean
 63 |     # theorem, the point with the largest distance will be
 64 |     # our bottom-right point
 65 |     D = dist.cdist(tl[np.newaxis], rightMost, "euclidean")[0]
 66 |     (br, tr) = rightMost[np.argsort(D)[::-1], :]
 67 | 
 68 |     # return the coordinates in top-left, top-right,
 69 |     # bottom-right, and bottom-left order
 70 |     return np.array([tl, tr, br, bl], dtype="float32")
 71 | 
 72 | 
 73 | def order_points_quadrangle(pts):
 74 |     # sort the points based on their x-coordinates
 75 |     xSorted = pts[np.argsort(pts[:, 0]), :]
 76 | 
 77 |     # grab the left-most and right-most points from the sorted
 78 |     # x-roodinate points
 79 |     leftMost = xSorted[:2, :]
 80 |     rightMost = xSorted[2:, :]
 81 | 
 82 |     # now, sort the left-most coordinates according to their
 83 |     # y-coordinates so we can grab the top-left and bottom-left
 84 |     # points, respectively
 85 |     leftMost = leftMost[np.argsort(leftMost[:, 1]), :]
 86 |     (tl, bl) = leftMost
 87 | 
 88 |     # now that we have the top-left and bottom-left coordinate, use it as an
 89 |     # base vector to calculate the angles between the other two vectors
 90 | 
 91 |     vector_0 = np.array(bl-tl)
 92 |     vector_1 = np.array(rightMost[0]-tl)
 93 |     vector_2 = np.array(rightMost[1]-tl)
 94 | 
 95 |     angle = [np.arccos(cos_dist(vector_0, vector_1)), np.arccos(cos_dist(vector_0, vector_2))]
 96 |     (br, tr) = rightMost[np.argsort(angle), :]
 97 | 
 98 |     # return the coordinates in top-left, top-right,
 99 |     # bottom-right, and bottom-left order
100 |     return np.array([tl, tr, br, bl], dtype="float32")
101 |     
102 | 
103 | 
104 | 
105 | def xywha2points(x):
106 |     # 带旋转角度，顺时针正，+-0.5pi;返回四个点坐标
107 |     cx = x[0]; cy = x[1]; w = x[2]; h = x[3]; a = x[4]
108 |     xmin = cx - w*0.5; xmax = cx + w*0.5; ymin = cy - h*0.5; ymax = cy + h*0.5
109 |     t_x0=xmin; t_y0=ymin; t_x1=xmin; t_y1=ymax; t_x2=xmax; t_y2=ymax; t_x3=xmax; t_y3=ymin
110 |     R = np.eye(3)
111 |     R[:2] = cv2.getRotationMatrix2D(angle=-a*180/math.pi, center=(cx,cy), scale=1)
112 |     x0 = t_x0*R[0,0] + t_y0*R[0,1] + R[0,2] 
113 |     y0 = t_x0*R[1,0] + t_y0*R[1,1] + R[1,2] 
114 |     x1 = t_x1*R[0,0] + t_y1*R[0,1] + R[0,2] 
115 |     y1 = t_x1*R[1,0] + t_y1*R[1,1] + R[1,2] 
116 |     x2 = t_x2*R[0,0] + t_y2*R[0,1] + R[0,2] 
117 |     y2 = t_x2*R[1,0] + t_y2*R[1,1] + R[1,2] 
118 |     x3 = t_x3*R[0,0] + t_y3*R[0,1] + R[0,2] 
119 |     y3 = t_x3*R[1,0] + t_y3*R[1,1] + R[1,2] 
120 |     points = np.array([[float(x0),float(y0)],[float(x1),float(y1)],[float(x2),float(y2)],[float(x3),float(y3)]])
121 |     return points
122 | 
123 | def xywha2icdar(box):
124 |     box = xywha2points(box)
125 |     cw_box = order_points(box)
126 |     cw_box = cw_box.reshape(1, 8).squeeze().astype('int').tolist()
127 |     str_box = str(cw_box[0]) + ',' + \
128 |               str(cw_box[1]) + ',' + \
129 |               str(cw_box[2]) + ',' + \
130 |               str(cw_box[3]) + ',' + \
131 |               str(cw_box[4]) + ',' + \
132 |               str(cw_box[5]) + ',' + \
133 |               str(cw_box[6]) + ',' + \
134 |               str(cw_box[7]) + '\n'
135 |     return str_box
136 | 
137 | 
138 | # if __name__ == "__main__":
139 | #     pnts = np.array([137,340,137,351,172,351,172,340]).reshape(4,2)
140 | #     trans_pnts = order_points(pnts)
141 |     
142 | #     img = np.zeros((1000, 1000, 3), np.uint8)
143 | #     for id, point in enumerate(trans_pnts):
144 | #         point = tuple(point)
145 | #         cv2.circle(img, point, radius = 1, color = (0, 0, 255), thickness = 4)
146 | #         cv2.putText(img, str(id), point, cv2.FONT_HERSHEY_COMPLEX, 1, (255,0,0), 2)
147 | 
148 | #     cv2.imshow('points', img)
149 | #     cv2.waitKey (0) 
150 | #     cv2.destroyAllWindows()


--------------------------------------------------------------------------------
/utils/adabound.py:
--------------------------------------------------------------------------------
  1 | import math
  2 | 
  3 | import torch
  4 | from torch.optim import Optimizer
  5 | 
  6 | 
  7 | class AdaBound(Optimizer):
  8 |     """Implements AdaBound algorithm.
  9 |     It has been proposed in `Adaptive Gradient Methods with Dynamic Bound of Learning Rate`_.
 10 |     Arguments:
 11 |         params (iterable): iterable of parameters to optimize or dicts defining
 12 |             parameter groups
 13 |         lr (float, optional): Adam learning rate (default: 1e-3)
 14 |         betas (Tuple[float, float], optional): coefficients used for computing
 15 |             running averages of gradient and its square (default: (0.9, 0.999))
 16 |         final_lr (float, optional): final (SGD) learning rate (default: 0.1)
 17 |         gamma (float, optional): convergence speed of the bound functions (default: 1e-3)
 18 |         eps (float, optional): term added to the denominator to improve
 19 |             numerical stability (default: 1e-8)
 20 |         weight_decay (float, optional): weight decay (L2 penalty) (default: 0)
 21 |         amsbound (boolean, optional): whether to use the AMSBound variant of this algorithm
 22 |     .. Adaptive Gradient Methods with Dynamic Bound of Learning Rate:
 23 |         https://openreview.net/forum?id=Bkg3g2R9FX
 24 |     """
 25 | 
 26 |     def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), final_lr=0.1, gamma=1e-3,
 27 |                  eps=1e-8, weight_decay=0, amsbound=False):
 28 |         if not 0.0 <= lr:
 29 |             raise ValueError("Invalid learning rate: {}".format(lr))
 30 |         if not 0.0 <= eps:
 31 |             raise ValueError("Invalid epsilon value: {}".format(eps))
 32 |         if not 0.0 <= betas[0] < 1.0:
 33 |             raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0]))
 34 |         if not 0.0 <= betas[1] < 1.0:
 35 |             raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1]))
 36 |         if not 0.0 <= final_lr:
 37 |             raise ValueError("Invalid final learning rate: {}".format(final_lr))
 38 |         if not 0.0 <= gamma < 1.0:
 39 |             raise ValueError("Invalid gamma parameter: {}".format(gamma))
 40 |         defaults = dict(lr=lr, betas=betas, final_lr=final_lr, gamma=gamma, eps=eps,
 41 |                         weight_decay=weight_decay, amsbound=amsbound)
 42 |         super(AdaBound, self).__init__(params, defaults)
 43 | 
 44 |         self.base_lrs = list(map(lambda group: group['lr'], self.param_groups))
 45 | 
 46 |     def __setstate__(self, state):
 47 |         super(AdaBound, self).__setstate__(state)
 48 |         for group in self.param_groups:
 49 |             group.setdefault('amsbound', False)
 50 | 
 51 |     def step(self, closure=None):
 52 |         """Performs a single optimization step.
 53 |         Arguments:
 54 |             closure (callable, optional): A closure that reevaluates the model
 55 |                 and returns the loss.
 56 |         """
 57 |         loss = None
 58 |         if closure is not None:
 59 |             loss = closure()
 60 | 
 61 |         for group, base_lr in zip(self.param_groups, self.base_lrs):
 62 |             for p in group['params']:
 63 |                 if p.grad is None:
 64 |                     continue
 65 |                 grad = p.grad.data
 66 |                 if grad.is_sparse:
 67 |                     raise RuntimeError(
 68 |                         'Adam does not support sparse gradients, please consider SparseAdam instead')
 69 |                 amsbound = group['amsbound']
 70 | 
 71 |                 state = self.state[p]
 72 | 
 73 |                 # State initialization
 74 |                 if len(state) == 0:
 75 |                     state['step'] = 0
 76 |                     # Exponential moving average of gradient values
 77 |                     state['exp_avg'] = torch.zeros_like(p.data)
 78 |                     # Exponential moving average of squared gradient values
 79 |                     state['exp_avg_sq'] = torch.zeros_like(p.data)
 80 |                     if amsbound:
 81 |                         # Maintains max of all exp. moving avg. of sq. grad. values
 82 |                         state['max_exp_avg_sq'] = torch.zeros_like(p.data)
 83 | 
 84 |                 exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
 85 |                 if amsbound:
 86 |                     max_exp_avg_sq = state['max_exp_avg_sq']
 87 |                 beta1, beta2 = group['betas']
 88 | 
 89 |                 state['step'] += 1
 90 | 
 91 |                 if group['weight_decay'] != 0:
 92 |                     grad = grad.add(group['weight_decay'], p.data)
 93 | 
 94 |                 # Decay the first and second moment running average coefficient
 95 |                 exp_avg.mul_(beta1).add_(1 - beta1, grad)
 96 |                 exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
 97 |                 if amsbound:
 98 |                     # Maintains the maximum of all 2nd moment running avg. till now
 99 |                     torch.max(max_exp_avg_sq, exp_avg_sq, out=max_exp_avg_sq)
100 |                     # Use the max. for normalizing running avg. of gradient
101 |                     denom = max_exp_avg_sq.sqrt().add_(group['eps'])
102 |                 else:
103 |                     denom = exp_avg_sq.sqrt().add_(group['eps'])
104 | 
105 |                 bias_correction1 = 1 - beta1 ** state['step']
106 |                 bias_correction2 = 1 - beta2 ** state['step']
107 |                 step_size = group['lr'] * math.sqrt(bias_correction2) / bias_correction1
108 | 
109 |                 # Applies bounds on actual learning rate
110 |                 # lr_scheduler cannot affect final_lr, this is a workaround to apply lr decay
111 |                 final_lr = group['final_lr'] * group['lr'] / base_lr
112 |                 lower_bound = final_lr * (1 - 1 / (group['gamma'] * state['step'] + 1))
113 |                 upper_bound = final_lr * (1 + 1 / (group['gamma'] * state['step']))
114 |                 step_size = torch.full_like(denom, step_size)
115 |                 step_size.div_(denom).clamp_(lower_bound, upper_bound).mul_(exp_avg)
116 | 
117 |                 p.data.add_(-step_size)
118 | 
119 |         return loss
120 | 
121 | 
122 | class AdaBoundW(Optimizer):
123 |     """Implements AdaBound algorithm with Decoupled Weight Decay (arxiv.org/abs/1711.05101)
124 |     It has been proposed in `Adaptive Gradient Methods with Dynamic Bound of Learning Rate`_.
125 |     Arguments:
126 |         params (iterable): iterable of parameters to optimize or dicts defining
127 |             parameter groups
128 |         lr (float, optional): Adam learning rate (default: 1e-3)
129 |         betas (Tuple[float, float], optional): coefficients used for computing
130 |             running averages of gradient and its square (default: (0.9, 0.999))
131 |         final_lr (float, optional): final (SGD) learning rate (default: 0.1)
132 |         gamma (float, optional): convergence speed of the bound functions (default: 1e-3)
133 |         eps (float, optional): term added to the denominator to improve
134 |             numerical stability (default: 1e-8)
135 |         weight_decay (float, optional): weight decay (L2 penalty) (default: 0)
136 |         amsbound (boolean, optional): whether to use the AMSBound variant of this algorithm
137 |     .. Adaptive Gradient Methods with Dynamic Bound of Learning Rate:
138 |         https://openreview.net/forum?id=Bkg3g2R9FX
139 |     """
140 | 
141 |     def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), final_lr=0.1, gamma=1e-3,
142 |                  eps=1e-8, weight_decay=0, amsbound=False):
143 |         if not 0.0 <= lr:
144 |             raise ValueError("Invalid learning rate: {}".format(lr))
145 |         if not 0.0 <= eps:
146 |             raise ValueError("Invalid epsilon value: {}".format(eps))
147 |         if not 0.0 <= betas[0] < 1.0:
148 |             raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0]))
149 |         if not 0.0 <= betas[1] < 1.0:
150 |             raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1]))
151 |         if not 0.0 <= final_lr:
152 |             raise ValueError("Invalid final learning rate: {}".format(final_lr))
153 |         if not 0.0 <= gamma < 1.0:
154 |             raise ValueError("Invalid gamma parameter: {}".format(gamma))
155 |         defaults = dict(lr=lr, betas=betas, final_lr=final_lr, gamma=gamma, eps=eps,
156 |                         weight_decay=weight_decay, amsbound=amsbound)
157 |         super(AdaBoundW, self).__init__(params, defaults)
158 | 
159 |         self.base_lrs = list(map(lambda group: group['lr'], self.param_groups))
160 | 
161 |     def __setstate__(self, state):
162 |         super(AdaBoundW, self).__setstate__(state)
163 |         for group in self.param_groups:
164 |             group.setdefault('amsbound', False)
165 | 
166 |     def step(self, closure=None):
167 |         """Performs a single optimization step.
168 |         Arguments:
169 |             closure (callable, optional): A closure that reevaluates the model
170 |                 and returns the loss.
171 |         """
172 |         loss = None
173 |         if closure is not None:
174 |             loss = closure()
175 | 
176 |         for group, base_lr in zip(self.param_groups, self.base_lrs):
177 |             for p in group['params']:
178 |                 if p.grad is None:
179 |                     continue
180 |                 grad = p.grad.data
181 |                 if grad.is_sparse:
182 |                     raise RuntimeError(
183 |                         'Adam does not support sparse gradients, please consider SparseAdam instead')
184 |                 amsbound = group['amsbound']
185 | 
186 |                 state = self.state[p]
187 | 
188 |                 # State initialization
189 |                 if len(state) == 0:
190 |                     state['step'] = 0
191 |                     # Exponential moving average of gradient values
192 |                     state['exp_avg'] = torch.zeros_like(p.data)
193 |                     # Exponential moving average of squared gradient values
194 |                     state['exp_avg_sq'] = torch.zeros_like(p.data)
195 |                     if amsbound:
196 |                         # Maintains max of all exp. moving avg. of sq. grad. values
197 |                         state['max_exp_avg_sq'] = torch.zeros_like(p.data)
198 | 
199 |                 exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
200 |                 if amsbound:
201 |                     max_exp_avg_sq = state['max_exp_avg_sq']
202 |                 beta1, beta2 = group['betas']
203 | 
204 |                 state['step'] += 1
205 | 
206 |                 # Decay the first and second moment running average coefficient
207 |                 exp_avg.mul_(beta1).add_(1 - beta1, grad)
208 |                 exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
209 |                 if amsbound:
210 |                     # Maintains the maximum of all 2nd moment running avg. till now
211 |                     torch.max(max_exp_avg_sq, exp_avg_sq, out=max_exp_avg_sq)
212 |                     # Use the max. for normalizing running avg. of gradient
213 |                     denom = max_exp_avg_sq.sqrt().add_(group['eps'])
214 |                 else:
215 |                     denom = exp_avg_sq.sqrt().add_(group['eps'])
216 | 
217 |                 bias_correction1 = 1 - beta1 ** state['step']
218 |                 bias_correction2 = 1 - beta2 ** state['step']
219 |                 step_size = group['lr'] * math.sqrt(bias_correction2) / bias_correction1
220 | 
221 |                 # Applies bounds on actual learning rate
222 |                 # lr_scheduler cannot affect final_lr, this is a workaround to apply lr decay
223 |                 final_lr = group['final_lr'] * group['lr'] / base_lr
224 |                 lower_bound = final_lr * (1 - 1 / (group['gamma'] * state['step'] + 1))
225 |                 upper_bound = final_lr * (1 + 1 / (group['gamma'] * state['step']))
226 |                 step_size = torch.full_like(denom, step_size)
227 |                 step_size.div_(denom).clamp_(lower_bound, upper_bound).mul_(exp_avg)
228 | 
229 |                 if group['weight_decay'] != 0:
230 |                     decayed_weights = torch.mul(p.data, group['weight_decay'])
231 |                     p.data.add_(-step_size)
232 |                     p.data.sub_(decayed_weights)
233 |                 else:
234 |                     p.data.add_(-step_size)
235 | 
236 |         return loss
237 | 


--------------------------------------------------------------------------------
/utils/gcp.sh:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env bash
  2 | 
  3 | # New VM
  4 | rm -rf sample_data yolov3 darknet apex coco cocoapi knife knifec
  5 | git clone https://github.com/ultralytics/yolov3
  6 | # git clone https://github.com/AlexeyAB/darknet && cd darknet && make GPU=1 CUDNN=1 CUDNN_HALF=1 OPENCV=0 && wget -c https://pjreddie.com/media/files/darknet53.conv.74 && cd ..
  7 | git clone https://github.com/NVIDIA/apex && cd apex && pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" . --user && cd .. && rm -rf apex
  8 | # git clone https://github.com/cocodataset/cocoapi && cd cocoapi/PythonAPI && make && cd ../.. && cp -r cocoapi/PythonAPI/pycocotools yolov3
  9 | sudo conda install -y -c conda-forge scikit-image tensorboard pycocotools
 10 | python3 -c "
 11 | from yolov3.utils.google_utils import gdrive_download
 12 | gdrive_download('1HaXkef9z6y5l4vUnCYgdmEAj61c6bfWO','coco.zip')"
 13 | sudo shutdown
 14 | 
 15 | # Re-clone
 16 | rm -rf yolov3  # Warning: remove existing
 17 | git clone https://github.com/ultralytics/yolov3 && cd yolov3 # master
 18 | # git clone -b test --depth 1 https://github.com/ultralytics/yolov3 test  # branch
 19 | python3 train.py --img-size 320 --weights weights/darknet53.conv.74 --epochs 27 --batch-size 64 --accumulate 1
 20 | 
 21 | # Train
 22 | python3 train.py
 23 | 
 24 | # Resume
 25 | python3 train.py --resume
 26 | 
 27 | # Detect
 28 | python3 detect.py
 29 | 
 30 | # Test
 31 | python3 test.py --save-json
 32 | 
 33 | # Evolve
 34 | for i in {0..500}
 35 | do
 36 |   python3 train.py --data data/coco.data --img-size 320 --epochs 1 --batch-size 64 --accumulate 1 --evolve --bucket yolov4
 37 | done
 38 | 
 39 | # Git pull
 40 | git pull https://github.com/ultralytics/yolov3  # master
 41 | git pull https://github.com/ultralytics/yolov3 test  # branch
 42 | 
 43 | # Test Darknet training
 44 | python3 test.py --weights ../darknet/backup/yolov3.backup
 45 | 
 46 | # Copy last.pt TO bucket
 47 | gsutil cp yolov3/weights/last1gpu.pt gs://ultralytics
 48 | 
 49 | # Copy last.pt FROM bucket
 50 | gsutil cp gs://ultralytics/last.pt yolov3/weights/last.pt
 51 | wget https://storage.googleapis.com/ultralytics/yolov3/last_v1_0.pt -O weights/last_v1_0.pt
 52 | wget https://storage.googleapis.com/ultralytics/yolov3/best_v1_0.pt -O weights/best_v1_0.pt
 53 | 
 54 | # Reproduce tutorials
 55 | rm results*.txt  # WARNING: removes existing results
 56 | python3 train.py --nosave --data data/coco_1img.data && mv results.txt results0r_1img.txt
 57 | python3 train.py --nosave --data data/coco_10img.data && mv results.txt results0r_10img.txt
 58 | python3 train.py --nosave --data data/coco_100img.data && mv results.txt results0r_100img.txt
 59 | # python3 train.py --nosave --data data/coco_100img.data --transfer && mv results.txt results3_100imgTL.txt
 60 | python3 -c "from utils import utils; utils.plot_results()"
 61 | # gsutil cp results*.txt gs://ultralytics
 62 | gsutil cp results.png gs://ultralytics
 63 | sudo shutdown
 64 | 
 65 | # Reproduce mAP
 66 | python3 test.py --save-json --img-size 608
 67 | python3 test.py --save-json --img-size 416
 68 | python3 test.py --save-json --img-size 320
 69 | sudo shutdown
 70 | 
 71 | # Benchmark script
 72 | git clone https://github.com/ultralytics/yolov3  # clone our repo
 73 | git clone https://github.com/NVIDIA/apex && cd apex && pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" . --user && cd .. && rm -rf apex  # install nvidia apex
 74 | python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('1HaXkef9z6y5l4vUnCYgdmEAj61c6bfWO','coco.zip')"  # download coco dataset (20GB)
 75 | cd yolov3 && clear && python3 train.py --epochs 1  # run benchmark (~30 min)
 76 | 
 77 | # Unit tests
 78 | python3 detect.py  # detect 2 persons, 1 tie
 79 | python3 test.py --data data/coco_32img.data  # test mAP = 0.8
 80 | python3 train.py --data data/coco_32img.data --epochs 5 --nosave  # train 5 epochs
 81 | python3 train.py --data data/coco_1cls.data --epochs 5 --nosave  # train 5 epochs
 82 | python3 train.py --data data/coco_1img.data --epochs 5 --nosave  # train 5 epochs
 83 | 
 84 | # AlexyAB Darknet
 85 | gsutil cp -r gs://sm6/supermarket2 .  # dataset from bucket
 86 | rm -rf darknet && git clone https://github.com/AlexeyAB/darknet && cd darknet && wget -c https://pjreddie.com/media/files/darknet53.conv.74  # sudo apt install libopencv-dev && make
 87 | ./darknet detector calc_anchors data/coco_img64.data -num_of_clusters 9 -width 320 -height 320  # kmeans anchor calculation
 88 | ./darknet detector train ../supermarket2/supermarket2.data ../yolo_v3_spp_pan_scale.cfg darknet53.conv.74 -map -dont_show # train spp
 89 | ./darknet detector train ../yolov3/data/coco.data ../yolov3-spp.cfg darknet53.conv.74 -map -dont_show # train spp coco
 90 | 
 91 | ./darknet detector train data/coco.data ../yolov3-spp.cfg darknet53.conv.74 -map -dont_show # train spp
 92 | gsutil cp -r backup/*5000.weights gs://sm6/weights
 93 | sudo shutdown
 94 | 
 95 | 
 96 | ./darknet detector train ../supermarket2/supermarket2.data ../yolov3-tiny-sm2-1cls.cfg yolov3-tiny.conv.15 -map -dont_show # train tiny
 97 | ./darknet detector train ../supermarket2/supermarket2.data cfg/yolov3-spp-sm2-1cls.cfg backup/yolov3-spp-sm2-1cls_last.weights  # resume
 98 | python3 train.py --data ../supermarket2/supermarket2.data --cfg ../yolov3-spp-sm2-1cls.cfg --epochs 100 --num-workers 8 --img-size 320 --nosave  # train ultralytics
 99 | python3 test.py --data ../supermarket2/supermarket2.data --weights ../darknet/backup/yolov3-spp-sm2-1cls_5000.weights --cfg cfg/yolov3-spp-sm2-1cls.cfg  # test
100 | gsutil cp -r backup/*.weights gs://sm6/weights  # weights to bucket
101 | 
102 | python3 test.py --data ../supermarket2/supermarket2.data --weights weights/yolov3-spp-sm2-1cls_5000.weights --cfg ../yolov3-spp-sm2-1cls.cfg --img-size 320 --conf-thres 0.2  # test
103 | python3 test.py --data ../supermarket2/supermarket2.data --weights weights/yolov3-spp-sm2-1cls-scalexy_125_5000.weights --cfg ../yolov3-spp-sm2-1cls-scalexy_125.cfg --img-size 320 --conf-thres 0.2  # test
104 | python3 test.py --data ../supermarket2/supermarket2.data --weights weights/yolov3-spp-sm2-1cls-scalexy_150_5000.weights --cfg ../yolov3-spp-sm2-1cls-scalexy_150.cfg --img-size 320 --conf-thres 0.2  # test
105 | python3 test.py --data ../supermarket2/supermarket2.data --weights weights/yolov3-spp-sm2-1cls-scalexy_200_5000.weights --cfg ../yolov3-spp-sm2-1cls-scalexy_200.cfg --img-size 320 --conf-thres 0.2  # test
106 | python3 test.py --data ../supermarket2/supermarket2.data --weights ../darknet/backup/yolov3-spp-sm2-1cls-scalexy_variable_5000.weights --cfg ../yolov3-spp-sm2-1cls-scalexy_variable.cfg --img-size 320 --conf-thres 0.2  # test
107 | 
108 | python3 train.py --img-size 320 --epochs 27 --batch-size 64 --accumulate 1 --nosave --notest && python3 test.py --weights weights/last.pt --img-size 320 --save-json && sudo shutdown
109 | 
110 | # Debug/Development
111 | python3 train.py --data data/coco.data --img-size 320 --single-scale --batch-size 64 --accumulate 1 --epochs 1 --evolve --giou
112 | python3 test.py --weights weights/last.pt --cfg cfg/yolov3-spp.cfg --img-size 320
113 | 
114 | gsutil cp evolve.txt gs://ultralytics
115 | sudo shutdown
116 | 
117 | #Docker
118 | sudo docker kill $(sudo docker ps -q)
119 | sudo docker pull ultralytics/yolov3:v1
120 | sudo nvidia-docker run -it --ipc=host --mount type=bind,source="$(pwd)"/coco,target=/usr/src/coco ultralytics/yolov3:v1
121 | 
122 | clear
123 | while true
124 | do
125 |   python3 train.py --data data/coco.data --img-size 320 --batch-size 64 --accumulate 1 --evolve --epochs 1 --adam --bucket yolov4/adamdefaultpw_coco_1e --device 1
126 | done
127 | 
128 | python3 train.py --data data/coco.data --img-size 320 --batch-size 64 --accumulate 1 --epochs 1 --adam --device 1 --prebias
129 | while true; do python3 train.py --data data/coco.data --img-size 320 --batch-size 64 --accumulate 1 --evolve --epochs 1 --adam --bucket yolov4/adamdefaultpw_coco_1e; done
130 | 


--------------------------------------------------------------------------------
/utils/google_utils.py:
--------------------------------------------------------------------------------
 1 | # This file contains google utils: https://cloud.google.com/storage/docs/reference/libraries
 2 | # pip install --upgrade google-cloud-storage
 3 | 
 4 | import os
 5 | import time
 6 | 
 7 | 
 8 | # from google.cloud import storage
 9 | 
10 | 
11 | def gdrive_download(id='1HaXkef9z6y5l4vUnCYgdmEAj61c6bfWO', name='coco.zip'):
12 |     # https://gist.github.com/tanaikech/f0f2d122e05bf5f971611258c22c110f
13 |     # Downloads a file from Google Drive, accepting presented query
14 |     # from utils.google_utils import *; gdrive_download()
15 |     t = time.time()
16 | 
17 |     print('Downloading https://drive.google.com/uc?export=download&id=%s as %s... ' % (id, name), end='')
18 |     if os.path.exists(name):  # remove existing
19 |         os.remove(name)
20 | 
21 |     # Attempt large file download
22 |     s = ["curl -c ./cookie -s -L \"https://drive.google.com/uc?export=download&id=%s\" > /dev/null" % id,
23 |          "curl -Lb ./cookie -s \"https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=%s\" -o %s" % (
24 |              id, name),
25 |          'rm ./cookie']
26 |     [os.system(x) for x in s]  # run commands
27 | 
28 |     # Attempt small file download
29 |     if not os.path.exists(name):  # file size < 40MB
30 |         s = 'curl -f -L -o %s https://drive.google.com/uc?export=download&id=%s' % (name, id)
31 |         os.system(s)
32 | 
33 |     # Unzip if archive
34 |     if name.endswith('.zip'):
35 |         print('unzipping... ', end='')
36 |         os.system('unzip -q %s' % name)  # unzip
37 |         os.remove(name)  # remove zip to free space
38 | 
39 |     print('Done (%.1fs)' % (time.time() - t))
40 | 
41 | 
42 | def upload_blob(bucket_name, source_file_name, destination_blob_name):
43 |     # Uploads a file to a bucket
44 |     # https://cloud.google.com/storage/docs/uploading-objects#storage-upload-object-python
45 | 
46 |     storage_client = storage.Client()
47 |     bucket = storage_client.get_bucket(bucket_name)
48 |     blob = bucket.blob(destination_blob_name)
49 | 
50 |     blob.upload_from_filename(source_file_name)
51 | 
52 |     print('File {} uploaded to {}.'.format(
53 |         source_file_name,
54 |         destination_blob_name))
55 | 
56 | 
57 | def download_blob(bucket_name, source_blob_name, destination_file_name):
58 |     # Uploads a blob from a bucket
59 |     storage_client = storage.Client()
60 |     bucket = storage_client.get_bucket(bucket_name)
61 |     blob = bucket.blob(source_blob_name)
62 | 
63 |     blob.download_to_filename(destination_file_name)
64 | 
65 |     print('Blob {} downloaded to {}.'.format(
66 |         source_blob_name,
67 |         destination_file_name))
68 | 


--------------------------------------------------------------------------------
/utils/init.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/utils/init.py


--------------------------------------------------------------------------------
/utils/kmeans/416/3/anchor_clusters.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/utils/kmeans/416/3/anchor_clusters.png


--------------------------------------------------------------------------------
/utils/kmeans/416/3/area_cluster.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/utils/kmeans/416/3/area_cluster.png


--------------------------------------------------------------------------------
/utils/kmeans/416/3/kmeans.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/utils/kmeans/416/3/kmeans.png


--------------------------------------------------------------------------------
/utils/kmeans/416/3/ratio_cluster.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/utils/kmeans/416/3/ratio_cluster.png


--------------------------------------------------------------------------------
/utils/kmeans/416/6/2019-10-31 09-02-05屏幕截图.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/utils/kmeans/416/6/2019-10-31 09-02-05屏幕截图.png


--------------------------------------------------------------------------------
/utils/kmeans/416/6/anchor_clusters.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/utils/kmeans/416/6/anchor_clusters.png


--------------------------------------------------------------------------------
/utils/kmeans/416/6/area_cluster.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/utils/kmeans/416/6/area_cluster.png


--------------------------------------------------------------------------------
/utils/kmeans/416/6/ratio_cluster.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/utils/kmeans/416/6/ratio_cluster.png


--------------------------------------------------------------------------------
/utils/kmeans/hrsc_512.txt:
--------------------------------------------------------------------------------
 1 | 28  10  
 2 | 50  12 
 3 | 61  6 
 4 | 69  8 
 5 | 73  16 
 6 | 79  10 
 7 | 86  12 
 8 | 108  13 
 9 | 111  17 
10 | 132  27 
11 | 134  15 
12 | 138  19 
13 | 155  22 
14 | 167  18 
15 | 175  28 
16 | 202  34 
17 | 251  42 
18 | 297  74 
19 | 


--------------------------------------------------------------------------------
/utils/kmeans/icdar_608_all.txt:
--------------------------------------------------------------------------------
 1 | 5  11 
 2 | 6  23 
 3 | 6  6 
 4 | 8  38 
 5 | 9  15 
 6 | 11  3 
 7 | 13  7 
 8 | 13  26 
 9 | 14  47 
10 | 20  5 
11 | 21  9 
12 | 27  75 
13 | 28  14 
14 | 35  9 
15 | 38  5 
16 | 44  19 
17 | 59  12 
18 | 88  28 
19 | 


--------------------------------------------------------------------------------
/utils/kmeans/icdar_608_care.txt:
--------------------------------------------------------------------------------
 1 | 6  14 
 2 | 8  21 
 3 | 8  36 
 4 | 12  26 
 5 | 13  6 
 6 | 13  44 
 7 | 16  9 
 8 | 20  8 
 9 | 21  65 
10 | 22  5 
11 | 23  12 
12 | 29  9 
13 | 33  7 
14 | 36  16 
15 | 36  112 
16 | 47  11 
17 | 61  18 
18 | 101  31 
19 | 


--------------------------------------------------------------------------------
/utils/nms/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/utils/nms/__init__.py


--------------------------------------------------------------------------------
/utils/nms/make.sh:
--------------------------------------------------------------------------------
1 | python setup.py build_ext --inplace
2 | 


--------------------------------------------------------------------------------
/utils/nms/nms.py:
--------------------------------------------------------------------------------
  1 | import torch 
  2 | from utils.nms.r_nms import r_nms
  3 | 
  4 | def non_max_suppression(prediction, conf_thres=0.5, nms_thres=0.5):
  5 |     """
  6 |     Removes detections with lower object confidence score than 'conf_thres'
  7 |     Non-Maximum Suppression to further filter detections.
  8 |     Returns detections with shape:
  9 |         (x, y, w, h, a, object_conf, class_conf, class)
 10 |     """
 11 |     # prediction: torch.Size([1, 8190, 8]) 第一维bs是图片数,第二维是所有的proposal,第三维是xywh + conf + classes(这里是三类)
 12 |     min_wh = 2  # (pixels) minimum box width and height
 13 |     output = [None] * len(prediction)
 14 |     for image_i, pred in enumerate(prediction):
 15 |         # Experiment: Prior class size rejection
 16 |         # x, y, w, h = pred[:, 0], pred[:, 1], pred[:, 2], pred[:, 3]
 17 |         # a = w * h  # area
 18 |         # ar = w / (h + 1e-16)  # aspect ratio
 19 |         # n = len(w)
 20 |         # log_w, log_h, log_a, log_ar = torch.log(w), torch.log(h), torch.log(a), torch.log(ar)
 21 |         # shape_likelihood = np.zeros((n, 60), dtype=np.float32)
 22 |         # x = np.concatenate((log_w.reshape(-1, 1), log_h.reshape(-1, 1)), 1)
 23 |         # from scipy.stats import multivariate_normal
 24 |         # for c in range(60):
 25 |         # shape_likelihood[:, c] =
 26 |         #   multivariate_normal.pdf(x, mean=mat['class_mu'][c, :2], cov=mat['class_cov'][c, :2, :2])
 27 | 
 28 |         if prediction.numel() == 0: # for multi-scale filtered result , in case of 0
 29 |             continue
 30 | 
 31 |         # Multiply conf by class conf to get combined confidence
 32 |         # max(1)是按照1维搜索,对每个proposal取出多分类分数,得到最大的那个值
 33 |         # 返回值class_conf和索引class_pred,索引就是类别所属
 34 |         class_conf, class_pred = pred[:, 6:].max(1)     # max(1) 是每行找最大的，即当前proposal最可能是哪个类
 35 |         pred[:, 5] *= class_conf            # 乘以conf才是真正的得分,赋值到conf的位置
 36 | 
 37 |         # Select only suitable predictions
 38 |         # 先创造一个满足要求的索引bool矩阵,然后据此第二步进行索引
 39 |         # 条件为:1.最大类的conf大于预设值   2.该anchor的预测wh大于2像素   3.非nan或无穷
 40 |         i = (pred[:, 5] > conf_thres) & (pred[:, 2:4] > min_wh).all(1) & torch.isfinite(pred).all(1)
 41 |         pred = pred[i]
 42 | 
 43 |         # If none are remaining => process next image
 44 |         if len(pred) == 0:
 45 |             continue
 46 | 
 47 |         # Select predicted classes
 48 |         class_conf = class_conf[i]  # bool向量筛掉False的conf
 49 |         class_pred = class_pred[i].unsqueeze(1).float() # torch.Size([num_of_proposal]) --> torch.Size([num_of_proposal,1])便于后面的concat
 50 | 
 51 |         use_cuda_nms = True
 52 |         # use_cuda时方案是不限于100个，因为有可能产生很多的高得分proposal，会误删
 53 |         if  use_cuda_nms:
 54 |             det_max = []
 55 |             pred = torch.cat((pred[:, :6], class_conf.unsqueeze(1), class_pred), 1)
 56 |             pred = pred[(-pred[:, 5]).argsort()]
 57 |             for c in pred[:, -1].unique():
 58 |                 dc = pred[pred[:, -1] == c]
 59 |                 dc = dc[(-dc[:, 5]).argsort()]
 60 |                 # if len(dc)>100:   # 如果proposal实在太多，取100个
 61 |                 #     dc = dc[:100]
 62 | 
 63 |                 # Non-maximum suppression
 64 |                 inds = r_nms(dc[:,:6], nms_thres)
 65 |                 
 66 |                 det_max.append(dc[inds])
 67 |             if len(det_max):
 68 |                 det_max = torch.cat(det_max)  # concatenate
 69 |                 output[image_i] = det_max[(-det_max[:, 5]).argsort()]  # sort
 70 | 
 71 |         else:
 72 |             # Detections ordered as (x1y1x2y2, obj_conf, class_conf, class_pred)
 73 |             pred = torch.cat((pred[:, :6], class_conf.unsqueeze(1), class_pred), 1)
 74 | 
 75 |             # Get detections sorted by decreasing confidence scores
 76 |             pred = pred[(-pred[:, 5]).argsort()]
 77 | 
 78 |             det_max = []
 79 |             nms_style = 'OR'  # 'OR' (default), 'AND', 'MERGE' (experimental)
 80 | 
 81 |             for c in pred[:, -1].unique():
 82 |                 dc = pred[pred[:, -1] == c]  # select class c #  shape [num,7]  7 = (x1, y1, x2, y2, object_conf, class_conf)
 83 |                 n = len(dc)
 84 |                 if n == 1:
 85 |                     det_max.append(dc)  # No NMS required if only 1 prediction
 86 |                     continue
 87 |                 elif n > 100:
 88 |                     dc = dc[:100]  # limit to first 100 boxes: https://github.com/ultralytics/yolov3/issues/117
 89 | 
 90 |                 # Non-maximum suppression
 91 |                 if nms_style == 'OR':  # default
 92 |                     # METHOD1
 93 |                     # ind = list(range(len(dc)))
 94 |                     # while len(ind):
 95 |                     # j = ind[0]
 96 |                     # det_max.append(dc[j:j + 1])  # save highest conf detection
 97 |                     # reject = (skew_bbox_iou(dc[j], dc[ind]) > nms_thres).nonzero()
 98 |                     # [ind.pop(i) for i in reversed(reject)]
 99 | 
100 |                     # METHOD2
101 |                     while dc.shape[0]:
102 |                         det_max.append(dc[:1])  # save highest conf detection
103 |                         if len(dc) == 1:  # Stop if we're at the last detection
104 |                             break
105 |                         iou = skew_bbox_iou(dc[0], dc[1:])  # iou with other boxes
106 |                         dc = dc[1:][iou < nms_thres]  # remove ious > threshold
107 | 
108 |                 elif nms_style == 'AND':  # requires overlap, single boxes erased
109 |                     while len(dc) > 1:
110 |                         iou = skew_bbox_iou(dc[0], dc[1:])  # iou with other boxes
111 |                         if iou.max() > 0.5:
112 |                             det_max.append(dc[:1])
113 |                         dc = dc[1:][iou < nms_thres]  # remove ious > threshold
114 | 
115 |                 elif nms_style == 'MERGE':  # weighted mixture box
116 |                     while len(dc):
117 |                         if len(dc) == 1:
118 |                             det_max.append(dc)
119 |                             break
120 |                         # 有个bug:如果当前一批box中和最高conf(排序后是第一个也就是dc[0])的iou都小于nms_thres,
121 |                         # 那么i全为False,导致weights=[],从而weights.sum()=0导致dc[0]变成nan!
122 |                         i = skew_bbox_iou(dc[0], dc) > nms_thres  # iou with other boxes, 返回的也是boolean,便于后面矩阵索引和筛选
123 |                         weights = dc[i, 5:6]    # 大于nms阈值的重复较多的proposal,取出conf
124 |                         assert len(weights)>0, 'Bugs on MERGE NMS!!'
125 |                         dc[0, :5] = (weights * dc[i, :5]).sum(0) / weights.sum()    # 将最高conf的bbox代之为大于阈值的所有bbox加权结果(conf不变,变了也没意义)
126 |                         det_max.append(dc[:1])
127 |                         dc = dc[i == 0]         # bool的false等价于0,这一步将dc中的已经计算过的predbox剔除掉
128 | 
129 |                 elif nms_style == 'SOFT':  # soft-NMS https://arxiv.org/abs/1704.04503
130 |                     sigma = 0.5  # soft-nms sigma parameter
131 |                     while len(dc):
132 |                         if len(dc) == 1:
133 |                             det_max.append(dc)
134 |                             break
135 |                         det_max.append(dc[:1])
136 |                         iou = skew_bbox_iou(dc[0], dc[1:])  # iou with other boxes
137 |                         dc = dc[1:]
138 |                         dc[:, 4] *= torch.exp(-iou ** 2 / sigma)  # decay confidences
139 |                         # dc = dc[dc[:, 4] > nms_thres]  # new line per https://github.com/ultralytics/yolov3/issues/362
140 | 
141 |             if len(det_max):
142 |                 det_max = torch.cat(det_max)  # concatenate
143 |                 import ipdb; ipdb.set_trace()
144 |                 output[image_i] = det_max[(-det_max[:, 5]).argsort()]  # sort
145 |             
146 | 
147 |     return output
148 | 
149 | 
150 | 


--------------------------------------------------------------------------------
/utils/nms/nms_wrapper_test.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import torch
 3 | import cv2
 4 | import math
 5 | 
 6 | import r_nms
 7 | 
 8 | 
 9 | def get_rotated_coors(box):
10 |     assert len(box) > 0 , 'Input valid box!'
11 |     cx = box[0]; cy = box[1]; w = box[2]; h = box[3]; a = box[4]
12 |     xmin = cx - w*0.5; xmax = cx + w*0.5; ymin = cy - h*0.5; ymax = cy + h*0.5
13 |     t_x0=xmin; t_y0=ymin; t_x1=xmin; t_y1=ymax; t_x2=xmax; t_y2=ymax; t_x3=xmax; t_y3=ymin
14 |     R = np.eye(3)
15 |     R[:2] = cv2.getRotationMatrix2D(angle=-a*180/math.pi, center=(cx,cy), scale=1)
16 |     x0 = t_x0*R[0,0] + t_y0*R[0,1] + R[0,2] 
17 |     y0 = t_x0*R[1,0] + t_y0*R[1,1] + R[1,2] 
18 |     x1 = t_x1*R[0,0] + t_y1*R[0,1] + R[0,2] 
19 |     y1 = t_x1*R[1,0] + t_y1*R[1,1] + R[1,2] 
20 |     x2 = t_x2*R[0,0] + t_y2*R[0,1] + R[0,2] 
21 |     y2 = t_x2*R[1,0] + t_y2*R[1,1] + R[1,2] 
22 |     x3 = t_x3*R[0,0] + t_y3*R[0,1] + R[0,2] 
23 |     y3 = t_x3*R[1,0] + t_y3*R[1,1] + R[1,2] 
24 | 
25 |     if isinstance(x0,torch.Tensor):
26 |         r_box=torch.cat([x0.unsqueeze(0),y0.unsqueeze(0),
27 |                          x1.unsqueeze(0),y1.unsqueeze(0),
28 |                          x2.unsqueeze(0),y2.unsqueeze(0),
29 |                          x3.unsqueeze(0),y3.unsqueeze(0)], 0)
30 |     else:
31 |         r_box = np.array([x0,y0,x1,y1,x2,y2,x3,y3])
32 |     return r_box
33 | 
34 | if __name__ == '__main__':
35 |     boxes = np.array([[150, 150, 100, 100, 0,       0.99,   0.1],
36 |                       [160, 160, 100, 100, 0,       0.88,   0.1],
37 |                       [150, 150, 100, 100, -0.7854, 0.66,   0.1],
38 |                       [300, 300, 100, 100, 0.,      0.77,   0.1]],dtype=np.float32)
39 |     
40 |     dets_th=torch.from_numpy(boxes).cuda()
41 |     import ipdb; ipdb.set_trace()
42 |     iou_thr = 0.1
43 |     inds = r_nms.r_nms(dets_th, iou_thr)
44 |     print(inds)
45 | 
46 |     img = np.zeros((416*2,416*2,3), np.uint8)
47 |     img.fill(255)
48 |     
49 |     boxes = boxes[:,:-1]
50 |     boxes = [get_rotated_coors(i).reshape(-1,2).astype(np.int32)  for i in boxes]
51 |     for box in boxes:
52 |         img = cv2.polylines(img,[box],True,(0,0,255),1)
53 |         cv2.imshow('anchor_show', img)
54 |     cv2.waitKey(0)
55 |     cv2.destroyAllWindows()


--------------------------------------------------------------------------------
/utils/nms/setup.py:
--------------------------------------------------------------------------------
 1 | from setuptools import setup
 2 | from torch.utils.cpp_extension import BuildExtension, CUDAExtension
 3 | 
 4 | setup(
 5 |     name='r_nms',
 6 |     ext_modules=[
 7 |         CUDAExtension('r_nms', [
 8 |             'src/rotate_polygon_nms.cpp',
 9 |             'src/rotate_polygon_nms_kernel.cu',
10 |         ]),
11 |     ],
12 |     cmdclass={'build_ext': BuildExtension})
13 | 
14 | 


--------------------------------------------------------------------------------
/utils/nms/src/rotate_polygon_nms.cpp:
--------------------------------------------------------------------------------
 1 | #include <torch/extension.h>
 2 | 
 3 | #define CHECK_CUDA(x) AT_CHECK(x.type().is_cuda(), #x, " must be a CUDAtensor ")
 4 | 
 5 | at::Tensor nms_cuda(const at::Tensor boxes, float nms_overlap_thresh);
 6 | 
 7 | at::Tensor r_nms(const at::Tensor& dets, const float threshold) {
 8 |   CHECK_CUDA(dets);
 9 |   if (dets.numel() == 0)
10 |     return at::empty({0}, dets.options().dtype(at::kLong).device(at::kCPU));
11 |   return nms_cuda(dets, threshold);
12 | }
13 | 
14 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
15 |   m.def("r_nms", &r_nms, "r_nms rnms");
16 | }


--------------------------------------------------------------------------------
/utils/nms/src/rotate_polygon_nms_kernel.cu:
--------------------------------------------------------------------------------
  1 | #include <ATen/ATen.h>
  2 | #include <ATen/cuda/CUDAContext.h>
  3 | 
  4 | #include <THC/THC.h>
  5 | #include <THC/THCDeviceUtils.cuh>
  6 | 
  7 | #include <vector>
  8 | #include <iostream>
  9 | 
 10 | #define CUDA_CHECK(condition) \
 11 |   /* Code block avoids redefinition of cudaError_t error */ \
 12 |   do { \
 13 |     cudaError_t error = condition; \
 14 |     if (error != cudaSuccess) { \
 15 |       std::cout << cudaGetErrorString(error) << std::endl; \
 16 | 	    } \
 17 |     } while (0)
 18 | 
 19 | #define DIVUP(m,n) ((m) / (n) + ((m) % (n) > 0))
 20 | int const threadsPerBlock = sizeof(unsigned long long) * 8;
 21 | 
 22 | __device__ inline float trangle_area(float * a, float * b, float * c) {
 23 | 	return ((a[0] - c[0]) * (b[1] - c[1]) - (a[1] - c[1]) * (b[0] - c[0])) / 2.0;
 24 | }
 25 | 
 26 | __device__ inline float area(float * int_pts, int num_of_inter) {
 27 | 
 28 | 	float area = 0.0;
 29 | 	for (int i = 0; i < num_of_inter - 2; i++) {
 30 | 		area += fabs(trangle_area(int_pts, int_pts + 2 * i + 2, int_pts + 2 * i + 4));
 31 | 	}
 32 | 	return area;
 33 | }
 34 | 
 35 | __device__ inline void reorder_pts(float * int_pts, int num_of_inter) {
 36 | 
 37 | 
 38 | 
 39 | 	if (num_of_inter > 0) {
 40 | 
 41 | 		float center[2];
 42 | 
 43 | 		center[0] = 0.0;
 44 | 		center[1] = 0.0;
 45 | 
 46 | 		for (int i = 0; i < num_of_inter; i++) {
 47 | 			center[0] += int_pts[2 * i];
 48 | 			center[1] += int_pts[2 * i + 1];
 49 | 		}
 50 | 		center[0] /= num_of_inter;
 51 | 		center[1] /= num_of_inter;
 52 | 
 53 | 		float vs[16];
 54 | 		float v[2];
 55 | 		float d;
 56 | 		for (int i = 0; i < num_of_inter; i++) {
 57 | 			v[0] = int_pts[2 * i] - center[0];
 58 | 			v[1] = int_pts[2 * i + 1] - center[1];
 59 | 			d = sqrt(v[0] * v[0] + v[1] * v[1]);
 60 | 			v[0] = v[0] / d;
 61 | 			v[1] = v[1] / d;
 62 | 			if (v[1] < 0) {
 63 | 				v[0] = -2 - v[0];
 64 | 			}
 65 | 			vs[i] = v[0];
 66 | 		}
 67 | 
 68 | 		float temp, tx, ty;
 69 | 		int j;
 70 | 		for (int i = 1; i<num_of_inter; ++i){
 71 | 			if (vs[i - 1]>vs[i]){
 72 | 				temp = vs[i];
 73 | 				tx = int_pts[2 * i];
 74 | 				ty = int_pts[2 * i + 1];
 75 | 				j = i;
 76 | 				while (j>0 && vs[j - 1]>temp){
 77 | 					vs[j] = vs[j - 1];
 78 | 					int_pts[j * 2] = int_pts[j * 2 - 2];
 79 | 					int_pts[j * 2 + 1] = int_pts[j * 2 - 1];
 80 | 					j--;
 81 | 				}
 82 | 				vs[j] = temp;
 83 | 				int_pts[j * 2] = tx;
 84 | 				int_pts[j * 2 + 1] = ty;
 85 | 			}
 86 | 		}
 87 | 	}
 88 | 
 89 | }
 90 | __device__ inline bool inter2line(float * pts1, float *pts2, int i, int j, float * temp_pts) {
 91 | 
 92 | 	float a[2];
 93 | 	float b[2];
 94 | 	float c[2];
 95 | 	float d[2];
 96 | 
 97 | 	float area_abc, area_abd, area_cda, area_cdb;
 98 | 
 99 | 	a[0] = pts1[2 * i];
100 | 	a[1] = pts1[2 * i + 1];
101 | 
102 | 	b[0] = pts1[2 * ((i + 1) % 4)];
103 | 	b[1] = pts1[2 * ((i + 1) % 4) + 1];
104 | 
105 | 	c[0] = pts2[2 * j];
106 | 	c[1] = pts2[2 * j + 1];
107 | 
108 | 	d[0] = pts2[2 * ((j + 1) % 4)];
109 | 	d[1] = pts2[2 * ((j + 1) % 4) + 1];
110 | 
111 | 	area_abc = trangle_area(a, b, c);
112 | 	area_abd = trangle_area(a, b, d);
113 | 
114 | 	if (area_abc * area_abd >= 0) {
115 | 		return false;
116 | 	}
117 | 
118 | 	area_cda = trangle_area(c, d, a);
119 | 	area_cdb = area_cda + area_abc - area_abd;
120 | 
121 | 	if (area_cda * area_cdb >= 0) {
122 | 		return false;
123 | 	}
124 | 	float t = area_cda / (area_abd - area_abc);
125 | 
126 | 	float dx = t * (b[0] - a[0]);
127 | 	float dy = t * (b[1] - a[1]);
128 | 	temp_pts[0] = a[0] + dx;
129 | 	temp_pts[1] = a[1] + dy;
130 | 
131 | 	return true;
132 | }
133 | 
134 | __device__ inline bool in_rect(float pt_x, float pt_y, float * pts) {
135 | 
136 | 	float ab[2];
137 | 	float ad[2];
138 | 	float ap[2];
139 | 
140 | 	float abab;
141 | 	float abap;
142 | 	float adad;
143 | 	float adap;
144 | 
145 | 	ab[0] = pts[2] - pts[0];
146 | 	ab[1] = pts[3] - pts[1];
147 | 
148 | 	ad[0] = pts[6] - pts[0];
149 | 	ad[1] = pts[7] - pts[1];
150 | 
151 | 	ap[0] = pt_x - pts[0];
152 | 	ap[1] = pt_y - pts[1];
153 | 
154 | 	abab = ab[0] * ab[0] + ab[1] * ab[1];
155 | 	abap = ab[0] * ap[0] + ab[1] * ap[1];
156 | 	adad = ad[0] * ad[0] + ad[1] * ad[1];
157 | 	adap = ad[0] * ap[0] + ad[1] * ap[1];
158 | 
159 | 	return abab >= abap and abap >= 0 and adad >= adap and adap >= 0;
160 | }
161 | 
162 | __device__ inline int inter_pts(float * pts1, float * pts2, float * int_pts) {
163 | 
164 | 	int num_of_inter = 0;
165 | 
166 | 	for (int i = 0; i < 4; i++) {
167 | 		if (in_rect(pts1[2 * i], pts1[2 * i + 1], pts2)) {
168 | 			int_pts[num_of_inter * 2] = pts1[2 * i];
169 | 			int_pts[num_of_inter * 2 + 1] = pts1[2 * i + 1];
170 | 			num_of_inter++;
171 | 		}
172 | 		if (in_rect(pts2[2 * i], pts2[2 * i + 1], pts1)) {
173 | 			int_pts[num_of_inter * 2] = pts2[2 * i];
174 | 			int_pts[num_of_inter * 2 + 1] = pts2[2 * i + 1];
175 | 			num_of_inter++;
176 | 		}
177 | 	}
178 | 
179 | 	float temp_pts[2];
180 | 
181 | 	for (int i = 0; i < 4; i++) {
182 | 		for (int j = 0; j < 4; j++) {
183 | 			bool has_pts = inter2line(pts1, pts2, i, j, temp_pts);
184 | 			if (has_pts) {
185 | 				int_pts[num_of_inter * 2] = temp_pts[0];
186 | 				int_pts[num_of_inter * 2 + 1] = temp_pts[1];
187 | 				num_of_inter++;
188 | 			}
189 | 		}
190 | 	}
191 | 
192 | 
193 | 	return num_of_inter;
194 | }
195 | 
196 | __device__ inline void convert_region(float * pts, float const * const region) {
197 | 
198 | 	float angle = region[4];
199 | 	//float a_cos = cos(angle / 180.0*3.1415926535);
200 | 	//float a_sin = sin(angle / 180.0*3.1415926535);
201 | 	float a_cos = cos(angle);
202 | 	float a_sin = sin(angle);
203 | 
204 | 	float ctr_x = region[0];
205 | 	float ctr_y = region[1];
206 | 
207 | 	float w = region[2];
208 | 	float h = region[3];
209 | 
210 | 	float pts_x[4];
211 | 	float pts_y[4];
212 | 
213 | 	pts_x[0] = -w / 2;
214 | 	pts_x[1] = w / 2;
215 | 	pts_x[2] = w / 2;
216 | 	pts_x[3] = -w / 2;
217 | 
218 | 	pts_y[0] = -h / 2;
219 | 	pts_y[1] = -h / 2;
220 | 	pts_y[2] = h / 2;
221 | 	pts_y[3] = h / 2;
222 | 
223 | 	for (int i = 0; i < 4; i++) {
224 | 		pts[7 - 2 * i - 1] = a_cos * pts_x[i] - a_sin * pts_y[i] + ctr_x;
225 | 		pts[7 - 2 * i] = a_sin * pts_x[i] + a_cos * pts_y[i] + ctr_y;
226 | 
227 | 	}
228 | 
229 | }
230 | 
231 | 
232 | __device__ inline float inter(float const * const region1, float const * const region2) {
233 | 
234 | 	float pts1[8];
235 | 	float pts2[8];
236 | 	float int_pts[16];
237 | 	int num_of_inter;
238 | 
239 | 	convert_region(pts1, region1);
240 | 	convert_region(pts2, region2);
241 | 
242 | 	num_of_inter = inter_pts(pts1, pts2, int_pts);
243 | 
244 | 	reorder_pts(int_pts, num_of_inter);
245 | 
246 | 	return area(int_pts, num_of_inter);
247 | 
248 | 
249 | }
250 | 
251 | __device__ inline float devRotateIoU(float const * const region1, float const * const region2) {
252 | 
253 | 	float area1 = region1[2] * region1[3];
254 | 	float area2 = region2[2] * region2[3];
255 | 	float area_inter = inter(region1, region2);
256 | 
257 | 	return area_inter / (area1 + area2 - area_inter);
258 | 
259 | 
260 | }
261 | 
262 | __global__ void rotate_nms_kernel(const int n_boxes, const float nms_overlap_thresh,
263 | 	const float *dev_boxes, unsigned long long *dev_mask) {
264 | 	const int row_start = blockIdx.y;
265 | 	const int col_start = blockIdx.x;
266 | 
267 | 	// if (row_start > col_start) return;
268 | 
269 | 	const int row_size =
270 | 		min(n_boxes - row_start * threadsPerBlock, threadsPerBlock);
271 | 	const int col_size =
272 | 		min(n_boxes - col_start * threadsPerBlock, threadsPerBlock);
273 | 
274 | 	__shared__ float block_boxes[threadsPerBlock * 6];
275 | 	if (threadIdx.x < col_size) {
276 | 		block_boxes[threadIdx.x * 6 + 0] =
277 | 			dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 6 + 0];
278 | 		block_boxes[threadIdx.x * 6 + 1] =
279 | 			dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 6 + 1];
280 | 		block_boxes[threadIdx.x * 6 + 2] =
281 | 			dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 6 + 2];
282 | 		block_boxes[threadIdx.x * 6 + 3] =
283 | 			dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 6 + 3];
284 | 		block_boxes[threadIdx.x * 6 + 4] =
285 | 			dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 6 + 4];
286 | 		block_boxes[threadIdx.x * 6 + 5] =
287 | 			dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 6 + 5];
288 | 	}
289 | 	__syncthreads();
290 | 
291 | 	if (threadIdx.x < row_size) {
292 | 		const int cur_box_idx = threadsPerBlock * row_start + threadIdx.x;
293 | 		const float *cur_box = dev_boxes + cur_box_idx * 6;
294 | 		int i = 0;
295 | 		unsigned long long t = 0;
296 | 		int start = 0;
297 | 		if (row_start == col_start) {
298 | 			start = threadIdx.x + 1;
299 | 		}
300 | 		for (i = start; i < col_size; i++) {
301 | 			if (devRotateIoU(cur_box, block_boxes + i * 6) > nms_overlap_thresh) {
302 | 				t |= 1ULL << i;
303 | 			}
304 | 		}
305 | 		const int col_blocks = DIVUP(n_boxes, threadsPerBlock);
306 | 		dev_mask[cur_box_idx * col_blocks + col_start] = t;
307 | 	}
308 | }
309 | 
310 | void _set_device(int device_id) {
311 | 	int current_device;
312 | 	CUDA_CHECK(cudaGetDevice(&current_device));
313 | 	if (current_device == device_id) {
314 | 		return;
315 | 	}
316 | 	// The call to cudaSetDevice must come before any calls to Get, which
317 | 	// may perform initialization using the GPU.
318 | 	CUDA_CHECK(cudaSetDevice(device_id));
319 | }
320 | 
321 | 
322 | // boxes is a N x 5 tensor
323 | at::Tensor nms_cuda(const at::Tensor boxes, float nms_overlap_thresh) {
324 | 	using scalar_t = float;
325 | 	AT_ASSERTM(boxes.type().is_cuda(), "boxes must be a CUDA tensor");
326 | 	auto scores = boxes.select(1, 5);									//dim=1, select the conf_score
327 | 	auto order_t = std::get<1>(scores.sort(0, /* descending=*/true));	//conf from high to low
328 | 	auto boxes_sorted = boxes.index_select(0, order_t);					// re-rank the boxes via conf 
329 | 
330 | 	int boxes_num = boxes.size(0);
331 | 
332 | 	const int col_blocks = THCCeilDiv(boxes_num, threadsPerBlock);
333 | 
334 | 	scalar_t* boxes_dev = boxes_sorted.data<scalar_t>();
335 | 
336 | 	THCState *state = at::globalContext().lazyInitCUDA(); // TODO replace with getTHCState
337 | 
338 | 	unsigned long long* mask_dev = NULL;
339 | 	//THCudaCheck(THCudaMalloc(state, (void**) &mask_dev,
340 | 	//                      boxes_num * col_blocks * sizeof(unsigned long long)));
341 | 
342 | 	mask_dev = (unsigned long long*) THCudaMalloc(state, boxes_num * col_blocks * sizeof(unsigned long long));
343 | 
344 | 	dim3 blocks(THCCeilDiv(boxes_num, threadsPerBlock),
345 | 		THCCeilDiv(boxes_num, threadsPerBlock));
346 | 	dim3 threads(threadsPerBlock);
347 | 	rotate_nms_kernel << <blocks, threads >> >(boxes_num,
348 | 		nms_overlap_thresh,
349 | 		boxes_dev,
350 | 		mask_dev);
351 | 
352 | 	std::vector<unsigned long long> mask_host(boxes_num * col_blocks);
353 | 	THCudaCheck(cudaMemcpy(&mask_host[0],
354 | 		mask_dev,
355 | 		sizeof(unsigned long long) * boxes_num * col_blocks,
356 | 		cudaMemcpyDeviceToHost));
357 | 
358 | 	std::vector<unsigned long long> remv(col_blocks);
359 | 	memset(&remv[0], 0, sizeof(unsigned long long) * col_blocks);
360 | 
361 | 	at::Tensor keep = at::empty({ boxes_num }, boxes.options().dtype(at::kLong).device(at::kCPU));
362 | 	int64_t* keep_out = keep.data<int64_t>();
363 | 
364 | 	int num_to_keep = 0;
365 | 	for (int i = 0; i < boxes_num; i++) {
366 | 		int nblock = i / threadsPerBlock;
367 | 		int inblock = i % threadsPerBlock;
368 | 
369 | 		if (!(remv[nblock] & (1ULL << inblock))) {
370 | 			keep_out[num_to_keep++] = i;
371 | 			unsigned long long *p = &mask_host[0] + i * col_blocks;
372 | 			for (int j = nblock; j < col_blocks; j++) {
373 | 				remv[j] |= p[j];
374 | 			}
375 | 		}
376 | 	}
377 | 
378 | 	THCudaFree(state, mask_dev);
379 | 	// TODO improve this part
380 | 	return std::get<0>(order_t.index({
381 | 		keep.narrow(/*dim=*/0, /*start=*/0, /*length=*/num_to_keep).to(
382 | 		order_t.device(), keep.scalar_type())
383 | 	}).sort(0, false));
384 | }


--------------------------------------------------------------------------------
/utils/parse_config.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import math
 3 | 
 4 | 
 5 | 
 6 | def cfg2anchors(val):
 7 |     if 'ara' in val:   # area, ratio, angle respectly
 8 |         val = val[val.index('ara')+3:]
 9 |         val = [i for i in val.split('/')  if len(i)!=0]  # ['12130, 42951, 113378 ', ' 4.18, 6.50, 8.75 ', '-60,-30,0,30,60,90']
10 |         areas = [float(i) for i in val[0].split(',')]
11 |         ratios = [float(i) for i in val[1].split(',')]  # w/h
12 |         angles = [float(i) for i in val[2].split(',')]   
13 |         anchors = []
14 |         for area in areas:
15 |             for ratio in ratios:
16 |                 for angle in angles:
17 |                     anchor_w = math.sqrt(area*ratio)
18 |                     anchor_h = math.sqrt(area/ratio)
19 |                     angle = angle*math.pi/180
20 |                     anchor   = [anchor_w, anchor_h, angle]
21 |                     anchors.append(anchor)
22 |         assert len(anchors) == len(areas)*len(ratios)*len(angles),'Something wrong in anchor settings.'
23 |         # print(np.array(anchors))
24 |         return np.array(anchors)
25 |     else:    # anchors generated via k-means, input anchor.txt 
26 |         # 默认是15度一个anchor
27 |         anchors_setting = val.strip(' ')
28 |         anchors = np.loadtxt(anchors_setting)
29 |         angle = np.array([i for i in range(-6,6)])*math.pi/12
30 |         anchors = np.concatenate([np.column_stack((np.expand_dims(i,0).repeat(len(angle),0),angle.T)) for i in anchors],0)
31 |         return anchors
32 | 
33 | 
34 | # cfg解析函数：
35 | #   将cfg的layer，setting等解析成dict的形式，返回一个包含这些dict的list；
36 | #   lsit的每个元素（dict）对应cfg文件的一个 [] 开头的block（如net等），第一个元素就是该block的性质如{'type': 'net'...}
37 | def parse_model_cfg(path):
38 |     # Parses the yolo-v3 layer configuration file and returns module definitions
39 |     file = open(path, 'r')
40 |     lines = file.read().split('\n')
41 |     lines = [x for x in lines if x and not x.startswith('#')]
42 |     lines = [x.rstrip().lstrip() for x in lines]  # get rid of fringe whitespaces
43 |     mdefs = []  # module definitions
44 |     for line in lines:
45 |         if line.startswith('['):  # This marks the start of a new block
46 |             mdefs.append({})
47 |             mdefs[-1]['type'] = line[1:-1].rstrip()
48 |             if mdefs[-1]['type'] == 'convolutional':
49 |                 mdefs[-1]['batch_normalize'] = 0  # pre-populate with zeros (may be overwritten later)
50 |         else:
51 |             key, val = line.split("=")
52 |             key = key.rstrip()
53 | 
54 |             if 'anchors' in key:
55 |                 mdefs[-1][key] =  cfg2anchors(val) # np anchors
56 |             else:
57 |                 mdefs[-1][key] = val.strip()
58 | 
59 |     return mdefs
60 | 
61 | # 像mmdetection一样，将配置文件转码return成dict的键值对形式，便于索引查询
62 | def parse_data_cfg(path):
63 |     # Parses the data configuration file
64 |     options = dict()
65 |     with open(path, 'r') as fp:
66 |         lines = fp.readlines()
67 | 
68 |     for line in lines:
69 |         line = line.strip()
70 |         if line == '' or line.startswith('#'):
71 |             continue
72 |         key, val = line.split('=')
73 |         options[key.strip()] = val.strip()
74 | 
75 |     return options
76 | 


--------------------------------------------------------------------------------
/utils/torch_utils.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | 
 3 | import torch
 4 | 
 5 | 
 6 | def init_seeds(seed=0):
 7 |     torch.manual_seed(seed)
 8 |     torch.cuda.manual_seed(seed)
 9 |     torch.cuda.manual_seed_all(seed)
10 | 
11 |     # Remove randomness (may be slower on Tesla GPUs) # https://pytorch.org/docs/stable/notes/randomness.html
12 |     if seed == 0:
13 |         torch.backends.cudnn.deterministic = True
14 |         torch.backends.cudnn.benchmark = False
15 | 
16 | 
17 | def select_device(device=None, apex=False):
18 |     if device == 'cpu':
19 |         pass
20 |     elif device:  # Set environment variable if device is specified
21 |         os.environ['CUDA_VISIBLE_DEVICES'] = device
22 | 
23 |     # apex if mixed precision training https://github.com/NVIDIA/apex
24 |     cuda = False if device == 'cpu' else torch.cuda.is_available()
25 |     device = torch.device('cuda:0' if cuda else 'cpu')
26 | 
27 |     if not cuda:
28 |         print('Using CPU')
29 |     if cuda:
30 |         c = 1024 ** 2  # bytes to MB
31 |         ng = torch.cuda.device_count()
32 |         x = [torch.cuda.get_device_properties(i) for i in range(ng)]
33 |         cuda_str = 'Using CUDA ' + ('Apex ' if apex else '')
34 |         for i in range(0, ng):
35 |             if i == 1:
36 |                 # torch.cuda.set_device(0)  # OPTIONAL: Set GPU ID
37 |                 cuda_str = ' ' * len(cuda_str)
38 |             print("%sdevice%g _CudaDeviceProperties(name='%s', total_memory=%dMB)" %
39 |                   (cuda_str, i, x[i].name, x[i].total_memory / c))
40 | 
41 |     print('')  # skip a line
42 |     return device
43 | 
44 | 
45 | def fuse_conv_and_bn(conv, bn):
46 |     # https://tehnokv.com/posts/fusing-batchnorm-and-conv/
47 |     with torch.no_grad():
48 |         # init
49 |         fusedconv = torch.nn.Conv2d(conv.in_channels,
50 |                                     conv.out_channels,
51 |                                     kernel_size=conv.kernel_size,
52 |                                     stride=conv.stride,
53 |                                     padding=conv.padding,
54 |                                     bias=True)
55 | 
56 |         # prepare filters
57 |         w_conv = conv.weight.clone().view(conv.out_channels, -1)
58 |         w_bn = torch.diag(bn.weight.div(torch.sqrt(bn.eps + bn.running_var)))
59 |         fusedconv.weight.copy_(torch.mm(w_bn, w_conv).view(fusedconv.weight.size()))
60 | 
61 |         # prepare spatial bias
62 |         if conv.bias is not None:
63 |             b_conv = conv.bias
64 |         else:
65 |             b_conv = torch.zeros(conv.weight.size(0))
66 |         b_bn = bn.bias - bn.weight.mul(bn.running_mean).div(torch.sqrt(bn.running_var + bn.eps))
67 |         fusedconv.bias.copy_(b_conv + b_bn)
68 | 
69 |         return fusedconv
70 | 


--------------------------------------------------------------------------------