├── .gitignore ├── HRSC2016 └── train │ ├── 100001675.jpg │ └── 100001675.txt ├── README.md ├── cfg ├── HRSC+ │ ├── hyp.py │ ├── yolov3_512.cfg │ ├── yolov3_512_ma.cfg │ ├── yolov3_512_matrix.cfg │ └── yolov3_512_se.cfg ├── HRSC │ ├── hyp.py │ ├── yolov3-416.cfg │ └── yolov3-m.cfg ├── ICDAR │ ├── hyp.py │ ├── yolov3_608.cfg │ └── yolov3_608_se.cfg ├── hyp_template.py ├── yolov3-m.cfg ├── yolov3-tiny.cfg └── yolov3.cfg ├── data ├── IC_eval │ └── ic15 │ │ └── rrc_evaluation_funcs.pyc ├── coco.data ├── coco.names ├── hrsc.data ├── hrsc.name ├── icdar.name ├── icdar_13+15.data ├── icdar_15.data └── icdar_15_all.data ├── demo.png ├── detect.py ├── experiment ├── HRSC+ │ ├── 1000_SE_nosample.txt │ └── 1000_normal.txt ├── HRSC │ ├── context │ │ ├── context-0.8.txt │ │ ├── context-1.25.txt │ │ └── context-1.6.txt │ ├── hyp.py │ ├── hyper │ │ ├── iou_0.05_ang_12.txt │ │ ├── iou_0.1-ang_12.txt │ │ ├── iou_0.1-ang_24.txt │ │ └── iou_0.3-ang_12.txt │ └── mul-scale │ │ ├── hyp.txt │ │ └── results.txt ├── IC15 │ ├── 0.1_12.txt │ ├── 0.3_12.txt │ ├── 0.5_12.txt │ ├── 0.5_6.txt │ ├── 0.7_12.txt │ └── ablation.png ├── ga-attention(_4).png └── tiny_test_gax4_o8_dh.png ├── make.sh ├── model ├── __init__.py ├── layer │ ├── DCNv2 │ │ ├── .gitignore │ │ ├── __init__.py │ │ ├── dcn_test.py │ │ ├── dcn_v2.py │ │ ├── make.sh │ │ ├── setup.py │ │ └── src │ │ │ ├── cpu │ │ │ ├── dcn_v2_cpu.cpp │ │ │ └── vision.h │ │ │ ├── cuda │ │ │ ├── dcn_v2_cuda.cu │ │ │ ├── dcn_v2_im2col_cuda.cu │ │ │ ├── dcn_v2_im2col_cuda.h │ │ │ ├── dcn_v2_psroi_pooling_cuda.cu │ │ │ └── vision.h │ │ │ ├── dcn_v2.h │ │ │ └── vision.cpp │ └── __init__.py ├── loss.py ├── model_utils.py ├── models.py └── sampler_ratio.png ├── study.txt ├── test.py ├── train.py └── utils ├── ICDAR ├── ICDAR2yolo.py └── icdar_utils.py ├── adabound.py ├── augment.py ├── datasets.py ├── gcp.sh ├── google_utils.py ├── init.py ├── kmeans ├── 416 │ ├── 3 │ │ ├── anchor_clusters.png │ │ ├── area_cluster.png │ │ ├── kmeans.png │ │ └── ratio_cluster.png │ └── 6 │ │ ├── 2019-10-31 09-02-05屏幕截图.png │ │ ├── anchor_clusters.png │ │ ├── area_cluster.png │ │ └── ratio_cluster.png ├── hrsc_512.txt ├── icdar_608_all.txt ├── icdar_608_care.txt └── kmeans.py ├── nms ├── __init__.py ├── make.sh ├── nms.py ├── nms_wrapper_test.py ├── setup.py └── src │ ├── rotate_polygon_nms.cpp │ └── rotate_polygon_nms_kernel.cu ├── parse_config.py ├── torch_utils.py └── utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.pt 2 | *.pth 3 | 4 | 5 | -------------------------------------------------------------------------------- /HRSC2016/train/100001675.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/HRSC2016/train/100001675.jpg -------------------------------------------------------------------------------- /HRSC2016/train/100001675.txt: -------------------------------------------------------------------------------- 1 | 0 0.748569 0.577177 0.488319 0.121866 1.415746 2 | 0 0.844500 0.501634 0.343307 0.049861 1.398959 3 | 0 0.221619 0.527416 0.332149 0.103440 -1.19584 4 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Rotated-Yolov3 2 | 3 | Rotaion object detection implemented with yolov3. 4 | 5 | --- 6 | 7 | Hello, the no-program [ryolov3](https://github.com/ming71/yolov3-polygon) is available now. Although not so many tricks are attached like this repo, it still achieves good results, and is friendly for beginners to learn, have a good luck. 8 | 9 | ## Update 10 | 11 | The latest code has been uploaded, unfortunately, due to my negligence, I incorrectly modified some parts of the code and did not save the historical version last year, which made it hard to reproduce the previous high performance. It is tentatively that there are some problems in the loss calculation part. 12 | 13 | But I found from the experimental results left last year that yolov3 is suitable for rotation detection. After using several tricks (attention, ORN, Mish, and etc.), it have achieved good performance. More previous experiment results can be found [here](https://github.com/ming71/rotate-yolo/blob/master/experiment). 14 | 15 | ## Support 16 | * SEBlock 17 | * CUDA RNMS 18 | * riou loss 19 | * Inception module 20 | * DCNv2 21 | * ORN 22 | * SeparableConv 23 | * Mish/Swish 24 | * GlobalAttention 25 | 26 | ## Detection Results 27 | 28 | The detection results from rotated yolov3 left over last year: 29 | 30 |
31 | 32 | ## Q&A 33 | 34 | Following questions are frequently mentioned. And if you have something unclear, don't doubt and contact me via opening issues. 35 | 36 | * Q: How can I obtain `icdar_608_care.txt`? 37 | 38 | A: `icdar_608_care.txt` sets the initial anchors generated via kmeans, you need to run `kmeans.py` refer to my implemention [here](https://github.com/ming71/toolbox/blob/master/kmeans.py). You can also check `utils/parse_config.py` for more details. 39 | 40 | * Q: How to train the model on my own dataset? 41 | 42 | A: This ryolo implemention is based on this [repo](https://github.com/ultralytics/yolov3), training and evaluation pipeline are the same as that one do. 43 | 44 | * Q: Where is ORN codes? 45 | 46 | A: I'll release the whole codebase as I return school, and this [repo](https://github.com/ming71/CUDA/tree/master/ORN) may help. 47 | 48 | * Q: I cannot reproduce the result you reported(80 mAP for hrsc and 0.7 F1 for IC15). 49 | * A: Refer to my reply [here](https://github.com/ming71/rotate-yolov3/issues/14#issuecomment-663328130). This is only a backup repo, the overall model is no problem, but **direct running does not necessarily guarantee good results**, cause it is not the latest version, and some parameters may have problems, you need to adjust some details and parameter settings yourself. 50 | I will upload the complete executable code as soon as I return to school in September (if lucky). 51 | 52 | ## In the end 53 | There is no need or time to maintain the codebase to reproduce the previous performance. If you are interested in this work, you are welcome to fix the bugs in this codebase, and the trained models are available [here](https://pan.baidu.com/s/1EXhyGSiuUIPnkZ7cwpfCbQ) with extracted code `5noq` . I'll reimplement the rotation yolov4 or yolov5 if time permitting in the future. -------------------------------------------------------------------------------- /cfg/HRSC+/hyp.py: -------------------------------------------------------------------------------- 1 | giou: 0.1 # giou loss gain 1.582 2 | cls: 27.76 # cls loss gain (CE=~1.0, uCE=~20) 3 | cls_pw: 1.446 # cls BCELoss positive_weight 4 | obj: 20.35 # obj loss gain (*=80 for uBCE with 80 classes) 5 | obj_pw: 3.941 # obj BCELoss positive_weight 6 | iou_t: 0.5 # iou training threshold 7 | ang_t: 3.1415926/12 8 | reg: 1.0 9 | fl_gamma: 0.5 # focal loss gamma 10 | context_factor: 1.0 # 按照短边h来设置的,wh的增幅相同; 调试时设为倒数直接检测 11 | 12 | 13 | # lr 14 | lr0: 0.0001 15 | multiplier:10 16 | warm_epoch:5 17 | lrf: -4. # final LambdaLR learning rate = lr0 * (10 ** lrf) 18 | momentum: 0.97 # SGD momentum 19 | weight_decay: 0.0004569 # optimizer weight decay 20 | 21 | 22 | # aug 23 | hsv_s: 0.5 # image HSV-Saturation augmentation (fraction) 24 | hsv_v: 0.3 # image HSV-Value augmentation (fraction) 25 | degrees: 5.0 # image rotation (+/- deg) 26 | translate: 0.1 # image translation (+/- fraction) 27 | scale: 0.1 # image scale (+/- gain) 28 | shear: 0.0 29 | gamma: 0.2 30 | blur: 1.3 31 | noise: 0.01 32 | contrast: 0.15 33 | sharpen: 0.15 34 | copypaste: 0.1 # 船身 h 的 3sigma 段位以内 35 | grayscale: 0.3 # 灰度强度为0.3-1.0 36 | 37 | 38 | # training 39 | epochs: 1000 40 | batch_size: 4 41 | save_interval: 300 42 | test_interval: 5 43 | -------------------------------------------------------------------------------- /cfg/HRSC+/yolov3_512.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | #batch=1 4 | #subdivisions=1 5 | # Training 6 | batch=16 7 | subdivisions=1 8 | width=512 9 | height=512 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | [convolutional] 26 | batch_normalize=1 27 | filters=32 28 | size=3 29 | stride=1 30 | pad=1 31 | activation=leaky 32 | 33 | # Downsample 34 | 35 | [convolutional] 36 | batch_normalize=1 37 | filters=64 38 | size=3 39 | stride=2 40 | pad=1 41 | activation=leaky 42 | 43 | [convolutional] 44 | batch_normalize=1 45 | filters=32 46 | size=1 47 | stride=1 48 | pad=1 49 | activation=leaky 50 | 51 | [convolutional] 52 | batch_normalize=1 53 | filters=64 54 | size=3 55 | stride=1 56 | pad=1 57 | activation=leaky 58 | 59 | [shortcut] 60 | from=-3 61 | activation=linear 62 | 63 | # Downsample 64 | 65 | [convolutional] 66 | batch_normalize=1 67 | filters=128 68 | size=3 69 | stride=2 70 | pad=1 71 | activation=leaky 72 | 73 | [convolutional] 74 | batch_normalize=1 75 | filters=64 76 | size=1 77 | stride=1 78 | pad=1 79 | activation=leaky 80 | 81 | [convolutional] 82 | batch_normalize=1 83 | filters=128 84 | size=3 85 | stride=1 86 | pad=1 87 | activation=leaky 88 | 89 | [shortcut] 90 | from=-3 91 | activation=linear 92 | 93 | [convolutional] 94 | batch_normalize=1 95 | filters=64 96 | size=1 97 | stride=1 98 | pad=1 99 | activation=leaky 100 | 101 | [convolutional] 102 | batch_normalize=1 103 | filters=128 104 | size=3 105 | stride=1 106 | pad=1 107 | activation=leaky 108 | 109 | [shortcut] 110 | from=-3 111 | activation=linear 112 | 113 | # Downsample 114 | 115 | [convolutional] 116 | batch_normalize=1 117 | filters=256 118 | size=3 119 | stride=2 120 | pad=1 121 | activation=leaky 122 | 123 | [convolutional] 124 | batch_normalize=1 125 | filters=128 126 | size=1 127 | stride=1 128 | pad=1 129 | activation=leaky 130 | 131 | [convolutional] 132 | batch_normalize=1 133 | filters=256 134 | size=3 135 | stride=1 136 | pad=1 137 | activation=leaky 138 | 139 | [shortcut] 140 | from=-3 141 | activation=linear 142 | 143 | [convolutional] 144 | batch_normalize=1 145 | filters=128 146 | size=1 147 | stride=1 148 | pad=1 149 | activation=leaky 150 | 151 | [convolutional] 152 | batch_normalize=1 153 | filters=256 154 | size=3 155 | stride=1 156 | pad=1 157 | activation=leaky 158 | 159 | [shortcut] 160 | from=-3 161 | activation=linear 162 | 163 | [convolutional] 164 | batch_normalize=1 165 | filters=128 166 | size=1 167 | stride=1 168 | pad=1 169 | activation=leaky 170 | 171 | [convolutional] 172 | batch_normalize=1 173 | filters=256 174 | size=3 175 | stride=1 176 | pad=1 177 | activation=leaky 178 | 179 | [shortcut] 180 | from=-3 181 | activation=linear 182 | 183 | [convolutional] 184 | batch_normalize=1 185 | filters=128 186 | size=1 187 | stride=1 188 | pad=1 189 | activation=leaky 190 | 191 | [convolutional] 192 | batch_normalize=1 193 | filters=256 194 | size=3 195 | stride=1 196 | pad=1 197 | activation=leaky 198 | 199 | [shortcut] 200 | from=-3 201 | activation=linear 202 | 203 | 204 | [convolutional] 205 | batch_normalize=1 206 | filters=128 207 | size=1 208 | stride=1 209 | pad=1 210 | activation=leaky 211 | 212 | [convolutional] 213 | batch_normalize=1 214 | filters=256 215 | size=3 216 | stride=1 217 | pad=1 218 | activation=leaky 219 | 220 | [shortcut] 221 | from=-3 222 | activation=linear 223 | 224 | [convolutional] 225 | batch_normalize=1 226 | filters=128 227 | size=1 228 | stride=1 229 | pad=1 230 | activation=leaky 231 | 232 | [convolutional] 233 | batch_normalize=1 234 | filters=256 235 | size=3 236 | stride=1 237 | pad=1 238 | activation=leaky 239 | 240 | [shortcut] 241 | from=-3 242 | activation=linear 243 | 244 | [convolutional] 245 | batch_normalize=1 246 | filters=128 247 | size=1 248 | stride=1 249 | pad=1 250 | activation=leaky 251 | 252 | [convolutional] 253 | batch_normalize=1 254 | filters=256 255 | size=3 256 | stride=1 257 | pad=1 258 | activation=leaky 259 | 260 | [shortcut] 261 | from=-3 262 | activation=linear 263 | 264 | [convolutional] 265 | batch_normalize=1 266 | filters=128 267 | size=1 268 | stride=1 269 | pad=1 270 | activation=leaky 271 | 272 | [convolutional] 273 | batch_normalize=1 274 | filters=256 275 | size=3 276 | stride=1 277 | pad=1 278 | activation=leaky 279 | 280 | [shortcut] 281 | from=-3 282 | activation=linear 283 | 284 | # Downsample 285 | 286 | [convolutional] 287 | batch_normalize=1 288 | filters=512 289 | size=3 290 | stride=2 291 | pad=1 292 | activation=leaky 293 | 294 | [convolutional] 295 | batch_normalize=1 296 | filters=256 297 | size=1 298 | stride=1 299 | pad=1 300 | activation=leaky 301 | 302 | [convolutional] 303 | batch_normalize=1 304 | filters=512 305 | size=3 306 | stride=1 307 | pad=1 308 | activation=leaky 309 | 310 | [shortcut] 311 | from=-3 312 | activation=linear 313 | 314 | 315 | [convolutional] 316 | batch_normalize=1 317 | filters=256 318 | size=1 319 | stride=1 320 | pad=1 321 | activation=leaky 322 | 323 | [convolutional] 324 | batch_normalize=1 325 | filters=512 326 | size=3 327 | stride=1 328 | pad=1 329 | activation=leaky 330 | 331 | [shortcut] 332 | from=-3 333 | activation=linear 334 | 335 | 336 | [convolutional] 337 | batch_normalize=1 338 | filters=256 339 | size=1 340 | stride=1 341 | pad=1 342 | activation=leaky 343 | 344 | [convolutional] 345 | batch_normalize=1 346 | filters=512 347 | size=3 348 | stride=1 349 | pad=1 350 | activation=leaky 351 | 352 | [shortcut] 353 | from=-3 354 | activation=linear 355 | 356 | 357 | [convolutional] 358 | batch_normalize=1 359 | filters=256 360 | size=1 361 | stride=1 362 | pad=1 363 | activation=leaky 364 | 365 | [convolutional] 366 | batch_normalize=1 367 | filters=512 368 | size=3 369 | stride=1 370 | pad=1 371 | activation=leaky 372 | 373 | [shortcut] 374 | from=-3 375 | activation=linear 376 | 377 | [convolutional] 378 | batch_normalize=1 379 | filters=256 380 | size=1 381 | stride=1 382 | pad=1 383 | activation=leaky 384 | 385 | [convolutional] 386 | batch_normalize=1 387 | filters=512 388 | size=3 389 | stride=1 390 | pad=1 391 | activation=leaky 392 | 393 | [shortcut] 394 | from=-3 395 | activation=linear 396 | 397 | 398 | [convolutional] 399 | batch_normalize=1 400 | filters=256 401 | size=1 402 | stride=1 403 | pad=1 404 | activation=leaky 405 | 406 | [convolutional] 407 | batch_normalize=1 408 | filters=512 409 | size=3 410 | stride=1 411 | pad=1 412 | activation=leaky 413 | 414 | [shortcut] 415 | from=-3 416 | activation=linear 417 | 418 | 419 | [convolutional] 420 | batch_normalize=1 421 | filters=256 422 | size=1 423 | stride=1 424 | pad=1 425 | activation=leaky 426 | 427 | [convolutional] 428 | batch_normalize=1 429 | filters=512 430 | size=3 431 | stride=1 432 | pad=1 433 | activation=leaky 434 | 435 | [shortcut] 436 | from=-3 437 | activation=linear 438 | 439 | [convolutional] 440 | batch_normalize=1 441 | filters=256 442 | size=1 443 | stride=1 444 | pad=1 445 | activation=leaky 446 | 447 | [convolutional] 448 | batch_normalize=1 449 | filters=512 450 | size=3 451 | stride=1 452 | pad=1 453 | activation=leaky 454 | 455 | [shortcut] 456 | from=-3 457 | activation=linear 458 | 459 | # Downsample 460 | 461 | [convolutional] 462 | batch_normalize=1 463 | filters=1024 464 | size=3 465 | stride=2 466 | pad=1 467 | activation=leaky 468 | 469 | [convolutional] 470 | batch_normalize=1 471 | filters=512 472 | size=1 473 | stride=1 474 | pad=1 475 | activation=leaky 476 | 477 | [convolutional] 478 | batch_normalize=1 479 | filters=1024 480 | size=3 481 | stride=1 482 | pad=1 483 | activation=leaky 484 | 485 | [shortcut] 486 | from=-3 487 | activation=linear 488 | 489 | [convolutional] 490 | batch_normalize=1 491 | filters=512 492 | size=1 493 | stride=1 494 | pad=1 495 | activation=leaky 496 | 497 | [convolutional] 498 | batch_normalize=1 499 | filters=1024 500 | size=3 501 | stride=1 502 | pad=1 503 | activation=leaky 504 | 505 | [shortcut] 506 | from=-3 507 | activation=linear 508 | 509 | [convolutional] 510 | batch_normalize=1 511 | filters=512 512 | size=1 513 | stride=1 514 | pad=1 515 | activation=leaky 516 | 517 | [convolutional] 518 | batch_normalize=1 519 | filters=1024 520 | size=3 521 | stride=1 522 | pad=1 523 | activation=leaky 524 | 525 | [shortcut] 526 | from=-3 527 | activation=linear 528 | 529 | [convolutional] 530 | batch_normalize=1 531 | filters=512 532 | size=1 533 | stride=1 534 | pad=1 535 | activation=leaky 536 | 537 | [convolutional] 538 | batch_normalize=1 539 | filters=1024 540 | size=3 541 | stride=1 542 | pad=1 543 | activation=leaky 544 | 545 | [shortcut] 546 | from=-3 547 | activation=linear 548 | 549 | ###################### 550 | 551 | [convolutional] 552 | batch_normalize=1 553 | filters=512 554 | size=1 555 | stride=1 556 | pad=1 557 | activation=leaky 558 | 559 | [convolutional] 560 | batch_normalize=1 561 | size=3 562 | stride=1 563 | pad=1 564 | filters=1024 565 | activation=leaky 566 | 567 | [convolutional] 568 | batch_normalize=1 569 | filters=512 570 | size=1 571 | stride=1 572 | pad=1 573 | activation=leaky 574 | 575 | [convolutional] 576 | batch_normalize=1 577 | size=3 578 | stride=1 579 | pad=1 580 | filters=1024 581 | activation=leaky 582 | 583 | [convolutional] 584 | batch_normalize=1 585 | filters=512 586 | size=1 587 | stride=1 588 | pad=1 589 | activation=leaky 590 | 591 | [convolutional] 592 | batch_normalize=1 593 | size=3 594 | stride=1 595 | pad=1 596 | filters=1024 597 | activation=leaky 598 | 599 | [convolutional] 600 | size=1 601 | stride=1 602 | pad=1 603 | filters=504 604 | activation=linear 605 | 606 | # 角度定义为x+ 0; 顺时针; 正负0.5pi 607 | [yolo] 608 | mask = 144-215 609 | anchors = utils/kmeans/hrsc_512.txt 610 | classes=1 611 | num=9 612 | jitter=.3 613 | ignore_thresh = .7 614 | truth_thresh = 1 615 | random=1 616 | 617 | 618 | [route] 619 | layers = -4 620 | 621 | [convolutional] 622 | batch_normalize=1 623 | filters=256 624 | size=1 625 | stride=1 626 | pad=1 627 | activation=leaky 628 | 629 | [upsample] 630 | stride=2 631 | 632 | [route] 633 | layers = -1, 61 634 | 635 | 636 | 637 | [convolutional] 638 | batch_normalize=1 639 | filters=256 640 | size=1 641 | stride=1 642 | pad=1 643 | activation=leaky 644 | 645 | [convolutional] 646 | batch_normalize=1 647 | size=3 648 | stride=1 649 | pad=1 650 | filters=512 651 | activation=leaky 652 | 653 | [convolutional] 654 | batch_normalize=1 655 | filters=256 656 | size=1 657 | stride=1 658 | pad=1 659 | activation=leaky 660 | 661 | [convolutional] 662 | batch_normalize=1 663 | size=3 664 | stride=1 665 | pad=1 666 | filters=512 667 | activation=leaky 668 | 669 | [convolutional] 670 | batch_normalize=1 671 | filters=256 672 | size=1 673 | stride=1 674 | pad=1 675 | activation=leaky 676 | 677 | [convolutional] 678 | batch_normalize=1 679 | size=3 680 | stride=1 681 | pad=1 682 | filters=512 683 | activation=leaky 684 | 685 | [convolutional] 686 | size=1 687 | stride=1 688 | pad=1 689 | filters=504 690 | activation=linear 691 | 692 | 693 | [yolo] 694 | mask = 72-143 695 | anchors = utils/kmeans/hrsc_512.txt 696 | classes=1 697 | num=9 698 | jitter=.3 699 | ignore_thresh = .7 700 | truth_thresh = 1 701 | random=1 702 | 703 | 704 | 705 | [route] 706 | layers = -4 707 | 708 | [convolutional] 709 | batch_normalize=1 710 | filters=128 711 | size=1 712 | stride=1 713 | pad=1 714 | activation=leaky 715 | 716 | [upsample] 717 | stride=2 718 | 719 | [route] 720 | layers = -1, 36 721 | 722 | 723 | 724 | [convolutional] 725 | batch_normalize=1 726 | filters=128 727 | size=1 728 | stride=1 729 | pad=1 730 | activation=leaky 731 | 732 | [convolutional] 733 | batch_normalize=1 734 | size=3 735 | stride=1 736 | pad=1 737 | filters=256 738 | activation=leaky 739 | 740 | [convolutional] 741 | batch_normalize=1 742 | filters=128 743 | size=1 744 | stride=1 745 | pad=1 746 | activation=leaky 747 | 748 | [convolutional] 749 | batch_normalize=1 750 | size=3 751 | stride=1 752 | pad=1 753 | filters=256 754 | activation=leaky 755 | 756 | [convolutional] 757 | batch_normalize=1 758 | filters=128 759 | size=1 760 | stride=1 761 | pad=1 762 | activation=leaky 763 | 764 | [convolutional] 765 | batch_normalize=1 766 | size=3 767 | stride=1 768 | pad=1 769 | filters=256 770 | activation=leaky 771 | 772 | [convolutional] 773 | size=1 774 | stride=1 775 | pad=1 776 | filters=504 777 | activation=linear 778 | 779 | 780 | [yolo] 781 | mask = 0-71 782 | anchors = utils/kmeans/hrsc_512.txt 783 | classes=1 784 | num=9 785 | jitter=.3 786 | ignore_thresh = .7 787 | truth_thresh = 1 788 | random=1 789 | -------------------------------------------------------------------------------- /cfg/HRSC+/yolov3_512_ma.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | #batch=1 4 | #subdivisions=1 5 | # Training 6 | batch=16 7 | subdivisions=1 8 | width=512 9 | height=512 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | [convolutional] 26 | batch_normalize=1 27 | filters=32 28 | size=3 29 | stride=1 30 | pad=1 31 | activation=leaky 32 | 33 | # Downsample 34 | 35 | [convolutional] 36 | batch_normalize=1 37 | filters=64 38 | size=3 39 | stride=2 40 | pad=1 41 | activation=leaky 42 | 43 | [convolutional] 44 | batch_normalize=1 45 | filters=32 46 | size=1 47 | stride=1 48 | pad=1 49 | activation=leaky 50 | 51 | [convolutional] 52 | batch_normalize=1 53 | filters=64 54 | size=3 55 | stride=1 56 | pad=1 57 | activation=leaky 58 | 59 | [shortcut] 60 | from=-3 61 | activation=linear 62 | 63 | # Downsample 64 | 65 | [convolutional] 66 | batch_normalize=1 67 | filters=128 68 | size=3 69 | stride=2 70 | pad=1 71 | activation=leaky 72 | 73 | [convolutional] 74 | batch_normalize=1 75 | filters=64 76 | size=1 77 | stride=1 78 | pad=1 79 | activation=leaky 80 | 81 | [convolutional] 82 | batch_normalize=1 83 | filters=128 84 | size=3 85 | stride=1 86 | pad=1 87 | activation=leaky 88 | 89 | [shortcut] 90 | from=-3 91 | activation=linear 92 | 93 | [convolutional] 94 | batch_normalize=1 95 | filters=64 96 | size=1 97 | stride=1 98 | pad=1 99 | activation=leaky 100 | 101 | [convolutional] 102 | batch_normalize=1 103 | filters=128 104 | size=3 105 | stride=1 106 | pad=1 107 | activation=leaky 108 | 109 | [shortcut] 110 | from=-3 111 | activation=linear 112 | 113 | # Downsample 114 | 115 | [convolutional] 116 | batch_normalize=1 117 | filters=256 118 | size=3 119 | stride=2 120 | pad=1 121 | activation=leaky 122 | 123 | [convolutional] 124 | batch_normalize=1 125 | filters=128 126 | size=1 127 | stride=1 128 | pad=1 129 | activation=leaky 130 | 131 | [convolutional] 132 | batch_normalize=1 133 | filters=256 134 | size=3 135 | stride=1 136 | pad=1 137 | activation=leaky 138 | 139 | [shortcut] 140 | from=-3 141 | activation=linear 142 | 143 | [convolutional] 144 | batch_normalize=1 145 | filters=128 146 | size=1 147 | stride=1 148 | pad=1 149 | activation=leaky 150 | 151 | [convolutional] 152 | batch_normalize=1 153 | filters=256 154 | size=3 155 | stride=1 156 | pad=1 157 | activation=leaky 158 | 159 | [shortcut] 160 | from=-3 161 | activation=linear 162 | 163 | [convolutional] 164 | batch_normalize=1 165 | filters=128 166 | size=1 167 | stride=1 168 | pad=1 169 | activation=leaky 170 | 171 | [convolutional] 172 | batch_normalize=1 173 | filters=256 174 | size=3 175 | stride=1 176 | pad=1 177 | activation=leaky 178 | 179 | [shortcut] 180 | from=-3 181 | activation=linear 182 | 183 | [convolutional] 184 | batch_normalize=1 185 | filters=128 186 | size=1 187 | stride=1 188 | pad=1 189 | activation=leaky 190 | 191 | [convolutional] 192 | batch_normalize=1 193 | filters=256 194 | size=3 195 | stride=1 196 | pad=1 197 | activation=leaky 198 | 199 | [shortcut] 200 | from=-3 201 | activation=linear 202 | 203 | 204 | [convolutional] 205 | batch_normalize=1 206 | filters=128 207 | size=1 208 | stride=1 209 | pad=1 210 | activation=leaky 211 | 212 | [convolutional] 213 | batch_normalize=1 214 | filters=256 215 | size=3 216 | stride=1 217 | pad=1 218 | activation=leaky 219 | 220 | [shortcut] 221 | from=-3 222 | activation=linear 223 | 224 | [convolutional] 225 | batch_normalize=1 226 | filters=128 227 | size=1 228 | stride=1 229 | pad=1 230 | activation=leaky 231 | 232 | [convolutional] 233 | batch_normalize=1 234 | filters=256 235 | size=3 236 | stride=1 237 | pad=1 238 | activation=leaky 239 | 240 | [shortcut] 241 | from=-3 242 | activation=linear 243 | 244 | [convolutional] 245 | batch_normalize=1 246 | filters=128 247 | size=1 248 | stride=1 249 | pad=1 250 | activation=leaky 251 | 252 | [convolutional] 253 | batch_normalize=1 254 | filters=256 255 | size=3 256 | stride=1 257 | pad=1 258 | activation=leaky 259 | 260 | [shortcut] 261 | from=-3 262 | activation=linear 263 | 264 | [convolutional] 265 | batch_normalize=1 266 | filters=128 267 | size=1 268 | stride=1 269 | pad=1 270 | activation=leaky 271 | 272 | [convolutional] 273 | batch_normalize=1 274 | filters=256 275 | size=3 276 | stride=1 277 | pad=1 278 | activation=leaky 279 | 280 | [shortcut] 281 | from=-3 282 | activation=linear 283 | 284 | # Downsample 285 | 286 | [convolutional] 287 | batch_normalize=1 288 | filters=512 289 | size=3 290 | stride=2 291 | pad=1 292 | activation=leaky 293 | 294 | [convolutional] 295 | batch_normalize=1 296 | filters=256 297 | size=1 298 | stride=1 299 | pad=1 300 | activation=leaky 301 | 302 | [convolutional] 303 | batch_normalize=1 304 | filters=512 305 | size=3 306 | stride=1 307 | pad=1 308 | activation=leaky 309 | 310 | [shortcut] 311 | from=-3 312 | activation=linear 313 | 314 | 315 | [convolutional] 316 | batch_normalize=1 317 | filters=256 318 | size=1 319 | stride=1 320 | pad=1 321 | activation=leaky 322 | 323 | [convolutional] 324 | batch_normalize=1 325 | filters=512 326 | size=3 327 | stride=1 328 | pad=1 329 | activation=leaky 330 | 331 | [shortcut] 332 | from=-3 333 | activation=linear 334 | 335 | 336 | [convolutional] 337 | batch_normalize=1 338 | filters=256 339 | size=1 340 | stride=1 341 | pad=1 342 | activation=leaky 343 | 344 | [convolutional] 345 | batch_normalize=1 346 | filters=512 347 | size=3 348 | stride=1 349 | pad=1 350 | activation=leaky 351 | 352 | [shortcut] 353 | from=-3 354 | activation=linear 355 | 356 | 357 | [convolutional] 358 | batch_normalize=1 359 | filters=256 360 | size=1 361 | stride=1 362 | pad=1 363 | activation=leaky 364 | 365 | [convolutional] 366 | batch_normalize=1 367 | filters=512 368 | size=3 369 | stride=1 370 | pad=1 371 | activation=leaky 372 | 373 | [shortcut] 374 | from=-3 375 | activation=linear 376 | 377 | [convolutional] 378 | batch_normalize=1 379 | filters=256 380 | size=1 381 | stride=1 382 | pad=1 383 | activation=leaky 384 | 385 | [convolutional] 386 | batch_normalize=1 387 | filters=512 388 | size=3 389 | stride=1 390 | pad=1 391 | activation=leaky 392 | 393 | [shortcut] 394 | from=-3 395 | activation=linear 396 | 397 | 398 | [convolutional] 399 | batch_normalize=1 400 | filters=256 401 | size=1 402 | stride=1 403 | pad=1 404 | activation=leaky 405 | 406 | [convolutional] 407 | batch_normalize=1 408 | filters=512 409 | size=3 410 | stride=1 411 | pad=1 412 | activation=leaky 413 | 414 | [shortcut] 415 | from=-3 416 | activation=linear 417 | 418 | 419 | [convolutional] 420 | batch_normalize=1 421 | filters=256 422 | size=1 423 | stride=1 424 | pad=1 425 | activation=leaky 426 | 427 | [convolutional] 428 | batch_normalize=1 429 | filters=512 430 | size=3 431 | stride=1 432 | pad=1 433 | activation=leaky 434 | 435 | [shortcut] 436 | from=-3 437 | activation=linear 438 | 439 | [convolutional] 440 | batch_normalize=1 441 | filters=256 442 | size=1 443 | stride=1 444 | pad=1 445 | activation=leaky 446 | 447 | [convolutional] 448 | batch_normalize=1 449 | filters=512 450 | size=3 451 | stride=1 452 | pad=1 453 | activation=leaky 454 | 455 | [shortcut] 456 | from=-3 457 | activation=linear 458 | 459 | # Downsample 460 | 461 | [convolutional] 462 | batch_normalize=1 463 | filters=1024 464 | size=3 465 | stride=2 466 | pad=1 467 | activation=leaky 468 | 469 | [convolutional] 470 | batch_normalize=1 471 | filters=512 472 | size=1 473 | stride=1 474 | pad=1 475 | activation=leaky 476 | 477 | [convolutional] 478 | batch_normalize=1 479 | filters=1024 480 | size=3 481 | stride=1 482 | pad=1 483 | activation=leaky 484 | 485 | [shortcut] 486 | from=-3 487 | activation=linear 488 | 489 | [convolutional] 490 | batch_normalize=1 491 | filters=512 492 | size=1 493 | stride=1 494 | pad=1 495 | activation=leaky 496 | 497 | [convolutional] 498 | batch_normalize=1 499 | filters=1024 500 | size=3 501 | stride=1 502 | pad=1 503 | activation=leaky 504 | 505 | [shortcut] 506 | from=-3 507 | activation=linear 508 | 509 | [convolutional] 510 | batch_normalize=1 511 | filters=512 512 | size=1 513 | stride=1 514 | pad=1 515 | activation=leaky 516 | 517 | [convolutional] 518 | batch_normalize=1 519 | filters=1024 520 | size=3 521 | stride=1 522 | pad=1 523 | activation=leaky 524 | 525 | [shortcut] 526 | from=-3 527 | activation=linear 528 | 529 | [convolutional] 530 | batch_normalize=1 531 | filters=512 532 | size=1 533 | stride=1 534 | pad=1 535 | activation=leaky 536 | 537 | [convolutional] 538 | batch_normalize=1 539 | filters=1024 540 | size=3 541 | stride=1 542 | pad=1 543 | activation=leaky 544 | 545 | [shortcut] 546 | from=-3 547 | activation=linear 548 | 549 | ###################### 550 | 551 | [convolutional] 552 | batch_normalize=1 553 | filters=512 554 | size=1 555 | stride=1 556 | pad=1 557 | activation=leaky 558 | 559 | [convolutional] 560 | batch_normalize=1 561 | size=3 562 | stride=1 563 | pad=1 564 | filters=1024 565 | activation=leaky 566 | 567 | [convolutional] 568 | batch_normalize=1 569 | filters=512 570 | size=1 571 | stride=1 572 | pad=1 573 | activation=leaky 574 | 575 | [convolutional] 576 | batch_normalize=1 577 | size=3 578 | stride=1 579 | pad=1 580 | filters=1024 581 | activation=leaky 582 | 583 | [convolutional] 584 | batch_normalize=1 585 | filters=512 586 | size=1 587 | stride=1 588 | pad=1 589 | activation=leaky 590 | 591 | [convolutional] 592 | batch_normalize=1 593 | size=3 594 | stride=1 595 | pad=1 596 | filters=1024 597 | activation=leaky 598 | 599 | [convolutional] 600 | size=1 601 | stride=1 602 | pad=1 603 | filters=504 604 | activation=linear 605 | 606 | # 角度定义为x+ 0; 顺时针; 正负0.5pi 607 | [yolo] 608 | mask = 144-215 609 | anchors = ara 879, 3170, 6813, 11599, 20813, 28065 / 4.0, 6.4, 9.2 / -75, -60, -45, -30, -15 ,0,15, 30,45, 60,75, 90 610 | classes=1 611 | num=9 612 | jitter=.3 613 | ignore_thresh = .7 614 | truth_thresh = 1 615 | random=1 616 | 617 | 618 | [route] 619 | layers = -4 620 | 621 | [convolutional] 622 | batch_normalize=1 623 | filters=256 624 | size=1 625 | stride=1 626 | pad=1 627 | activation=leaky 628 | 629 | [upsample] 630 | stride=2 631 | 632 | [route] 633 | layers = -1, 61 634 | 635 | 636 | 637 | [convolutional] 638 | batch_normalize=1 639 | filters=256 640 | size=1 641 | stride=1 642 | pad=1 643 | activation=leaky 644 | 645 | [convolutional] 646 | batch_normalize=1 647 | size=3 648 | stride=1 649 | pad=1 650 | filters=512 651 | activation=leaky 652 | 653 | [convolutional] 654 | batch_normalize=1 655 | filters=256 656 | size=1 657 | stride=1 658 | pad=1 659 | activation=leaky 660 | 661 | [convolutional] 662 | batch_normalize=1 663 | size=3 664 | stride=1 665 | pad=1 666 | filters=512 667 | activation=leaky 668 | 669 | [convolutional] 670 | batch_normalize=1 671 | filters=256 672 | size=1 673 | stride=1 674 | pad=1 675 | activation=leaky 676 | 677 | [convolutional] 678 | batch_normalize=1 679 | size=3 680 | stride=1 681 | pad=1 682 | filters=512 683 | activation=leaky 684 | 685 | [convolutional] 686 | size=1 687 | stride=1 688 | pad=1 689 | filters=504 690 | activation=linear 691 | 692 | 693 | [yolo] 694 | mask = 72-143 695 | anchors = ara 879, 3170, 6813, 11599, 20813, 28065 / 4.0, 6.4, 9.2 / -75, -60, -45, -30, -15 ,0,15, 30,45, 60,75, 90 696 | classes=1 697 | num=9 698 | jitter=.3 699 | ignore_thresh = .7 700 | truth_thresh = 1 701 | random=1 702 | 703 | 704 | 705 | [route] 706 | layers = -4 707 | 708 | [convolutional] 709 | batch_normalize=1 710 | filters=128 711 | size=1 712 | stride=1 713 | pad=1 714 | activation=leaky 715 | 716 | [upsample] 717 | stride=2 718 | 719 | [route] 720 | layers = -1, 36 721 | 722 | 723 | 724 | [convolutional] 725 | batch_normalize=1 726 | filters=128 727 | size=1 728 | stride=1 729 | pad=1 730 | activation=leaky 731 | 732 | [convolutional] 733 | batch_normalize=1 734 | size=3 735 | stride=1 736 | pad=1 737 | filters=256 738 | activation=leaky 739 | 740 | [convolutional] 741 | batch_normalize=1 742 | filters=128 743 | size=1 744 | stride=1 745 | pad=1 746 | activation=leaky 747 | 748 | [convolutional] 749 | batch_normalize=1 750 | size=3 751 | stride=1 752 | pad=1 753 | filters=256 754 | activation=leaky 755 | 756 | [convolutional] 757 | batch_normalize=1 758 | filters=128 759 | size=1 760 | stride=1 761 | pad=1 762 | activation=leaky 763 | 764 | [convolutional] 765 | batch_normalize=1 766 | size=3 767 | stride=1 768 | pad=1 769 | filters=256 770 | activation=leaky 771 | 772 | [convolutional] 773 | size=1 774 | stride=1 775 | pad=1 776 | filters=504 777 | activation=linear 778 | 779 | 780 | [yolo] 781 | mask = 0-71 782 | anchors = ara 879, 3170, 6813, 11599, 20813, 28065 / 4.0, 6.4, 9.2 / -75, -60, -45, -30, -15 ,0,15, 30,45, 60,75, 90 783 | classes=1 784 | num=9 785 | jitter=.3 786 | ignore_thresh = .7 787 | truth_thresh = 1 788 | random=1 789 | -------------------------------------------------------------------------------- /cfg/HRSC+/yolov3_512_se.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | #batch=1 4 | #subdivisions=1 5 | # Training 6 | batch=16 7 | subdivisions=1 8 | width=512 9 | height=512 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | [convolutional] 26 | batch_normalize=1 27 | filters=32 28 | size=3 29 | stride=1 30 | pad=1 31 | activation=leaky 32 | 33 | # Downsample 34 | 35 | [convolutional] 36 | batch_normalize=1 37 | filters=64 38 | size=3 39 | stride=2 40 | pad=1 41 | activation=leaky 42 | 43 | [convolutional] 44 | batch_normalize=1 45 | filters=32 46 | size=1 47 | stride=1 48 | pad=1 49 | activation=leaky 50 | 51 | [convolutional] 52 | batch_normalize=1 53 | filters=64 54 | size=3 55 | stride=1 56 | pad=1 57 | activation=leaky 58 | 59 | [shortcut] 60 | from=-3 61 | activation=linear 62 | 63 | # Downsample 64 | 65 | [convolutional] 66 | batch_normalize=1 67 | filters=128 68 | size=3 69 | stride=2 70 | pad=1 71 | activation=leaky 72 | 73 | [convolutional] 74 | batch_normalize=1 75 | filters=64 76 | size=1 77 | stride=1 78 | pad=1 79 | activation=leaky 80 | 81 | [convolutional] 82 | batch_normalize=1 83 | filters=128 84 | size=3 85 | stride=1 86 | pad=1 87 | activation=leaky 88 | 89 | [shortcut] 90 | from=-3 91 | activation=linear 92 | 93 | [convolutional] 94 | batch_normalize=1 95 | filters=64 96 | size=1 97 | stride=1 98 | pad=1 99 | activation=leaky 100 | 101 | [convolutional] 102 | batch_normalize=1 103 | filters=128 104 | size=3 105 | stride=1 106 | pad=1 107 | activation=leaky 108 | 109 | [shortcut] 110 | from=-3 111 | activation=linear 112 | 113 | # Downsample 114 | # s=8 115 | [convolutional] 116 | batch_normalize=1 117 | filters=256 118 | size=3 119 | stride=2 120 | pad=1 121 | activation=leaky 122 | 123 | # --SE block--- 124 | [se] 125 | channels=256 126 | 127 | [convolutional] 128 | batch_normalize=1 129 | filters=128 130 | size=1 131 | stride=1 132 | pad=1 133 | activation=leaky 134 | 135 | [convolutional] 136 | batch_normalize=1 137 | filters=256 138 | size=3 139 | stride=1 140 | pad=1 141 | activation=leaky 142 | 143 | [shortcut] 144 | from=-4 145 | activation=linear 146 | 147 | [se] 148 | channels=256 149 | 150 | [convolutional] 151 | batch_normalize=1 152 | filters=128 153 | size=1 154 | stride=1 155 | pad=1 156 | activation=leaky 157 | 158 | [convolutional] 159 | batch_normalize=1 160 | filters=256 161 | size=3 162 | stride=1 163 | pad=1 164 | activation=leaky 165 | 166 | [shortcut] 167 | from=-4 168 | activation=linear 169 | 170 | [se] 171 | channels=256 172 | 173 | [convolutional] 174 | batch_normalize=1 175 | filters=128 176 | size=1 177 | stride=1 178 | pad=1 179 | activation=leaky 180 | 181 | [convolutional] 182 | batch_normalize=1 183 | filters=256 184 | size=3 185 | stride=1 186 | pad=1 187 | activation=leaky 188 | 189 | [shortcut] 190 | from=-4 191 | activation=linear 192 | 193 | [se] 194 | channels=256 195 | 196 | [convolutional] 197 | batch_normalize=1 198 | filters=128 199 | size=1 200 | stride=1 201 | pad=1 202 | activation=leaky 203 | 204 | [convolutional] 205 | batch_normalize=1 206 | filters=256 207 | size=3 208 | stride=1 209 | pad=1 210 | activation=leaky 211 | 212 | [shortcut] 213 | from=-4 214 | activation=linear 215 | 216 | [se] 217 | channels=256 218 | 219 | [convolutional] 220 | batch_normalize=1 221 | filters=128 222 | size=1 223 | stride=1 224 | pad=1 225 | activation=leaky 226 | 227 | [convolutional] 228 | batch_normalize=1 229 | filters=256 230 | size=3 231 | stride=1 232 | pad=1 233 | activation=leaky 234 | 235 | [shortcut] 236 | from=-4 237 | activation=linear 238 | 239 | [se] 240 | channels=256 241 | 242 | [convolutional] 243 | batch_normalize=1 244 | filters=128 245 | size=1 246 | stride=1 247 | pad=1 248 | activation=leaky 249 | 250 | [convolutional] 251 | batch_normalize=1 252 | filters=256 253 | size=3 254 | stride=1 255 | pad=1 256 | activation=leaky 257 | 258 | [shortcut] 259 | from=-4 260 | activation=linear 261 | 262 | [se] 263 | channels=256 264 | 265 | [convolutional] 266 | batch_normalize=1 267 | filters=128 268 | size=1 269 | stride=1 270 | pad=1 271 | activation=leaky 272 | 273 | [convolutional] 274 | batch_normalize=1 275 | filters=256 276 | size=3 277 | stride=1 278 | pad=1 279 | activation=leaky 280 | 281 | [shortcut] 282 | from=-4 283 | activation=linear 284 | 285 | [se] 286 | channels=256 287 | 288 | [convolutional] 289 | batch_normalize=1 290 | filters=128 291 | size=1 292 | stride=1 293 | pad=1 294 | activation=leaky 295 | 296 | [convolutional] 297 | batch_normalize=1 298 | filters=256 299 | size=3 300 | stride=1 301 | pad=1 302 | activation=leaky 303 | 304 | [shortcut] 305 | from=-4 306 | activation=linear 307 | 308 | # Downsample 309 | # s=16 310 | [convolutional] 311 | batch_normalize=1 312 | filters=512 313 | size=3 314 | stride=2 315 | pad=1 316 | activation=leaky 317 | 318 | [se] 319 | channels=512 320 | 321 | [convolutional] 322 | batch_normalize=1 323 | filters=256 324 | size=1 325 | stride=1 326 | pad=1 327 | activation=leaky 328 | 329 | [convolutional] 330 | batch_normalize=1 331 | filters=512 332 | size=3 333 | stride=1 334 | pad=1 335 | activation=leaky 336 | 337 | [shortcut] 338 | from=-4 339 | activation=linear 340 | 341 | [se] 342 | channels=512 343 | 344 | [convolutional] 345 | batch_normalize=1 346 | filters=256 347 | size=1 348 | stride=1 349 | pad=1 350 | activation=leaky 351 | 352 | [convolutional] 353 | batch_normalize=1 354 | filters=512 355 | size=3 356 | stride=1 357 | pad=1 358 | activation=leaky 359 | 360 | [shortcut] 361 | from=-4 362 | activation=linear 363 | 364 | [se] 365 | channels=512 366 | 367 | [convolutional] 368 | batch_normalize=1 369 | filters=256 370 | size=1 371 | stride=1 372 | pad=1 373 | activation=leaky 374 | 375 | [convolutional] 376 | batch_normalize=1 377 | filters=512 378 | size=3 379 | stride=1 380 | pad=1 381 | activation=leaky 382 | 383 | [shortcut] 384 | from=-4 385 | activation=linear 386 | 387 | [se] 388 | channels=512 389 | 390 | [convolutional] 391 | batch_normalize=1 392 | filters=256 393 | size=1 394 | stride=1 395 | pad=1 396 | activation=leaky 397 | 398 | [convolutional] 399 | batch_normalize=1 400 | filters=512 401 | size=3 402 | stride=1 403 | pad=1 404 | activation=leaky 405 | 406 | [shortcut] 407 | from=-4 408 | activation=linear 409 | 410 | [se] 411 | channels=512 412 | 413 | [convolutional] 414 | batch_normalize=1 415 | filters=256 416 | size=1 417 | stride=1 418 | pad=1 419 | activation=leaky 420 | 421 | [convolutional] 422 | batch_normalize=1 423 | filters=512 424 | size=3 425 | stride=1 426 | pad=1 427 | activation=leaky 428 | 429 | [shortcut] 430 | from=-4 431 | activation=linear 432 | 433 | [se] 434 | channels=512 435 | 436 | [convolutional] 437 | batch_normalize=1 438 | filters=256 439 | size=1 440 | stride=1 441 | pad=1 442 | activation=leaky 443 | 444 | [convolutional] 445 | batch_normalize=1 446 | filters=512 447 | size=3 448 | stride=1 449 | pad=1 450 | activation=leaky 451 | 452 | [shortcut] 453 | from=-4 454 | activation=linear 455 | 456 | [se] 457 | channels=512 458 | 459 | [convolutional] 460 | batch_normalize=1 461 | filters=256 462 | size=1 463 | stride=1 464 | pad=1 465 | activation=leaky 466 | 467 | [convolutional] 468 | batch_normalize=1 469 | filters=512 470 | size=3 471 | stride=1 472 | pad=1 473 | activation=leaky 474 | 475 | [shortcut] 476 | from=-4 477 | activation=linear 478 | 479 | [se] 480 | channels=512 481 | 482 | [convolutional] 483 | batch_normalize=1 484 | filters=256 485 | size=1 486 | stride=1 487 | pad=1 488 | activation=leaky 489 | 490 | [convolutional] 491 | batch_normalize=1 492 | filters=512 493 | size=3 494 | stride=1 495 | pad=1 496 | activation=leaky 497 | 498 | [shortcut] 499 | from=-4 500 | activation=linear 501 | 502 | # Downsample 503 | # s=32 504 | [convolutional] 505 | batch_normalize=1 506 | filters=1024 507 | size=3 508 | stride=2 509 | pad=1 510 | activation=leaky 511 | 512 | [se] 513 | channels=1024 514 | 515 | [convolutional] 516 | batch_normalize=1 517 | filters=512 518 | size=1 519 | stride=1 520 | pad=1 521 | activation=leaky 522 | 523 | [convolutional] 524 | batch_normalize=1 525 | filters=1024 526 | size=3 527 | stride=1 528 | pad=1 529 | activation=leaky 530 | 531 | [shortcut] 532 | from=-4 533 | activation=linear 534 | 535 | [se] 536 | channels=1024 537 | 538 | [convolutional] 539 | batch_normalize=1 540 | filters=512 541 | size=1 542 | stride=1 543 | pad=1 544 | activation=leaky 545 | 546 | [convolutional] 547 | batch_normalize=1 548 | filters=1024 549 | size=3 550 | stride=1 551 | pad=1 552 | activation=leaky 553 | 554 | [shortcut] 555 | from=-4 556 | activation=linear 557 | 558 | [se] 559 | channels=1024 560 | 561 | [convolutional] 562 | batch_normalize=1 563 | filters=512 564 | size=1 565 | stride=1 566 | pad=1 567 | activation=leaky 568 | 569 | [convolutional] 570 | batch_normalize=1 571 | filters=1024 572 | size=3 573 | stride=1 574 | pad=1 575 | activation=leaky 576 | 577 | [shortcut] 578 | from=-4 579 | activation=linear 580 | 581 | 582 | [se] 583 | channels=1024 584 | 585 | [convolutional] 586 | batch_normalize=1 587 | filters=512 588 | size=1 589 | stride=1 590 | pad=1 591 | activation=leaky 592 | 593 | [convolutional] 594 | batch_normalize=1 595 | filters=1024 596 | size=3 597 | stride=1 598 | pad=1 599 | activation=leaky 600 | 601 | [shortcut] 602 | from=-4 603 | activation=linear 604 | 605 | ######## backbone到此为止 ############## 606 | 607 | [convolutional] 608 | batch_normalize=1 609 | filters=512 610 | size=1 611 | stride=1 612 | pad=1 613 | activation=leaky 614 | 615 | [convolutional] 616 | batch_normalize=1 617 | size=3 618 | stride=1 619 | pad=1 620 | filters=1024 621 | activation=leaky 622 | 623 | [convolutional] 624 | batch_normalize=1 625 | filters=512 626 | size=1 627 | stride=1 628 | pad=1 629 | activation=leaky 630 | 631 | [convolutional] 632 | batch_normalize=1 633 | size=3 634 | stride=1 635 | pad=1 636 | filters=1024 637 | activation=leaky 638 | 639 | [convolutional] 640 | batch_normalize=1 641 | filters=512 642 | size=1 643 | stride=1 644 | pad=1 645 | activation=leaky 646 | 647 | [convolutional] 648 | batch_normalize=1 649 | size=3 650 | stride=1 651 | pad=1 652 | filters=1024 653 | activation=leaky 654 | 655 | [convolutional] 656 | size=1 657 | stride=1 658 | pad=1 659 | filters=504 660 | activation=linear 661 | 662 | # 角度定义为x+ 0; 顺时针; 正负0.5pi 663 | [yolo] 664 | mask = 144-215 665 | anchors = utils/kmeans/hrsc_512.txt 666 | classes=1 667 | num=9 668 | jitter=.3 669 | ignore_thresh = .7 670 | truth_thresh = 1 671 | random=1 672 | 673 | 674 | [route] 675 | layers = -4 676 | 677 | [convolutional] 678 | batch_normalize=1 679 | filters=256 680 | size=1 681 | stride=1 682 | pad=1 683 | activation=leaky 684 | 685 | [upsample] 686 | stride=2 687 | 688 | [route] 689 | layers = -1, 61 690 | 691 | 692 | 693 | [convolutional] 694 | batch_normalize=1 695 | filters=256 696 | size=1 697 | stride=1 698 | pad=1 699 | activation=leaky 700 | 701 | [convolutional] 702 | batch_normalize=1 703 | size=3 704 | stride=1 705 | pad=1 706 | filters=512 707 | activation=leaky 708 | 709 | [convolutional] 710 | batch_normalize=1 711 | filters=256 712 | size=1 713 | stride=1 714 | pad=1 715 | activation=leaky 716 | 717 | [convolutional] 718 | batch_normalize=1 719 | size=3 720 | stride=1 721 | pad=1 722 | filters=512 723 | activation=leaky 724 | 725 | [convolutional] 726 | batch_normalize=1 727 | filters=256 728 | size=1 729 | stride=1 730 | pad=1 731 | activation=leaky 732 | 733 | [convolutional] 734 | batch_normalize=1 735 | size=3 736 | stride=1 737 | pad=1 738 | filters=512 739 | activation=leaky 740 | 741 | [convolutional] 742 | size=1 743 | stride=1 744 | pad=1 745 | filters=504 746 | activation=linear 747 | 748 | 749 | [yolo] 750 | mask = 72-143 751 | anchors = utils/kmeans/hrsc_512.txt 752 | classes=1 753 | num=9 754 | jitter=.3 755 | ignore_thresh = .7 756 | truth_thresh = 1 757 | random=1 758 | 759 | 760 | 761 | [route] 762 | layers = -4 763 | 764 | [convolutional] 765 | batch_normalize=1 766 | filters=128 767 | size=1 768 | stride=1 769 | pad=1 770 | activation=leaky 771 | 772 | [upsample] 773 | stride=2 774 | 775 | [route] 776 | layers = -1, 36 777 | 778 | 779 | 780 | [convolutional] 781 | batch_normalize=1 782 | filters=128 783 | size=1 784 | stride=1 785 | pad=1 786 | activation=leaky 787 | 788 | [convolutional] 789 | batch_normalize=1 790 | size=3 791 | stride=1 792 | pad=1 793 | filters=256 794 | activation=leaky 795 | 796 | [convolutional] 797 | batch_normalize=1 798 | filters=128 799 | size=1 800 | stride=1 801 | pad=1 802 | activation=leaky 803 | 804 | [convolutional] 805 | batch_normalize=1 806 | size=3 807 | stride=1 808 | pad=1 809 | filters=256 810 | activation=leaky 811 | 812 | [convolutional] 813 | batch_normalize=1 814 | filters=128 815 | size=1 816 | stride=1 817 | pad=1 818 | activation=leaky 819 | 820 | [convolutional] 821 | batch_normalize=1 822 | size=3 823 | stride=1 824 | pad=1 825 | filters=256 826 | activation=leaky 827 | 828 | [convolutional] 829 | size=1 830 | stride=1 831 | pad=1 832 | filters=504 833 | activation=linear 834 | 835 | 836 | [yolo] 837 | mask = 0-71 838 | anchors = utils/kmeans/hrsc_512.txt 839 | classes=1 840 | num=9 841 | jitter=.3 842 | ignore_thresh = .7 843 | truth_thresh = 1 844 | random=1 845 | -------------------------------------------------------------------------------- /cfg/HRSC/hyp.py: -------------------------------------------------------------------------------- 1 | giou: 0.1 # giou loss gain 1.582 2 | cls: 27.76 # cls loss gain (CE=~1.0, uCE=~20) 3 | cls_pw: 1.446 # cls BCELoss positive_weight 4 | obj: 20.35 # obj loss gain (*=80 for uBCE with 80 classes) 5 | obj_pw: 3.941 # obj BCELoss positive_weight 6 | iou_t: 0.4 # iou training threshold 7 | ang_t: 3.1415926/6 8 | reg: 1.0 9 | fl_gamma: 0.5 # focal loss gamma 10 | context_factor: 1.0 # 按照短边h来设置的,wh的增幅相同; 调试时设为倒数直接检测 11 | grayscale: 0.3 # 灰度强度为0.3-1.0 12 | 13 | 14 | # lr 15 | lr0: 0.00001 16 | multiplier:10 17 | warm_epoch:1 18 | lrf: -4. # final LambdaLR learning rate = lr0 * (10 ** lrf) 19 | momentum: 0.97 # SGD momentum 20 | weight_decay: 0.0004569 # optimizer weight decay 21 | 22 | 23 | # aug 24 | hsv_s: 0.5 # image HSV-Saturation augmentation (fraction) 25 | hsv_v: 0.3 # image HSV-Value augmentation (fraction) 26 | degrees: 5.0 # image rotation (+/- deg) 27 | translate: 0.1 # image translation (+/- fraction) 28 | scale: 0.15 # image scale (+/- gain) 29 | shear: 0.0 30 | gamma: 0.2 31 | blur: 1.3 32 | noise: 0.01 33 | contrast: 0.15 34 | sharpen: 0.15 35 | # copypaste: 0.3 # 船身 h 的 3sigma 段位以内 36 | 37 | 38 | # training 39 | epochs: 1000 40 | batch_size: 1 41 | save_interval: 100 42 | test_interval: 10 43 | -------------------------------------------------------------------------------- /cfg/HRSC/yolov3-416.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | #batch=1 4 | #subdivisions=1 5 | # Training 6 | batch=16 7 | subdivisions=1 8 | width=416 9 | height=416 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | [convolutional] 26 | batch_normalize=1 27 | filters=32 28 | size=3 29 | stride=1 30 | pad=1 31 | activation=leaky 32 | 33 | # Downsample 34 | 35 | [convolutional] 36 | batch_normalize=1 37 | filters=64 38 | size=3 39 | stride=2 40 | pad=1 41 | activation=leaky 42 | 43 | [convolutional] 44 | batch_normalize=1 45 | filters=32 46 | size=1 47 | stride=1 48 | pad=1 49 | activation=leaky 50 | 51 | [convolutional] 52 | batch_normalize=1 53 | filters=64 54 | size=3 55 | stride=1 56 | pad=1 57 | activation=leaky 58 | 59 | [shortcut] 60 | from=-3 61 | activation=linear 62 | 63 | # Downsample 64 | 65 | [convolutional] 66 | batch_normalize=1 67 | filters=128 68 | size=3 69 | stride=2 70 | pad=1 71 | activation=leaky 72 | 73 | [convolutional] 74 | batch_normalize=1 75 | filters=64 76 | size=1 77 | stride=1 78 | pad=1 79 | activation=leaky 80 | 81 | [convolutional] 82 | batch_normalize=1 83 | filters=128 84 | size=3 85 | stride=1 86 | pad=1 87 | activation=leaky 88 | 89 | [shortcut] 90 | from=-3 91 | activation=linear 92 | 93 | [convolutional] 94 | batch_normalize=1 95 | filters=64 96 | size=1 97 | stride=1 98 | pad=1 99 | activation=leaky 100 | 101 | [convolutional] 102 | batch_normalize=1 103 | filters=128 104 | size=3 105 | stride=1 106 | pad=1 107 | activation=leaky 108 | 109 | [shortcut] 110 | from=-3 111 | activation=linear 112 | 113 | # Downsample 114 | 115 | [convolutional] 116 | batch_normalize=1 117 | filters=256 118 | size=3 119 | stride=2 120 | pad=1 121 | activation=leaky 122 | 123 | [convolutional] 124 | batch_normalize=1 125 | filters=128 126 | size=1 127 | stride=1 128 | pad=1 129 | activation=leaky 130 | 131 | [convolutional] 132 | batch_normalize=1 133 | filters=256 134 | size=3 135 | stride=1 136 | pad=1 137 | activation=leaky 138 | 139 | [shortcut] 140 | from=-3 141 | activation=linear 142 | 143 | [convolutional] 144 | batch_normalize=1 145 | filters=128 146 | size=1 147 | stride=1 148 | pad=1 149 | activation=leaky 150 | 151 | [convolutional] 152 | batch_normalize=1 153 | filters=256 154 | size=3 155 | stride=1 156 | pad=1 157 | activation=leaky 158 | 159 | [shortcut] 160 | from=-3 161 | activation=linear 162 | 163 | [convolutional] 164 | batch_normalize=1 165 | filters=128 166 | size=1 167 | stride=1 168 | pad=1 169 | activation=leaky 170 | 171 | [convolutional] 172 | batch_normalize=1 173 | filters=256 174 | size=3 175 | stride=1 176 | pad=1 177 | activation=leaky 178 | 179 | [shortcut] 180 | from=-3 181 | activation=linear 182 | 183 | [convolutional] 184 | batch_normalize=1 185 | filters=128 186 | size=1 187 | stride=1 188 | pad=1 189 | activation=leaky 190 | 191 | [convolutional] 192 | batch_normalize=1 193 | filters=256 194 | size=3 195 | stride=1 196 | pad=1 197 | activation=leaky 198 | 199 | [shortcut] 200 | from=-3 201 | activation=linear 202 | 203 | 204 | [convolutional] 205 | batch_normalize=1 206 | filters=128 207 | size=1 208 | stride=1 209 | pad=1 210 | activation=leaky 211 | 212 | [convolutional] 213 | batch_normalize=1 214 | filters=256 215 | size=3 216 | stride=1 217 | pad=1 218 | activation=leaky 219 | 220 | [shortcut] 221 | from=-3 222 | activation=linear 223 | 224 | [convolutional] 225 | batch_normalize=1 226 | filters=128 227 | size=1 228 | stride=1 229 | pad=1 230 | activation=leaky 231 | 232 | [convolutional] 233 | batch_normalize=1 234 | filters=256 235 | size=3 236 | stride=1 237 | pad=1 238 | activation=leaky 239 | 240 | [shortcut] 241 | from=-3 242 | activation=linear 243 | 244 | [convolutional] 245 | batch_normalize=1 246 | filters=128 247 | size=1 248 | stride=1 249 | pad=1 250 | activation=leaky 251 | 252 | [convolutional] 253 | batch_normalize=1 254 | filters=256 255 | size=3 256 | stride=1 257 | pad=1 258 | activation=leaky 259 | 260 | [shortcut] 261 | from=-3 262 | activation=linear 263 | 264 | [convolutional] 265 | batch_normalize=1 266 | filters=128 267 | size=1 268 | stride=1 269 | pad=1 270 | activation=leaky 271 | 272 | [convolutional] 273 | batch_normalize=1 274 | filters=256 275 | size=3 276 | stride=1 277 | pad=1 278 | activation=leaky 279 | 280 | [shortcut] 281 | from=-3 282 | activation=linear 283 | 284 | # Downsample 285 | 286 | [convolutional] 287 | batch_normalize=1 288 | filters=512 289 | size=3 290 | stride=2 291 | pad=1 292 | activation=leaky 293 | 294 | [convolutional] 295 | batch_normalize=1 296 | filters=256 297 | size=1 298 | stride=1 299 | pad=1 300 | activation=leaky 301 | 302 | [convolutional] 303 | batch_normalize=1 304 | filters=512 305 | size=3 306 | stride=1 307 | pad=1 308 | activation=leaky 309 | 310 | [shortcut] 311 | from=-3 312 | activation=linear 313 | 314 | 315 | [convolutional] 316 | batch_normalize=1 317 | filters=256 318 | size=1 319 | stride=1 320 | pad=1 321 | activation=leaky 322 | 323 | [convolutional] 324 | batch_normalize=1 325 | filters=512 326 | size=3 327 | stride=1 328 | pad=1 329 | activation=leaky 330 | 331 | [shortcut] 332 | from=-3 333 | activation=linear 334 | 335 | 336 | [convolutional] 337 | batch_normalize=1 338 | filters=256 339 | size=1 340 | stride=1 341 | pad=1 342 | activation=leaky 343 | 344 | [convolutional] 345 | batch_normalize=1 346 | filters=512 347 | size=3 348 | stride=1 349 | pad=1 350 | activation=leaky 351 | 352 | [shortcut] 353 | from=-3 354 | activation=linear 355 | 356 | 357 | [convolutional] 358 | batch_normalize=1 359 | filters=256 360 | size=1 361 | stride=1 362 | pad=1 363 | activation=leaky 364 | 365 | [convolutional] 366 | batch_normalize=1 367 | filters=512 368 | size=3 369 | stride=1 370 | pad=1 371 | activation=leaky 372 | 373 | [shortcut] 374 | from=-3 375 | activation=linear 376 | 377 | [convolutional] 378 | batch_normalize=1 379 | filters=256 380 | size=1 381 | stride=1 382 | pad=1 383 | activation=leaky 384 | 385 | [convolutional] 386 | batch_normalize=1 387 | filters=512 388 | size=3 389 | stride=1 390 | pad=1 391 | activation=leaky 392 | 393 | [shortcut] 394 | from=-3 395 | activation=linear 396 | 397 | 398 | [convolutional] 399 | batch_normalize=1 400 | filters=256 401 | size=1 402 | stride=1 403 | pad=1 404 | activation=leaky 405 | 406 | [convolutional] 407 | batch_normalize=1 408 | filters=512 409 | size=3 410 | stride=1 411 | pad=1 412 | activation=leaky 413 | 414 | [shortcut] 415 | from=-3 416 | activation=linear 417 | 418 | 419 | [convolutional] 420 | batch_normalize=1 421 | filters=256 422 | size=1 423 | stride=1 424 | pad=1 425 | activation=leaky 426 | 427 | [convolutional] 428 | batch_normalize=1 429 | filters=512 430 | size=3 431 | stride=1 432 | pad=1 433 | activation=leaky 434 | 435 | [shortcut] 436 | from=-3 437 | activation=linear 438 | 439 | [convolutional] 440 | batch_normalize=1 441 | filters=256 442 | size=1 443 | stride=1 444 | pad=1 445 | activation=leaky 446 | 447 | [convolutional] 448 | batch_normalize=1 449 | filters=512 450 | size=3 451 | stride=1 452 | pad=1 453 | activation=leaky 454 | 455 | [shortcut] 456 | from=-3 457 | activation=linear 458 | 459 | # Downsample 460 | 461 | [convolutional] 462 | batch_normalize=1 463 | filters=1024 464 | size=3 465 | stride=2 466 | pad=1 467 | activation=leaky 468 | 469 | [convolutional] 470 | batch_normalize=1 471 | filters=512 472 | size=1 473 | stride=1 474 | pad=1 475 | activation=leaky 476 | 477 | [convolutional] 478 | batch_normalize=1 479 | filters=1024 480 | size=3 481 | stride=1 482 | pad=1 483 | activation=leaky 484 | 485 | [shortcut] 486 | from=-3 487 | activation=linear 488 | 489 | [convolutional] 490 | batch_normalize=1 491 | filters=512 492 | size=1 493 | stride=1 494 | pad=1 495 | activation=leaky 496 | 497 | [convolutional] 498 | batch_normalize=1 499 | filters=1024 500 | size=3 501 | stride=1 502 | pad=1 503 | activation=leaky 504 | 505 | [shortcut] 506 | from=-3 507 | activation=linear 508 | 509 | [convolutional] 510 | batch_normalize=1 511 | filters=512 512 | size=1 513 | stride=1 514 | pad=1 515 | activation=leaky 516 | 517 | [convolutional] 518 | batch_normalize=1 519 | filters=1024 520 | size=3 521 | stride=1 522 | pad=1 523 | activation=leaky 524 | 525 | [shortcut] 526 | from=-3 527 | activation=linear 528 | 529 | [convolutional] 530 | batch_normalize=1 531 | filters=512 532 | size=1 533 | stride=1 534 | pad=1 535 | activation=leaky 536 | 537 | [convolutional] 538 | batch_normalize=1 539 | filters=1024 540 | size=3 541 | stride=1 542 | pad=1 543 | activation=leaky 544 | 545 | [shortcut] 546 | from=-3 547 | activation=linear 548 | 549 | ###################### 550 | 551 | [convolutional] 552 | batch_normalize=1 553 | filters=512 554 | size=1 555 | stride=1 556 | pad=1 557 | activation=leaky 558 | 559 | [convolutional] 560 | batch_normalize=1 561 | size=3 562 | stride=1 563 | pad=1 564 | filters=1024 565 | activation=leaky 566 | 567 | [convolutional] 568 | batch_normalize=1 569 | filters=512 570 | size=1 571 | stride=1 572 | pad=1 573 | activation=leaky 574 | 575 | [convolutional] 576 | batch_normalize=1 577 | size=3 578 | stride=1 579 | pad=1 580 | filters=1024 581 | activation=leaky 582 | 583 | [convolutional] 584 | batch_normalize=1 585 | filters=512 586 | size=1 587 | stride=1 588 | pad=1 589 | activation=leaky 590 | 591 | [convolutional] 592 | batch_normalize=1 593 | size=3 594 | stride=1 595 | pad=1 596 | filters=1024 597 | activation=leaky 598 | 599 | [convolutional] 600 | size=1 601 | stride=1 602 | pad=1 603 | filters=504 604 | activation=linear 605 | 606 | # 角度定义为x+ 0; 顺时针; 正负0.5pi 607 | [yolo] 608 | mask = 144-215 609 | anchors = /py/rotated-yolo/utils/kmeans/hrsc_512.txt 610 | classes=1 611 | num=9 612 | jitter=.3 613 | ignore_thresh = .7 614 | truth_thresh = 1 615 | random=1 616 | 617 | 618 | [route] 619 | layers = -4 620 | 621 | [convolutional] 622 | batch_normalize=1 623 | filters=256 624 | size=1 625 | stride=1 626 | pad=1 627 | activation=leaky 628 | 629 | [upsample] 630 | stride=2 631 | 632 | [route] 633 | layers = -1, 61 634 | 635 | 636 | 637 | [convolutional] 638 | batch_normalize=1 639 | filters=256 640 | size=1 641 | stride=1 642 | pad=1 643 | activation=leaky 644 | 645 | [convolutional] 646 | batch_normalize=1 647 | size=3 648 | stride=1 649 | pad=1 650 | filters=512 651 | activation=leaky 652 | 653 | [convolutional] 654 | batch_normalize=1 655 | filters=256 656 | size=1 657 | stride=1 658 | pad=1 659 | activation=leaky 660 | 661 | [convolutional] 662 | batch_normalize=1 663 | size=3 664 | stride=1 665 | pad=1 666 | filters=512 667 | activation=leaky 668 | 669 | [convolutional] 670 | batch_normalize=1 671 | filters=256 672 | size=1 673 | stride=1 674 | pad=1 675 | activation=leaky 676 | 677 | [convolutional] 678 | batch_normalize=1 679 | size=3 680 | stride=1 681 | pad=1 682 | filters=512 683 | activation=leaky 684 | 685 | [convolutional] 686 | size=1 687 | stride=1 688 | pad=1 689 | filters=504 690 | activation=linear 691 | 692 | 693 | [yolo] 694 | mask = 72-143 695 | anchors = /py/rotated-yolo/utils/kmeans/hrsc_512.txt 696 | classes=1 697 | num=18 698 | jitter=.3 699 | ignore_thresh = .7 700 | truth_thresh = 1 701 | random=1 702 | 703 | 704 | 705 | [route] 706 | layers = -4 707 | 708 | [convolutional] 709 | batch_normalize=1 710 | filters=128 711 | size=1 712 | stride=1 713 | pad=1 714 | activation=leaky 715 | 716 | [upsample] 717 | stride=2 718 | 719 | [route] 720 | layers = -1, 36 721 | 722 | 723 | 724 | [convolutional] 725 | batch_normalize=1 726 | filters=128 727 | size=1 728 | stride=1 729 | pad=1 730 | activation=leaky 731 | 732 | [convolutional] 733 | batch_normalize=1 734 | size=3 735 | stride=1 736 | pad=1 737 | filters=256 738 | activation=leaky 739 | 740 | [convolutional] 741 | batch_normalize=1 742 | filters=128 743 | size=1 744 | stride=1 745 | pad=1 746 | activation=leaky 747 | 748 | [convolutional] 749 | batch_normalize=1 750 | size=3 751 | stride=1 752 | pad=1 753 | filters=256 754 | activation=leaky 755 | 756 | [convolutional] 757 | batch_normalize=1 758 | filters=128 759 | size=1 760 | stride=1 761 | pad=1 762 | activation=leaky 763 | 764 | [convolutional] 765 | batch_normalize=1 766 | size=3 767 | stride=1 768 | pad=1 769 | filters=256 770 | activation=leaky 771 | 772 | [convolutional] 773 | size=1 774 | stride=1 775 | pad=1 776 | filters=504 777 | activation=linear 778 | 779 | 780 | [yolo] 781 | mask = 0-71 782 | anchors = /py/rotated-yolo/utils/kmeans/hrsc_512.txt 783 | classes=1 784 | num=9 785 | jitter=.3 786 | ignore_thresh = .7 787 | truth_thresh = 1 788 | random=1 789 | -------------------------------------------------------------------------------- /cfg/ICDAR/hyp.py: -------------------------------------------------------------------------------- 1 | giou: 0.1 # giou loss gain 1.582 2 | cls: 27.76 # cls loss gain (CE=~1.0, uCE=~20) 3 | cls_pw: 1.446 # cls BCELoss positive_weight 4 | obj: 20.35 # obj loss gain (*=80 for uBCE with 80 classes) 5 | obj_pw: 3.941 # obj BCELoss positive_weight 6 | iou_t: 0.3 # iou training threshold 7 | ang_t: 3.1415926/12 8 | reg: 1.0 9 | fl_gamma: 0.5 # focal loss gamma 10 | context_factor: 1.0 # 按照短边h来设置的,wh的增幅相同; 调试时设为倒数直接检测 11 | 12 | 13 | # lr 14 | lr0: 0.00008 15 | multiplier:10 16 | warm_epoch:5 17 | lrf: -4. # final LambdaLR learning rate = lr0 * (10 ** lrf) 18 | momentum: 0.97 # SGD momentum 19 | weight_decay: 0.0004569 # optimizer weight decay 20 | 21 | 22 | # aug 23 | hsv_s: 0.5 # image HSV-Saturation augmentation (fraction) 24 | hsv_v: 0.3 # image HSV-Value augmentation (fraction) 25 | degrees: 50.0 # image rotation (+/- deg) 26 | translate: 0.2 # image translation (+/- fraction) 27 | scale: 0.2 # image scale (+/- gain) 28 | shear: 0.0 29 | gamma: 0.2 30 | blur: 1.2 31 | noise: 0.005 32 | contrast: 0.0 33 | sharpen: 0.0 34 | copypaste: 0.0 35 | grayscale: 0.05 36 | 37 | 38 | # training 39 | epochs: 1500 40 | batch_size: 4 41 | save_interval: 50 42 | test_interval: 1 43 | -------------------------------------------------------------------------------- /cfg/ICDAR/yolov3_608.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | #batch=1 4 | #subdivisions=1 5 | # Training 6 | batch=16 7 | subdivisions=1 8 | width=512 9 | height=512 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | [convolutional] 26 | batch_normalize=1 27 | filters=32 28 | size=3 29 | stride=1 30 | pad=1 31 | activation=leaky 32 | 33 | # Downsample 34 | 35 | [convolutional] 36 | batch_normalize=1 37 | filters=64 38 | size=3 39 | stride=2 40 | pad=1 41 | activation=leaky 42 | 43 | [convolutional] 44 | batch_normalize=1 45 | filters=32 46 | size=1 47 | stride=1 48 | pad=1 49 | activation=leaky 50 | 51 | [convolutional] 52 | batch_normalize=1 53 | filters=64 54 | size=3 55 | stride=1 56 | pad=1 57 | activation=leaky 58 | 59 | [shortcut] 60 | from=-3 61 | activation=linear 62 | 63 | # Downsample 64 | 65 | [convolutional] 66 | batch_normalize=1 67 | filters=128 68 | size=3 69 | stride=2 70 | pad=1 71 | activation=leaky 72 | 73 | [convolutional] 74 | batch_normalize=1 75 | filters=64 76 | size=1 77 | stride=1 78 | pad=1 79 | activation=leaky 80 | 81 | [convolutional] 82 | batch_normalize=1 83 | filters=128 84 | size=3 85 | stride=1 86 | pad=1 87 | activation=leaky 88 | 89 | [shortcut] 90 | from=-3 91 | activation=linear 92 | 93 | [convolutional] 94 | batch_normalize=1 95 | filters=64 96 | size=1 97 | stride=1 98 | pad=1 99 | activation=leaky 100 | 101 | [convolutional] 102 | batch_normalize=1 103 | filters=128 104 | size=3 105 | stride=1 106 | pad=1 107 | activation=leaky 108 | 109 | [shortcut] 110 | from=-3 111 | activation=linear 112 | 113 | # Downsample 114 | 115 | [convolutional] 116 | batch_normalize=1 117 | filters=256 118 | size=3 119 | stride=2 120 | pad=1 121 | activation=leaky 122 | 123 | [convolutional] 124 | batch_normalize=1 125 | filters=128 126 | size=1 127 | stride=1 128 | pad=1 129 | activation=leaky 130 | 131 | [convolutional] 132 | batch_normalize=1 133 | filters=256 134 | size=3 135 | stride=1 136 | pad=1 137 | activation=leaky 138 | 139 | [shortcut] 140 | from=-3 141 | activation=linear 142 | 143 | [convolutional] 144 | batch_normalize=1 145 | filters=128 146 | size=1 147 | stride=1 148 | pad=1 149 | activation=leaky 150 | 151 | [convolutional] 152 | batch_normalize=1 153 | filters=256 154 | size=3 155 | stride=1 156 | pad=1 157 | activation=leaky 158 | 159 | [shortcut] 160 | from=-3 161 | activation=linear 162 | 163 | [convolutional] 164 | batch_normalize=1 165 | filters=128 166 | size=1 167 | stride=1 168 | pad=1 169 | activation=leaky 170 | 171 | [convolutional] 172 | batch_normalize=1 173 | filters=256 174 | size=3 175 | stride=1 176 | pad=1 177 | activation=leaky 178 | 179 | [shortcut] 180 | from=-3 181 | activation=linear 182 | 183 | [convolutional] 184 | batch_normalize=1 185 | filters=128 186 | size=1 187 | stride=1 188 | pad=1 189 | activation=leaky 190 | 191 | [convolutional] 192 | batch_normalize=1 193 | filters=256 194 | size=3 195 | stride=1 196 | pad=1 197 | activation=leaky 198 | 199 | [shortcut] 200 | from=-3 201 | activation=linear 202 | 203 | 204 | [convolutional] 205 | batch_normalize=1 206 | filters=128 207 | size=1 208 | stride=1 209 | pad=1 210 | activation=leaky 211 | 212 | [convolutional] 213 | batch_normalize=1 214 | filters=256 215 | size=3 216 | stride=1 217 | pad=1 218 | activation=leaky 219 | 220 | [shortcut] 221 | from=-3 222 | activation=linear 223 | 224 | [convolutional] 225 | batch_normalize=1 226 | filters=128 227 | size=1 228 | stride=1 229 | pad=1 230 | activation=leaky 231 | 232 | [convolutional] 233 | batch_normalize=1 234 | filters=256 235 | size=3 236 | stride=1 237 | pad=1 238 | activation=leaky 239 | 240 | [shortcut] 241 | from=-3 242 | activation=linear 243 | 244 | [convolutional] 245 | batch_normalize=1 246 | filters=128 247 | size=1 248 | stride=1 249 | pad=1 250 | activation=leaky 251 | 252 | [convolutional] 253 | batch_normalize=1 254 | filters=256 255 | size=3 256 | stride=1 257 | pad=1 258 | activation=leaky 259 | 260 | [shortcut] 261 | from=-3 262 | activation=linear 263 | 264 | [convolutional] 265 | batch_normalize=1 266 | filters=128 267 | size=1 268 | stride=1 269 | pad=1 270 | activation=leaky 271 | 272 | [convolutional] 273 | batch_normalize=1 274 | filters=256 275 | size=3 276 | stride=1 277 | pad=1 278 | activation=leaky 279 | 280 | [shortcut] 281 | from=-3 282 | activation=linear 283 | 284 | # Downsample 285 | 286 | [convolutional] 287 | batch_normalize=1 288 | filters=512 289 | size=3 290 | stride=2 291 | pad=1 292 | activation=leaky 293 | 294 | [convolutional] 295 | batch_normalize=1 296 | filters=256 297 | size=1 298 | stride=1 299 | pad=1 300 | activation=leaky 301 | 302 | [convolutional] 303 | batch_normalize=1 304 | filters=512 305 | size=3 306 | stride=1 307 | pad=1 308 | activation=leaky 309 | 310 | [shortcut] 311 | from=-3 312 | activation=linear 313 | 314 | 315 | [convolutional] 316 | batch_normalize=1 317 | filters=256 318 | size=1 319 | stride=1 320 | pad=1 321 | activation=leaky 322 | 323 | [convolutional] 324 | batch_normalize=1 325 | filters=512 326 | size=3 327 | stride=1 328 | pad=1 329 | activation=leaky 330 | 331 | [shortcut] 332 | from=-3 333 | activation=linear 334 | 335 | 336 | [convolutional] 337 | batch_normalize=1 338 | filters=256 339 | size=1 340 | stride=1 341 | pad=1 342 | activation=leaky 343 | 344 | [convolutional] 345 | batch_normalize=1 346 | filters=512 347 | size=3 348 | stride=1 349 | pad=1 350 | activation=leaky 351 | 352 | [shortcut] 353 | from=-3 354 | activation=linear 355 | 356 | 357 | [convolutional] 358 | batch_normalize=1 359 | filters=256 360 | size=1 361 | stride=1 362 | pad=1 363 | activation=leaky 364 | 365 | [convolutional] 366 | batch_normalize=1 367 | filters=512 368 | size=3 369 | stride=1 370 | pad=1 371 | activation=leaky 372 | 373 | [shortcut] 374 | from=-3 375 | activation=linear 376 | 377 | [convolutional] 378 | batch_normalize=1 379 | filters=256 380 | size=1 381 | stride=1 382 | pad=1 383 | activation=leaky 384 | 385 | [convolutional] 386 | batch_normalize=1 387 | filters=512 388 | size=3 389 | stride=1 390 | pad=1 391 | activation=leaky 392 | 393 | [shortcut] 394 | from=-3 395 | activation=linear 396 | 397 | 398 | [convolutional] 399 | batch_normalize=1 400 | filters=256 401 | size=1 402 | stride=1 403 | pad=1 404 | activation=leaky 405 | 406 | [convolutional] 407 | batch_normalize=1 408 | filters=512 409 | size=3 410 | stride=1 411 | pad=1 412 | activation=leaky 413 | 414 | [shortcut] 415 | from=-3 416 | activation=linear 417 | 418 | 419 | [convolutional] 420 | batch_normalize=1 421 | filters=256 422 | size=1 423 | stride=1 424 | pad=1 425 | activation=leaky 426 | 427 | [convolutional] 428 | batch_normalize=1 429 | filters=512 430 | size=3 431 | stride=1 432 | pad=1 433 | activation=leaky 434 | 435 | [shortcut] 436 | from=-3 437 | activation=linear 438 | 439 | [convolutional] 440 | batch_normalize=1 441 | filters=256 442 | size=1 443 | stride=1 444 | pad=1 445 | activation=leaky 446 | 447 | [convolutional] 448 | batch_normalize=1 449 | filters=512 450 | size=3 451 | stride=1 452 | pad=1 453 | activation=leaky 454 | 455 | [shortcut] 456 | from=-3 457 | activation=linear 458 | 459 | # Downsample 460 | 461 | [convolutional] 462 | batch_normalize=1 463 | filters=1024 464 | size=3 465 | stride=2 466 | pad=1 467 | activation=leaky 468 | 469 | [convolutional] 470 | batch_normalize=1 471 | filters=512 472 | size=1 473 | stride=1 474 | pad=1 475 | activation=leaky 476 | 477 | [convolutional] 478 | batch_normalize=1 479 | filters=1024 480 | size=3 481 | stride=1 482 | pad=1 483 | activation=leaky 484 | 485 | [shortcut] 486 | from=-3 487 | activation=linear 488 | 489 | [convolutional] 490 | batch_normalize=1 491 | filters=512 492 | size=1 493 | stride=1 494 | pad=1 495 | activation=leaky 496 | 497 | [convolutional] 498 | batch_normalize=1 499 | filters=1024 500 | size=3 501 | stride=1 502 | pad=1 503 | activation=leaky 504 | 505 | [shortcut] 506 | from=-3 507 | activation=linear 508 | 509 | [convolutional] 510 | batch_normalize=1 511 | filters=512 512 | size=1 513 | stride=1 514 | pad=1 515 | activation=leaky 516 | 517 | [convolutional] 518 | batch_normalize=1 519 | filters=1024 520 | size=3 521 | stride=1 522 | pad=1 523 | activation=leaky 524 | 525 | [shortcut] 526 | from=-3 527 | activation=linear 528 | 529 | [convolutional] 530 | batch_normalize=1 531 | filters=512 532 | size=1 533 | stride=1 534 | pad=1 535 | activation=leaky 536 | 537 | [convolutional] 538 | batch_normalize=1 539 | filters=1024 540 | size=3 541 | stride=1 542 | pad=1 543 | activation=leaky 544 | 545 | [shortcut] 546 | from=-3 547 | activation=linear 548 | 549 | ###################### 550 | 551 | [convolutional] 552 | batch_normalize=1 553 | filters=512 554 | size=1 555 | stride=1 556 | pad=1 557 | activation=leaky 558 | 559 | [convolutional] 560 | batch_normalize=1 561 | size=3 562 | stride=1 563 | pad=1 564 | filters=1024 565 | activation=leaky 566 | 567 | [convolutional] 568 | batch_normalize=1 569 | filters=512 570 | size=1 571 | stride=1 572 | pad=1 573 | activation=leaky 574 | 575 | [convolutional] 576 | batch_normalize=1 577 | size=3 578 | stride=1 579 | pad=1 580 | filters=1024 581 | activation=leaky 582 | 583 | [convolutional] 584 | batch_normalize=1 585 | filters=512 586 | size=1 587 | stride=1 588 | pad=1 589 | activation=leaky 590 | 591 | [convolutional] 592 | batch_normalize=1 593 | size=3 594 | stride=1 595 | pad=1 596 | filters=1024 597 | activation=leaky 598 | 599 | [convolutional] 600 | size=1 601 | stride=1 602 | pad=1 603 | filters=504 604 | activation=linear 605 | 606 | # 角度定义为x+ 0; 顺时针; 正负0.5pi 607 | [yolo] 608 | mask = 144-215 609 | anchors = utils/kmeans/icdar_608_care.txt 610 | classes=1 611 | num=9 612 | jitter=.3 613 | ignore_thresh = .7 614 | truth_thresh = 1 615 | random=1 616 | 617 | 618 | [route] 619 | layers = -4 620 | 621 | [convolutional] 622 | batch_normalize=1 623 | filters=256 624 | size=1 625 | stride=1 626 | pad=1 627 | activation=leaky 628 | 629 | [upsample] 630 | stride=2 631 | 632 | [route] 633 | layers = -1, 61 634 | 635 | 636 | 637 | [convolutional] 638 | batch_normalize=1 639 | filters=256 640 | size=1 641 | stride=1 642 | pad=1 643 | activation=leaky 644 | 645 | [convolutional] 646 | batch_normalize=1 647 | size=3 648 | stride=1 649 | pad=1 650 | filters=512 651 | activation=leaky 652 | 653 | [convolutional] 654 | batch_normalize=1 655 | filters=256 656 | size=1 657 | stride=1 658 | pad=1 659 | activation=leaky 660 | 661 | [convolutional] 662 | batch_normalize=1 663 | size=3 664 | stride=1 665 | pad=1 666 | filters=512 667 | activation=leaky 668 | 669 | [convolutional] 670 | batch_normalize=1 671 | filters=256 672 | size=1 673 | stride=1 674 | pad=1 675 | activation=leaky 676 | 677 | [convolutional] 678 | batch_normalize=1 679 | size=3 680 | stride=1 681 | pad=1 682 | filters=512 683 | activation=leaky 684 | 685 | [convolutional] 686 | size=1 687 | stride=1 688 | pad=1 689 | filters=504 690 | activation=linear 691 | 692 | 693 | [yolo] 694 | mask = 72-143 695 | anchors = utils/kmeans/icdar_608_care.txt 696 | classes=1 697 | num=9 698 | jitter=.3 699 | ignore_thresh = .7 700 | truth_thresh = 1 701 | random=1 702 | 703 | 704 | 705 | [route] 706 | layers = -4 707 | 708 | [convolutional] 709 | batch_normalize=1 710 | filters=128 711 | size=1 712 | stride=1 713 | pad=1 714 | activation=leaky 715 | 716 | [upsample] 717 | stride=2 718 | 719 | [route] 720 | layers = -1, 36 721 | 722 | 723 | 724 | [convolutional] 725 | batch_normalize=1 726 | filters=128 727 | size=1 728 | stride=1 729 | pad=1 730 | activation=leaky 731 | 732 | [convolutional] 733 | batch_normalize=1 734 | size=3 735 | stride=1 736 | pad=1 737 | filters=256 738 | activation=leaky 739 | 740 | [convolutional] 741 | batch_normalize=1 742 | filters=128 743 | size=1 744 | stride=1 745 | pad=1 746 | activation=leaky 747 | 748 | [convolutional] 749 | batch_normalize=1 750 | size=3 751 | stride=1 752 | pad=1 753 | filters=256 754 | activation=leaky 755 | 756 | [convolutional] 757 | batch_normalize=1 758 | filters=128 759 | size=1 760 | stride=1 761 | pad=1 762 | activation=leaky 763 | 764 | [convolutional] 765 | batch_normalize=1 766 | size=3 767 | stride=1 768 | pad=1 769 | filters=256 770 | activation=leaky 771 | 772 | [convolutional] 773 | size=1 774 | stride=1 775 | pad=1 776 | filters=504 777 | activation=linear 778 | 779 | 780 | [yolo] 781 | mask = 0-71 782 | anchors = utils/kmeans/icdar_608_care.txt 783 | classes=1 784 | num=9 785 | jitter=.3 786 | ignore_thresh = .7 787 | truth_thresh = 1 788 | random=1 789 | -------------------------------------------------------------------------------- /cfg/ICDAR/yolov3_608_se.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | #batch=1 4 | #subdivisions=1 5 | # Training 6 | batch=16 7 | subdivisions=1 8 | width=512 9 | height=512 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | [convolutional] 26 | batch_normalize=1 27 | filters=32 28 | size=3 29 | stride=1 30 | pad=1 31 | activation=leaky 32 | 33 | # Downsample 34 | 35 | [convolutional] 36 | batch_normalize=1 37 | filters=64 38 | size=3 39 | stride=2 40 | pad=1 41 | activation=leaky 42 | 43 | [convolutional] 44 | batch_normalize=1 45 | filters=32 46 | size=1 47 | stride=1 48 | pad=1 49 | activation=leaky 50 | 51 | [convolutional] 52 | batch_normalize=1 53 | filters=64 54 | size=3 55 | stride=1 56 | pad=1 57 | activation=leaky 58 | 59 | [shortcut] 60 | from=-3 61 | activation=linear 62 | 63 | # Downsample 64 | 65 | [convolutional] 66 | batch_normalize=1 67 | filters=128 68 | size=3 69 | stride=2 70 | pad=1 71 | activation=leaky 72 | 73 | [convolutional] 74 | batch_normalize=1 75 | filters=64 76 | size=1 77 | stride=1 78 | pad=1 79 | activation=leaky 80 | 81 | [convolutional] 82 | batch_normalize=1 83 | filters=128 84 | size=3 85 | stride=1 86 | pad=1 87 | activation=leaky 88 | 89 | [shortcut] 90 | from=-3 91 | activation=linear 92 | 93 | [convolutional] 94 | batch_normalize=1 95 | filters=64 96 | size=1 97 | stride=1 98 | pad=1 99 | activation=leaky 100 | 101 | [convolutional] 102 | batch_normalize=1 103 | filters=128 104 | size=3 105 | stride=1 106 | pad=1 107 | activation=leaky 108 | 109 | [shortcut] 110 | from=-3 111 | activation=linear 112 | 113 | # Downsample 114 | # s=8 115 | [convolutional] 116 | batch_normalize=1 117 | filters=256 118 | size=3 119 | stride=2 120 | pad=1 121 | activation=leaky 122 | 123 | # --SE block--- 124 | [se] 125 | channels=256 126 | 127 | [convolutional] 128 | batch_normalize=1 129 | filters=128 130 | size=1 131 | stride=1 132 | pad=1 133 | activation=leaky 134 | 135 | [convolutional] 136 | batch_normalize=1 137 | filters=256 138 | size=3 139 | stride=1 140 | pad=1 141 | activation=leaky 142 | 143 | [shortcut] 144 | from=-4 145 | activation=linear 146 | 147 | [se] 148 | channels=256 149 | 150 | [convolutional] 151 | batch_normalize=1 152 | filters=128 153 | size=1 154 | stride=1 155 | pad=1 156 | activation=leaky 157 | 158 | [convolutional] 159 | batch_normalize=1 160 | filters=256 161 | size=3 162 | stride=1 163 | pad=1 164 | activation=leaky 165 | 166 | [shortcut] 167 | from=-4 168 | activation=linear 169 | 170 | [se] 171 | channels=256 172 | 173 | [convolutional] 174 | batch_normalize=1 175 | filters=128 176 | size=1 177 | stride=1 178 | pad=1 179 | activation=leaky 180 | 181 | [convolutional] 182 | batch_normalize=1 183 | filters=256 184 | size=3 185 | stride=1 186 | pad=1 187 | activation=leaky 188 | 189 | [shortcut] 190 | from=-4 191 | activation=linear 192 | 193 | [se] 194 | channels=256 195 | 196 | [convolutional] 197 | batch_normalize=1 198 | filters=128 199 | size=1 200 | stride=1 201 | pad=1 202 | activation=leaky 203 | 204 | [convolutional] 205 | batch_normalize=1 206 | filters=256 207 | size=3 208 | stride=1 209 | pad=1 210 | activation=leaky 211 | 212 | [shortcut] 213 | from=-4 214 | activation=linear 215 | 216 | [se] 217 | channels=256 218 | 219 | [convolutional] 220 | batch_normalize=1 221 | filters=128 222 | size=1 223 | stride=1 224 | pad=1 225 | activation=leaky 226 | 227 | [convolutional] 228 | batch_normalize=1 229 | filters=256 230 | size=3 231 | stride=1 232 | pad=1 233 | activation=leaky 234 | 235 | [shortcut] 236 | from=-4 237 | activation=linear 238 | 239 | [se] 240 | channels=256 241 | 242 | [convolutional] 243 | batch_normalize=1 244 | filters=128 245 | size=1 246 | stride=1 247 | pad=1 248 | activation=leaky 249 | 250 | [convolutional] 251 | batch_normalize=1 252 | filters=256 253 | size=3 254 | stride=1 255 | pad=1 256 | activation=leaky 257 | 258 | [shortcut] 259 | from=-4 260 | activation=linear 261 | 262 | [se] 263 | channels=256 264 | 265 | [convolutional] 266 | batch_normalize=1 267 | filters=128 268 | size=1 269 | stride=1 270 | pad=1 271 | activation=leaky 272 | 273 | [convolutional] 274 | batch_normalize=1 275 | filters=256 276 | size=3 277 | stride=1 278 | pad=1 279 | activation=leaky 280 | 281 | [shortcut] 282 | from=-4 283 | activation=linear 284 | 285 | [se] 286 | channels=256 287 | 288 | [convolutional] 289 | batch_normalize=1 290 | filters=128 291 | size=1 292 | stride=1 293 | pad=1 294 | activation=leaky 295 | 296 | [convolutional] 297 | batch_normalize=1 298 | filters=256 299 | size=3 300 | stride=1 301 | pad=1 302 | activation=leaky 303 | 304 | [shortcut] 305 | from=-4 306 | activation=linear 307 | 308 | # Downsample 309 | # s=16 310 | [convolutional] 311 | batch_normalize=1 312 | filters=512 313 | size=3 314 | stride=2 315 | pad=1 316 | activation=leaky 317 | 318 | [se] 319 | channels=512 320 | 321 | [convolutional] 322 | batch_normalize=1 323 | filters=256 324 | size=1 325 | stride=1 326 | pad=1 327 | activation=leaky 328 | 329 | [convolutional] 330 | batch_normalize=1 331 | filters=512 332 | size=3 333 | stride=1 334 | pad=1 335 | activation=leaky 336 | 337 | [shortcut] 338 | from=-4 339 | activation=linear 340 | 341 | [se] 342 | channels=512 343 | 344 | [convolutional] 345 | batch_normalize=1 346 | filters=256 347 | size=1 348 | stride=1 349 | pad=1 350 | activation=leaky 351 | 352 | [convolutional] 353 | batch_normalize=1 354 | filters=512 355 | size=3 356 | stride=1 357 | pad=1 358 | activation=leaky 359 | 360 | [shortcut] 361 | from=-4 362 | activation=linear 363 | 364 | [se] 365 | channels=512 366 | 367 | [convolutional] 368 | batch_normalize=1 369 | filters=256 370 | size=1 371 | stride=1 372 | pad=1 373 | activation=leaky 374 | 375 | [convolutional] 376 | batch_normalize=1 377 | filters=512 378 | size=3 379 | stride=1 380 | pad=1 381 | activation=leaky 382 | 383 | [shortcut] 384 | from=-4 385 | activation=linear 386 | 387 | [se] 388 | channels=512 389 | 390 | [convolutional] 391 | batch_normalize=1 392 | filters=256 393 | size=1 394 | stride=1 395 | pad=1 396 | activation=leaky 397 | 398 | [convolutional] 399 | batch_normalize=1 400 | filters=512 401 | size=3 402 | stride=1 403 | pad=1 404 | activation=leaky 405 | 406 | [shortcut] 407 | from=-4 408 | activation=linear 409 | 410 | [se] 411 | channels=512 412 | 413 | [convolutional] 414 | batch_normalize=1 415 | filters=256 416 | size=1 417 | stride=1 418 | pad=1 419 | activation=leaky 420 | 421 | [convolutional] 422 | batch_normalize=1 423 | filters=512 424 | size=3 425 | stride=1 426 | pad=1 427 | activation=leaky 428 | 429 | [shortcut] 430 | from=-4 431 | activation=linear 432 | 433 | [se] 434 | channels=512 435 | 436 | [convolutional] 437 | batch_normalize=1 438 | filters=256 439 | size=1 440 | stride=1 441 | pad=1 442 | activation=leaky 443 | 444 | [convolutional] 445 | batch_normalize=1 446 | filters=512 447 | size=3 448 | stride=1 449 | pad=1 450 | activation=leaky 451 | 452 | [shortcut] 453 | from=-4 454 | activation=linear 455 | 456 | [se] 457 | channels=512 458 | 459 | [convolutional] 460 | batch_normalize=1 461 | filters=256 462 | size=1 463 | stride=1 464 | pad=1 465 | activation=leaky 466 | 467 | [convolutional] 468 | batch_normalize=1 469 | filters=512 470 | size=3 471 | stride=1 472 | pad=1 473 | activation=leaky 474 | 475 | [shortcut] 476 | from=-4 477 | activation=linear 478 | 479 | [se] 480 | channels=512 481 | 482 | [convolutional] 483 | batch_normalize=1 484 | filters=256 485 | size=1 486 | stride=1 487 | pad=1 488 | activation=leaky 489 | 490 | [convolutional] 491 | batch_normalize=1 492 | filters=512 493 | size=3 494 | stride=1 495 | pad=1 496 | activation=leaky 497 | 498 | [shortcut] 499 | from=-4 500 | activation=linear 501 | 502 | # Downsample 503 | # s=32 504 | [convolutional] 505 | batch_normalize=1 506 | filters=1024 507 | size=3 508 | stride=2 509 | pad=1 510 | activation=leaky 511 | 512 | [se] 513 | channels=1024 514 | 515 | [convolutional] 516 | batch_normalize=1 517 | filters=512 518 | size=1 519 | stride=1 520 | pad=1 521 | activation=leaky 522 | 523 | [convolutional] 524 | batch_normalize=1 525 | filters=1024 526 | size=3 527 | stride=1 528 | pad=1 529 | activation=leaky 530 | 531 | [shortcut] 532 | from=-4 533 | activation=linear 534 | 535 | [se] 536 | channels=1024 537 | 538 | [convolutional] 539 | batch_normalize=1 540 | filters=512 541 | size=1 542 | stride=1 543 | pad=1 544 | activation=leaky 545 | 546 | [convolutional] 547 | batch_normalize=1 548 | filters=1024 549 | size=3 550 | stride=1 551 | pad=1 552 | activation=leaky 553 | 554 | [shortcut] 555 | from=-4 556 | activation=linear 557 | 558 | [se] 559 | channels=1024 560 | 561 | [convolutional] 562 | batch_normalize=1 563 | filters=512 564 | size=1 565 | stride=1 566 | pad=1 567 | activation=leaky 568 | 569 | [convolutional] 570 | batch_normalize=1 571 | filters=1024 572 | size=3 573 | stride=1 574 | pad=1 575 | activation=leaky 576 | 577 | [shortcut] 578 | from=-4 579 | activation=linear 580 | 581 | 582 | [se] 583 | channels=1024 584 | 585 | [convolutional] 586 | batch_normalize=1 587 | filters=512 588 | size=1 589 | stride=1 590 | pad=1 591 | activation=leaky 592 | 593 | [convolutional] 594 | batch_normalize=1 595 | filters=1024 596 | size=3 597 | stride=1 598 | pad=1 599 | activation=leaky 600 | 601 | [shortcut] 602 | from=-4 603 | activation=linear 604 | 605 | ######## backbone到此为止 ############## 606 | 607 | [convolutional] 608 | batch_normalize=1 609 | filters=512 610 | size=1 611 | stride=1 612 | pad=1 613 | activation=leaky 614 | 615 | [convolutional] 616 | batch_normalize=1 617 | size=3 618 | stride=1 619 | pad=1 620 | filters=1024 621 | activation=leaky 622 | 623 | [convolutional] 624 | batch_normalize=1 625 | filters=512 626 | size=1 627 | stride=1 628 | pad=1 629 | activation=leaky 630 | 631 | [convolutional] 632 | batch_normalize=1 633 | size=3 634 | stride=1 635 | pad=1 636 | filters=1024 637 | activation=leaky 638 | 639 | [convolutional] 640 | batch_normalize=1 641 | filters=512 642 | size=1 643 | stride=1 644 | pad=1 645 | activation=leaky 646 | 647 | [convolutional] 648 | batch_normalize=1 649 | size=3 650 | stride=1 651 | pad=1 652 | filters=1024 653 | activation=leaky 654 | 655 | [convolutional] 656 | size=1 657 | stride=1 658 | pad=1 659 | filters=504 660 | activation=linear 661 | 662 | # 角度定义为x+ 0; 顺时针; 正负0.5pi 663 | [yolo] 664 | mask = 144-215 665 | anchors = utils/kmeans/icdar_608_care.txt 666 | classes=1 667 | num=9 668 | jitter=.3 669 | ignore_thresh = .7 670 | truth_thresh = 1 671 | random=1 672 | 673 | 674 | [route] 675 | layers = -4 676 | 677 | [convolutional] 678 | batch_normalize=1 679 | filters=256 680 | size=1 681 | stride=1 682 | pad=1 683 | activation=leaky 684 | 685 | [upsample] 686 | stride=2 687 | 688 | [route] 689 | layers = -1, 61 690 | 691 | 692 | 693 | [convolutional] 694 | batch_normalize=1 695 | filters=256 696 | size=1 697 | stride=1 698 | pad=1 699 | activation=leaky 700 | 701 | [convolutional] 702 | batch_normalize=1 703 | size=3 704 | stride=1 705 | pad=1 706 | filters=512 707 | activation=leaky 708 | 709 | [convolutional] 710 | batch_normalize=1 711 | filters=256 712 | size=1 713 | stride=1 714 | pad=1 715 | activation=leaky 716 | 717 | [convolutional] 718 | batch_normalize=1 719 | size=3 720 | stride=1 721 | pad=1 722 | filters=512 723 | activation=leaky 724 | 725 | [convolutional] 726 | batch_normalize=1 727 | filters=256 728 | size=1 729 | stride=1 730 | pad=1 731 | activation=leaky 732 | 733 | [convolutional] 734 | batch_normalize=1 735 | size=3 736 | stride=1 737 | pad=1 738 | filters=512 739 | activation=leaky 740 | 741 | [convolutional] 742 | size=1 743 | stride=1 744 | pad=1 745 | filters=504 746 | activation=linear 747 | 748 | 749 | [yolo] 750 | mask = 72-143 751 | anchors = utils/kmeans/icdar_608_care.txt 752 | classes=1 753 | num=9 754 | jitter=.3 755 | ignore_thresh = .7 756 | truth_thresh = 1 757 | random=1 758 | 759 | 760 | 761 | [route] 762 | layers = -4 763 | 764 | [convolutional] 765 | batch_normalize=1 766 | filters=128 767 | size=1 768 | stride=1 769 | pad=1 770 | activation=leaky 771 | 772 | [upsample] 773 | stride=2 774 | 775 | [route] 776 | layers = -1, 36 777 | 778 | 779 | 780 | [convolutional] 781 | batch_normalize=1 782 | filters=128 783 | size=1 784 | stride=1 785 | pad=1 786 | activation=leaky 787 | 788 | [convolutional] 789 | batch_normalize=1 790 | size=3 791 | stride=1 792 | pad=1 793 | filters=256 794 | activation=leaky 795 | 796 | [convolutional] 797 | batch_normalize=1 798 | filters=128 799 | size=1 800 | stride=1 801 | pad=1 802 | activation=leaky 803 | 804 | [convolutional] 805 | batch_normalize=1 806 | size=3 807 | stride=1 808 | pad=1 809 | filters=256 810 | activation=leaky 811 | 812 | [convolutional] 813 | batch_normalize=1 814 | filters=128 815 | size=1 816 | stride=1 817 | pad=1 818 | activation=leaky 819 | 820 | [convolutional] 821 | batch_normalize=1 822 | size=3 823 | stride=1 824 | pad=1 825 | filters=256 826 | activation=leaky 827 | 828 | [convolutional] 829 | size=1 830 | stride=1 831 | pad=1 832 | filters=504 833 | activation=linear 834 | 835 | 836 | [yolo] 837 | mask = 0-71 838 | anchors = utils/kmeans/icdar_608_care.txt 839 | classes=1 840 | num=9 841 | jitter=.3 842 | ignore_thresh = .7 843 | truth_thresh = 1 844 | random=1 845 | -------------------------------------------------------------------------------- /cfg/hyp_template.py: -------------------------------------------------------------------------------- 1 | giou: 0.1 # giou loss gain 1.582 2 | cls: 27.76 # cls loss gain (CE=~1.0, uCE=~20) 3 | cls_pw: 1.446 # cls BCELoss positive_weight 4 | obj: 20.35 # obj loss gain (*=80 for uBCE with 80 classes) 5 | obj_pw: 3.941 # obj BCELoss positive_weight 6 | iou_t: 0.5 # iou training threshold 7 | ang_t: 3.1415926/12 8 | reg: 1.0 9 | # fl_gamma: 0.5 # focal loss gamma 10 | context_factor: 1.0 # 按照短边h来设置的,wh的增幅相同; 调试时设为倒数直接检测 11 | 12 | 13 | # lr 14 | lr0: 0.0001 15 | multiplier:10 16 | warm_epoch:5 17 | lrf: -4. # final LambdaLR learning rate = lr0 * (10 ** lrf) 18 | momentum: 0.97 # SGD momentum 19 | weight_decay: 0.0004569 # optimizer weight decay 20 | 21 | 22 | # aug 23 | hsv_s: 0.5 # image HSV-Saturation augmentation (fraction) 24 | hsv_v: 0.3 # image HSV-Value augmentation (fraction) 25 | degrees: 5.0 # image rotation (+/- deg) 26 | translate: 0.1 # image translation (+/- fraction) 27 | scale: 0.1 # image scale (+/- gain) 28 | shear: 0.0 29 | gamma: 0.2 30 | blur: 1.3 31 | noise: 0.01 32 | contrast: 0.15 33 | sharpen: 0.15 34 | copypaste: 0.1 # 船身 h 的 3sigma 段位以内 35 | grayscale: 0.3 # 灰度强度为0.3-1.0 36 | 37 | 38 | # training 39 | epochs: 100 40 | batch_size: 8 41 | save_interval: 300 42 | test_interval: 5 43 | -------------------------------------------------------------------------------- /cfg/yolov3-tiny.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | batch=1 4 | subdivisions=1 5 | # Training 6 | # batch=64 7 | # subdivisions=2 8 | width=416 9 | height=416 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | [convolutional] 26 | batch_normalize=1 27 | filters=16 28 | size=3 29 | stride=1 30 | pad=1 31 | activation=leaky 32 | 33 | [maxpool] 34 | size=2 35 | stride=2 36 | 37 | [convolutional] 38 | batch_normalize=1 39 | filters=32 40 | size=3 41 | stride=1 42 | pad=1 43 | activation=leaky 44 | 45 | [maxpool] 46 | size=2 47 | stride=2 48 | 49 | [convolutional] 50 | batch_normalize=1 51 | filters=64 52 | size=3 53 | stride=1 54 | pad=1 55 | activation=leaky 56 | 57 | [maxpool] 58 | size=2 59 | stride=2 60 | 61 | [convolutional] 62 | batch_normalize=1 63 | filters=128 64 | size=3 65 | stride=1 66 | pad=1 67 | activation=leaky 68 | 69 | [maxpool] 70 | size=2 71 | stride=2 72 | 73 | [convolutional] 74 | batch_normalize=1 75 | filters=256 76 | size=3 77 | stride=1 78 | pad=1 79 | activation=leaky 80 | 81 | [maxpool] 82 | size=2 83 | stride=2 84 | 85 | [convolutional] 86 | batch_normalize=1 87 | filters=512 88 | size=3 89 | stride=1 90 | pad=1 91 | activation=leaky 92 | 93 | [maxpool] 94 | size=2 95 | stride=1 96 | 97 | [convolutional] 98 | batch_normalize=1 99 | filters=1024 100 | size=3 101 | stride=1 102 | pad=1 103 | activation=leaky 104 | 105 | ########### 106 | 107 | [convolutional] 108 | batch_normalize=1 109 | filters=256 110 | size=1 111 | stride=1 112 | pad=1 113 | activation=leaky 114 | 115 | [convolutional] 116 | batch_normalize=1 117 | filters=512 118 | size=3 119 | stride=1 120 | pad=1 121 | activation=leaky 122 | 123 | [convolutional] 124 | size=1 125 | stride=1 126 | pad=1 127 | filters=255 128 | activation=linear 129 | 130 | 131 | 132 | [yolo] 133 | mask = 3,4,5 134 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 135 | classes=80 136 | num=6 137 | jitter=.3 138 | ignore_thresh = .7 139 | truth_thresh = 1 140 | random=1 141 | 142 | [route] 143 | layers = -4 144 | 145 | [convolutional] 146 | batch_normalize=1 147 | filters=128 148 | size=1 149 | stride=1 150 | pad=1 151 | activation=leaky 152 | 153 | [upsample] 154 | stride=2 155 | 156 | [route] 157 | layers = -1, 8 158 | 159 | [convolutional] 160 | batch_normalize=1 161 | filters=256 162 | size=3 163 | stride=1 164 | pad=1 165 | activation=leaky 166 | 167 | [convolutional] 168 | size=1 169 | stride=1 170 | pad=1 171 | filters=255 172 | activation=linear 173 | 174 | [yolo] 175 | mask = 1,2,3 176 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 177 | classes=80 178 | num=6 179 | jitter=.3 180 | ignore_thresh = .7 181 | truth_thresh = 1 182 | random=1 183 | -------------------------------------------------------------------------------- /cfg/yolov3.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | #batch=1 4 | #subdivisions=1 5 | # Training 6 | batch=16 7 | subdivisions=1 8 | width=416 9 | height=416 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | [convolutional] 26 | batch_normalize=1 27 | filters=32 28 | size=3 29 | stride=1 30 | pad=1 31 | activation=leaky 32 | 33 | # Downsample 34 | 35 | [convolutional] 36 | batch_normalize=1 37 | filters=64 38 | size=3 39 | stride=2 40 | pad=1 41 | activation=leaky 42 | 43 | [convolutional] 44 | batch_normalize=1 45 | filters=32 46 | size=1 47 | stride=1 48 | pad=1 49 | activation=leaky 50 | 51 | [convolutional] 52 | batch_normalize=1 53 | filters=64 54 | size=3 55 | stride=1 56 | pad=1 57 | activation=leaky 58 | 59 | [shortcut] 60 | from=-3 61 | activation=linear 62 | 63 | # Downsample 64 | 65 | [convolutional] 66 | batch_normalize=1 67 | filters=128 68 | size=3 69 | stride=2 70 | pad=1 71 | activation=leaky 72 | 73 | [convolutional] 74 | batch_normalize=1 75 | filters=64 76 | size=1 77 | stride=1 78 | pad=1 79 | activation=leaky 80 | 81 | [convolutional] 82 | batch_normalize=1 83 | filters=128 84 | size=3 85 | stride=1 86 | pad=1 87 | activation=leaky 88 | 89 | [shortcut] 90 | from=-3 91 | activation=linear 92 | 93 | [convolutional] 94 | batch_normalize=1 95 | filters=64 96 | size=1 97 | stride=1 98 | pad=1 99 | activation=leaky 100 | 101 | [convolutional] 102 | batch_normalize=1 103 | filters=128 104 | size=3 105 | stride=1 106 | pad=1 107 | activation=leaky 108 | 109 | [shortcut] 110 | from=-3 111 | activation=linear 112 | 113 | # Downsample 114 | 115 | [convolutional] 116 | batch_normalize=1 117 | filters=256 118 | size=3 119 | stride=2 120 | pad=1 121 | activation=leaky 122 | 123 | [convolutional] 124 | batch_normalize=1 125 | filters=128 126 | size=1 127 | stride=1 128 | pad=1 129 | activation=leaky 130 | 131 | [convolutional] 132 | batch_normalize=1 133 | filters=256 134 | size=3 135 | stride=1 136 | pad=1 137 | activation=leaky 138 | 139 | [shortcut] 140 | from=-3 141 | activation=linear 142 | 143 | [convolutional] 144 | batch_normalize=1 145 | filters=128 146 | size=1 147 | stride=1 148 | pad=1 149 | activation=leaky 150 | 151 | [convolutional] 152 | batch_normalize=1 153 | filters=256 154 | size=3 155 | stride=1 156 | pad=1 157 | activation=leaky 158 | 159 | [shortcut] 160 | from=-3 161 | activation=linear 162 | 163 | [convolutional] 164 | batch_normalize=1 165 | filters=128 166 | size=1 167 | stride=1 168 | pad=1 169 | activation=leaky 170 | 171 | [convolutional] 172 | batch_normalize=1 173 | filters=256 174 | size=3 175 | stride=1 176 | pad=1 177 | activation=leaky 178 | 179 | [shortcut] 180 | from=-3 181 | activation=linear 182 | 183 | [convolutional] 184 | batch_normalize=1 185 | filters=128 186 | size=1 187 | stride=1 188 | pad=1 189 | activation=leaky 190 | 191 | [convolutional] 192 | batch_normalize=1 193 | filters=256 194 | size=3 195 | stride=1 196 | pad=1 197 | activation=leaky 198 | 199 | [shortcut] 200 | from=-3 201 | activation=linear 202 | 203 | 204 | [convolutional] 205 | batch_normalize=1 206 | filters=128 207 | size=1 208 | stride=1 209 | pad=1 210 | activation=leaky 211 | 212 | [convolutional] 213 | batch_normalize=1 214 | filters=256 215 | size=3 216 | stride=1 217 | pad=1 218 | activation=leaky 219 | 220 | [shortcut] 221 | from=-3 222 | activation=linear 223 | 224 | [convolutional] 225 | batch_normalize=1 226 | filters=128 227 | size=1 228 | stride=1 229 | pad=1 230 | activation=leaky 231 | 232 | [convolutional] 233 | batch_normalize=1 234 | filters=256 235 | size=3 236 | stride=1 237 | pad=1 238 | activation=leaky 239 | 240 | [shortcut] 241 | from=-3 242 | activation=linear 243 | 244 | [convolutional] 245 | batch_normalize=1 246 | filters=128 247 | size=1 248 | stride=1 249 | pad=1 250 | activation=leaky 251 | 252 | [convolutional] 253 | batch_normalize=1 254 | filters=256 255 | size=3 256 | stride=1 257 | pad=1 258 | activation=leaky 259 | 260 | [shortcut] 261 | from=-3 262 | activation=linear 263 | 264 | [convolutional] 265 | batch_normalize=1 266 | filters=128 267 | size=1 268 | stride=1 269 | pad=1 270 | activation=leaky 271 | 272 | [convolutional] 273 | batch_normalize=1 274 | filters=256 275 | size=3 276 | stride=1 277 | pad=1 278 | activation=leaky 279 | 280 | [shortcut] 281 | from=-3 282 | activation=linear 283 | 284 | # Downsample 285 | 286 | [convolutional] 287 | batch_normalize=1 288 | filters=512 289 | size=3 290 | stride=2 291 | pad=1 292 | activation=leaky 293 | 294 | [convolutional] 295 | batch_normalize=1 296 | filters=256 297 | size=1 298 | stride=1 299 | pad=1 300 | activation=leaky 301 | 302 | [convolutional] 303 | batch_normalize=1 304 | filters=512 305 | size=3 306 | stride=1 307 | pad=1 308 | activation=leaky 309 | 310 | [shortcut] 311 | from=-3 312 | activation=linear 313 | 314 | 315 | [convolutional] 316 | batch_normalize=1 317 | filters=256 318 | size=1 319 | stride=1 320 | pad=1 321 | activation=leaky 322 | 323 | [convolutional] 324 | batch_normalize=1 325 | filters=512 326 | size=3 327 | stride=1 328 | pad=1 329 | activation=leaky 330 | 331 | [shortcut] 332 | from=-3 333 | activation=linear 334 | 335 | 336 | [convolutional] 337 | batch_normalize=1 338 | filters=256 339 | size=1 340 | stride=1 341 | pad=1 342 | activation=leaky 343 | 344 | [convolutional] 345 | batch_normalize=1 346 | filters=512 347 | size=3 348 | stride=1 349 | pad=1 350 | activation=leaky 351 | 352 | [shortcut] 353 | from=-3 354 | activation=linear 355 | 356 | 357 | [convolutional] 358 | batch_normalize=1 359 | filters=256 360 | size=1 361 | stride=1 362 | pad=1 363 | activation=leaky 364 | 365 | [convolutional] 366 | batch_normalize=1 367 | filters=512 368 | size=3 369 | stride=1 370 | pad=1 371 | activation=leaky 372 | 373 | [shortcut] 374 | from=-3 375 | activation=linear 376 | 377 | [convolutional] 378 | batch_normalize=1 379 | filters=256 380 | size=1 381 | stride=1 382 | pad=1 383 | activation=leaky 384 | 385 | [convolutional] 386 | batch_normalize=1 387 | filters=512 388 | size=3 389 | stride=1 390 | pad=1 391 | activation=leaky 392 | 393 | [shortcut] 394 | from=-3 395 | activation=linear 396 | 397 | 398 | [convolutional] 399 | batch_normalize=1 400 | filters=256 401 | size=1 402 | stride=1 403 | pad=1 404 | activation=leaky 405 | 406 | [convolutional] 407 | batch_normalize=1 408 | filters=512 409 | size=3 410 | stride=1 411 | pad=1 412 | activation=leaky 413 | 414 | [shortcut] 415 | from=-3 416 | activation=linear 417 | 418 | 419 | [convolutional] 420 | batch_normalize=1 421 | filters=256 422 | size=1 423 | stride=1 424 | pad=1 425 | activation=leaky 426 | 427 | [convolutional] 428 | batch_normalize=1 429 | filters=512 430 | size=3 431 | stride=1 432 | pad=1 433 | activation=leaky 434 | 435 | [shortcut] 436 | from=-3 437 | activation=linear 438 | 439 | [convolutional] 440 | batch_normalize=1 441 | filters=256 442 | size=1 443 | stride=1 444 | pad=1 445 | activation=leaky 446 | 447 | [convolutional] 448 | batch_normalize=1 449 | filters=512 450 | size=3 451 | stride=1 452 | pad=1 453 | activation=leaky 454 | 455 | [shortcut] 456 | from=-3 457 | activation=linear 458 | 459 | # Downsample 460 | 461 | [convolutional] 462 | batch_normalize=1 463 | filters=1024 464 | size=3 465 | stride=2 466 | pad=1 467 | activation=leaky 468 | 469 | [convolutional] 470 | batch_normalize=1 471 | filters=512 472 | size=1 473 | stride=1 474 | pad=1 475 | activation=leaky 476 | 477 | [convolutional] 478 | batch_normalize=1 479 | filters=1024 480 | size=3 481 | stride=1 482 | pad=1 483 | activation=leaky 484 | 485 | [shortcut] 486 | from=-3 487 | activation=linear 488 | 489 | [convolutional] 490 | batch_normalize=1 491 | filters=512 492 | size=1 493 | stride=1 494 | pad=1 495 | activation=leaky 496 | 497 | [convolutional] 498 | batch_normalize=1 499 | filters=1024 500 | size=3 501 | stride=1 502 | pad=1 503 | activation=leaky 504 | 505 | [shortcut] 506 | from=-3 507 | activation=linear 508 | 509 | [convolutional] 510 | batch_normalize=1 511 | filters=512 512 | size=1 513 | stride=1 514 | pad=1 515 | activation=leaky 516 | 517 | [convolutional] 518 | batch_normalize=1 519 | filters=1024 520 | size=3 521 | stride=1 522 | pad=1 523 | activation=leaky 524 | 525 | [shortcut] 526 | from=-3 527 | activation=linear 528 | 529 | [convolutional] 530 | batch_normalize=1 531 | filters=512 532 | size=1 533 | stride=1 534 | pad=1 535 | activation=leaky 536 | 537 | [convolutional] 538 | batch_normalize=1 539 | filters=1024 540 | size=3 541 | stride=1 542 | pad=1 543 | activation=leaky 544 | 545 | [shortcut] 546 | from=-3 547 | activation=linear 548 | 549 | ###################### 550 | 551 | [convolutional] 552 | batch_normalize=1 553 | filters=512 554 | size=1 555 | stride=1 556 | pad=1 557 | activation=leaky 558 | 559 | [convolutional] 560 | batch_normalize=1 561 | size=3 562 | stride=1 563 | pad=1 564 | filters=1024 565 | activation=leaky 566 | 567 | [convolutional] 568 | batch_normalize=1 569 | filters=512 570 | size=1 571 | stride=1 572 | pad=1 573 | activation=leaky 574 | 575 | [convolutional] 576 | batch_normalize=1 577 | size=3 578 | stride=1 579 | pad=1 580 | filters=1024 581 | activation=leaky 582 | 583 | [convolutional] 584 | batch_normalize=1 585 | filters=512 586 | size=1 587 | stride=1 588 | pad=1 589 | activation=leaky 590 | 591 | [convolutional] 592 | batch_normalize=1 593 | size=3 594 | stride=1 595 | pad=1 596 | filters=1024 597 | activation=leaky 598 | 599 | [convolutional] 600 | size=1 601 | stride=1 602 | pad=1 603 | filters=504 604 | activation=linear 605 | 606 | # 角度定义为x+ 0; 顺时针; 正负0.5pi 607 | [yolo] 608 | mask = 144-215 609 | anchors = 792, 2061, 3870, 6353, 9623, 15803 / 4.18, 6.48, 8.71 / -75, -60, -45, -30, -15 ,0,15, 30,45, 60,75, 90 610 | classes=1 611 | num=9 612 | jitter=.3 613 | ignore_thresh = .7 614 | truth_thresh = 1 615 | random=1 616 | 617 | 618 | [route] 619 | layers = -4 620 | 621 | [convolutional] 622 | batch_normalize=1 623 | filters=256 624 | size=1 625 | stride=1 626 | pad=1 627 | activation=leaky 628 | 629 | [upsample] 630 | stride=2 631 | 632 | [route] 633 | layers = -1, 61 634 | 635 | 636 | 637 | [convolutional] 638 | batch_normalize=1 639 | filters=256 640 | size=1 641 | stride=1 642 | pad=1 643 | activation=leaky 644 | 645 | [convolutional] 646 | batch_normalize=1 647 | size=3 648 | stride=1 649 | pad=1 650 | filters=512 651 | activation=leaky 652 | 653 | [convolutional] 654 | batch_normalize=1 655 | filters=256 656 | size=1 657 | stride=1 658 | pad=1 659 | activation=leaky 660 | 661 | [convolutional] 662 | batch_normalize=1 663 | size=3 664 | stride=1 665 | pad=1 666 | filters=512 667 | activation=leaky 668 | 669 | [convolutional] 670 | batch_normalize=1 671 | filters=256 672 | size=1 673 | stride=1 674 | pad=1 675 | activation=leaky 676 | 677 | [convolutional] 678 | batch_normalize=1 679 | size=3 680 | stride=1 681 | pad=1 682 | filters=512 683 | activation=leaky 684 | 685 | [convolutional] 686 | size=1 687 | stride=1 688 | pad=1 689 | filters=504 690 | activation=linear 691 | 692 | 693 | [yolo] 694 | mask = 72-143 695 | anchors = 792, 2061, 3870, 6353, 9623, 15803 / 4.18, 6.48, 8.71 / -75, -60, -45, -30, -15 ,0,15, 30,45, 60,75, 90 696 | classes=1 697 | num=9 698 | jitter=.3 699 | ignore_thresh = .7 700 | truth_thresh = 1 701 | random=1 702 | 703 | 704 | 705 | [route] 706 | layers = -4 707 | 708 | [convolutional] 709 | batch_normalize=1 710 | filters=128 711 | size=1 712 | stride=1 713 | pad=1 714 | activation=leaky 715 | 716 | [upsample] 717 | stride=2 718 | 719 | [route] 720 | layers = -1, 36 721 | 722 | 723 | 724 | [convolutional] 725 | batch_normalize=1 726 | filters=128 727 | size=1 728 | stride=1 729 | pad=1 730 | activation=leaky 731 | 732 | [convolutional] 733 | batch_normalize=1 734 | size=3 735 | stride=1 736 | pad=1 737 | filters=256 738 | activation=leaky 739 | 740 | [convolutional] 741 | batch_normalize=1 742 | filters=128 743 | size=1 744 | stride=1 745 | pad=1 746 | activation=leaky 747 | 748 | [convolutional] 749 | batch_normalize=1 750 | size=3 751 | stride=1 752 | pad=1 753 | filters=256 754 | activation=leaky 755 | 756 | [convolutional] 757 | batch_normalize=1 758 | filters=128 759 | size=1 760 | stride=1 761 | pad=1 762 | activation=leaky 763 | 764 | [convolutional] 765 | batch_normalize=1 766 | size=3 767 | stride=1 768 | pad=1 769 | filters=256 770 | activation=leaky 771 | 772 | [convolutional] 773 | size=1 774 | stride=1 775 | pad=1 776 | filters=504 777 | activation=linear 778 | 779 | 780 | [yolo] 781 | mask = 0-71 782 | anchors = 792, 2061, 3870, 6353, 9623, 15803 / 4.18, 6.48, 8.71 / -75, -60, -45, -30, -15 ,0,15, 30,45, 60,75, 90 783 | classes=1 784 | num=9 785 | jitter=.3 786 | ignore_thresh = .7 787 | truth_thresh = 1 788 | random=1 789 | -------------------------------------------------------------------------------- /data/IC_eval/ic15/rrc_evaluation_funcs.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/data/IC_eval/ic15/rrc_evaluation_funcs.pyc -------------------------------------------------------------------------------- /data/coco.data: -------------------------------------------------------------------------------- 1 | classes=80 2 | train=../coco/trainvalno5k.txt 3 | valid=../coco/5k.txt 4 | names=data/coco.names 5 | backup=backup/ 6 | eval=coco 7 | -------------------------------------------------------------------------------- /data/coco.names: -------------------------------------------------------------------------------- 1 | person 2 | bicycle 3 | car 4 | motorcycle 5 | airplane 6 | bus 7 | train 8 | truck 9 | boat 10 | traffic light 11 | fire hydrant 12 | stop sign 13 | parking meter 14 | bench 15 | bird 16 | cat 17 | dog 18 | horse 19 | sheep 20 | cow 21 | elephant 22 | bear 23 | zebra 24 | giraffe 25 | backpack 26 | umbrella 27 | handbag 28 | tie 29 | suitcase 30 | frisbee 31 | skis 32 | snowboard 33 | sports ball 34 | kite 35 | baseball bat 36 | baseball glove 37 | skateboard 38 | surfboard 39 | tennis racket 40 | bottle 41 | wine glass 42 | cup 43 | fork 44 | knife 45 | spoon 46 | bowl 47 | banana 48 | apple 49 | sandwich 50 | orange 51 | broccoli 52 | carrot 53 | hot dog 54 | pizza 55 | donut 56 | cake 57 | chair 58 | couch 59 | potted plant 60 | bed 61 | dining table 62 | toilet 63 | tv 64 | laptop 65 | mouse 66 | remote 67 | keyboard 68 | cell phone 69 | microwave 70 | oven 71 | toaster 72 | sink 73 | refrigerator 74 | book 75 | clock 76 | vase 77 | scissors 78 | teddy bear 79 | hair drier 80 | toothbrush 81 | -------------------------------------------------------------------------------- /data/hrsc.data: -------------------------------------------------------------------------------- 1 | classes=1 2 | train=/py/datasets/HRSC2016/yolo-dataset/single-train.txt 3 | valid=/py/datasets/HRSC2016/yolo-dataset/single-train.txt 4 | names=data/hrsc.name 5 | backup=backup/ 6 | eval=coco 7 | -------------------------------------------------------------------------------- /data/hrsc.name: -------------------------------------------------------------------------------- 1 | ship 2 | -------------------------------------------------------------------------------- /data/icdar.name: -------------------------------------------------------------------------------- 1 | text 2 | -------------------------------------------------------------------------------- /data/icdar_13+15.data: -------------------------------------------------------------------------------- 1 | classes=1 2 | train=/py/datasets/ICDAR2015/yolo/13+15/train.txt 3 | valid=/py/datasets/ICDAR2015/yolo/13+15/val.txt 4 | names=data/icdar.name 5 | backup=backup/ 6 | eval=coco 7 | -------------------------------------------------------------------------------- /data/icdar_15.data: -------------------------------------------------------------------------------- 1 | classes=1 2 | train=/py/datasets/ICDAR2015/yolo/train.txt 3 | valid=/py/datasets/ICDAR2015/yolo/val.txt 4 | names=data/icdar.name 5 | backup=backup/ 6 | eval=coco 7 | -------------------------------------------------------------------------------- /data/icdar_15_all.data: -------------------------------------------------------------------------------- 1 | classes=1 2 | train=/py/datasets/ICDAR2015/yolo/care_all/train.txt 3 | valid=/py/datasets/ICDAR2015/yolo/care_all/val.txt 4 | names=data/icdar.name 5 | backup=backup/ 6 | eval=coco 7 | -------------------------------------------------------------------------------- /demo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/demo.png -------------------------------------------------------------------------------- /experiment/HRSC/hyp.py: -------------------------------------------------------------------------------- 1 | giou: 0.1 # giou loss gain 1.582 2 | cls: 27.76 # cls loss gain (CE=~1.0, uCE=~20) 3 | cls_pw: 1.446 # cls BCELoss positive_weight 4 | obj: 20.35 # obj loss gain (*=80 for uBCE with 80 classes) 5 | obj_pw: 3.941 # obj BCELoss positive_weight 6 | iou_t: 0.1 # iou training threshold 7 | ang_t: 3.1415926/12 8 | reg: 1.0 9 | lr0: 0.00005 10 | multiplier:10 11 | lrf: -4. # final LambdaLR learning rate = lr0 * (10 ** lrf) 12 | momentum: 0.97 # SGD momentum 13 | weight_decay: 0.0004569 # optimizer weight decay 14 | fl_gamma: 0.5 # focal loss gamma 15 | hsv_s: 0.5 # image HSV-Saturation augmentation (fraction) 16 | hsv_v: 0.3 # image HSV-Value augmentation (fraction) 17 | degrees: 5.0 # image rotation (+/- deg) 18 | translate': 0.1 # image translation (+/- fraction) 19 | scale: 0.2 # image scale (+/- gain) 20 | shear: 0.5 21 | gamma:0.3 22 | blur:2.0 23 | noise:0.02 24 | contrast:0.3 25 | sharpen:0.3 -------------------------------------------------------------------------------- /experiment/HRSC/mul-scale/hyp.txt: -------------------------------------------------------------------------------- 1 | # ryolo 2 | # hyp = {'giou': 0.1, # giou loss gain 1.582 3 | # 'cls': 27.76, # cls loss gain (CE=~1.0, uCE=~20) 4 | # 'cls_pw': 1.446, # cls BCELoss positive_weight 5 | # 'obj': 20.35, # obj loss gain (*=80 for uBCE with 80 classes) 6 | # 'obj_pw': 3.941, # obj BCELoss positive_weight 7 | # 'iou_t': 0.5, # iou training threshold 8 | # 'ang_t': 3.1415926/6, 9 | # 'reg': 1.0, 10 | # # 'lr0': 0.002324, # initial learning rate (SGD=1E-3, Adam=9E-5) 11 | # 'lr0': 0.00005, 12 | # 'multiplier':10, 13 | # 'lrf': -4., # final LambdaLR learning rate = lr0 * (10 ** lrf) 14 | # 'momentum': 0.97, # SGD momentum 15 | # 'weight_decay': 0.0004569, # optimizer weight decay 16 | # 'fl_gamma': 0.5, # focal loss gamma 17 | # 'hsv_s': 0.5, # image HSV-Saturation augmentation (fraction) 18 | # 'hsv_v': 0.3, # image HSV-Value augmentation (fraction) 19 | # 'degrees': 5.0, # image rotation (+/- deg) 20 | # 'translate': 0.1, # image translation (+/- fraction) 21 | # 'scale': 0.2, # image scale (+/- gain) 22 | # 'shear': 0.5, 23 | # 'gamma':0.3, 24 | # 'blur':2.0, 25 | # 'noise':0.02, 26 | # 'contrast':0.3, 27 | # 'sharpen':0.3, 28 | # } 29 | 30 | -------------------------------------------------------------------------------- /experiment/IC15/ablation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/experiment/IC15/ablation.png -------------------------------------------------------------------------------- /experiment/ga-attention(_4).png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/experiment/ga-attention(_4).png -------------------------------------------------------------------------------- /experiment/tiny_test_gax4_o8_dh.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/experiment/tiny_test_gax4_o8_dh.png -------------------------------------------------------------------------------- /make.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | cd model/layer/ORN 3 | ./make.sh 4 | 5 | cd ../../layer/DCNv2 6 | ./make.sh 7 | 8 | cd ../../../utils/nms 9 | ./make.sh 10 | -------------------------------------------------------------------------------- /model/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/model/__init__.py -------------------------------------------------------------------------------- /model/layer/DCNv2/.gitignore: -------------------------------------------------------------------------------- 1 | .vscode 2 | .idea 3 | *.so 4 | *.o 5 | *pyc 6 | _ext 7 | build 8 | DCNv2.egg-info 9 | dist -------------------------------------------------------------------------------- /model/layer/DCNv2/__init__.py: -------------------------------------------------------------------------------- 1 | from .dcn_v2 import DCN 2 | 3 | __all__ = ['DCN'] -------------------------------------------------------------------------------- /model/layer/DCNv2/dcn_test.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | from __future__ import absolute_import 3 | from __future__ import print_function 4 | from __future__ import division 5 | 6 | import time 7 | import torch 8 | import torch.nn as nn 9 | from torch.autograd import gradcheck 10 | 11 | from dcn_v2 import dcn_v2_conv, DCNv2, DCN 12 | from dcn_v2 import dcn_v2_pooling, DCNv2Pooling, DCNPooling 13 | 14 | deformable_groups = 1 15 | N, inC, inH, inW = 2, 2, 4, 4 16 | outC = 2 17 | kH, kW = 3, 3 18 | 19 | 20 | def conv_identify(weight, bias): 21 | weight.data.zero_() 22 | bias.data.zero_() 23 | o, i, h, w = weight.shape 24 | y = h//2 25 | x = w//2 26 | for p in range(i): 27 | for q in range(o): 28 | if p == q: 29 | weight.data[q, p, y, x] = 1.0 30 | 31 | 32 | def check_zero_offset(): 33 | conv_offset = nn.Conv2d(inC, deformable_groups * 2 * kH * kW, 34 | kernel_size=(kH, kW), 35 | stride=(1, 1), 36 | padding=(1, 1), 37 | bias=True).cuda() 38 | 39 | conv_mask = nn.Conv2d(inC, deformable_groups * 1 * kH * kW, 40 | kernel_size=(kH, kW), 41 | stride=(1, 1), 42 | padding=(1, 1), 43 | bias=True).cuda() 44 | 45 | dcn_v2 = DCNv2(inC, outC, (kH, kW), 46 | stride=1, padding=1, dilation=1, 47 | deformable_groups=deformable_groups).cuda() 48 | 49 | conv_offset.weight.data.zero_() 50 | conv_offset.bias.data.zero_() 51 | conv_mask.weight.data.zero_() 52 | conv_mask.bias.data.zero_() 53 | conv_identify(dcn_v2.weight, dcn_v2.bias) 54 | 55 | input = torch.randn(N, inC, inH, inW).cuda() 56 | offset = conv_offset(input) 57 | mask = conv_mask(input) 58 | mask = torch.sigmoid(mask) 59 | output = dcn_v2(input, offset, mask) 60 | output *= 2 61 | d = (input - output).abs().max() 62 | if d < 1e-10: 63 | print('Zero offset passed') 64 | else: 65 | print('Zero offset failed') 66 | print(input) 67 | print(output) 68 | 69 | def check_gradient_dconv(): 70 | 71 | input = torch.rand(N, inC, inH, inW).cuda() * 0.01 72 | input.requires_grad = True 73 | 74 | offset = torch.randn(N, deformable_groups * 2 * kW * kH, inH, inW).cuda() * 2 75 | # offset.data.zero_() 76 | # offset.data -= 0.5 77 | offset.requires_grad = True 78 | 79 | mask = torch.rand(N, deformable_groups * 1 * kW * kH, inH, inW).cuda() 80 | # mask.data.zero_() 81 | mask.requires_grad = True 82 | mask = torch.sigmoid(mask) 83 | 84 | weight = torch.randn(outC, inC, kH, kW).cuda() 85 | weight.requires_grad = True 86 | 87 | bias = torch.rand(outC).cuda() 88 | bias.requires_grad = True 89 | 90 | stride = 1 91 | padding = 1 92 | dilation = 1 93 | 94 | print('check_gradient_dconv: ', 95 | gradcheck(dcn_v2_conv, (input, offset, mask, weight, bias, 96 | stride, padding, dilation, deformable_groups), 97 | eps=1e-3, atol=1e-4, rtol=1e-2)) 98 | 99 | 100 | def check_pooling_zero_offset(): 101 | 102 | input = torch.randn(2, 16, 64, 64).cuda().zero_() 103 | input[0, :, 16:26, 16:26] = 1. 104 | input[1, :, 10:20, 20:30] = 2. 105 | rois = torch.tensor([ 106 | [0, 65, 65, 103, 103], 107 | [1, 81, 41, 119, 79], 108 | ]).cuda().float() 109 | pooling = DCNv2Pooling(spatial_scale=1.0 / 4, 110 | pooled_size=7, 111 | output_dim=16, 112 | no_trans=True, 113 | group_size=1, 114 | trans_std=0.0).cuda() 115 | 116 | out = pooling(input, rois, input.new()) 117 | s = ', '.join(['%f' % out[i, :, :, :].mean().item() 118 | for i in range(rois.shape[0])]) 119 | print(s) 120 | 121 | dpooling = DCNv2Pooling(spatial_scale=1.0 / 4, 122 | pooled_size=7, 123 | output_dim=16, 124 | no_trans=False, 125 | group_size=1, 126 | trans_std=0.0).cuda() 127 | offset = torch.randn(20, 2, 7, 7).cuda().zero_() 128 | dout = dpooling(input, rois, offset) 129 | s = ', '.join(['%f' % dout[i, :, :, :].mean().item() 130 | for i in range(rois.shape[0])]) 131 | print(s) 132 | 133 | 134 | def check_gradient_dpooling(): 135 | input = torch.randn(2, 3, 5, 5).cuda() * 0.01 136 | N = 4 137 | batch_inds = torch.randint(2, (N, 1)).cuda().float() 138 | x = torch.rand((N, 1)).cuda().float() * 15 139 | y = torch.rand((N, 1)).cuda().float() * 15 140 | w = torch.rand((N, 1)).cuda().float() * 10 141 | h = torch.rand((N, 1)).cuda().float() * 10 142 | rois = torch.cat((batch_inds, x, y, x + w, y + h), dim=1) 143 | offset = torch.randn(N, 2, 3, 3).cuda() 144 | input.requires_grad = True 145 | offset.requires_grad = True 146 | 147 | spatial_scale = 1.0 / 4 148 | pooled_size = 3 149 | output_dim = 3 150 | no_trans = 0 151 | group_size = 1 152 | trans_std = 0.0 153 | sample_per_part = 4 154 | part_size = pooled_size 155 | 156 | print('check_gradient_dpooling:', 157 | gradcheck(dcn_v2_pooling, (input, rois, offset, 158 | spatial_scale, 159 | pooled_size, 160 | output_dim, 161 | no_trans, 162 | group_size, 163 | part_size, 164 | sample_per_part, 165 | trans_std), 166 | eps=1e-4)) 167 | 168 | 169 | def example_dconv(): 170 | input = torch.randn(2, 64, 128, 128).cuda() 171 | # wrap all things (offset and mask) in DCN 172 | dcn = DCN(64, 64, kernel_size=(3, 3), stride=1, 173 | padding=1, deformable_groups=2).cuda() 174 | # print(dcn.weight.shape, input.shape) 175 | output = dcn(input) 176 | targert = output.new(*output.size()) 177 | targert.data.uniform_(-0.01, 0.01) 178 | error = (targert - output).mean() 179 | error.backward() 180 | print(output.shape) 181 | 182 | 183 | def example_dpooling(): 184 | input = torch.randn(2, 32, 64, 64).cuda() 185 | batch_inds = torch.randint(2, (20, 1)).cuda().float() 186 | x = torch.randint(256, (20, 1)).cuda().float() 187 | y = torch.randint(256, (20, 1)).cuda().float() 188 | w = torch.randint(64, (20, 1)).cuda().float() 189 | h = torch.randint(64, (20, 1)).cuda().float() 190 | rois = torch.cat((batch_inds, x, y, x + w, y + h), dim=1) 191 | offset = torch.randn(20, 2, 7, 7).cuda() 192 | input.requires_grad = True 193 | offset.requires_grad = True 194 | 195 | # normal roi_align 196 | pooling = DCNv2Pooling(spatial_scale=1.0 / 4, 197 | pooled_size=7, 198 | output_dim=32, 199 | no_trans=True, 200 | group_size=1, 201 | trans_std=0.1).cuda() 202 | 203 | # deformable pooling 204 | dpooling = DCNv2Pooling(spatial_scale=1.0 / 4, 205 | pooled_size=7, 206 | output_dim=32, 207 | no_trans=False, 208 | group_size=1, 209 | trans_std=0.1).cuda() 210 | 211 | out = pooling(input, rois, offset) 212 | dout = dpooling(input, rois, offset) 213 | print(out.shape) 214 | print(dout.shape) 215 | 216 | target_out = out.new(*out.size()) 217 | target_out.data.uniform_(-0.01, 0.01) 218 | target_dout = dout.new(*dout.size()) 219 | target_dout.data.uniform_(-0.01, 0.01) 220 | e = (target_out - out).mean() 221 | e.backward() 222 | e = (target_dout - dout).mean() 223 | e.backward() 224 | 225 | 226 | def example_mdpooling(): 227 | input = torch.randn(2, 32, 64, 64).cuda() 228 | input.requires_grad = True 229 | batch_inds = torch.randint(2, (20, 1)).cuda().float() 230 | x = torch.randint(256, (20, 1)).cuda().float() 231 | y = torch.randint(256, (20, 1)).cuda().float() 232 | w = torch.randint(64, (20, 1)).cuda().float() 233 | h = torch.randint(64, (20, 1)).cuda().float() 234 | rois = torch.cat((batch_inds, x, y, x + w, y + h), dim=1) 235 | 236 | # mdformable pooling (V2) 237 | dpooling = DCNPooling(spatial_scale=1.0 / 4, 238 | pooled_size=7, 239 | output_dim=32, 240 | no_trans=False, 241 | group_size=1, 242 | trans_std=0.1, 243 | deform_fc_dim=1024).cuda() 244 | 245 | dout = dpooling(input, rois) 246 | target = dout.new(*dout.size()) 247 | target.data.uniform_(-0.1, 0.1) 248 | error = (target - dout).mean() 249 | error.backward() 250 | print(dout.shape) 251 | 252 | 253 | if __name__ == '__main__': 254 | 255 | example_dconv() 256 | # example_dpooling() 257 | # example_mdpooling() 258 | 259 | # check_pooling_zero_offset() 260 | # zero offset check 261 | # if inC == outC: 262 | # check_zero_offset() 263 | 264 | # check_gradient_dpooling() 265 | # check_gradient_dconv() 266 | # """ 267 | # ****** Note: backward is not reentrant error may not be a serious problem, 268 | # ****** since the max error is less than 1e-7, 269 | # ****** Still looking for what trigger this problem 270 | # """ 271 | -------------------------------------------------------------------------------- /model/layer/DCNv2/make.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | rm -rf build 4 | python setup.py clean && python setup.py build develop 5 | -------------------------------------------------------------------------------- /model/layer/DCNv2/setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import os 4 | import glob 5 | 6 | import torch 7 | 8 | from torch.utils.cpp_extension import CUDA_HOME 9 | from torch.utils.cpp_extension import CppExtension 10 | from torch.utils.cpp_extension import CUDAExtension 11 | 12 | from setuptools import find_packages 13 | from setuptools import setup 14 | 15 | requirements = ["torch", "torchvision"] 16 | 17 | def get_extensions(): 18 | this_dir = os.path.dirname(os.path.abspath(__file__)) 19 | extensions_dir = os.path.join(this_dir, "src") 20 | 21 | main_file = glob.glob(os.path.join(extensions_dir, "*.cpp")) 22 | source_cpu = glob.glob(os.path.join(extensions_dir, "cpu", "*.cpp")) 23 | source_cuda = glob.glob(os.path.join(extensions_dir, "cuda", "*.cu")) 24 | 25 | sources = main_file + source_cpu 26 | extension = CppExtension 27 | extra_compile_args = {"cxx": []} 28 | define_macros = [] 29 | 30 | if torch.cuda.is_available() and CUDA_HOME is not None: 31 | extension = CUDAExtension 32 | sources += source_cuda 33 | define_macros += [("WITH_CUDA", None)] 34 | extra_compile_args["nvcc"] = [ 35 | "-DCUDA_HAS_FP16=1", 36 | "-D__CUDA_NO_HALF_OPERATORS__", 37 | "-D__CUDA_NO_HALF_CONVERSIONS__", 38 | "-D__CUDA_NO_HALF2_OPERATORS__", 39 | ] 40 | else: 41 | raise NotImplementedError('Cuda is not availabel') 42 | 43 | sources = [os.path.join(extensions_dir, s) for s in sources] 44 | include_dirs = [extensions_dir] 45 | ext_modules = [ 46 | extension( 47 | "_ext", 48 | sources, 49 | include_dirs=include_dirs, 50 | define_macros=define_macros, 51 | extra_compile_args=extra_compile_args, 52 | ) 53 | ] 54 | return ext_modules 55 | 56 | setup( 57 | name="DCNv2", 58 | version="0.1", 59 | author="charlesshang", 60 | url="https://github.com/charlesshang/DCNv2", 61 | description="deformable convolutional networks", 62 | packages=find_packages(exclude=("configs", "tests",)), 63 | # install_requires=requirements, 64 | ext_modules=get_extensions(), 65 | cmdclass={"build_ext": torch.utils.cpp_extension.BuildExtension}, 66 | ) -------------------------------------------------------------------------------- /model/layer/DCNv2/src/cpu/dcn_v2_cpu.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | #include 4 | #include 5 | 6 | 7 | at::Tensor 8 | dcn_v2_cpu_forward(const at::Tensor &input, 9 | const at::Tensor &weight, 10 | const at::Tensor &bias, 11 | const at::Tensor &offset, 12 | const at::Tensor &mask, 13 | const int kernel_h, 14 | const int kernel_w, 15 | const int stride_h, 16 | const int stride_w, 17 | const int pad_h, 18 | const int pad_w, 19 | const int dilation_h, 20 | const int dilation_w, 21 | const int deformable_group) 22 | { 23 | AT_ERROR("Not implement on cpu"); 24 | } 25 | 26 | std::vector 27 | dcn_v2_cpu_backward(const at::Tensor &input, 28 | const at::Tensor &weight, 29 | const at::Tensor &bias, 30 | const at::Tensor &offset, 31 | const at::Tensor &mask, 32 | const at::Tensor &grad_output, 33 | int kernel_h, int kernel_w, 34 | int stride_h, int stride_w, 35 | int pad_h, int pad_w, 36 | int dilation_h, int dilation_w, 37 | int deformable_group) 38 | { 39 | AT_ERROR("Not implement on cpu"); 40 | } 41 | 42 | std::tuple 43 | dcn_v2_psroi_pooling_cpu_forward(const at::Tensor &input, 44 | const at::Tensor &bbox, 45 | const at::Tensor &trans, 46 | const int no_trans, 47 | const float spatial_scale, 48 | const int output_dim, 49 | const int group_size, 50 | const int pooled_size, 51 | const int part_size, 52 | const int sample_per_part, 53 | const float trans_std) 54 | { 55 | AT_ERROR("Not implement on cpu"); 56 | } 57 | 58 | std::tuple 59 | dcn_v2_psroi_pooling_cpu_backward(const at::Tensor &out_grad, 60 | const at::Tensor &input, 61 | const at::Tensor &bbox, 62 | const at::Tensor &trans, 63 | const at::Tensor &top_count, 64 | const int no_trans, 65 | const float spatial_scale, 66 | const int output_dim, 67 | const int group_size, 68 | const int pooled_size, 69 | const int part_size, 70 | const int sample_per_part, 71 | const float trans_std) 72 | { 73 | AT_ERROR("Not implement on cpu"); 74 | } -------------------------------------------------------------------------------- /model/layer/DCNv2/src/cpu/vision.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | #include 3 | 4 | at::Tensor 5 | dcn_v2_cpu_forward(const at::Tensor &input, 6 | const at::Tensor &weight, 7 | const at::Tensor &bias, 8 | const at::Tensor &offset, 9 | const at::Tensor &mask, 10 | const int kernel_h, 11 | const int kernel_w, 12 | const int stride_h, 13 | const int stride_w, 14 | const int pad_h, 15 | const int pad_w, 16 | const int dilation_h, 17 | const int dilation_w, 18 | const int deformable_group); 19 | 20 | std::vector 21 | dcn_v2_cpu_backward(const at::Tensor &input, 22 | const at::Tensor &weight, 23 | const at::Tensor &bias, 24 | const at::Tensor &offset, 25 | const at::Tensor &mask, 26 | const at::Tensor &grad_output, 27 | int kernel_h, int kernel_w, 28 | int stride_h, int stride_w, 29 | int pad_h, int pad_w, 30 | int dilation_h, int dilation_w, 31 | int deformable_group); 32 | 33 | 34 | std::tuple 35 | dcn_v2_psroi_pooling_cpu_forward(const at::Tensor &input, 36 | const at::Tensor &bbox, 37 | const at::Tensor &trans, 38 | const int no_trans, 39 | const float spatial_scale, 40 | const int output_dim, 41 | const int group_size, 42 | const int pooled_size, 43 | const int part_size, 44 | const int sample_per_part, 45 | const float trans_std); 46 | 47 | std::tuple 48 | dcn_v2_psroi_pooling_cpu_backward(const at::Tensor &out_grad, 49 | const at::Tensor &input, 50 | const at::Tensor &bbox, 51 | const at::Tensor &trans, 52 | const at::Tensor &top_count, 53 | const int no_trans, 54 | const float spatial_scale, 55 | const int output_dim, 56 | const int group_size, 57 | const int pooled_size, 58 | const int part_size, 59 | const int sample_per_part, 60 | const float trans_std); -------------------------------------------------------------------------------- /model/layer/DCNv2/src/cuda/dcn_v2_im2col_cuda.h: -------------------------------------------------------------------------------- 1 | 2 | /*! 3 | ******************* BEGIN Caffe Copyright Notice and Disclaimer **************** 4 | * 5 | * COPYRIGHT 6 | * 7 | * All contributions by the University of California: 8 | * Copyright (c) 2014-2017 The Regents of the University of California (Regents) 9 | * All rights reserved. 10 | * 11 | * All other contributions: 12 | * Copyright (c) 2014-2017, the respective contributors 13 | * All rights reserved. 14 | * 15 | * Caffe uses a shared copyright model: each contributor holds copyright over 16 | * their contributions to Caffe. The project versioning records all such 17 | * contribution and copyright details. If a contributor wants to further mark 18 | * their specific copyright on a particular contribution, they should indicate 19 | * their copyright solely in the commit message of the change when it is 20 | * committed. 21 | * 22 | * LICENSE 23 | * 24 | * Redistribution and use in source and binary forms, with or without 25 | * modification, are permitted provided that the following conditions are met: 26 | * 27 | * 1. Redistributions of source code must retain the above copyright notice, this 28 | * list of conditions and the following disclaimer. 29 | * 2. Redistributions in binary form must reproduce the above copyright notice, 30 | * this list of conditions and the following disclaimer in the documentation 31 | * and/or other materials provided with the distribution. 32 | * 33 | * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 34 | * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 35 | * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 36 | * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 37 | * ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 38 | * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 39 | * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 40 | * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 41 | * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 42 | * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 43 | * 44 | * CONTRIBUTION AGREEMENT 45 | * 46 | * By contributing to the BVLC/caffe repository through pull-request, comment, 47 | * or otherwise, the contributor releases their content to the 48 | * license and copyright terms herein. 49 | * 50 | ***************** END Caffe Copyright Notice and Disclaimer ******************** 51 | * 52 | * Copyright (c) 2018 Microsoft 53 | * Licensed under The MIT License [see LICENSE for details] 54 | * \file modulated_deformable_im2col.h 55 | * \brief Function definitions of converting an image to 56 | * column matrix based on kernel, padding, dilation, and offset. 57 | * These functions are mainly used in deformable convolution operators. 58 | * \ref: https://arxiv.org/abs/1811.11168 59 | * \author Yuwen Xiong, Haozhi Qi, Jifeng Dai, Xizhou Zhu, Han Hu 60 | */ 61 | 62 | /***************** Adapted by Charles Shang *********************/ 63 | 64 | #ifndef DCN_V2_IM2COL_CUDA 65 | #define DCN_V2_IM2COL_CUDA 66 | 67 | #ifdef __cplusplus 68 | extern "C" 69 | { 70 | #endif 71 | 72 | void modulated_deformable_im2col_cuda(cudaStream_t stream, 73 | const float *data_im, const float *data_offset, const float *data_mask, 74 | const int batch_size, const int channels, const int height_im, const int width_im, 75 | const int height_col, const int width_col, const int kernel_h, const int kenerl_w, 76 | const int pad_h, const int pad_w, const int stride_h, const int stride_w, 77 | const int dilation_h, const int dilation_w, 78 | const int deformable_group, float *data_col); 79 | 80 | void modulated_deformable_col2im_cuda(cudaStream_t stream, 81 | const float *data_col, const float *data_offset, const float *data_mask, 82 | const int batch_size, const int channels, const int height_im, const int width_im, 83 | const int height_col, const int width_col, const int kernel_h, const int kenerl_w, 84 | const int pad_h, const int pad_w, const int stride_h, const int stride_w, 85 | const int dilation_h, const int dilation_w, 86 | const int deformable_group, float *grad_im); 87 | 88 | void modulated_deformable_col2im_coord_cuda(cudaStream_t stream, 89 | const float *data_col, const float *data_im, const float *data_offset, const float *data_mask, 90 | const int batch_size, const int channels, const int height_im, const int width_im, 91 | const int height_col, const int width_col, const int kernel_h, const int kenerl_w, 92 | const int pad_h, const int pad_w, const int stride_h, const int stride_w, 93 | const int dilation_h, const int dilation_w, 94 | const int deformable_group, 95 | float *grad_offset, float *grad_mask); 96 | 97 | #ifdef __cplusplus 98 | } 99 | #endif 100 | 101 | #endif -------------------------------------------------------------------------------- /model/layer/DCNv2/src/cuda/vision.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | #include 3 | 4 | at::Tensor 5 | dcn_v2_cuda_forward(const at::Tensor &input, 6 | const at::Tensor &weight, 7 | const at::Tensor &bias, 8 | const at::Tensor &offset, 9 | const at::Tensor &mask, 10 | const int kernel_h, 11 | const int kernel_w, 12 | const int stride_h, 13 | const int stride_w, 14 | const int pad_h, 15 | const int pad_w, 16 | const int dilation_h, 17 | const int dilation_w, 18 | const int deformable_group); 19 | 20 | std::vector 21 | dcn_v2_cuda_backward(const at::Tensor &input, 22 | const at::Tensor &weight, 23 | const at::Tensor &bias, 24 | const at::Tensor &offset, 25 | const at::Tensor &mask, 26 | const at::Tensor &grad_output, 27 | int kernel_h, int kernel_w, 28 | int stride_h, int stride_w, 29 | int pad_h, int pad_w, 30 | int dilation_h, int dilation_w, 31 | int deformable_group); 32 | 33 | 34 | std::tuple 35 | dcn_v2_psroi_pooling_cuda_forward(const at::Tensor &input, 36 | const at::Tensor &bbox, 37 | const at::Tensor &trans, 38 | const int no_trans, 39 | const float spatial_scale, 40 | const int output_dim, 41 | const int group_size, 42 | const int pooled_size, 43 | const int part_size, 44 | const int sample_per_part, 45 | const float trans_std); 46 | 47 | std::tuple 48 | dcn_v2_psroi_pooling_cuda_backward(const at::Tensor &out_grad, 49 | const at::Tensor &input, 50 | const at::Tensor &bbox, 51 | const at::Tensor &trans, 52 | const at::Tensor &top_count, 53 | const int no_trans, 54 | const float spatial_scale, 55 | const int output_dim, 56 | const int group_size, 57 | const int pooled_size, 58 | const int part_size, 59 | const int sample_per_part, 60 | const float trans_std); -------------------------------------------------------------------------------- /model/layer/DCNv2/src/dcn_v2.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include "cpu/vision.h" 4 | 5 | #ifdef WITH_CUDA 6 | #include "cuda/vision.h" 7 | #endif 8 | 9 | at::Tensor 10 | dcn_v2_forward(const at::Tensor &input, 11 | const at::Tensor &weight, 12 | const at::Tensor &bias, 13 | const at::Tensor &offset, 14 | const at::Tensor &mask, 15 | const int kernel_h, 16 | const int kernel_w, 17 | const int stride_h, 18 | const int stride_w, 19 | const int pad_h, 20 | const int pad_w, 21 | const int dilation_h, 22 | const int dilation_w, 23 | const int deformable_group) 24 | { 25 | if (input.type().is_cuda()) 26 | { 27 | #ifdef WITH_CUDA 28 | return dcn_v2_cuda_forward(input, weight, bias, offset, mask, 29 | kernel_h, kernel_w, 30 | stride_h, stride_w, 31 | pad_h, pad_w, 32 | dilation_h, dilation_w, 33 | deformable_group); 34 | #else 35 | AT_ERROR("Not compiled with GPU support"); 36 | #endif 37 | } 38 | AT_ERROR("Not implemented on the CPU"); 39 | } 40 | 41 | std::vector 42 | dcn_v2_backward(const at::Tensor &input, 43 | const at::Tensor &weight, 44 | const at::Tensor &bias, 45 | const at::Tensor &offset, 46 | const at::Tensor &mask, 47 | const at::Tensor &grad_output, 48 | int kernel_h, int kernel_w, 49 | int stride_h, int stride_w, 50 | int pad_h, int pad_w, 51 | int dilation_h, int dilation_w, 52 | int deformable_group) 53 | { 54 | if (input.type().is_cuda()) 55 | { 56 | #ifdef WITH_CUDA 57 | return dcn_v2_cuda_backward(input, 58 | weight, 59 | bias, 60 | offset, 61 | mask, 62 | grad_output, 63 | kernel_h, kernel_w, 64 | stride_h, stride_w, 65 | pad_h, pad_w, 66 | dilation_h, dilation_w, 67 | deformable_group); 68 | #else 69 | AT_ERROR("Not compiled with GPU support"); 70 | #endif 71 | } 72 | AT_ERROR("Not implemented on the CPU"); 73 | } 74 | 75 | std::tuple 76 | dcn_v2_psroi_pooling_forward(const at::Tensor &input, 77 | const at::Tensor &bbox, 78 | const at::Tensor &trans, 79 | const int no_trans, 80 | const float spatial_scale, 81 | const int output_dim, 82 | const int group_size, 83 | const int pooled_size, 84 | const int part_size, 85 | const int sample_per_part, 86 | const float trans_std) 87 | { 88 | if (input.type().is_cuda()) 89 | { 90 | #ifdef WITH_CUDA 91 | return dcn_v2_psroi_pooling_cuda_forward(input, 92 | bbox, 93 | trans, 94 | no_trans, 95 | spatial_scale, 96 | output_dim, 97 | group_size, 98 | pooled_size, 99 | part_size, 100 | sample_per_part, 101 | trans_std); 102 | #else 103 | AT_ERROR("Not compiled with GPU support"); 104 | #endif 105 | } 106 | AT_ERROR("Not implemented on the CPU"); 107 | } 108 | 109 | std::tuple 110 | dcn_v2_psroi_pooling_backward(const at::Tensor &out_grad, 111 | const at::Tensor &input, 112 | const at::Tensor &bbox, 113 | const at::Tensor &trans, 114 | const at::Tensor &top_count, 115 | const int no_trans, 116 | const float spatial_scale, 117 | const int output_dim, 118 | const int group_size, 119 | const int pooled_size, 120 | const int part_size, 121 | const int sample_per_part, 122 | const float trans_std) 123 | { 124 | if (input.type().is_cuda()) 125 | { 126 | #ifdef WITH_CUDA 127 | return dcn_v2_psroi_pooling_cuda_backward(out_grad, 128 | input, 129 | bbox, 130 | trans, 131 | top_count, 132 | no_trans, 133 | spatial_scale, 134 | output_dim, 135 | group_size, 136 | pooled_size, 137 | part_size, 138 | sample_per_part, 139 | trans_std); 140 | #else 141 | AT_ERROR("Not compiled with GPU support"); 142 | #endif 143 | } 144 | AT_ERROR("Not implemented on the CPU"); 145 | } -------------------------------------------------------------------------------- /model/layer/DCNv2/src/vision.cpp: -------------------------------------------------------------------------------- 1 | 2 | #include "dcn_v2.h" 3 | 4 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { 5 | m.def("dcn_v2_forward", &dcn_v2_forward, "dcn_v2_forward"); 6 | m.def("dcn_v2_backward", &dcn_v2_backward, "dcn_v2_backward"); 7 | m.def("dcn_v2_psroi_pooling_forward", &dcn_v2_psroi_pooling_forward, "dcn_v2_psroi_pooling_forward"); 8 | m.def("dcn_v2_psroi_pooling_backward", &dcn_v2_psroi_pooling_backward, "dcn_v2_psroi_pooling_backward"); 9 | } 10 | -------------------------------------------------------------------------------- /model/layer/__init__.py: -------------------------------------------------------------------------------- 1 | from .DCNv2 import * 2 | from .ORN import * 3 | -------------------------------------------------------------------------------- /model/model_utils.py: -------------------------------------------------------------------------------- 1 | import torch.nn.functional as F 2 | 3 | from utils.google_utils import * 4 | from utils.parse_config import * 5 | from utils.utils import * 6 | 7 | 8 | 9 | def get_yolo_layers(model): 10 | return [i for i, x in enumerate(model.module_defs) if x['type'] == 'yolo'] # [82, 94, 106] for yolov3 11 | 12 | 13 | # 做了两件事: 14 | # - 编码grid cell的坐标 15 | # - 将anchor缩放到特征图尺度(后面在特征图上进行预测) 16 | def create_grids(self, img_size=416, ng=(13, 13), device='cpu', type=torch.float32): 17 | nx, ny = ng # x and y grid size # ng是传入的特征图宽高tuple 18 | # 计算降采样步长self.stride 32/16/8 19 | self.img_size = max(img_size) 20 | self.stride = self.img_size / max(ng) 21 | 22 | # build xy offsets 23 | # 最终结果self.grid_xy的维度为torch.Size([1, 1, 10, 13, 2]),其中10和13的维度对应的是特征图的每个点,最后的2是其上的编号 24 | # 如特征图为10*13,则构建的偏移阵列从[0,0],[1,0]...[12,0], [0,1].[1,1]...[12,1], ...[12,9] 25 | # 表示的是特征图每个像素点的位置,也就是原图的grid左上角坐标,和后面预测的cell内偏移共同表示最终预测的物体位置 26 | yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)]) 27 | self.grid_xy = torch.stack((xv, yv), 2).to(device).type(type).view((1, 1, ny, nx, 2)) 28 | 29 | # build wh gains 30 | self.anchor_vec = self.anchors.to(device) 31 | self.anchor_vec[:,:2] /= self.stride 32 | self.anchor_wh = self.anchor_vec.view(1, self.na, 1, 1, 3).to(device).type(type) # torch.Size([1, 18, 1, 1, 3]) 33 | self.ng = torch.Tensor(ng).to(device) 34 | self.nx = nx 35 | self.ny = ny 36 | 37 | 38 | def load_darknet_weights(self, weights, cutoff=-1): 39 | # Parses and loads the weights stored in 'weights' 40 | 41 | # Establish cutoffs (load layers between 0 and cutoff. if cutoff = -1 all are loaded) 42 | file = Path(weights).name 43 | if file == 'darknet53.conv.74': 44 | cutoff = 75 45 | elif file == 'yolov3-tiny.conv.15': 46 | cutoff = 15 47 | 48 | # Read weights file 49 | with open(weights, 'rb') as f: 50 | # Read Header https://github.com/AlexeyAB/darknet/issues/2914#issuecomment-496675346 51 | self.version = np.fromfile(f, dtype=np.int32, count=3) # (int32) version info: major, minor, revision 52 | self.seen = np.fromfile(f, dtype=np.int64, count=1) # (int64) number of images seen during training 53 | 54 | weights = np.fromfile(f, dtype=np.float32) # The rest are weights 55 | 56 | ptr = 0 57 | for i, (mdef, module) in enumerate(zip(self.module_defs[:cutoff], self.module_list[:cutoff])): 58 | if mdef['type'] == 'convolutional': 59 | conv_layer = module[0] 60 | if mdef['batch_normalize']: 61 | # Load BN bias, weights, running mean and running variance 62 | bn_layer = module[1] 63 | num_b = bn_layer.bias.numel() # Number of biases 64 | # Bias 65 | bn_b = torch.from_numpy(weights[ptr:ptr + num_b]).view_as(bn_layer.bias) 66 | bn_layer.bias.data.copy_(bn_b) 67 | ptr += num_b 68 | # Weight 69 | bn_w = torch.from_numpy(weights[ptr:ptr + num_b]).view_as(bn_layer.weight) 70 | bn_layer.weight.data.copy_(bn_w) 71 | ptr += num_b 72 | # Running Mean 73 | bn_rm = torch.from_numpy(weights[ptr:ptr + num_b]).view_as(bn_layer.running_mean) 74 | bn_layer.running_mean.data.copy_(bn_rm) 75 | ptr += num_b 76 | # Running Var 77 | bn_rv = torch.from_numpy(weights[ptr:ptr + num_b]).view_as(bn_layer.running_var) 78 | bn_layer.running_var.data.copy_(bn_rv) 79 | ptr += num_b 80 | else: 81 | # Load conv. bias 82 | num_b = conv_layer.bias.numel() 83 | conv_b = torch.from_numpy(weights[ptr:ptr + num_b]).view_as(conv_layer.bias) 84 | conv_layer.bias.data.copy_(conv_b) 85 | ptr += num_b 86 | # Load conv. weights 87 | num_w = conv_layer.weight.numel() 88 | conv_w = torch.from_numpy(weights[ptr:ptr + num_w]).view_as(conv_layer.weight) 89 | conv_layer.weight.data.copy_(conv_w) 90 | ptr += num_w 91 | 92 | return cutoff 93 | 94 | 95 | def save_weights(self, path='model.weights', cutoff=-1): 96 | # Converts a PyTorch model to Darket format (*.pt to *.weights) 97 | # Note: Does not work if model.fuse() is applied 98 | with open(path, 'wb') as f: 99 | # Write Header https://github.com/AlexeyAB/darknet/issues/2914#issuecomment-496675346 100 | self.version.tofile(f) # (int32) version info: major, minor, revision 101 | self.seen.tofile(f) # (int64) number of images seen during training 102 | 103 | # Iterate through layers 104 | for i, (mdef, module) in enumerate(zip(self.module_defs[:cutoff], self.module_list[:cutoff])): 105 | if mdef['type'] == 'convolutional': 106 | conv_layer = module[0] 107 | # If batch norm, load bn first 108 | if mdef['batch_normalize']: 109 | bn_layer = module[1] 110 | bn_layer.bias.data.cpu().numpy().tofile(f) 111 | bn_layer.weight.data.cpu().numpy().tofile(f) 112 | bn_layer.running_mean.data.cpu().numpy().tofile(f) 113 | bn_layer.running_var.data.cpu().numpy().tofile(f) 114 | # Load conv bias 115 | else: 116 | conv_layer.bias.data.cpu().numpy().tofile(f) 117 | # Load conv weights 118 | conv_layer.weight.data.cpu().numpy().tofile(f) 119 | 120 | 121 | def convert(cfg='cfg/yolov3-spp.cfg', weights='weights/yolov3-spp.weights'): 122 | # Converts between PyTorch and Darknet format per extension (i.e. *.weights convert to *.pt and vice versa) 123 | # from models import *; convert('cfg/yolov3-spp.cfg', 'weights/yolov3-spp.weights') 124 | 125 | # Initialize model 126 | model = Darknet(cfg) 127 | 128 | # Load weights and save 129 | if weights.endswith('.pt'): # if PyTorch format 130 | model.load_state_dict(torch.load(weights, map_location='cpu')['model']) 131 | save_weights(model, path='converted.weights', cutoff=-1) 132 | print("Success: converted '%s' to 'converted.weights'" % weights) 133 | 134 | elif weights.endswith('.weights'): # darknet format 135 | _ = load_darknet_weights(model, weights) 136 | 137 | chkpt = {'epoch': -1, 138 | 'best_fitness': None, 139 | 'training_results': None, 140 | 'model': model.state_dict(), 141 | 'optimizer': None} 142 | 143 | torch.save(chkpt, 'converted.pt') 144 | print("Success: converted '%s' to 'converted.pt'" % weights) 145 | 146 | else: 147 | print('Error: extension not supported.') 148 | 149 | # 如果weights指定的权重不存在,则下载;存在则该函数不返回直接pass 150 | def attempt_download(weights): 151 | # Attempt to download pretrained weights if not found locally 152 | msg = weights + ' missing, download from https://drive.google.com/drive/folders/1uxgUBemJVw9wZsdpboYbzUN4bcRhsuAI' 153 | if weights and not os.path.isfile(weights): # 指定路径的权值文件不存在 154 | file = Path(weights).name # 分割路径文件名 155 | 156 | if file == 'yolov3-spp.weights': 157 | gdrive_download(id='1oPCHKsM2JpM-zgyepQciGli9X0MTsJCO', name=weights) 158 | elif file == 'yolov3-spp.pt': 159 | gdrive_download(id='1vFlbJ_dXPvtwaLLOu-twnjK4exdFiQ73', name=weights) 160 | elif file == 'yolov3.pt': 161 | gdrive_download(id='11uy0ybbOXA2hc-NJkJbbbkDwNX1QZDlz', name=weights) 162 | elif file == 'yolov3-tiny.pt': 163 | gdrive_download(id='1qKSgejNeNczgNNiCn9ZF_o55GFk1DjY_', name=weights) 164 | elif file == 'darknet53.conv.74': 165 | gdrive_download(id='18xqvs_uwAqfTXp-LJCYLYNHBOcrwbrp0', name=weights) 166 | elif file == 'yolov3-tiny.conv.15': 167 | gdrive_download(id='140PnSedCsGGgu3rOD6Ez4oI6cdDzerLC', name=weights) 168 | 169 | else: 170 | try: # download from pjreddie.com 171 | url = 'https://pjreddie.com/media/files/' + file 172 | print('Downloading ' + url) 173 | os.system('curl -f ' + url + ' -o ' + weights) 174 | except IOError: 175 | print(msg) 176 | os.system('rm ' + weights) # remove partial downloads 177 | 178 | assert os.path.exists(weights), msg # download missing weights from Google Drive 179 | -------------------------------------------------------------------------------- /model/sampler_ratio.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/model/sampler_ratio.png -------------------------------------------------------------------------------- /study.txt: -------------------------------------------------------------------------------- 1 | 0.88 1 1 0.9362 0 0 0 2.87 2 | 0.9167 1 1 0.9565 0 0 0 0.821 3 | 0.9565 1 1 0.9778 0 0 0 0.8187 4 | 0.9565 1 1 0.9778 0 0 0 0.8235 5 | 1 1 1 1 0 0 0 0.8223 6 | 1 1 1 1 0 0 0 0.8243 7 | 1 1 1 1 0 0 0 0.8386 8 | 1 1 1 1 0 0 0 0.82 9 | 1 1 1 1 0 0 0 0.8239 10 | 1 1 1 1 0 0 0 0.8341 11 | -------------------------------------------------------------------------------- /test.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import json 3 | import torch 4 | 5 | from torch.utils.data import DataLoader 6 | 7 | from model.models import Darknet 8 | from model.model_utils import attempt_download, parse_data_cfg 9 | from utils.datasets import LoadImagesAndLabels 10 | from utils.utils import * 11 | from utils.parse_config import parse_model_cfg 12 | from utils.nms.r_nms import r_nms 13 | from model.loss import compute_loss 14 | from utils.nms.nms import non_max_suppression 15 | 16 | 17 | def test(cfg, 18 | data, 19 | weights=None, 20 | batch_size=16, 21 | img_size=416, 22 | iou_thres=0.5, 23 | conf_thres=0.001, 24 | nms_thres=0.5, 25 | save_json=False, 26 | hyp=None, 27 | model=None): 28 | # Initialize/load model and set device 29 | if model is None: 30 | device = torch_utils.select_device(opt.device) 31 | verbose = True 32 | 33 | # Initialize model 34 | model = Darknet(cfg, hyp).to(device) 35 | 36 | # Load weights 37 | attempt_download(weights) 38 | if weights.endswith('.pt'): # pytorch format 39 | model.load_state_dict(torch.load(weights, map_location=device)['model']) 40 | else: # darknet format 41 | _ = load_darknet_weights(model, weights) 42 | 43 | if torch.cuda.device_count() > 1: 44 | model = nn.DataParallel(model) 45 | else: 46 | device = next(model.parameters()).device # get model device 47 | verbose = False 48 | 49 | # Configure run 50 | data = parse_data_cfg(data) 51 | nc = int(data['classes']) # number of classes 52 | test_path = data['valid'] # path to test images 53 | names = load_classes(data['names']) # class names 54 | 55 | # Dataloader 56 | dataset = LoadImagesAndLabels(test_path, img_size, batch_size,augment=False, hyp=hyp) 57 | dataloader = DataLoader(dataset, 58 | batch_size=batch_size, 59 | num_workers=min([os.cpu_count(), batch_size, 16]), 60 | pin_memory=True, 61 | collate_fn=dataset.collate_fn) 62 | 63 | seen = 0 64 | model.eval() 65 | coco91class = coco80_to_coco91_class() 66 | s = ('%20s' + '%10s' * 6) % ('Class', 'Images', 'Targets', 'P', 'R', 'mAP', 'F1') 67 | p, r, f1, mp, mr, map, mf1 = 0., 0., 0., 0., 0., 0., 0. 68 | loss = torch.zeros(3) 69 | jdict, stats, ap, ap_class = [], [], [], [] 70 | for batch_i, (imgs, targets, paths, shapes) in enumerate(tqdm(dataloader, desc=s)): 71 | targets = targets.to(device) # [img_id, cls_id, x, y, w, h, a] 72 | imgs = imgs.to(device) 73 | _, _, height, width = imgs.shape # batch size, channels, height, width 74 | 75 | # Plot images with bounding boxes 76 | if batch_i == 0 and not os.path.exists('test_batch0.jpg'): 77 | plot_images(imgs=imgs, targets=targets, paths=paths, fname='test_batch0.jpg') 78 | 79 | # Run model 80 | inf_out, train_out = model(imgs) # inference and training outputs 81 | 82 | # # Compute loss 83 | # if hasattr(model, 'hyp'): # if model has loss hyperparameters 84 | # loss += compute_loss(train_out, targets, model,hyp)[1][:3].cpu() # GIoU, obj, cls 85 | 86 | # Run NMS 87 | output = non_max_suppression(inf_out, conf_thres=conf_thres, nms_thres=nms_thres) 88 | 89 | # Statistics per image 90 | for si, pred in enumerate(output): 91 | labels = targets[targets[:, 0] == si, 1:] # 当前图像的gt [cls_id, x, y, w, h, a] 92 | nl = len(labels) 93 | tcls = labels[:, 0].tolist() if nl else [] # target class 94 | seen += 1 95 | 96 | if pred is None: 97 | if nl: 98 | stats.append(([], torch.Tensor(), torch.Tensor(), tcls)) 99 | continue 100 | 101 | # Append to text file 102 | # with open('test.txt', 'a') as file: 103 | # [file.write('%11.5g' * 7 % tuple(x) + '\n') for x in pred] 104 | 105 | # Append to pycocotools JSON dictionary 106 | if save_json: 107 | # [{"image_id": 42, "category_id": 18, "bbox": [258.15, 41.29, 348.26, 243.78], "score": 0.236}, ... 108 | image_id = int(Path(paths[si]).stem.split('_')[-1]) 109 | box = pred[:, :4].clone() # xyxy 110 | scale_coords(imgs[si].shape[1:], box, shapes[si]) # to original shape 111 | box = xyxy2xywh(box) # xywh 112 | box[:, :2] -= box[:, 2:] / 2 # xy center to top-left corner 113 | for di, d in enumerate(pred): 114 | jdict.append({'image_id': image_id, 115 | 'category_id': coco91class[int(d[6])], 116 | 'bbox': [floatn(x, 3) for x in box[di]], 117 | 'score': floatn(d[4], 5)}) 118 | 119 | # Clip boxes to image bounds 120 | clip_coords(pred, (height, width)) 121 | 122 | # Assign all predictions as incorrect 123 | correct = [0] * len(pred) 124 | if nl: 125 | detected = [] 126 | tcls_tensor = labels[:, 0] 127 | 128 | # target boxes 129 | tbox = labels[:, 1:6] 130 | tbox[:, [0, 2]] *= width 131 | tbox[:, [1, 3]] *= height 132 | 133 | # Search for correct predictions遍历每个检测出的box 134 | for i, (*pbox, pconf, pcls_conf, pcls) in enumerate(pred): 135 | 136 | # Break if all targets already located in image 137 | if len(detected) == nl: 138 | break 139 | 140 | # Continue if predicted class not among image classes 141 | if pcls.item() not in tcls: 142 | continue 143 | 144 | # Best iou, index between pred and targets 145 | m = (pcls == tcls_tensor).nonzero().view(-1) 146 | iou, bi = skew_bbox_iou(pbox, tbox[m]).max(0) 147 | 148 | # If iou > threshold and class is correct mark as correct 149 | if iou > iou_thres and m[bi] not in detected: # and pcls == tcls[bi]: 150 | correct[i] = 1 151 | detected.append(m[bi]) 152 | 153 | # Append statistics (correct, conf, pcls, tcls) 154 | stats.append((correct, pred[:, 5].cpu(), pred[:, 7].cpu(), tcls)) 155 | 156 | # Compute statistics 157 | stats = [np.concatenate(x, 0) for x in list(zip(*stats))] # to numpy 158 | if len(stats): 159 | p, r, ap, f1, ap_class = ap_per_class(*stats) 160 | mp, mr, map, mf1 = p.mean(), r.mean(), ap.mean(), f1.mean() 161 | nt = np.bincount(stats[3].astype(np.int64), minlength=nc) # number of targets per class 162 | else: 163 | nt = torch.zeros(1) 164 | 165 | # Print results 166 | pf = '%20s' + '%10.3g' * 6 # print format 167 | print(pf % ('all', seen, nt.sum(), mp, mr, map, mf1)) 168 | 169 | # Print results per class 170 | if verbose and nc > 1 and len(stats): 171 | for i, c in enumerate(ap_class): 172 | print(pf % (names[c], seen, nt[c], p[i], r[i], ap[i], f1[i])) 173 | 174 | # Save JSON 175 | if save_json and map and len(jdict): 176 | try: 177 | imgIds = [int(Path(x).stem.split('_')[-1]) for x in dataset.img_files] 178 | with open('results.json', 'w') as file: 179 | json.dump(jdict, file) 180 | 181 | from pycocotools.coco import COCO 182 | from pycocotools.cocoeval import COCOeval 183 | 184 | # https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb 185 | cocoGt = COCO('../coco/annotations/instances_val2014.json') # initialize COCO ground truth api 186 | cocoDt = cocoGt.loadRes('results.json') # initialize COCO pred api 187 | 188 | cocoEval = COCOeval(cocoGt, cocoDt, 'bbox') 189 | cocoEval.params.imgIds = imgIds # [:32] # only evaluate these images 190 | cocoEval.evaluate() 191 | cocoEval.accumulate() 192 | cocoEval.summarize() 193 | map = cocoEval.stats[1] # update mAP to pycocotools mAP 194 | except: 195 | print('WARNING: missing dependency pycocotools from requirements.txt. Can not compute official COCO mAP.') 196 | 197 | # Return results 198 | maps = np.zeros(nc) + map 199 | for i, c in enumerate(ap_class): 200 | maps[c] = ap[i] 201 | return (mp, mr, map, mf1, *(loss / len(dataloader)).tolist()), maps 202 | 203 | 204 | if __name__ == '__main__': 205 | parser = argparse.ArgumentParser(prog='test.py') 206 | parser.add_argument('--hyp', type=str, default='cfg/ICDAR/hyp.py', help='hyper-parameter path') 207 | parser.add_argument('--cfg', type=str, default='cfg/ICDAR/yolov3_608_se.cfg', help='cfg file path') 208 | parser.add_argument('--data', type=str, default='data/icdar_13+15.data', help='coco.data file path') 209 | parser.add_argument('--weights', type=str, default='weights/best.pt', help='path to weights file') 210 | parser.add_argument('--batch-size', type=int, default=1, help='size of each image batch') 211 | parser.add_argument('--img-size', type=int, default=608, help='inference size (pixels)') 212 | parser.add_argument('--iou-thres', type=float, default=0.5, help='iou threshold required to qualify as detected') 213 | parser.add_argument('--conf-thres', type=float, default=0.001, help='object confidence threshold') 214 | parser.add_argument('--nms-thres', type=float, default=0.5, help='iou threshold for non-maximum suppression') 215 | parser.add_argument('--save-json', action='store_true', help='save a cocoapi-compatible JSON results file') 216 | parser.add_argument('--device', default='', help='device id (i.e. 0 or 0,1) or cpu') 217 | opt = parser.parse_args() 218 | print(opt) 219 | 220 | hyp = hyp_parse(opt.hyp) 221 | 222 | with torch.no_grad(): 223 | test(opt.cfg, 224 | opt.data, 225 | opt.weights, 226 | opt.batch_size, 227 | opt.img_size, 228 | opt.iou_thres, 229 | opt.conf_thres, 230 | opt.nms_thres, 231 | opt.save_json, 232 | hyp) 233 | -------------------------------------------------------------------------------- /utils/ICDAR/ICDAR2yolo.py: -------------------------------------------------------------------------------- 1 | # ICDAR坐标为四点多边形 2 | # 这里将其处理成近似拟合的矩形框,并且归一化得到yolo格式 3 | # 去除do not care的label 4 | 5 | import os 6 | import sys 7 | import cv2 8 | import math 9 | import numpy as np 10 | from tqdm import tqdm 11 | from decimal import Decimal 12 | 13 | 14 | # 检查异常文件并返回 15 | # 异常类型:1. xywh数值超出1(图像范围) 2. 负值(max和min标反了的) 16 | def check_exception(txt_path): 17 | files = os.listdir(txt_path) 18 | class_id = [] 19 | exception = [] 20 | for file in files: 21 | with open(os.path.join(txt_path,file),'r') as f: 22 | contents = f.read() 23 | lines = contents.split('\n') 24 | lines = [i for i in lines if len(i)>0] 25 | for line in lines: 26 | line = line.split(' ') 27 | 28 | assert len(line) == 6 ,'wrong length!!' 29 | c,x,y,w,h,a = line 30 | if c not in class_id: 31 | class_id.append(c) 32 | if float(x)>1.0 or float(y)>1.0 or float(w)>1.0 or float(h)>1.0 or (float(eval(a))>0.5*math.pi or float(eval(a))<-0.5*math.pi): 33 | exception.append(file) 34 | elif float(x)<0 or float(y)<0 or float(w)<0 or float(h)<0: 35 | exception.append(file) 36 | 37 | assert '0' in class_id , 'Class counting from 0 rather than 1!' 38 | if len(exception) ==0: 39 | return 'No exception found.' 40 | else: 41 | return exception 42 | 43 | 44 | 45 | def convert(src_path,img_path,dst_path): 46 | icdar_files= os.listdir(src_path) 47 | for icdar_file in tqdm(icdar_files): #每个文件名称 48 | with open(os.path.join(dst_path, os.path.splitext(icdar_file)[0]+'.txt'),'w') as f: #打开要写的文件 49 | with open(os.path.join(src_path,icdar_file),'r',encoding='utf-8-sig') as fd: #打开要读的文件 50 | objects = fd.readlines() 51 | # objects = [x[ :x.find(x.split(',')[8])-1] for x in objects] 52 | assert len(objects) > 0, 'No object found in ' + xml_path 53 | 54 | class_label = 0 # 只分前景背景 55 | height, width, _ = cv2.imread(os.path.join(img_path, os.path.splitext(icdar_file)[0][3:])+'.jpg').shape 56 | 57 | for object in objects: 58 | if '###' not in object: 59 | object = object.split(',')[:8] 60 | coors = np.array([int(x) for x in object]).reshape(4,2).astype(np.int32) 61 | ((cx, cy), (w, h), theta) = cv2.minAreaRect(coors) 62 | ### vis & debug opencv 0度起点,顺时针为+ 63 | # print(cv2.minAreaRect(coors)) 64 | # img = cv2.imread(os.path.join(img_path, os.path.splitext(icdar_file)[0][3:])+'.jpg') 65 | # points = cv2.boxPoints(cv2.minAreaRect(coors)).astype(np.int32) 66 | # img = cv2.polylines(img,[points],True,(0,0,255),2) # 后三个参数为:是否封闭/color/thickness 67 | # cv2.imshow('display box',img) 68 | # cv2.waitKey(0) 69 | 70 | # 转换为自己的标准:-0.5pi, 0.5pi 71 | a = theta / 180 * math.pi 72 | if a > 0.5*math.pi: a = math.pi - a 73 | if a < -0.5*math.pi: a = math.pi + a 74 | 75 | x = Decimal(cx/width).quantize(Decimal('0.000000')) 76 | y = Decimal(cy/height).quantize(Decimal('0.000000')) 77 | w = Decimal(w/width).quantize(Decimal('0.000000')) 78 | h = Decimal(h/height).quantize(Decimal('0.000000')) 79 | a = Decimal(a).quantize(Decimal('0.000000')) 80 | 81 | f.write(str(class_label)+' '+str(x)+' '+str(y)+' '+str(w)+' '+str(h)+' '+str(a)+'\n') 82 | 83 | 84 | 85 | 86 | if __name__ == "__main__": 87 | 88 | care_all = True 89 | src_path = "/py/datasets/ICDAR2015/ICDAR/val_labels" 90 | img_path = '/py/datasets/ICDAR2015/ICDAR/val_imgs' 91 | dst_path = "/py/datasets/ICDAR2015/yolo/separate/val_labels" 92 | 93 | convert(src_path,img_path,dst_path) 94 | 95 | exception_files = check_exception(dst_path) 96 | print(exception_files) 97 | -------------------------------------------------------------------------------- /utils/ICDAR/icdar_utils.py: -------------------------------------------------------------------------------- 1 | import os 2 | import math 3 | import cv2 4 | import numpy as np 5 | from scipy.spatial import distance as dist 6 | import zipfile 7 | 8 | def zip_dir(dirname,zipfilename): 9 | filelist = [] 10 | if os.path.isfile(dirname): 11 | filelist.append(dirname) 12 | else : 13 | for root, dirs, files in os.walk(dirname): 14 | for name in files: 15 | filelist.append(os.path.join(root, name)) 16 | 17 | zf = zipfile.ZipFile(zipfilename, "w", zipfile.zlib.DEFLATED) 18 | for tar in filelist: 19 | arcname = tar[len(dirname):] 20 | #print arcname 21 | zf.write(tar,arcname) 22 | zf.close() 23 | 24 | 25 | def cos_dist(a, b): 26 | if len(a) != len(b): 27 | return None 28 | part_up = 0.0 29 | a_sq = 0.0 30 | b_sq = 0.0 31 | # print(a, b) 32 | # print(zip(a, b)) 33 | for a1, b1 in zip(a, b): 34 | part_up += a1*b1 35 | a_sq += a1**2 36 | b_sq += b1**2 37 | part_down = math.sqrt(a_sq*b_sq) 38 | if part_down == 0.0: 39 | return None 40 | else: 41 | return part_up / part_down 42 | 43 | 44 | # this function is confined to rectangle 45 | def order_points(pts): 46 | # sort the points based on their x-coordinates 47 | xSorted = pts[np.argsort(pts[:, 0]), :] 48 | 49 | # grab the left-most and right-most points from the sorted 50 | # x-roodinate points 51 | leftMost = xSorted[:2, :] 52 | rightMost = xSorted[2:, :] 53 | 54 | # now, sort the left-most coordinates according to their 55 | # y-coordinates so we can grab the top-left and bottom-left 56 | # points, respectively 57 | leftMost = leftMost[np.argsort(leftMost[:, 1]), :] 58 | (tl, bl) = leftMost 59 | 60 | # now that we have the top-left coordinate, use it as an 61 | # anchor to calculate the Euclidean distance between the 62 | # top-left and right-most points; by the Pythagorean 63 | # theorem, the point with the largest distance will be 64 | # our bottom-right point 65 | D = dist.cdist(tl[np.newaxis], rightMost, "euclidean")[0] 66 | (br, tr) = rightMost[np.argsort(D)[::-1], :] 67 | 68 | # return the coordinates in top-left, top-right, 69 | # bottom-right, and bottom-left order 70 | return np.array([tl, tr, br, bl], dtype="float32") 71 | 72 | 73 | def order_points_quadrangle(pts): 74 | # sort the points based on their x-coordinates 75 | xSorted = pts[np.argsort(pts[:, 0]), :] 76 | 77 | # grab the left-most and right-most points from the sorted 78 | # x-roodinate points 79 | leftMost = xSorted[:2, :] 80 | rightMost = xSorted[2:, :] 81 | 82 | # now, sort the left-most coordinates according to their 83 | # y-coordinates so we can grab the top-left and bottom-left 84 | # points, respectively 85 | leftMost = leftMost[np.argsort(leftMost[:, 1]), :] 86 | (tl, bl) = leftMost 87 | 88 | # now that we have the top-left and bottom-left coordinate, use it as an 89 | # base vector to calculate the angles between the other two vectors 90 | 91 | vector_0 = np.array(bl-tl) 92 | vector_1 = np.array(rightMost[0]-tl) 93 | vector_2 = np.array(rightMost[1]-tl) 94 | 95 | angle = [np.arccos(cos_dist(vector_0, vector_1)), np.arccos(cos_dist(vector_0, vector_2))] 96 | (br, tr) = rightMost[np.argsort(angle), :] 97 | 98 | # return the coordinates in top-left, top-right, 99 | # bottom-right, and bottom-left order 100 | return np.array([tl, tr, br, bl], dtype="float32") 101 | 102 | 103 | 104 | 105 | def xywha2points(x): 106 | # 带旋转角度,顺时针正,+-0.5pi;返回四个点坐标 107 | cx = x[0]; cy = x[1]; w = x[2]; h = x[3]; a = x[4] 108 | xmin = cx - w*0.5; xmax = cx + w*0.5; ymin = cy - h*0.5; ymax = cy + h*0.5 109 | t_x0=xmin; t_y0=ymin; t_x1=xmin; t_y1=ymax; t_x2=xmax; t_y2=ymax; t_x3=xmax; t_y3=ymin 110 | R = np.eye(3) 111 | R[:2] = cv2.getRotationMatrix2D(angle=-a*180/math.pi, center=(cx,cy), scale=1) 112 | x0 = t_x0*R[0,0] + t_y0*R[0,1] + R[0,2] 113 | y0 = t_x0*R[1,0] + t_y0*R[1,1] + R[1,2] 114 | x1 = t_x1*R[0,0] + t_y1*R[0,1] + R[0,2] 115 | y1 = t_x1*R[1,0] + t_y1*R[1,1] + R[1,2] 116 | x2 = t_x2*R[0,0] + t_y2*R[0,1] + R[0,2] 117 | y2 = t_x2*R[1,0] + t_y2*R[1,1] + R[1,2] 118 | x3 = t_x3*R[0,0] + t_y3*R[0,1] + R[0,2] 119 | y3 = t_x3*R[1,0] + t_y3*R[1,1] + R[1,2] 120 | points = np.array([[float(x0),float(y0)],[float(x1),float(y1)],[float(x2),float(y2)],[float(x3),float(y3)]]) 121 | return points 122 | 123 | def xywha2icdar(box): 124 | box = xywha2points(box) 125 | cw_box = order_points(box) 126 | cw_box = cw_box.reshape(1, 8).squeeze().astype('int').tolist() 127 | str_box = str(cw_box[0]) + ',' + \ 128 | str(cw_box[1]) + ',' + \ 129 | str(cw_box[2]) + ',' + \ 130 | str(cw_box[3]) + ',' + \ 131 | str(cw_box[4]) + ',' + \ 132 | str(cw_box[5]) + ',' + \ 133 | str(cw_box[6]) + ',' + \ 134 | str(cw_box[7]) + '\n' 135 | return str_box 136 | 137 | 138 | # if __name__ == "__main__": 139 | # pnts = np.array([137,340,137,351,172,351,172,340]).reshape(4,2) 140 | # trans_pnts = order_points(pnts) 141 | 142 | # img = np.zeros((1000, 1000, 3), np.uint8) 143 | # for id, point in enumerate(trans_pnts): 144 | # point = tuple(point) 145 | # cv2.circle(img, point, radius = 1, color = (0, 0, 255), thickness = 4) 146 | # cv2.putText(img, str(id), point, cv2.FONT_HERSHEY_COMPLEX, 1, (255,0,0), 2) 147 | 148 | # cv2.imshow('points', img) 149 | # cv2.waitKey (0) 150 | # cv2.destroyAllWindows() -------------------------------------------------------------------------------- /utils/adabound.py: -------------------------------------------------------------------------------- 1 | import math 2 | 3 | import torch 4 | from torch.optim import Optimizer 5 | 6 | 7 | class AdaBound(Optimizer): 8 | """Implements AdaBound algorithm. 9 | It has been proposed in `Adaptive Gradient Methods with Dynamic Bound of Learning Rate`_. 10 | Arguments: 11 | params (iterable): iterable of parameters to optimize or dicts defining 12 | parameter groups 13 | lr (float, optional): Adam learning rate (default: 1e-3) 14 | betas (Tuple[float, float], optional): coefficients used for computing 15 | running averages of gradient and its square (default: (0.9, 0.999)) 16 | final_lr (float, optional): final (SGD) learning rate (default: 0.1) 17 | gamma (float, optional): convergence speed of the bound functions (default: 1e-3) 18 | eps (float, optional): term added to the denominator to improve 19 | numerical stability (default: 1e-8) 20 | weight_decay (float, optional): weight decay (L2 penalty) (default: 0) 21 | amsbound (boolean, optional): whether to use the AMSBound variant of this algorithm 22 | .. Adaptive Gradient Methods with Dynamic Bound of Learning Rate: 23 | https://openreview.net/forum?id=Bkg3g2R9FX 24 | """ 25 | 26 | def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), final_lr=0.1, gamma=1e-3, 27 | eps=1e-8, weight_decay=0, amsbound=False): 28 | if not 0.0 <= lr: 29 | raise ValueError("Invalid learning rate: {}".format(lr)) 30 | if not 0.0 <= eps: 31 | raise ValueError("Invalid epsilon value: {}".format(eps)) 32 | if not 0.0 <= betas[0] < 1.0: 33 | raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0])) 34 | if not 0.0 <= betas[1] < 1.0: 35 | raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1])) 36 | if not 0.0 <= final_lr: 37 | raise ValueError("Invalid final learning rate: {}".format(final_lr)) 38 | if not 0.0 <= gamma < 1.0: 39 | raise ValueError("Invalid gamma parameter: {}".format(gamma)) 40 | defaults = dict(lr=lr, betas=betas, final_lr=final_lr, gamma=gamma, eps=eps, 41 | weight_decay=weight_decay, amsbound=amsbound) 42 | super(AdaBound, self).__init__(params, defaults) 43 | 44 | self.base_lrs = list(map(lambda group: group['lr'], self.param_groups)) 45 | 46 | def __setstate__(self, state): 47 | super(AdaBound, self).__setstate__(state) 48 | for group in self.param_groups: 49 | group.setdefault('amsbound', False) 50 | 51 | def step(self, closure=None): 52 | """Performs a single optimization step. 53 | Arguments: 54 | closure (callable, optional): A closure that reevaluates the model 55 | and returns the loss. 56 | """ 57 | loss = None 58 | if closure is not None: 59 | loss = closure() 60 | 61 | for group, base_lr in zip(self.param_groups, self.base_lrs): 62 | for p in group['params']: 63 | if p.grad is None: 64 | continue 65 | grad = p.grad.data 66 | if grad.is_sparse: 67 | raise RuntimeError( 68 | 'Adam does not support sparse gradients, please consider SparseAdam instead') 69 | amsbound = group['amsbound'] 70 | 71 | state = self.state[p] 72 | 73 | # State initialization 74 | if len(state) == 0: 75 | state['step'] = 0 76 | # Exponential moving average of gradient values 77 | state['exp_avg'] = torch.zeros_like(p.data) 78 | # Exponential moving average of squared gradient values 79 | state['exp_avg_sq'] = torch.zeros_like(p.data) 80 | if amsbound: 81 | # Maintains max of all exp. moving avg. of sq. grad. values 82 | state['max_exp_avg_sq'] = torch.zeros_like(p.data) 83 | 84 | exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq'] 85 | if amsbound: 86 | max_exp_avg_sq = state['max_exp_avg_sq'] 87 | beta1, beta2 = group['betas'] 88 | 89 | state['step'] += 1 90 | 91 | if group['weight_decay'] != 0: 92 | grad = grad.add(group['weight_decay'], p.data) 93 | 94 | # Decay the first and second moment running average coefficient 95 | exp_avg.mul_(beta1).add_(1 - beta1, grad) 96 | exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad) 97 | if amsbound: 98 | # Maintains the maximum of all 2nd moment running avg. till now 99 | torch.max(max_exp_avg_sq, exp_avg_sq, out=max_exp_avg_sq) 100 | # Use the max. for normalizing running avg. of gradient 101 | denom = max_exp_avg_sq.sqrt().add_(group['eps']) 102 | else: 103 | denom = exp_avg_sq.sqrt().add_(group['eps']) 104 | 105 | bias_correction1 = 1 - beta1 ** state['step'] 106 | bias_correction2 = 1 - beta2 ** state['step'] 107 | step_size = group['lr'] * math.sqrt(bias_correction2) / bias_correction1 108 | 109 | # Applies bounds on actual learning rate 110 | # lr_scheduler cannot affect final_lr, this is a workaround to apply lr decay 111 | final_lr = group['final_lr'] * group['lr'] / base_lr 112 | lower_bound = final_lr * (1 - 1 / (group['gamma'] * state['step'] + 1)) 113 | upper_bound = final_lr * (1 + 1 / (group['gamma'] * state['step'])) 114 | step_size = torch.full_like(denom, step_size) 115 | step_size.div_(denom).clamp_(lower_bound, upper_bound).mul_(exp_avg) 116 | 117 | p.data.add_(-step_size) 118 | 119 | return loss 120 | 121 | 122 | class AdaBoundW(Optimizer): 123 | """Implements AdaBound algorithm with Decoupled Weight Decay (arxiv.org/abs/1711.05101) 124 | It has been proposed in `Adaptive Gradient Methods with Dynamic Bound of Learning Rate`_. 125 | Arguments: 126 | params (iterable): iterable of parameters to optimize or dicts defining 127 | parameter groups 128 | lr (float, optional): Adam learning rate (default: 1e-3) 129 | betas (Tuple[float, float], optional): coefficients used for computing 130 | running averages of gradient and its square (default: (0.9, 0.999)) 131 | final_lr (float, optional): final (SGD) learning rate (default: 0.1) 132 | gamma (float, optional): convergence speed of the bound functions (default: 1e-3) 133 | eps (float, optional): term added to the denominator to improve 134 | numerical stability (default: 1e-8) 135 | weight_decay (float, optional): weight decay (L2 penalty) (default: 0) 136 | amsbound (boolean, optional): whether to use the AMSBound variant of this algorithm 137 | .. Adaptive Gradient Methods with Dynamic Bound of Learning Rate: 138 | https://openreview.net/forum?id=Bkg3g2R9FX 139 | """ 140 | 141 | def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), final_lr=0.1, gamma=1e-3, 142 | eps=1e-8, weight_decay=0, amsbound=False): 143 | if not 0.0 <= lr: 144 | raise ValueError("Invalid learning rate: {}".format(lr)) 145 | if not 0.0 <= eps: 146 | raise ValueError("Invalid epsilon value: {}".format(eps)) 147 | if not 0.0 <= betas[0] < 1.0: 148 | raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0])) 149 | if not 0.0 <= betas[1] < 1.0: 150 | raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1])) 151 | if not 0.0 <= final_lr: 152 | raise ValueError("Invalid final learning rate: {}".format(final_lr)) 153 | if not 0.0 <= gamma < 1.0: 154 | raise ValueError("Invalid gamma parameter: {}".format(gamma)) 155 | defaults = dict(lr=lr, betas=betas, final_lr=final_lr, gamma=gamma, eps=eps, 156 | weight_decay=weight_decay, amsbound=amsbound) 157 | super(AdaBoundW, self).__init__(params, defaults) 158 | 159 | self.base_lrs = list(map(lambda group: group['lr'], self.param_groups)) 160 | 161 | def __setstate__(self, state): 162 | super(AdaBoundW, self).__setstate__(state) 163 | for group in self.param_groups: 164 | group.setdefault('amsbound', False) 165 | 166 | def step(self, closure=None): 167 | """Performs a single optimization step. 168 | Arguments: 169 | closure (callable, optional): A closure that reevaluates the model 170 | and returns the loss. 171 | """ 172 | loss = None 173 | if closure is not None: 174 | loss = closure() 175 | 176 | for group, base_lr in zip(self.param_groups, self.base_lrs): 177 | for p in group['params']: 178 | if p.grad is None: 179 | continue 180 | grad = p.grad.data 181 | if grad.is_sparse: 182 | raise RuntimeError( 183 | 'Adam does not support sparse gradients, please consider SparseAdam instead') 184 | amsbound = group['amsbound'] 185 | 186 | state = self.state[p] 187 | 188 | # State initialization 189 | if len(state) == 0: 190 | state['step'] = 0 191 | # Exponential moving average of gradient values 192 | state['exp_avg'] = torch.zeros_like(p.data) 193 | # Exponential moving average of squared gradient values 194 | state['exp_avg_sq'] = torch.zeros_like(p.data) 195 | if amsbound: 196 | # Maintains max of all exp. moving avg. of sq. grad. values 197 | state['max_exp_avg_sq'] = torch.zeros_like(p.data) 198 | 199 | exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq'] 200 | if amsbound: 201 | max_exp_avg_sq = state['max_exp_avg_sq'] 202 | beta1, beta2 = group['betas'] 203 | 204 | state['step'] += 1 205 | 206 | # Decay the first and second moment running average coefficient 207 | exp_avg.mul_(beta1).add_(1 - beta1, grad) 208 | exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad) 209 | if amsbound: 210 | # Maintains the maximum of all 2nd moment running avg. till now 211 | torch.max(max_exp_avg_sq, exp_avg_sq, out=max_exp_avg_sq) 212 | # Use the max. for normalizing running avg. of gradient 213 | denom = max_exp_avg_sq.sqrt().add_(group['eps']) 214 | else: 215 | denom = exp_avg_sq.sqrt().add_(group['eps']) 216 | 217 | bias_correction1 = 1 - beta1 ** state['step'] 218 | bias_correction2 = 1 - beta2 ** state['step'] 219 | step_size = group['lr'] * math.sqrt(bias_correction2) / bias_correction1 220 | 221 | # Applies bounds on actual learning rate 222 | # lr_scheduler cannot affect final_lr, this is a workaround to apply lr decay 223 | final_lr = group['final_lr'] * group['lr'] / base_lr 224 | lower_bound = final_lr * (1 - 1 / (group['gamma'] * state['step'] + 1)) 225 | upper_bound = final_lr * (1 + 1 / (group['gamma'] * state['step'])) 226 | step_size = torch.full_like(denom, step_size) 227 | step_size.div_(denom).clamp_(lower_bound, upper_bound).mul_(exp_avg) 228 | 229 | if group['weight_decay'] != 0: 230 | decayed_weights = torch.mul(p.data, group['weight_decay']) 231 | p.data.add_(-step_size) 232 | p.data.sub_(decayed_weights) 233 | else: 234 | p.data.add_(-step_size) 235 | 236 | return loss 237 | -------------------------------------------------------------------------------- /utils/gcp.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | # New VM 4 | rm -rf sample_data yolov3 darknet apex coco cocoapi knife knifec 5 | git clone https://github.com/ultralytics/yolov3 6 | # git clone https://github.com/AlexeyAB/darknet && cd darknet && make GPU=1 CUDNN=1 CUDNN_HALF=1 OPENCV=0 && wget -c https://pjreddie.com/media/files/darknet53.conv.74 && cd .. 7 | git clone https://github.com/NVIDIA/apex && cd apex && pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" . --user && cd .. && rm -rf apex 8 | # git clone https://github.com/cocodataset/cocoapi && cd cocoapi/PythonAPI && make && cd ../.. && cp -r cocoapi/PythonAPI/pycocotools yolov3 9 | sudo conda install -y -c conda-forge scikit-image tensorboard pycocotools 10 | python3 -c " 11 | from yolov3.utils.google_utils import gdrive_download 12 | gdrive_download('1HaXkef9z6y5l4vUnCYgdmEAj61c6bfWO','coco.zip')" 13 | sudo shutdown 14 | 15 | # Re-clone 16 | rm -rf yolov3 # Warning: remove existing 17 | git clone https://github.com/ultralytics/yolov3 && cd yolov3 # master 18 | # git clone -b test --depth 1 https://github.com/ultralytics/yolov3 test # branch 19 | python3 train.py --img-size 320 --weights weights/darknet53.conv.74 --epochs 27 --batch-size 64 --accumulate 1 20 | 21 | # Train 22 | python3 train.py 23 | 24 | # Resume 25 | python3 train.py --resume 26 | 27 | # Detect 28 | python3 detect.py 29 | 30 | # Test 31 | python3 test.py --save-json 32 | 33 | # Evolve 34 | for i in {0..500} 35 | do 36 | python3 train.py --data data/coco.data --img-size 320 --epochs 1 --batch-size 64 --accumulate 1 --evolve --bucket yolov4 37 | done 38 | 39 | # Git pull 40 | git pull https://github.com/ultralytics/yolov3 # master 41 | git pull https://github.com/ultralytics/yolov3 test # branch 42 | 43 | # Test Darknet training 44 | python3 test.py --weights ../darknet/backup/yolov3.backup 45 | 46 | # Copy last.pt TO bucket 47 | gsutil cp yolov3/weights/last1gpu.pt gs://ultralytics 48 | 49 | # Copy last.pt FROM bucket 50 | gsutil cp gs://ultralytics/last.pt yolov3/weights/last.pt 51 | wget https://storage.googleapis.com/ultralytics/yolov3/last_v1_0.pt -O weights/last_v1_0.pt 52 | wget https://storage.googleapis.com/ultralytics/yolov3/best_v1_0.pt -O weights/best_v1_0.pt 53 | 54 | # Reproduce tutorials 55 | rm results*.txt # WARNING: removes existing results 56 | python3 train.py --nosave --data data/coco_1img.data && mv results.txt results0r_1img.txt 57 | python3 train.py --nosave --data data/coco_10img.data && mv results.txt results0r_10img.txt 58 | python3 train.py --nosave --data data/coco_100img.data && mv results.txt results0r_100img.txt 59 | # python3 train.py --nosave --data data/coco_100img.data --transfer && mv results.txt results3_100imgTL.txt 60 | python3 -c "from utils import utils; utils.plot_results()" 61 | # gsutil cp results*.txt gs://ultralytics 62 | gsutil cp results.png gs://ultralytics 63 | sudo shutdown 64 | 65 | # Reproduce mAP 66 | python3 test.py --save-json --img-size 608 67 | python3 test.py --save-json --img-size 416 68 | python3 test.py --save-json --img-size 320 69 | sudo shutdown 70 | 71 | # Benchmark script 72 | git clone https://github.com/ultralytics/yolov3 # clone our repo 73 | git clone https://github.com/NVIDIA/apex && cd apex && pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" . --user && cd .. && rm -rf apex # install nvidia apex 74 | python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('1HaXkef9z6y5l4vUnCYgdmEAj61c6bfWO','coco.zip')" # download coco dataset (20GB) 75 | cd yolov3 && clear && python3 train.py --epochs 1 # run benchmark (~30 min) 76 | 77 | # Unit tests 78 | python3 detect.py # detect 2 persons, 1 tie 79 | python3 test.py --data data/coco_32img.data # test mAP = 0.8 80 | python3 train.py --data data/coco_32img.data --epochs 5 --nosave # train 5 epochs 81 | python3 train.py --data data/coco_1cls.data --epochs 5 --nosave # train 5 epochs 82 | python3 train.py --data data/coco_1img.data --epochs 5 --nosave # train 5 epochs 83 | 84 | # AlexyAB Darknet 85 | gsutil cp -r gs://sm6/supermarket2 . # dataset from bucket 86 | rm -rf darknet && git clone https://github.com/AlexeyAB/darknet && cd darknet && wget -c https://pjreddie.com/media/files/darknet53.conv.74 # sudo apt install libopencv-dev && make 87 | ./darknet detector calc_anchors data/coco_img64.data -num_of_clusters 9 -width 320 -height 320 # kmeans anchor calculation 88 | ./darknet detector train ../supermarket2/supermarket2.data ../yolo_v3_spp_pan_scale.cfg darknet53.conv.74 -map -dont_show # train spp 89 | ./darknet detector train ../yolov3/data/coco.data ../yolov3-spp.cfg darknet53.conv.74 -map -dont_show # train spp coco 90 | 91 | ./darknet detector train data/coco.data ../yolov3-spp.cfg darknet53.conv.74 -map -dont_show # train spp 92 | gsutil cp -r backup/*5000.weights gs://sm6/weights 93 | sudo shutdown 94 | 95 | 96 | ./darknet detector train ../supermarket2/supermarket2.data ../yolov3-tiny-sm2-1cls.cfg yolov3-tiny.conv.15 -map -dont_show # train tiny 97 | ./darknet detector train ../supermarket2/supermarket2.data cfg/yolov3-spp-sm2-1cls.cfg backup/yolov3-spp-sm2-1cls_last.weights # resume 98 | python3 train.py --data ../supermarket2/supermarket2.data --cfg ../yolov3-spp-sm2-1cls.cfg --epochs 100 --num-workers 8 --img-size 320 --nosave # train ultralytics 99 | python3 test.py --data ../supermarket2/supermarket2.data --weights ../darknet/backup/yolov3-spp-sm2-1cls_5000.weights --cfg cfg/yolov3-spp-sm2-1cls.cfg # test 100 | gsutil cp -r backup/*.weights gs://sm6/weights # weights to bucket 101 | 102 | python3 test.py --data ../supermarket2/supermarket2.data --weights weights/yolov3-spp-sm2-1cls_5000.weights --cfg ../yolov3-spp-sm2-1cls.cfg --img-size 320 --conf-thres 0.2 # test 103 | python3 test.py --data ../supermarket2/supermarket2.data --weights weights/yolov3-spp-sm2-1cls-scalexy_125_5000.weights --cfg ../yolov3-spp-sm2-1cls-scalexy_125.cfg --img-size 320 --conf-thres 0.2 # test 104 | python3 test.py --data ../supermarket2/supermarket2.data --weights weights/yolov3-spp-sm2-1cls-scalexy_150_5000.weights --cfg ../yolov3-spp-sm2-1cls-scalexy_150.cfg --img-size 320 --conf-thres 0.2 # test 105 | python3 test.py --data ../supermarket2/supermarket2.data --weights weights/yolov3-spp-sm2-1cls-scalexy_200_5000.weights --cfg ../yolov3-spp-sm2-1cls-scalexy_200.cfg --img-size 320 --conf-thres 0.2 # test 106 | python3 test.py --data ../supermarket2/supermarket2.data --weights ../darknet/backup/yolov3-spp-sm2-1cls-scalexy_variable_5000.weights --cfg ../yolov3-spp-sm2-1cls-scalexy_variable.cfg --img-size 320 --conf-thres 0.2 # test 107 | 108 | python3 train.py --img-size 320 --epochs 27 --batch-size 64 --accumulate 1 --nosave --notest && python3 test.py --weights weights/last.pt --img-size 320 --save-json && sudo shutdown 109 | 110 | # Debug/Development 111 | python3 train.py --data data/coco.data --img-size 320 --single-scale --batch-size 64 --accumulate 1 --epochs 1 --evolve --giou 112 | python3 test.py --weights weights/last.pt --cfg cfg/yolov3-spp.cfg --img-size 320 113 | 114 | gsutil cp evolve.txt gs://ultralytics 115 | sudo shutdown 116 | 117 | #Docker 118 | sudo docker kill $(sudo docker ps -q) 119 | sudo docker pull ultralytics/yolov3:v1 120 | sudo nvidia-docker run -it --ipc=host --mount type=bind,source="$(pwd)"/coco,target=/usr/src/coco ultralytics/yolov3:v1 121 | 122 | clear 123 | while true 124 | do 125 | python3 train.py --data data/coco.data --img-size 320 --batch-size 64 --accumulate 1 --evolve --epochs 1 --adam --bucket yolov4/adamdefaultpw_coco_1e --device 1 126 | done 127 | 128 | python3 train.py --data data/coco.data --img-size 320 --batch-size 64 --accumulate 1 --epochs 1 --adam --device 1 --prebias 129 | while true; do python3 train.py --data data/coco.data --img-size 320 --batch-size 64 --accumulate 1 --evolve --epochs 1 --adam --bucket yolov4/adamdefaultpw_coco_1e; done 130 | -------------------------------------------------------------------------------- /utils/google_utils.py: -------------------------------------------------------------------------------- 1 | # This file contains google utils: https://cloud.google.com/storage/docs/reference/libraries 2 | # pip install --upgrade google-cloud-storage 3 | 4 | import os 5 | import time 6 | 7 | 8 | # from google.cloud import storage 9 | 10 | 11 | def gdrive_download(id='1HaXkef9z6y5l4vUnCYgdmEAj61c6bfWO', name='coco.zip'): 12 | # https://gist.github.com/tanaikech/f0f2d122e05bf5f971611258c22c110f 13 | # Downloads a file from Google Drive, accepting presented query 14 | # from utils.google_utils import *; gdrive_download() 15 | t = time.time() 16 | 17 | print('Downloading https://drive.google.com/uc?export=download&id=%s as %s... ' % (id, name), end='') 18 | if os.path.exists(name): # remove existing 19 | os.remove(name) 20 | 21 | # Attempt large file download 22 | s = ["curl -c ./cookie -s -L \"https://drive.google.com/uc?export=download&id=%s\" > /dev/null" % id, 23 | "curl -Lb ./cookie -s \"https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=%s\" -o %s" % ( 24 | id, name), 25 | 'rm ./cookie'] 26 | [os.system(x) for x in s] # run commands 27 | 28 | # Attempt small file download 29 | if not os.path.exists(name): # file size < 40MB 30 | s = 'curl -f -L -o %s https://drive.google.com/uc?export=download&id=%s' % (name, id) 31 | os.system(s) 32 | 33 | # Unzip if archive 34 | if name.endswith('.zip'): 35 | print('unzipping... ', end='') 36 | os.system('unzip -q %s' % name) # unzip 37 | os.remove(name) # remove zip to free space 38 | 39 | print('Done (%.1fs)' % (time.time() - t)) 40 | 41 | 42 | def upload_blob(bucket_name, source_file_name, destination_blob_name): 43 | # Uploads a file to a bucket 44 | # https://cloud.google.com/storage/docs/uploading-objects#storage-upload-object-python 45 | 46 | storage_client = storage.Client() 47 | bucket = storage_client.get_bucket(bucket_name) 48 | blob = bucket.blob(destination_blob_name) 49 | 50 | blob.upload_from_filename(source_file_name) 51 | 52 | print('File {} uploaded to {}.'.format( 53 | source_file_name, 54 | destination_blob_name)) 55 | 56 | 57 | def download_blob(bucket_name, source_blob_name, destination_file_name): 58 | # Uploads a blob from a bucket 59 | storage_client = storage.Client() 60 | bucket = storage_client.get_bucket(bucket_name) 61 | blob = bucket.blob(source_blob_name) 62 | 63 | blob.download_to_filename(destination_file_name) 64 | 65 | print('Blob {} downloaded to {}.'.format( 66 | source_blob_name, 67 | destination_file_name)) 68 | -------------------------------------------------------------------------------- /utils/init.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/utils/init.py -------------------------------------------------------------------------------- /utils/kmeans/416/3/anchor_clusters.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/utils/kmeans/416/3/anchor_clusters.png -------------------------------------------------------------------------------- /utils/kmeans/416/3/area_cluster.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/utils/kmeans/416/3/area_cluster.png -------------------------------------------------------------------------------- /utils/kmeans/416/3/kmeans.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/utils/kmeans/416/3/kmeans.png -------------------------------------------------------------------------------- /utils/kmeans/416/3/ratio_cluster.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/utils/kmeans/416/3/ratio_cluster.png -------------------------------------------------------------------------------- /utils/kmeans/416/6/2019-10-31 09-02-05屏幕截图.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/utils/kmeans/416/6/2019-10-31 09-02-05屏幕截图.png -------------------------------------------------------------------------------- /utils/kmeans/416/6/anchor_clusters.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/utils/kmeans/416/6/anchor_clusters.png -------------------------------------------------------------------------------- /utils/kmeans/416/6/area_cluster.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/utils/kmeans/416/6/area_cluster.png -------------------------------------------------------------------------------- /utils/kmeans/416/6/ratio_cluster.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/utils/kmeans/416/6/ratio_cluster.png -------------------------------------------------------------------------------- /utils/kmeans/hrsc_512.txt: -------------------------------------------------------------------------------- 1 | 28 10 2 | 50 12 3 | 61 6 4 | 69 8 5 | 73 16 6 | 79 10 7 | 86 12 8 | 108 13 9 | 111 17 10 | 132 27 11 | 134 15 12 | 138 19 13 | 155 22 14 | 167 18 15 | 175 28 16 | 202 34 17 | 251 42 18 | 297 74 19 | -------------------------------------------------------------------------------- /utils/kmeans/icdar_608_all.txt: -------------------------------------------------------------------------------- 1 | 5 11 2 | 6 23 3 | 6 6 4 | 8 38 5 | 9 15 6 | 11 3 7 | 13 7 8 | 13 26 9 | 14 47 10 | 20 5 11 | 21 9 12 | 27 75 13 | 28 14 14 | 35 9 15 | 38 5 16 | 44 19 17 | 59 12 18 | 88 28 19 | -------------------------------------------------------------------------------- /utils/kmeans/icdar_608_care.txt: -------------------------------------------------------------------------------- 1 | 6 14 2 | 8 21 3 | 8 36 4 | 12 26 5 | 13 6 6 | 13 44 7 | 16 9 8 | 20 8 9 | 21 65 10 | 22 5 11 | 23 12 12 | 29 9 13 | 33 7 14 | 36 16 15 | 36 112 16 | 47 11 17 | 61 18 18 | 101 31 19 | -------------------------------------------------------------------------------- /utils/nms/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ming71/rotate-yolov3/2341159a5ca29487107065966fa67c2946c0309e/utils/nms/__init__.py -------------------------------------------------------------------------------- /utils/nms/make.sh: -------------------------------------------------------------------------------- 1 | python setup.py build_ext --inplace 2 | -------------------------------------------------------------------------------- /utils/nms/nms.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from utils.nms.r_nms import r_nms 3 | 4 | def non_max_suppression(prediction, conf_thres=0.5, nms_thres=0.5): 5 | """ 6 | Removes detections with lower object confidence score than 'conf_thres' 7 | Non-Maximum Suppression to further filter detections. 8 | Returns detections with shape: 9 | (x, y, w, h, a, object_conf, class_conf, class) 10 | """ 11 | # prediction: torch.Size([1, 8190, 8]) 第一维bs是图片数,第二维是所有的proposal,第三维是xywh + conf + classes(这里是三类) 12 | min_wh = 2 # (pixels) minimum box width and height 13 | output = [None] * len(prediction) 14 | for image_i, pred in enumerate(prediction): 15 | # Experiment: Prior class size rejection 16 | # x, y, w, h = pred[:, 0], pred[:, 1], pred[:, 2], pred[:, 3] 17 | # a = w * h # area 18 | # ar = w / (h + 1e-16) # aspect ratio 19 | # n = len(w) 20 | # log_w, log_h, log_a, log_ar = torch.log(w), torch.log(h), torch.log(a), torch.log(ar) 21 | # shape_likelihood = np.zeros((n, 60), dtype=np.float32) 22 | # x = np.concatenate((log_w.reshape(-1, 1), log_h.reshape(-1, 1)), 1) 23 | # from scipy.stats import multivariate_normal 24 | # for c in range(60): 25 | # shape_likelihood[:, c] = 26 | # multivariate_normal.pdf(x, mean=mat['class_mu'][c, :2], cov=mat['class_cov'][c, :2, :2]) 27 | 28 | if prediction.numel() == 0: # for multi-scale filtered result , in case of 0 29 | continue 30 | 31 | # Multiply conf by class conf to get combined confidence 32 | # max(1)是按照1维搜索,对每个proposal取出多分类分数,得到最大的那个值 33 | # 返回值class_conf和索引class_pred,索引就是类别所属 34 | class_conf, class_pred = pred[:, 6:].max(1) # max(1) 是每行找最大的,即当前proposal最可能是哪个类 35 | pred[:, 5] *= class_conf # 乘以conf才是真正的得分,赋值到conf的位置 36 | 37 | # Select only suitable predictions 38 | # 先创造一个满足要求的索引bool矩阵,然后据此第二步进行索引 39 | # 条件为:1.最大类的conf大于预设值 2.该anchor的预测wh大于2像素 3.非nan或无穷 40 | i = (pred[:, 5] > conf_thres) & (pred[:, 2:4] > min_wh).all(1) & torch.isfinite(pred).all(1) 41 | pred = pred[i] 42 | 43 | # If none are remaining => process next image 44 | if len(pred) == 0: 45 | continue 46 | 47 | # Select predicted classes 48 | class_conf = class_conf[i] # bool向量筛掉False的conf 49 | class_pred = class_pred[i].unsqueeze(1).float() # torch.Size([num_of_proposal]) --> torch.Size([num_of_proposal,1])便于后面的concat 50 | 51 | use_cuda_nms = True 52 | # use_cuda时方案是不限于100个,因为有可能产生很多的高得分proposal,会误删 53 | if use_cuda_nms: 54 | det_max = [] 55 | pred = torch.cat((pred[:, :6], class_conf.unsqueeze(1), class_pred), 1) 56 | pred = pred[(-pred[:, 5]).argsort()] 57 | for c in pred[:, -1].unique(): 58 | dc = pred[pred[:, -1] == c] 59 | dc = dc[(-dc[:, 5]).argsort()] 60 | # if len(dc)>100: # 如果proposal实在太多,取100个 61 | # dc = dc[:100] 62 | 63 | # Non-maximum suppression 64 | inds = r_nms(dc[:,:6], nms_thres) 65 | 66 | det_max.append(dc[inds]) 67 | if len(det_max): 68 | det_max = torch.cat(det_max) # concatenate 69 | output[image_i] = det_max[(-det_max[:, 5]).argsort()] # sort 70 | 71 | else: 72 | # Detections ordered as (x1y1x2y2, obj_conf, class_conf, class_pred) 73 | pred = torch.cat((pred[:, :6], class_conf.unsqueeze(1), class_pred), 1) 74 | 75 | # Get detections sorted by decreasing confidence scores 76 | pred = pred[(-pred[:, 5]).argsort()] 77 | 78 | det_max = [] 79 | nms_style = 'OR' # 'OR' (default), 'AND', 'MERGE' (experimental) 80 | 81 | for c in pred[:, -1].unique(): 82 | dc = pred[pred[:, -1] == c] # select class c # shape [num,7] 7 = (x1, y1, x2, y2, object_conf, class_conf) 83 | n = len(dc) 84 | if n == 1: 85 | det_max.append(dc) # No NMS required if only 1 prediction 86 | continue 87 | elif n > 100: 88 | dc = dc[:100] # limit to first 100 boxes: https://github.com/ultralytics/yolov3/issues/117 89 | 90 | # Non-maximum suppression 91 | if nms_style == 'OR': # default 92 | # METHOD1 93 | # ind = list(range(len(dc))) 94 | # while len(ind): 95 | # j = ind[0] 96 | # det_max.append(dc[j:j + 1]) # save highest conf detection 97 | # reject = (skew_bbox_iou(dc[j], dc[ind]) > nms_thres).nonzero() 98 | # [ind.pop(i) for i in reversed(reject)] 99 | 100 | # METHOD2 101 | while dc.shape[0]: 102 | det_max.append(dc[:1]) # save highest conf detection 103 | if len(dc) == 1: # Stop if we're at the last detection 104 | break 105 | iou = skew_bbox_iou(dc[0], dc[1:]) # iou with other boxes 106 | dc = dc[1:][iou < nms_thres] # remove ious > threshold 107 | 108 | elif nms_style == 'AND': # requires overlap, single boxes erased 109 | while len(dc) > 1: 110 | iou = skew_bbox_iou(dc[0], dc[1:]) # iou with other boxes 111 | if iou.max() > 0.5: 112 | det_max.append(dc[:1]) 113 | dc = dc[1:][iou < nms_thres] # remove ious > threshold 114 | 115 | elif nms_style == 'MERGE': # weighted mixture box 116 | while len(dc): 117 | if len(dc) == 1: 118 | det_max.append(dc) 119 | break 120 | # 有个bug:如果当前一批box中和最高conf(排序后是第一个也就是dc[0])的iou都小于nms_thres, 121 | # 那么i全为False,导致weights=[],从而weights.sum()=0导致dc[0]变成nan! 122 | i = skew_bbox_iou(dc[0], dc) > nms_thres # iou with other boxes, 返回的也是boolean,便于后面矩阵索引和筛选 123 | weights = dc[i, 5:6] # 大于nms阈值的重复较多的proposal,取出conf 124 | assert len(weights)>0, 'Bugs on MERGE NMS!!' 125 | dc[0, :5] = (weights * dc[i, :5]).sum(0) / weights.sum() # 将最高conf的bbox代之为大于阈值的所有bbox加权结果(conf不变,变了也没意义) 126 | det_max.append(dc[:1]) 127 | dc = dc[i == 0] # bool的false等价于0,这一步将dc中的已经计算过的predbox剔除掉 128 | 129 | elif nms_style == 'SOFT': # soft-NMS https://arxiv.org/abs/1704.04503 130 | sigma = 0.5 # soft-nms sigma parameter 131 | while len(dc): 132 | if len(dc) == 1: 133 | det_max.append(dc) 134 | break 135 | det_max.append(dc[:1]) 136 | iou = skew_bbox_iou(dc[0], dc[1:]) # iou with other boxes 137 | dc = dc[1:] 138 | dc[:, 4] *= torch.exp(-iou ** 2 / sigma) # decay confidences 139 | # dc = dc[dc[:, 4] > nms_thres] # new line per https://github.com/ultralytics/yolov3/issues/362 140 | 141 | if len(det_max): 142 | det_max = torch.cat(det_max) # concatenate 143 | import ipdb; ipdb.set_trace() 144 | output[image_i] = det_max[(-det_max[:, 5]).argsort()] # sort 145 | 146 | 147 | return output 148 | 149 | 150 | -------------------------------------------------------------------------------- /utils/nms/nms_wrapper_test.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | import cv2 4 | import math 5 | 6 | import r_nms 7 | 8 | 9 | def get_rotated_coors(box): 10 | assert len(box) > 0 , 'Input valid box!' 11 | cx = box[0]; cy = box[1]; w = box[2]; h = box[3]; a = box[4] 12 | xmin = cx - w*0.5; xmax = cx + w*0.5; ymin = cy - h*0.5; ymax = cy + h*0.5 13 | t_x0=xmin; t_y0=ymin; t_x1=xmin; t_y1=ymax; t_x2=xmax; t_y2=ymax; t_x3=xmax; t_y3=ymin 14 | R = np.eye(3) 15 | R[:2] = cv2.getRotationMatrix2D(angle=-a*180/math.pi, center=(cx,cy), scale=1) 16 | x0 = t_x0*R[0,0] + t_y0*R[0,1] + R[0,2] 17 | y0 = t_x0*R[1,0] + t_y0*R[1,1] + R[1,2] 18 | x1 = t_x1*R[0,0] + t_y1*R[0,1] + R[0,2] 19 | y1 = t_x1*R[1,0] + t_y1*R[1,1] + R[1,2] 20 | x2 = t_x2*R[0,0] + t_y2*R[0,1] + R[0,2] 21 | y2 = t_x2*R[1,0] + t_y2*R[1,1] + R[1,2] 22 | x3 = t_x3*R[0,0] + t_y3*R[0,1] + R[0,2] 23 | y3 = t_x3*R[1,0] + t_y3*R[1,1] + R[1,2] 24 | 25 | if isinstance(x0,torch.Tensor): 26 | r_box=torch.cat([x0.unsqueeze(0),y0.unsqueeze(0), 27 | x1.unsqueeze(0),y1.unsqueeze(0), 28 | x2.unsqueeze(0),y2.unsqueeze(0), 29 | x3.unsqueeze(0),y3.unsqueeze(0)], 0) 30 | else: 31 | r_box = np.array([x0,y0,x1,y1,x2,y2,x3,y3]) 32 | return r_box 33 | 34 | if __name__ == '__main__': 35 | boxes = np.array([[150, 150, 100, 100, 0, 0.99, 0.1], 36 | [160, 160, 100, 100, 0, 0.88, 0.1], 37 | [150, 150, 100, 100, -0.7854, 0.66, 0.1], 38 | [300, 300, 100, 100, 0., 0.77, 0.1]],dtype=np.float32) 39 | 40 | dets_th=torch.from_numpy(boxes).cuda() 41 | import ipdb; ipdb.set_trace() 42 | iou_thr = 0.1 43 | inds = r_nms.r_nms(dets_th, iou_thr) 44 | print(inds) 45 | 46 | img = np.zeros((416*2,416*2,3), np.uint8) 47 | img.fill(255) 48 | 49 | boxes = boxes[:,:-1] 50 | boxes = [get_rotated_coors(i).reshape(-1,2).astype(np.int32) for i in boxes] 51 | for box in boxes: 52 | img = cv2.polylines(img,[box],True,(0,0,255),1) 53 | cv2.imshow('anchor_show', img) 54 | cv2.waitKey(0) 55 | cv2.destroyAllWindows() -------------------------------------------------------------------------------- /utils/nms/setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup 2 | from torch.utils.cpp_extension import BuildExtension, CUDAExtension 3 | 4 | setup( 5 | name='r_nms', 6 | ext_modules=[ 7 | CUDAExtension('r_nms', [ 8 | 'src/rotate_polygon_nms.cpp', 9 | 'src/rotate_polygon_nms_kernel.cu', 10 | ]), 11 | ], 12 | cmdclass={'build_ext': BuildExtension}) 13 | 14 | -------------------------------------------------------------------------------- /utils/nms/src/rotate_polygon_nms.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | #define CHECK_CUDA(x) AT_CHECK(x.type().is_cuda(), #x, " must be a CUDAtensor ") 4 | 5 | at::Tensor nms_cuda(const at::Tensor boxes, float nms_overlap_thresh); 6 | 7 | at::Tensor r_nms(const at::Tensor& dets, const float threshold) { 8 | CHECK_CUDA(dets); 9 | if (dets.numel() == 0) 10 | return at::empty({0}, dets.options().dtype(at::kLong).device(at::kCPU)); 11 | return nms_cuda(dets, threshold); 12 | } 13 | 14 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { 15 | m.def("r_nms", &r_nms, "r_nms rnms"); 16 | } -------------------------------------------------------------------------------- /utils/nms/src/rotate_polygon_nms_kernel.cu: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | #include 5 | #include 6 | 7 | #include 8 | #include 9 | 10 | #define CUDA_CHECK(condition) \ 11 | /* Code block avoids redefinition of cudaError_t error */ \ 12 | do { \ 13 | cudaError_t error = condition; \ 14 | if (error != cudaSuccess) { \ 15 | std::cout << cudaGetErrorString(error) << std::endl; \ 16 | } \ 17 | } while (0) 18 | 19 | #define DIVUP(m,n) ((m) / (n) + ((m) % (n) > 0)) 20 | int const threadsPerBlock = sizeof(unsigned long long) * 8; 21 | 22 | __device__ inline float trangle_area(float * a, float * b, float * c) { 23 | return ((a[0] - c[0]) * (b[1] - c[1]) - (a[1] - c[1]) * (b[0] - c[0])) / 2.0; 24 | } 25 | 26 | __device__ inline float area(float * int_pts, int num_of_inter) { 27 | 28 | float area = 0.0; 29 | for (int i = 0; i < num_of_inter - 2; i++) { 30 | area += fabs(trangle_area(int_pts, int_pts + 2 * i + 2, int_pts + 2 * i + 4)); 31 | } 32 | return area; 33 | } 34 | 35 | __device__ inline void reorder_pts(float * int_pts, int num_of_inter) { 36 | 37 | 38 | 39 | if (num_of_inter > 0) { 40 | 41 | float center[2]; 42 | 43 | center[0] = 0.0; 44 | center[1] = 0.0; 45 | 46 | for (int i = 0; i < num_of_inter; i++) { 47 | center[0] += int_pts[2 * i]; 48 | center[1] += int_pts[2 * i + 1]; 49 | } 50 | center[0] /= num_of_inter; 51 | center[1] /= num_of_inter; 52 | 53 | float vs[16]; 54 | float v[2]; 55 | float d; 56 | for (int i = 0; i < num_of_inter; i++) { 57 | v[0] = int_pts[2 * i] - center[0]; 58 | v[1] = int_pts[2 * i + 1] - center[1]; 59 | d = sqrt(v[0] * v[0] + v[1] * v[1]); 60 | v[0] = v[0] / d; 61 | v[1] = v[1] / d; 62 | if (v[1] < 0) { 63 | v[0] = -2 - v[0]; 64 | } 65 | vs[i] = v[0]; 66 | } 67 | 68 | float temp, tx, ty; 69 | int j; 70 | for (int i = 1; ivs[i]){ 72 | temp = vs[i]; 73 | tx = int_pts[2 * i]; 74 | ty = int_pts[2 * i + 1]; 75 | j = i; 76 | while (j>0 && vs[j - 1]>temp){ 77 | vs[j] = vs[j - 1]; 78 | int_pts[j * 2] = int_pts[j * 2 - 2]; 79 | int_pts[j * 2 + 1] = int_pts[j * 2 - 1]; 80 | j--; 81 | } 82 | vs[j] = temp; 83 | int_pts[j * 2] = tx; 84 | int_pts[j * 2 + 1] = ty; 85 | } 86 | } 87 | } 88 | 89 | } 90 | __device__ inline bool inter2line(float * pts1, float *pts2, int i, int j, float * temp_pts) { 91 | 92 | float a[2]; 93 | float b[2]; 94 | float c[2]; 95 | float d[2]; 96 | 97 | float area_abc, area_abd, area_cda, area_cdb; 98 | 99 | a[0] = pts1[2 * i]; 100 | a[1] = pts1[2 * i + 1]; 101 | 102 | b[0] = pts1[2 * ((i + 1) % 4)]; 103 | b[1] = pts1[2 * ((i + 1) % 4) + 1]; 104 | 105 | c[0] = pts2[2 * j]; 106 | c[1] = pts2[2 * j + 1]; 107 | 108 | d[0] = pts2[2 * ((j + 1) % 4)]; 109 | d[1] = pts2[2 * ((j + 1) % 4) + 1]; 110 | 111 | area_abc = trangle_area(a, b, c); 112 | area_abd = trangle_area(a, b, d); 113 | 114 | if (area_abc * area_abd >= 0) { 115 | return false; 116 | } 117 | 118 | area_cda = trangle_area(c, d, a); 119 | area_cdb = area_cda + area_abc - area_abd; 120 | 121 | if (area_cda * area_cdb >= 0) { 122 | return false; 123 | } 124 | float t = area_cda / (area_abd - area_abc); 125 | 126 | float dx = t * (b[0] - a[0]); 127 | float dy = t * (b[1] - a[1]); 128 | temp_pts[0] = a[0] + dx; 129 | temp_pts[1] = a[1] + dy; 130 | 131 | return true; 132 | } 133 | 134 | __device__ inline bool in_rect(float pt_x, float pt_y, float * pts) { 135 | 136 | float ab[2]; 137 | float ad[2]; 138 | float ap[2]; 139 | 140 | float abab; 141 | float abap; 142 | float adad; 143 | float adap; 144 | 145 | ab[0] = pts[2] - pts[0]; 146 | ab[1] = pts[3] - pts[1]; 147 | 148 | ad[0] = pts[6] - pts[0]; 149 | ad[1] = pts[7] - pts[1]; 150 | 151 | ap[0] = pt_x - pts[0]; 152 | ap[1] = pt_y - pts[1]; 153 | 154 | abab = ab[0] * ab[0] + ab[1] * ab[1]; 155 | abap = ab[0] * ap[0] + ab[1] * ap[1]; 156 | adad = ad[0] * ad[0] + ad[1] * ad[1]; 157 | adap = ad[0] * ap[0] + ad[1] * ap[1]; 158 | 159 | return abab >= abap and abap >= 0 and adad >= adap and adap >= 0; 160 | } 161 | 162 | __device__ inline int inter_pts(float * pts1, float * pts2, float * int_pts) { 163 | 164 | int num_of_inter = 0; 165 | 166 | for (int i = 0; i < 4; i++) { 167 | if (in_rect(pts1[2 * i], pts1[2 * i + 1], pts2)) { 168 | int_pts[num_of_inter * 2] = pts1[2 * i]; 169 | int_pts[num_of_inter * 2 + 1] = pts1[2 * i + 1]; 170 | num_of_inter++; 171 | } 172 | if (in_rect(pts2[2 * i], pts2[2 * i + 1], pts1)) { 173 | int_pts[num_of_inter * 2] = pts2[2 * i]; 174 | int_pts[num_of_inter * 2 + 1] = pts2[2 * i + 1]; 175 | num_of_inter++; 176 | } 177 | } 178 | 179 | float temp_pts[2]; 180 | 181 | for (int i = 0; i < 4; i++) { 182 | for (int j = 0; j < 4; j++) { 183 | bool has_pts = inter2line(pts1, pts2, i, j, temp_pts); 184 | if (has_pts) { 185 | int_pts[num_of_inter * 2] = temp_pts[0]; 186 | int_pts[num_of_inter * 2 + 1] = temp_pts[1]; 187 | num_of_inter++; 188 | } 189 | } 190 | } 191 | 192 | 193 | return num_of_inter; 194 | } 195 | 196 | __device__ inline void convert_region(float * pts, float const * const region) { 197 | 198 | float angle = region[4]; 199 | //float a_cos = cos(angle / 180.0*3.1415926535); 200 | //float a_sin = sin(angle / 180.0*3.1415926535); 201 | float a_cos = cos(angle); 202 | float a_sin = sin(angle); 203 | 204 | float ctr_x = region[0]; 205 | float ctr_y = region[1]; 206 | 207 | float w = region[2]; 208 | float h = region[3]; 209 | 210 | float pts_x[4]; 211 | float pts_y[4]; 212 | 213 | pts_x[0] = -w / 2; 214 | pts_x[1] = w / 2; 215 | pts_x[2] = w / 2; 216 | pts_x[3] = -w / 2; 217 | 218 | pts_y[0] = -h / 2; 219 | pts_y[1] = -h / 2; 220 | pts_y[2] = h / 2; 221 | pts_y[3] = h / 2; 222 | 223 | for (int i = 0; i < 4; i++) { 224 | pts[7 - 2 * i - 1] = a_cos * pts_x[i] - a_sin * pts_y[i] + ctr_x; 225 | pts[7 - 2 * i] = a_sin * pts_x[i] + a_cos * pts_y[i] + ctr_y; 226 | 227 | } 228 | 229 | } 230 | 231 | 232 | __device__ inline float inter(float const * const region1, float const * const region2) { 233 | 234 | float pts1[8]; 235 | float pts2[8]; 236 | float int_pts[16]; 237 | int num_of_inter; 238 | 239 | convert_region(pts1, region1); 240 | convert_region(pts2, region2); 241 | 242 | num_of_inter = inter_pts(pts1, pts2, int_pts); 243 | 244 | reorder_pts(int_pts, num_of_inter); 245 | 246 | return area(int_pts, num_of_inter); 247 | 248 | 249 | } 250 | 251 | __device__ inline float devRotateIoU(float const * const region1, float const * const region2) { 252 | 253 | float area1 = region1[2] * region1[3]; 254 | float area2 = region2[2] * region2[3]; 255 | float area_inter = inter(region1, region2); 256 | 257 | return area_inter / (area1 + area2 - area_inter); 258 | 259 | 260 | } 261 | 262 | __global__ void rotate_nms_kernel(const int n_boxes, const float nms_overlap_thresh, 263 | const float *dev_boxes, unsigned long long *dev_mask) { 264 | const int row_start = blockIdx.y; 265 | const int col_start = blockIdx.x; 266 | 267 | // if (row_start > col_start) return; 268 | 269 | const int row_size = 270 | min(n_boxes - row_start * threadsPerBlock, threadsPerBlock); 271 | const int col_size = 272 | min(n_boxes - col_start * threadsPerBlock, threadsPerBlock); 273 | 274 | __shared__ float block_boxes[threadsPerBlock * 6]; 275 | if (threadIdx.x < col_size) { 276 | block_boxes[threadIdx.x * 6 + 0] = 277 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 6 + 0]; 278 | block_boxes[threadIdx.x * 6 + 1] = 279 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 6 + 1]; 280 | block_boxes[threadIdx.x * 6 + 2] = 281 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 6 + 2]; 282 | block_boxes[threadIdx.x * 6 + 3] = 283 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 6 + 3]; 284 | block_boxes[threadIdx.x * 6 + 4] = 285 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 6 + 4]; 286 | block_boxes[threadIdx.x * 6 + 5] = 287 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 6 + 5]; 288 | } 289 | __syncthreads(); 290 | 291 | if (threadIdx.x < row_size) { 292 | const int cur_box_idx = threadsPerBlock * row_start + threadIdx.x; 293 | const float *cur_box = dev_boxes + cur_box_idx * 6; 294 | int i = 0; 295 | unsigned long long t = 0; 296 | int start = 0; 297 | if (row_start == col_start) { 298 | start = threadIdx.x + 1; 299 | } 300 | for (i = start; i < col_size; i++) { 301 | if (devRotateIoU(cur_box, block_boxes + i * 6) > nms_overlap_thresh) { 302 | t |= 1ULL << i; 303 | } 304 | } 305 | const int col_blocks = DIVUP(n_boxes, threadsPerBlock); 306 | dev_mask[cur_box_idx * col_blocks + col_start] = t; 307 | } 308 | } 309 | 310 | void _set_device(int device_id) { 311 | int current_device; 312 | CUDA_CHECK(cudaGetDevice(¤t_device)); 313 | if (current_device == device_id) { 314 | return; 315 | } 316 | // The call to cudaSetDevice must come before any calls to Get, which 317 | // may perform initialization using the GPU. 318 | CUDA_CHECK(cudaSetDevice(device_id)); 319 | } 320 | 321 | 322 | // boxes is a N x 5 tensor 323 | at::Tensor nms_cuda(const at::Tensor boxes, float nms_overlap_thresh) { 324 | using scalar_t = float; 325 | AT_ASSERTM(boxes.type().is_cuda(), "boxes must be a CUDA tensor"); 326 | auto scores = boxes.select(1, 5); //dim=1, select the conf_score 327 | auto order_t = std::get<1>(scores.sort(0, /* descending=*/true)); //conf from high to low 328 | auto boxes_sorted = boxes.index_select(0, order_t); // re-rank the boxes via conf 329 | 330 | int boxes_num = boxes.size(0); 331 | 332 | const int col_blocks = THCCeilDiv(boxes_num, threadsPerBlock); 333 | 334 | scalar_t* boxes_dev = boxes_sorted.data(); 335 | 336 | THCState *state = at::globalContext().lazyInitCUDA(); // TODO replace with getTHCState 337 | 338 | unsigned long long* mask_dev = NULL; 339 | //THCudaCheck(THCudaMalloc(state, (void**) &mask_dev, 340 | // boxes_num * col_blocks * sizeof(unsigned long long))); 341 | 342 | mask_dev = (unsigned long long*) THCudaMalloc(state, boxes_num * col_blocks * sizeof(unsigned long long)); 343 | 344 | dim3 blocks(THCCeilDiv(boxes_num, threadsPerBlock), 345 | THCCeilDiv(boxes_num, threadsPerBlock)); 346 | dim3 threads(threadsPerBlock); 347 | rotate_nms_kernel << > >(boxes_num, 348 | nms_overlap_thresh, 349 | boxes_dev, 350 | mask_dev); 351 | 352 | std::vector mask_host(boxes_num * col_blocks); 353 | THCudaCheck(cudaMemcpy(&mask_host[0], 354 | mask_dev, 355 | sizeof(unsigned long long) * boxes_num * col_blocks, 356 | cudaMemcpyDeviceToHost)); 357 | 358 | std::vector remv(col_blocks); 359 | memset(&remv[0], 0, sizeof(unsigned long long) * col_blocks); 360 | 361 | at::Tensor keep = at::empty({ boxes_num }, boxes.options().dtype(at::kLong).device(at::kCPU)); 362 | int64_t* keep_out = keep.data(); 363 | 364 | int num_to_keep = 0; 365 | for (int i = 0; i < boxes_num; i++) { 366 | int nblock = i / threadsPerBlock; 367 | int inblock = i % threadsPerBlock; 368 | 369 | if (!(remv[nblock] & (1ULL << inblock))) { 370 | keep_out[num_to_keep++] = i; 371 | unsigned long long *p = &mask_host[0] + i * col_blocks; 372 | for (int j = nblock; j < col_blocks; j++) { 373 | remv[j] |= p[j]; 374 | } 375 | } 376 | } 377 | 378 | THCudaFree(state, mask_dev); 379 | // TODO improve this part 380 | return std::get<0>(order_t.index({ 381 | keep.narrow(/*dim=*/0, /*start=*/0, /*length=*/num_to_keep).to( 382 | order_t.device(), keep.scalar_type()) 383 | }).sort(0, false)); 384 | } -------------------------------------------------------------------------------- /utils/parse_config.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import math 3 | 4 | 5 | 6 | def cfg2anchors(val): 7 | if 'ara' in val: # area, ratio, angle respectly 8 | val = val[val.index('ara')+3:] 9 | val = [i for i in val.split('/') if len(i)!=0] # ['12130, 42951, 113378 ', ' 4.18, 6.50, 8.75 ', '-60,-30,0,30,60,90'] 10 | areas = [float(i) for i in val[0].split(',')] 11 | ratios = [float(i) for i in val[1].split(',')] # w/h 12 | angles = [float(i) for i in val[2].split(',')] 13 | anchors = [] 14 | for area in areas: 15 | for ratio in ratios: 16 | for angle in angles: 17 | anchor_w = math.sqrt(area*ratio) 18 | anchor_h = math.sqrt(area/ratio) 19 | angle = angle*math.pi/180 20 | anchor = [anchor_w, anchor_h, angle] 21 | anchors.append(anchor) 22 | assert len(anchors) == len(areas)*len(ratios)*len(angles),'Something wrong in anchor settings.' 23 | # print(np.array(anchors)) 24 | return np.array(anchors) 25 | else: # anchors generated via k-means, input anchor.txt 26 | # 默认是15度一个anchor 27 | anchors_setting = val.strip(' ') 28 | anchors = np.loadtxt(anchors_setting) 29 | angle = np.array([i for i in range(-6,6)])*math.pi/12 30 | anchors = np.concatenate([np.column_stack((np.expand_dims(i,0).repeat(len(angle),0),angle.T)) for i in anchors],0) 31 | return anchors 32 | 33 | 34 | # cfg解析函数: 35 | # 将cfg的layer,setting等解析成dict的形式,返回一个包含这些dict的list; 36 | # lsit的每个元素(dict)对应cfg文件的一个 [] 开头的block(如net等),第一个元素就是该block的性质如{'type': 'net'...} 37 | def parse_model_cfg(path): 38 | # Parses the yolo-v3 layer configuration file and returns module definitions 39 | file = open(path, 'r') 40 | lines = file.read().split('\n') 41 | lines = [x for x in lines if x and not x.startswith('#')] 42 | lines = [x.rstrip().lstrip() for x in lines] # get rid of fringe whitespaces 43 | mdefs = [] # module definitions 44 | for line in lines: 45 | if line.startswith('['): # This marks the start of a new block 46 | mdefs.append({}) 47 | mdefs[-1]['type'] = line[1:-1].rstrip() 48 | if mdefs[-1]['type'] == 'convolutional': 49 | mdefs[-1]['batch_normalize'] = 0 # pre-populate with zeros (may be overwritten later) 50 | else: 51 | key, val = line.split("=") 52 | key = key.rstrip() 53 | 54 | if 'anchors' in key: 55 | mdefs[-1][key] = cfg2anchors(val) # np anchors 56 | else: 57 | mdefs[-1][key] = val.strip() 58 | 59 | return mdefs 60 | 61 | # 像mmdetection一样,将配置文件转码return成dict的键值对形式,便于索引查询 62 | def parse_data_cfg(path): 63 | # Parses the data configuration file 64 | options = dict() 65 | with open(path, 'r') as fp: 66 | lines = fp.readlines() 67 | 68 | for line in lines: 69 | line = line.strip() 70 | if line == '' or line.startswith('#'): 71 | continue 72 | key, val = line.split('=') 73 | options[key.strip()] = val.strip() 74 | 75 | return options 76 | -------------------------------------------------------------------------------- /utils/torch_utils.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | import torch 4 | 5 | 6 | def init_seeds(seed=0): 7 | torch.manual_seed(seed) 8 | torch.cuda.manual_seed(seed) 9 | torch.cuda.manual_seed_all(seed) 10 | 11 | # Remove randomness (may be slower on Tesla GPUs) # https://pytorch.org/docs/stable/notes/randomness.html 12 | if seed == 0: 13 | torch.backends.cudnn.deterministic = True 14 | torch.backends.cudnn.benchmark = False 15 | 16 | 17 | def select_device(device=None, apex=False): 18 | if device == 'cpu': 19 | pass 20 | elif device: # Set environment variable if device is specified 21 | os.environ['CUDA_VISIBLE_DEVICES'] = device 22 | 23 | # apex if mixed precision training https://github.com/NVIDIA/apex 24 | cuda = False if device == 'cpu' else torch.cuda.is_available() 25 | device = torch.device('cuda:0' if cuda else 'cpu') 26 | 27 | if not cuda: 28 | print('Using CPU') 29 | if cuda: 30 | c = 1024 ** 2 # bytes to MB 31 | ng = torch.cuda.device_count() 32 | x = [torch.cuda.get_device_properties(i) for i in range(ng)] 33 | cuda_str = 'Using CUDA ' + ('Apex ' if apex else '') 34 | for i in range(0, ng): 35 | if i == 1: 36 | # torch.cuda.set_device(0) # OPTIONAL: Set GPU ID 37 | cuda_str = ' ' * len(cuda_str) 38 | print("%sdevice%g _CudaDeviceProperties(name='%s', total_memory=%dMB)" % 39 | (cuda_str, i, x[i].name, x[i].total_memory / c)) 40 | 41 | print('') # skip a line 42 | return device 43 | 44 | 45 | def fuse_conv_and_bn(conv, bn): 46 | # https://tehnokv.com/posts/fusing-batchnorm-and-conv/ 47 | with torch.no_grad(): 48 | # init 49 | fusedconv = torch.nn.Conv2d(conv.in_channels, 50 | conv.out_channels, 51 | kernel_size=conv.kernel_size, 52 | stride=conv.stride, 53 | padding=conv.padding, 54 | bias=True) 55 | 56 | # prepare filters 57 | w_conv = conv.weight.clone().view(conv.out_channels, -1) 58 | w_bn = torch.diag(bn.weight.div(torch.sqrt(bn.eps + bn.running_var))) 59 | fusedconv.weight.copy_(torch.mm(w_bn, w_conv).view(fusedconv.weight.size())) 60 | 61 | # prepare spatial bias 62 | if conv.bias is not None: 63 | b_conv = conv.bias 64 | else: 65 | b_conv = torch.zeros(conv.weight.size(0)) 66 | b_bn = bn.bias - bn.weight.mul(bn.running_mean).div(torch.sqrt(bn.running_var + bn.eps)) 67 | fusedconv.bias.copy_(b_conv + b_bn) 68 | 69 | return fusedconv 70 | --------------------------------------------------------------------------------