├── .gitignore ├── README.md ├── app.py ├── cfg └── yolov3.cfg ├── darknet.py ├── data └── coco.names ├── images ├── bar.jpeg ├── city_scene.jpg ├── class.jpg ├── dog.jpg ├── home.jpeg ├── meeting.jpeg └── snack.jpg ├── instance └── README.md ├── iti ├── Title Background.gif ├── image └── postman.png ├── requirements.txt ├── sample_output ├── 20200521_233133_570.jpg ├── 20200521_233208_33.jpg ├── 20200521_233222_914.jpg └── 20200521_233233_695.jpg ├── utils.py ├── weights └── README.md └── yolo.py /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | .env 3 | .flaskenv 4 | *.pyc 5 | *.pyo 6 | env/ 7 | env* 8 | dist/ 9 | build/ 10 | *.egg 11 | *.egg-info/ 12 | _mailinglist 13 | .tox/ 14 | .cache/ 15 | .pytest_cache/ 16 | .idea/ 17 | docs/_build/ 18 | .vscode 19 | *.weights 20 | instance/output/* 21 | instance/uploads/* -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # A simple YOLOv3 Object Detection API in Python (Flask) 2 | 3 | 4 | This repository provides a simple implementation of object detection in Python, served as an API using Flask. It is based on the YOLOv3 object detection system and we will be using the pre-trained weights on the COCO dataset. 5 | 6 | 7 | ## Installation 8 | 9 | ### 1. Clone repository and install requirements 10 | 11 | ##### NOTE: I am using Windows OS and Pip for package installation, and I have to install pytorch separately else I will run into issues. The command for installation varies, so do check out the PyTorch website and see which command you should run under "Quick Start Locally". For me, I run this: 12 | ``` 13 | pip install torch==1.5.0+cpu torchvision==0.6.0+cpu -f https://download.pytorch.org/whl/torch_stable.html 14 | ``` 15 | ``` 16 | git clone git@github.com:yankai364/Object-Detection-Flask-API.git 17 | cd Object-Detection-Flask-API 18 | pip install -r requirements.txt 19 | ``` 20 | 21 | 22 | ### 2. Download pre-trained weights 23 | You can download the YOLOv3 pre-trained weights on the COCO dataset here: 24 | 25 | https://pjreddie.com/media/files/yolov3.weights 26 | 27 | Once downloaded, place the .weights file in the weights folder. 28 | 29 | 30 | ## API Documentation 31 | There is only 1 endpoint in this API. 32 | 33 | ### Request 34 | 35 | Method: POST
36 | Endpoint: /upload/
37 | Body: 38 | ``` 39 | { 40 | "file": 41 | } 42 | ``` 43 | 44 | ### Response 45 | ``` 46 | { 47 | "data": { 48 | "objects_count": { 49 | : , 50 | : 51 | }, 52 | "objects_confidence": { 53 | : , 54 | : , 55 | : , 56 | ... 57 | }, 58 | "filename": 59 | } 60 | } 61 | ``` 62 | 63 | ## Usage 64 | 65 | ### 1. Start the application 66 | ``` 67 | cd Object-Detection-Flask-API 68 | python app.py 69 | ``` 70 | 71 | If the application runs successfully, you should see the following: 72 | ``` 73 | * Serving Flask app "app" (lazy loading) 74 | * Environment: production 75 | WARNING: This is a development server. Do not use it in a production deployment. 76 | Use a production WSGI server instead. 77 | * Debug mode: off 78 | * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit) 79 | ``` 80 | 81 | ### 2. Test the API 82 | You can test the API using Postman. Let's test it with the image "bar.jpeg" in the images folder: 83 | 84 | 85 | 86 | Open your Postman and configure the request [according to the documentation above](#api-documentation). Remember to set the "file" key to the "File" type and attach the image. Your request should look like this: 87 | 88 | 89 | 90 | Click the Send button. The request may take a few seconds to complete, but you should receive the following response: 91 | 92 | #### Response: 93 | ``` 94 | { 95 | "data": { 96 | "filename": "20200523_120754_313.jpg", 97 | "objects_confidence": [ 98 | { 99 | "cell phone": 1.0 100 | }, 101 | { 102 | "wine glass": 0.999997 103 | }, 104 | { 105 | "wine glass": 0.999972 106 | }, 107 | { 108 | "cup": 0.990166 109 | }, 110 | { 111 | "person": 0.999974 112 | }, 113 | { 114 | "bottle": 0.824177 115 | }, 116 | { 117 | "person": 1.0 118 | } 119 | ], 120 | "objects_count": { 121 | "bottle": 1, 122 | "cell phone": 1, 123 | "cup": 1, 124 | "person": 2, 125 | "wine glass": 2 126 | } 127 | } 128 | } 129 | ``` 130 | 131 | 132 | The application also draws the bounding boxes for each object in the image and saves it. The output image is named according to the filename in the response. You can find the below image in the /instance/output folder. 133 | 134 | 135 | 136 | 137 | And that's it! The pre-trained weights are decent in detecting everyday objects, so you can also test it using your own photos (instead of stock images): 138 |
139 | 140 | 141 | 142 | ## Acknowledgements 143 | YOLOv3 -------------------------------------------------------------------------------- /app.py: -------------------------------------------------------------------------------- 1 | from flask import Flask, request, jsonify 2 | import os 3 | from werkzeug.utils import secure_filename 4 | from flask_cors import CORS 5 | from yolo import process 6 | from datetime import datetime 7 | from random import randint 8 | 9 | 10 | app = Flask(__name__) 11 | CORS(app) 12 | uploads_dir = os.path.join(app.instance_path, 'uploads') 13 | output_dir = os.path.join(app.instance_path, 'output') 14 | 15 | 16 | @app.route('/upload/', methods=['POST']) 17 | def upload_image(): 18 | try: 19 | os.mkdir(uploads_dir) 20 | os.mkdir(output_dir) 21 | except: 22 | pass 23 | 24 | file = request.files['file'] 25 | if not file: 26 | return {'error': 'Missing file'}, 400 27 | 28 | now = datetime.now() 29 | filename = now.strftime("%Y%m%d_%H%M%S") + "_" + str(randint(000, 999)) 30 | file.save(os.path.join(uploads_dir, secure_filename(filename + '.jpg'))) 31 | objects_count, objects_confidence = process(uploads_dir, output_dir, filename) 32 | 33 | response = { 34 | 'objects_count': objects_count, 35 | 'objects_confidence': objects_confidence, 36 | 'filename': filename + '.jpg' 37 | } 38 | 39 | return jsonify({"data": response}), 200 40 | 41 | 42 | if __name__ == '__main__': 43 | app.run(host="0.0.0.0", port=5000) 44 | -------------------------------------------------------------------------------- /cfg/yolov3.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | batch=1 4 | subdivisions=1 5 | # Training 6 | # batch=64 7 | # subdivisions=16 8 | width=416 9 | height=416 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | [convolutional] 26 | batch_normalize=1 27 | filters=32 28 | size=3 29 | stride=1 30 | pad=1 31 | activation=leaky 32 | 33 | # Downsample 34 | 35 | [convolutional] 36 | batch_normalize=1 37 | filters=64 38 | size=3 39 | stride=2 40 | pad=1 41 | activation=leaky 42 | 43 | [convolutional] 44 | batch_normalize=1 45 | filters=32 46 | size=1 47 | stride=1 48 | pad=1 49 | activation=leaky 50 | 51 | [convolutional] 52 | batch_normalize=1 53 | filters=64 54 | size=3 55 | stride=1 56 | pad=1 57 | activation=leaky 58 | 59 | [shortcut] 60 | from=-3 61 | activation=linear 62 | 63 | # Downsample 64 | 65 | [convolutional] 66 | batch_normalize=1 67 | filters=128 68 | size=3 69 | stride=2 70 | pad=1 71 | activation=leaky 72 | 73 | [convolutional] 74 | batch_normalize=1 75 | filters=64 76 | size=1 77 | stride=1 78 | pad=1 79 | activation=leaky 80 | 81 | [convolutional] 82 | batch_normalize=1 83 | filters=128 84 | size=3 85 | stride=1 86 | pad=1 87 | activation=leaky 88 | 89 | [shortcut] 90 | from=-3 91 | activation=linear 92 | 93 | [convolutional] 94 | batch_normalize=1 95 | filters=64 96 | size=1 97 | stride=1 98 | pad=1 99 | activation=leaky 100 | 101 | [convolutional] 102 | batch_normalize=1 103 | filters=128 104 | size=3 105 | stride=1 106 | pad=1 107 | activation=leaky 108 | 109 | [shortcut] 110 | from=-3 111 | activation=linear 112 | 113 | # Downsample 114 | 115 | [convolutional] 116 | batch_normalize=1 117 | filters=256 118 | size=3 119 | stride=2 120 | pad=1 121 | activation=leaky 122 | 123 | [convolutional] 124 | batch_normalize=1 125 | filters=128 126 | size=1 127 | stride=1 128 | pad=1 129 | activation=leaky 130 | 131 | [convolutional] 132 | batch_normalize=1 133 | filters=256 134 | size=3 135 | stride=1 136 | pad=1 137 | activation=leaky 138 | 139 | [shortcut] 140 | from=-3 141 | activation=linear 142 | 143 | [convolutional] 144 | batch_normalize=1 145 | filters=128 146 | size=1 147 | stride=1 148 | pad=1 149 | activation=leaky 150 | 151 | [convolutional] 152 | batch_normalize=1 153 | filters=256 154 | size=3 155 | stride=1 156 | pad=1 157 | activation=leaky 158 | 159 | [shortcut] 160 | from=-3 161 | activation=linear 162 | 163 | [convolutional] 164 | batch_normalize=1 165 | filters=128 166 | size=1 167 | stride=1 168 | pad=1 169 | activation=leaky 170 | 171 | [convolutional] 172 | batch_normalize=1 173 | filters=256 174 | size=3 175 | stride=1 176 | pad=1 177 | activation=leaky 178 | 179 | [shortcut] 180 | from=-3 181 | activation=linear 182 | 183 | [convolutional] 184 | batch_normalize=1 185 | filters=128 186 | size=1 187 | stride=1 188 | pad=1 189 | activation=leaky 190 | 191 | [convolutional] 192 | batch_normalize=1 193 | filters=256 194 | size=3 195 | stride=1 196 | pad=1 197 | activation=leaky 198 | 199 | [shortcut] 200 | from=-3 201 | activation=linear 202 | 203 | 204 | [convolutional] 205 | batch_normalize=1 206 | filters=128 207 | size=1 208 | stride=1 209 | pad=1 210 | activation=leaky 211 | 212 | [convolutional] 213 | batch_normalize=1 214 | filters=256 215 | size=3 216 | stride=1 217 | pad=1 218 | activation=leaky 219 | 220 | [shortcut] 221 | from=-3 222 | activation=linear 223 | 224 | [convolutional] 225 | batch_normalize=1 226 | filters=128 227 | size=1 228 | stride=1 229 | pad=1 230 | activation=leaky 231 | 232 | [convolutional] 233 | batch_normalize=1 234 | filters=256 235 | size=3 236 | stride=1 237 | pad=1 238 | activation=leaky 239 | 240 | [shortcut] 241 | from=-3 242 | activation=linear 243 | 244 | [convolutional] 245 | batch_normalize=1 246 | filters=128 247 | size=1 248 | stride=1 249 | pad=1 250 | activation=leaky 251 | 252 | [convolutional] 253 | batch_normalize=1 254 | filters=256 255 | size=3 256 | stride=1 257 | pad=1 258 | activation=leaky 259 | 260 | [shortcut] 261 | from=-3 262 | activation=linear 263 | 264 | [convolutional] 265 | batch_normalize=1 266 | filters=128 267 | size=1 268 | stride=1 269 | pad=1 270 | activation=leaky 271 | 272 | [convolutional] 273 | batch_normalize=1 274 | filters=256 275 | size=3 276 | stride=1 277 | pad=1 278 | activation=leaky 279 | 280 | [shortcut] 281 | from=-3 282 | activation=linear 283 | 284 | # Downsample 285 | 286 | [convolutional] 287 | batch_normalize=1 288 | filters=512 289 | size=3 290 | stride=2 291 | pad=1 292 | activation=leaky 293 | 294 | [convolutional] 295 | batch_normalize=1 296 | filters=256 297 | size=1 298 | stride=1 299 | pad=1 300 | activation=leaky 301 | 302 | [convolutional] 303 | batch_normalize=1 304 | filters=512 305 | size=3 306 | stride=1 307 | pad=1 308 | activation=leaky 309 | 310 | [shortcut] 311 | from=-3 312 | activation=linear 313 | 314 | 315 | [convolutional] 316 | batch_normalize=1 317 | filters=256 318 | size=1 319 | stride=1 320 | pad=1 321 | activation=leaky 322 | 323 | [convolutional] 324 | batch_normalize=1 325 | filters=512 326 | size=3 327 | stride=1 328 | pad=1 329 | activation=leaky 330 | 331 | [shortcut] 332 | from=-3 333 | activation=linear 334 | 335 | 336 | [convolutional] 337 | batch_normalize=1 338 | filters=256 339 | size=1 340 | stride=1 341 | pad=1 342 | activation=leaky 343 | 344 | [convolutional] 345 | batch_normalize=1 346 | filters=512 347 | size=3 348 | stride=1 349 | pad=1 350 | activation=leaky 351 | 352 | [shortcut] 353 | from=-3 354 | activation=linear 355 | 356 | 357 | [convolutional] 358 | batch_normalize=1 359 | filters=256 360 | size=1 361 | stride=1 362 | pad=1 363 | activation=leaky 364 | 365 | [convolutional] 366 | batch_normalize=1 367 | filters=512 368 | size=3 369 | stride=1 370 | pad=1 371 | activation=leaky 372 | 373 | [shortcut] 374 | from=-3 375 | activation=linear 376 | 377 | [convolutional] 378 | batch_normalize=1 379 | filters=256 380 | size=1 381 | stride=1 382 | pad=1 383 | activation=leaky 384 | 385 | [convolutional] 386 | batch_normalize=1 387 | filters=512 388 | size=3 389 | stride=1 390 | pad=1 391 | activation=leaky 392 | 393 | [shortcut] 394 | from=-3 395 | activation=linear 396 | 397 | 398 | [convolutional] 399 | batch_normalize=1 400 | filters=256 401 | size=1 402 | stride=1 403 | pad=1 404 | activation=leaky 405 | 406 | [convolutional] 407 | batch_normalize=1 408 | filters=512 409 | size=3 410 | stride=1 411 | pad=1 412 | activation=leaky 413 | 414 | [shortcut] 415 | from=-3 416 | activation=linear 417 | 418 | 419 | [convolutional] 420 | batch_normalize=1 421 | filters=256 422 | size=1 423 | stride=1 424 | pad=1 425 | activation=leaky 426 | 427 | [convolutional] 428 | batch_normalize=1 429 | filters=512 430 | size=3 431 | stride=1 432 | pad=1 433 | activation=leaky 434 | 435 | [shortcut] 436 | from=-3 437 | activation=linear 438 | 439 | [convolutional] 440 | batch_normalize=1 441 | filters=256 442 | size=1 443 | stride=1 444 | pad=1 445 | activation=leaky 446 | 447 | [convolutional] 448 | batch_normalize=1 449 | filters=512 450 | size=3 451 | stride=1 452 | pad=1 453 | activation=leaky 454 | 455 | [shortcut] 456 | from=-3 457 | activation=linear 458 | 459 | # Downsample 460 | 461 | [convolutional] 462 | batch_normalize=1 463 | filters=1024 464 | size=3 465 | stride=2 466 | pad=1 467 | activation=leaky 468 | 469 | [convolutional] 470 | batch_normalize=1 471 | filters=512 472 | size=1 473 | stride=1 474 | pad=1 475 | activation=leaky 476 | 477 | [convolutional] 478 | batch_normalize=1 479 | filters=1024 480 | size=3 481 | stride=1 482 | pad=1 483 | activation=leaky 484 | 485 | [shortcut] 486 | from=-3 487 | activation=linear 488 | 489 | [convolutional] 490 | batch_normalize=1 491 | filters=512 492 | size=1 493 | stride=1 494 | pad=1 495 | activation=leaky 496 | 497 | [convolutional] 498 | batch_normalize=1 499 | filters=1024 500 | size=3 501 | stride=1 502 | pad=1 503 | activation=leaky 504 | 505 | [shortcut] 506 | from=-3 507 | activation=linear 508 | 509 | [convolutional] 510 | batch_normalize=1 511 | filters=512 512 | size=1 513 | stride=1 514 | pad=1 515 | activation=leaky 516 | 517 | [convolutional] 518 | batch_normalize=1 519 | filters=1024 520 | size=3 521 | stride=1 522 | pad=1 523 | activation=leaky 524 | 525 | [shortcut] 526 | from=-3 527 | activation=linear 528 | 529 | [convolutional] 530 | batch_normalize=1 531 | filters=512 532 | size=1 533 | stride=1 534 | pad=1 535 | activation=leaky 536 | 537 | [convolutional] 538 | batch_normalize=1 539 | filters=1024 540 | size=3 541 | stride=1 542 | pad=1 543 | activation=leaky 544 | 545 | [shortcut] 546 | from=-3 547 | activation=linear 548 | 549 | ###################### 550 | 551 | [convolutional] 552 | batch_normalize=1 553 | filters=512 554 | size=1 555 | stride=1 556 | pad=1 557 | activation=leaky 558 | 559 | [convolutional] 560 | batch_normalize=1 561 | size=3 562 | stride=1 563 | pad=1 564 | filters=1024 565 | activation=leaky 566 | 567 | [convolutional] 568 | batch_normalize=1 569 | filters=512 570 | size=1 571 | stride=1 572 | pad=1 573 | activation=leaky 574 | 575 | [convolutional] 576 | batch_normalize=1 577 | size=3 578 | stride=1 579 | pad=1 580 | filters=1024 581 | activation=leaky 582 | 583 | [convolutional] 584 | batch_normalize=1 585 | filters=512 586 | size=1 587 | stride=1 588 | pad=1 589 | activation=leaky 590 | 591 | [convolutional] 592 | batch_normalize=1 593 | size=3 594 | stride=1 595 | pad=1 596 | filters=1024 597 | activation=leaky 598 | 599 | [convolutional] 600 | size=1 601 | stride=1 602 | pad=1 603 | filters=255 604 | activation=linear 605 | 606 | 607 | [yolo] 608 | mask = 6,7,8 609 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 610 | classes=80 611 | num=9 612 | jitter=.3 613 | ignore_thresh = .5 614 | truth_thresh = 1 615 | random=1 616 | 617 | 618 | [route] 619 | layers = -4 620 | 621 | [convolutional] 622 | batch_normalize=1 623 | filters=256 624 | size=1 625 | stride=1 626 | pad=1 627 | activation=leaky 628 | 629 | [upsample] 630 | stride=2 631 | 632 | [route] 633 | layers = -1, 61 634 | 635 | 636 | 637 | [convolutional] 638 | batch_normalize=1 639 | filters=256 640 | size=1 641 | stride=1 642 | pad=1 643 | activation=leaky 644 | 645 | [convolutional] 646 | batch_normalize=1 647 | size=3 648 | stride=1 649 | pad=1 650 | filters=512 651 | activation=leaky 652 | 653 | [convolutional] 654 | batch_normalize=1 655 | filters=256 656 | size=1 657 | stride=1 658 | pad=1 659 | activation=leaky 660 | 661 | [convolutional] 662 | batch_normalize=1 663 | size=3 664 | stride=1 665 | pad=1 666 | filters=512 667 | activation=leaky 668 | 669 | [convolutional] 670 | batch_normalize=1 671 | filters=256 672 | size=1 673 | stride=1 674 | pad=1 675 | activation=leaky 676 | 677 | [convolutional] 678 | batch_normalize=1 679 | size=3 680 | stride=1 681 | pad=1 682 | filters=512 683 | activation=leaky 684 | 685 | [convolutional] 686 | size=1 687 | stride=1 688 | pad=1 689 | filters=255 690 | activation=linear 691 | 692 | 693 | [yolo] 694 | mask = 3,4,5 695 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 696 | classes=80 697 | num=9 698 | jitter=.3 699 | ignore_thresh = .5 700 | truth_thresh = 1 701 | random=1 702 | 703 | 704 | 705 | [route] 706 | layers = -4 707 | 708 | [convolutional] 709 | batch_normalize=1 710 | filters=128 711 | size=1 712 | stride=1 713 | pad=1 714 | activation=leaky 715 | 716 | [upsample] 717 | stride=2 718 | 719 | [route] 720 | layers = -1, 36 721 | 722 | 723 | 724 | [convolutional] 725 | batch_normalize=1 726 | filters=128 727 | size=1 728 | stride=1 729 | pad=1 730 | activation=leaky 731 | 732 | [convolutional] 733 | batch_normalize=1 734 | size=3 735 | stride=1 736 | pad=1 737 | filters=256 738 | activation=leaky 739 | 740 | [convolutional] 741 | batch_normalize=1 742 | filters=128 743 | size=1 744 | stride=1 745 | pad=1 746 | activation=leaky 747 | 748 | [convolutional] 749 | batch_normalize=1 750 | size=3 751 | stride=1 752 | pad=1 753 | filters=256 754 | activation=leaky 755 | 756 | [convolutional] 757 | batch_normalize=1 758 | filters=128 759 | size=1 760 | stride=1 761 | pad=1 762 | activation=leaky 763 | 764 | [convolutional] 765 | batch_normalize=1 766 | size=3 767 | stride=1 768 | pad=1 769 | filters=256 770 | activation=leaky 771 | 772 | [convolutional] 773 | size=1 774 | stride=1 775 | pad=1 776 | filters=255 777 | activation=linear 778 | 779 | 780 | [yolo] 781 | mask = 0,1,2 782 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 783 | classes=80 784 | num=9 785 | jitter=.3 786 | ignore_thresh = .5 787 | truth_thresh = 1 788 | random=1 789 | 790 | -------------------------------------------------------------------------------- /darknet.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import numpy as np 4 | 5 | 6 | class YoloLayer(nn.Module): 7 | def __init__(self, anchor_mask=[], num_classes=0, anchors=[], num_anchors=1): 8 | super(YoloLayer, self).__init__() 9 | self.anchor_mask = anchor_mask 10 | self.num_classes = num_classes 11 | self.anchors = anchors 12 | self.num_anchors = num_anchors 13 | self.anchor_step = len(anchors)/num_anchors 14 | self.coord_scale = 1 15 | self.noobject_scale = 1 16 | self.object_scale = 5 17 | self.class_scale = 1 18 | self.thresh = 0.6 19 | self.stride = 32 20 | self.seen = 0 21 | 22 | def forward(self, output, nms_thresh): 23 | self.thresh = nms_thresh 24 | masked_anchors = [] 25 | 26 | for m in self.anchor_mask: 27 | masked_anchors += self.anchors[m*self.anchor_step:(m+1)*self.anchor_step] 28 | 29 | masked_anchors = [anchor/self.stride for anchor in masked_anchors] 30 | boxes = get_region_boxes(output.data, self.thresh, self.num_classes, masked_anchors, len(self.anchor_mask)) 31 | 32 | return boxes 33 | 34 | 35 | class Upsample(nn.Module): 36 | def __init__(self, stride=2): 37 | super(Upsample, self).__init__() 38 | self.stride = stride 39 | def forward(self, x): 40 | stride = self.stride 41 | assert(x.data.dim() == 4) 42 | B = x.data.size(0) 43 | C = x.data.size(1) 44 | H = x.data.size(2) 45 | W = x.data.size(3) 46 | ws = stride 47 | hs = stride 48 | x = x.view(B, C, H, 1, W, 1).expand(B, C, H, stride, W, stride).contiguous().view(B, C, H*stride, W*stride) 49 | return x 50 | 51 | 52 | #for route and shortcut 53 | class EmptyModule(nn.Module): 54 | def __init__(self): 55 | super(EmptyModule, self).__init__() 56 | 57 | def forward(self, x): 58 | return x 59 | 60 | # support route shortcut 61 | class Darknet(nn.Module): 62 | def __init__(self, cfgfile): 63 | super(Darknet, self).__init__() 64 | self.blocks = parse_cfg(cfgfile) 65 | self.models = self.create_network(self.blocks) # merge conv, bn,leaky 66 | self.loss = self.models[len(self.models)-1] 67 | 68 | self.width = int(self.blocks[0]['width']) 69 | self.height = int(self.blocks[0]['height']) 70 | 71 | self.header = torch.IntTensor([0,0,0,0]) 72 | self.seen = 0 73 | 74 | def forward(self, x, nms_thresh): 75 | ind = -2 76 | self.loss = None 77 | outputs = dict() 78 | out_boxes = [] 79 | 80 | for block in self.blocks: 81 | ind = ind + 1 82 | if block['type'] == 'net': 83 | continue 84 | elif block['type'] in ['convolutional', 'upsample']: 85 | x = self.models[ind](x) 86 | outputs[ind] = x 87 | elif block['type'] == 'route': 88 | layers = block['layers'].split(',') 89 | layers = [int(i) if int(i) > 0 else int(i)+ind for i in layers] 90 | if len(layers) == 1: 91 | x = outputs[layers[0]] 92 | outputs[ind] = x 93 | elif len(layers) == 2: 94 | x1 = outputs[layers[0]] 95 | x2 = outputs[layers[1]] 96 | x = torch.cat((x1,x2),1) 97 | outputs[ind] = x 98 | elif block['type'] == 'shortcut': 99 | from_layer = int(block['from']) 100 | activation = block['activation'] 101 | from_layer = from_layer if from_layer > 0 else from_layer + ind 102 | x1 = outputs[from_layer] 103 | x2 = outputs[ind-1] 104 | x = x1 + x2 105 | outputs[ind] = x 106 | elif block['type'] == 'yolo': 107 | boxes = self.models[ind](x, nms_thresh) 108 | out_boxes.append(boxes) 109 | else: 110 | print('unknown type %s' % (block['type'])) 111 | 112 | return out_boxes 113 | 114 | 115 | def print_network(self): 116 | print_cfg(self.blocks) 117 | 118 | def create_network(self, blocks): 119 | models = nn.ModuleList() 120 | 121 | prev_filters = 3 122 | out_filters =[] 123 | prev_stride = 1 124 | out_strides = [] 125 | conv_id = 0 126 | for block in blocks: 127 | if block['type'] == 'net': 128 | prev_filters = int(block['channels']) 129 | continue 130 | elif block['type'] == 'convolutional': 131 | conv_id = conv_id + 1 132 | batch_normalize = int(block['batch_normalize']) 133 | filters = int(block['filters']) 134 | kernel_size = int(block['size']) 135 | stride = int(block['stride']) 136 | is_pad = int(block['pad']) 137 | pad = (kernel_size-1)//2 if is_pad else 0 138 | activation = block['activation'] 139 | model = nn.Sequential() 140 | if batch_normalize: 141 | model.add_module('conv{0}'.format(conv_id), nn.Conv2d(prev_filters, filters, kernel_size, stride, pad, bias=False)) 142 | model.add_module('bn{0}'.format(conv_id), nn.BatchNorm2d(filters)) 143 | else: 144 | model.add_module('conv{0}'.format(conv_id), nn.Conv2d(prev_filters, filters, kernel_size, stride, pad)) 145 | if activation == 'leaky': 146 | model.add_module('leaky{0}'.format(conv_id), nn.LeakyReLU(0.1, inplace=True)) 147 | prev_filters = filters 148 | out_filters.append(prev_filters) 149 | prev_stride = stride * prev_stride 150 | out_strides.append(prev_stride) 151 | models.append(model) 152 | elif block['type'] == 'upsample': 153 | stride = int(block['stride']) 154 | out_filters.append(prev_filters) 155 | prev_stride = prev_stride // stride 156 | out_strides.append(prev_stride) 157 | models.append(Upsample(stride)) 158 | elif block['type'] == 'route': 159 | layers = block['layers'].split(',') 160 | ind = len(models) 161 | layers = [int(i) if int(i) > 0 else int(i)+ind for i in layers] 162 | if len(layers) == 1: 163 | prev_filters = out_filters[layers[0]] 164 | prev_stride = out_strides[layers[0]] 165 | elif len(layers) == 2: 166 | assert(layers[0] == ind - 1) 167 | prev_filters = out_filters[layers[0]] + out_filters[layers[1]] 168 | prev_stride = out_strides[layers[0]] 169 | out_filters.append(prev_filters) 170 | out_strides.append(prev_stride) 171 | models.append(EmptyModule()) 172 | elif block['type'] == 'shortcut': 173 | ind = len(models) 174 | prev_filters = out_filters[ind-1] 175 | out_filters.append(prev_filters) 176 | prev_stride = out_strides[ind-1] 177 | out_strides.append(prev_stride) 178 | models.append(EmptyModule()) 179 | elif block['type'] == 'yolo': 180 | yolo_layer = YoloLayer() 181 | anchors = block['anchors'].split(',') 182 | anchor_mask = block['mask'].split(',') 183 | yolo_layer.anchor_mask = [int(i) for i in anchor_mask] 184 | yolo_layer.anchors = [float(i) for i in anchors] 185 | yolo_layer.num_classes = int(block['classes']) 186 | yolo_layer.num_anchors = int(block['num']) 187 | yolo_layer.anchor_step = len(yolo_layer.anchors)//yolo_layer.num_anchors 188 | yolo_layer.stride = prev_stride 189 | out_filters.append(prev_filters) 190 | out_strides.append(prev_stride) 191 | models.append(yolo_layer) 192 | else: 193 | print('unknown type %s' % (block['type'])) 194 | 195 | return models 196 | 197 | def load_weights(self, weightfile): 198 | print() 199 | fp = open(weightfile, 'rb') 200 | header = np.fromfile(fp, count=5, dtype=np.int32) 201 | self.header = torch.from_numpy(header) 202 | self.seen = self.header[3] 203 | buf = np.fromfile(fp, dtype = np.float32) 204 | fp.close() 205 | 206 | start = 0 207 | ind = -2 208 | counter = 3 209 | for block in self.blocks: 210 | if start >= buf.size: 211 | break 212 | ind = ind + 1 213 | if block['type'] == 'net': 214 | continue 215 | elif block['type'] == 'convolutional': 216 | model = self.models[ind] 217 | batch_normalize = int(block['batch_normalize']) 218 | if batch_normalize: 219 | start = load_conv_bn(buf, start, model[0], model[1]) 220 | else: 221 | start = load_conv(buf, start, model[0]) 222 | elif block['type'] == 'upsample': 223 | pass 224 | elif block['type'] == 'route': 225 | pass 226 | elif block['type'] == 'shortcut': 227 | pass 228 | elif block['type'] == 'yolo': 229 | pass 230 | else: 231 | print('unknown type %s' % (block['type'])) 232 | 233 | percent_comp = (counter / len(self.blocks)) * 100 234 | 235 | print('Loading weights. Please Wait...{:.2f}% Complete'.format(percent_comp), end = '\r', flush = True) 236 | 237 | counter += 1 238 | 239 | 240 | 241 | def convert2cpu(gpu_matrix): 242 | return torch.FloatTensor(gpu_matrix.size()).copy_(gpu_matrix) 243 | 244 | 245 | def convert2cpu_long(gpu_matrix): 246 | return torch.LongTensor(gpu_matrix.size()).copy_(gpu_matrix) 247 | 248 | 249 | def get_region_boxes(output, conf_thresh, num_classes, anchors, num_anchors, only_objectness = 1, validation = False): 250 | anchor_step = len(anchors)//num_anchors 251 | if output.dim() == 3: 252 | output = output.unsqueeze(0) 253 | batch = output.size(0) 254 | assert(output.size(1) == (5+num_classes)*num_anchors) 255 | h = output.size(2) 256 | w = output.size(3) 257 | 258 | all_boxes = [] 259 | output = output.view(batch*num_anchors, 5+num_classes, h*w).transpose(0,1).contiguous().view(5+num_classes, batch*num_anchors*h*w) 260 | 261 | grid_x = torch.linspace(0, w-1, w).repeat(h,1).repeat(batch*num_anchors, 1, 1).view(batch*num_anchors*h*w).type_as(output) #cuda() 262 | grid_y = torch.linspace(0, h-1, h).repeat(w,1).t().repeat(batch*num_anchors, 1, 1).view(batch*num_anchors*h*w).type_as(output) #cuda() 263 | xs = torch.sigmoid(output[0]) + grid_x 264 | ys = torch.sigmoid(output[1]) + grid_y 265 | 266 | anchor_w = torch.Tensor(anchors).view(num_anchors, anchor_step).index_select(1, torch.LongTensor([0])) 267 | anchor_h = torch.Tensor(anchors).view(num_anchors, anchor_step).index_select(1, torch.LongTensor([1])) 268 | anchor_w = anchor_w.repeat(batch, 1).repeat(1, 1, h*w).view(batch*num_anchors*h*w).type_as(output) #cuda() 269 | anchor_h = anchor_h.repeat(batch, 1).repeat(1, 1, h*w).view(batch*num_anchors*h*w).type_as(output) #cuda() 270 | ws = torch.exp(output[2]) * anchor_w 271 | hs = torch.exp(output[3]) * anchor_h 272 | 273 | det_confs = torch.sigmoid(output[4]) 274 | cls_confs = torch.nn.Softmax(dim=1)(output[5:5+num_classes].transpose(0,1)).detach() 275 | cls_max_confs, cls_max_ids = torch.max(cls_confs, 1) 276 | cls_max_confs = cls_max_confs.view(-1) 277 | cls_max_ids = cls_max_ids.view(-1) 278 | 279 | 280 | sz_hw = h*w 281 | sz_hwa = sz_hw*num_anchors 282 | det_confs = convert2cpu(det_confs) 283 | cls_max_confs = convert2cpu(cls_max_confs) 284 | cls_max_ids = convert2cpu_long(cls_max_ids) 285 | xs = convert2cpu(xs) 286 | ys = convert2cpu(ys) 287 | ws = convert2cpu(ws) 288 | hs = convert2cpu(hs) 289 | if validation: 290 | cls_confs = convert2cpu(cls_confs.view(-1, num_classes)) 291 | 292 | for b in range(batch): 293 | boxes = [] 294 | for cy in range(h): 295 | for cx in range(w): 296 | for i in range(num_anchors): 297 | ind = b*sz_hwa + i*sz_hw + cy*w + cx 298 | det_conf = det_confs[ind] 299 | if only_objectness: 300 | conf = det_confs[ind] 301 | else: 302 | conf = det_confs[ind] * cls_max_confs[ind] 303 | 304 | if conf > conf_thresh: 305 | bcx = xs[ind] 306 | bcy = ys[ind] 307 | bw = ws[ind] 308 | bh = hs[ind] 309 | cls_max_conf = cls_max_confs[ind] 310 | cls_max_id = cls_max_ids[ind] 311 | box = [bcx/w, bcy/h, bw/w, bh/h, det_conf, cls_max_conf, cls_max_id] 312 | if (not only_objectness) and validation: 313 | for c in range(num_classes): 314 | tmp_conf = cls_confs[ind][c] 315 | if c != cls_max_id and det_confs[ind]*tmp_conf > conf_thresh: 316 | box.append(tmp_conf) 317 | box.append(c) 318 | boxes.append(box) 319 | all_boxes.append(boxes) 320 | 321 | return all_boxes 322 | 323 | 324 | def parse_cfg(cfgfile): 325 | blocks = [] 326 | fp = open(cfgfile, 'r') 327 | block = None 328 | line = fp.readline() 329 | while line != '': 330 | line = line.rstrip() 331 | if line == '' or line[0] == '#': 332 | line = fp.readline() 333 | continue 334 | elif line[0] == '[': 335 | if block: 336 | blocks.append(block) 337 | block = dict() 338 | block['type'] = line.lstrip('[').rstrip(']') 339 | # set default value 340 | if block['type'] == 'convolutional': 341 | block['batch_normalize'] = 0 342 | else: 343 | key,value = line.split('=') 344 | key = key.strip() 345 | if key == 'type': 346 | key = '_type' 347 | value = value.strip() 348 | block[key] = value 349 | line = fp.readline() 350 | 351 | if block: 352 | blocks.append(block) 353 | fp.close() 354 | return blocks 355 | 356 | 357 | def print_cfg(blocks): 358 | print('layer filters size input output'); 359 | prev_width = 416 360 | prev_height = 416 361 | prev_filters = 3 362 | out_filters =[] 363 | out_widths =[] 364 | out_heights =[] 365 | ind = -2 366 | for block in blocks: 367 | ind = ind + 1 368 | if block['type'] == 'net': 369 | prev_width = int(block['width']) 370 | prev_height = int(block['height']) 371 | continue 372 | elif block['type'] == 'convolutional': 373 | filters = int(block['filters']) 374 | kernel_size = int(block['size']) 375 | stride = int(block['stride']) 376 | is_pad = int(block['pad']) 377 | pad = (kernel_size-1)//2 if is_pad else 0 378 | width = (prev_width + 2*pad - kernel_size)//stride + 1 379 | height = (prev_height + 2*pad - kernel_size)//stride + 1 380 | print('%5d %-6s %4d %d x %d / %d %3d x %3d x%4d -> %3d x %3d x%4d' % (ind, 'conv', filters, kernel_size, kernel_size, stride, prev_width, prev_height, prev_filters, width, height, filters)) 381 | prev_width = width 382 | prev_height = height 383 | prev_filters = filters 384 | out_widths.append(prev_width) 385 | out_heights.append(prev_height) 386 | out_filters.append(prev_filters) 387 | elif block['type'] == 'upsample': 388 | stride = int(block['stride']) 389 | filters = prev_filters 390 | width = prev_width*stride 391 | height = prev_height*stride 392 | print('%5d %-6s * %d %3d x %3d x%4d -> %3d x %3d x%4d' % (ind, 'upsample', stride, prev_width, prev_height, prev_filters, width, height, filters)) 393 | prev_width = width 394 | prev_height = height 395 | prev_filters = filters 396 | out_widths.append(prev_width) 397 | out_heights.append(prev_height) 398 | out_filters.append(prev_filters) 399 | elif block['type'] == 'route': 400 | layers = block['layers'].split(',') 401 | layers = [int(i) if int(i) > 0 else int(i)+ind for i in layers] 402 | if len(layers) == 1: 403 | print('%5d %-6s %d' % (ind, 'route', layers[0])) 404 | prev_width = out_widths[layers[0]] 405 | prev_height = out_heights[layers[0]] 406 | prev_filters = out_filters[layers[0]] 407 | elif len(layers) == 2: 408 | print('%5d %-6s %d %d' % (ind, 'route', layers[0], layers[1])) 409 | prev_width = out_widths[layers[0]] 410 | prev_height = out_heights[layers[0]] 411 | assert(prev_width == out_widths[layers[1]]) 412 | assert(prev_height == out_heights[layers[1]]) 413 | prev_filters = out_filters[layers[0]] + out_filters[layers[1]] 414 | out_widths.append(prev_width) 415 | out_heights.append(prev_height) 416 | out_filters.append(prev_filters) 417 | elif block['type'] in ['region', 'yolo']: 418 | print('%5d %-6s' % (ind, 'detection')) 419 | out_widths.append(prev_width) 420 | out_heights.append(prev_height) 421 | out_filters.append(prev_filters) 422 | elif block['type'] == 'shortcut': 423 | from_id = int(block['from']) 424 | from_id = from_id if from_id > 0 else from_id+ind 425 | print('%5d %-6s %d' % (ind, 'shortcut', from_id)) 426 | prev_width = out_widths[from_id] 427 | prev_height = out_heights[from_id] 428 | prev_filters = out_filters[from_id] 429 | out_widths.append(prev_width) 430 | out_heights.append(prev_height) 431 | out_filters.append(prev_filters) 432 | else: 433 | print('unknown type %s' % (block['type'])) 434 | 435 | 436 | def load_conv(buf, start, conv_model): 437 | num_w = conv_model.weight.numel() 438 | num_b = conv_model.bias.numel() 439 | conv_model.bias.data.copy_(torch.from_numpy(buf[start:start+num_b])); start = start + num_b 440 | conv_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_w]).view_as(conv_model.weight.data)); start = start + num_w 441 | return start 442 | 443 | 444 | def load_conv_bn(buf, start, conv_model, bn_model): 445 | num_w = conv_model.weight.numel() 446 | num_b = bn_model.bias.numel() 447 | bn_model.bias.data.copy_(torch.from_numpy(buf[start:start+num_b])); start = start + num_b 448 | bn_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_b])); start = start + num_b 449 | bn_model.running_mean.copy_(torch.from_numpy(buf[start:start+num_b])); start = start + num_b 450 | bn_model.running_var.copy_(torch.from_numpy(buf[start:start+num_b])); start = start + num_b 451 | conv_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_w]).view_as(conv_model.weight.data)); start = start + num_w 452 | return start 453 | -------------------------------------------------------------------------------- /data/coco.names: -------------------------------------------------------------------------------- 1 | person 2 | bicycle 3 | car 4 | motorbike 5 | aeroplane 6 | bus 7 | train 8 | truck 9 | boat 10 | traffic light 11 | fire hydrant 12 | stop sign 13 | parking meter 14 | bench 15 | bird 16 | cat 17 | dog 18 | horse 19 | sheep 20 | cow 21 | elephant 22 | bear 23 | zebra 24 | giraffe 25 | backpack 26 | umbrella 27 | handbag 28 | tie 29 | suitcase 30 | frisbee 31 | skis 32 | snowboard 33 | sports ball 34 | kite 35 | baseball bat 36 | baseball glove 37 | skateboard 38 | surfboard 39 | tennis racket 40 | bottle 41 | wine glass 42 | cup 43 | fork 44 | knife 45 | spoon 46 | bowl 47 | banana 48 | apple 49 | sandwich 50 | orange 51 | broccoli 52 | carrot 53 | hot dog 54 | pizza 55 | donut 56 | cake 57 | chair 58 | sofa 59 | pottedplant 60 | bed 61 | diningtable 62 | toilet 63 | tvmonitor 64 | laptop 65 | mouse 66 | remote 67 | keyboard 68 | cell phone 69 | microwave 70 | oven 71 | toaster 72 | sink 73 | refrigerator 74 | book 75 | clock 76 | vase 77 | scissors 78 | teddy bear 79 | hair drier 80 | toothbrush 81 | -------------------------------------------------------------------------------- /images/bar.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/images/bar.jpeg -------------------------------------------------------------------------------- /images/city_scene.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/images/city_scene.jpg -------------------------------------------------------------------------------- /images/class.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/images/class.jpg -------------------------------------------------------------------------------- /images/dog.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/images/dog.jpg -------------------------------------------------------------------------------- /images/home.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/images/home.jpeg -------------------------------------------------------------------------------- /images/meeting.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/images/meeting.jpeg -------------------------------------------------------------------------------- /images/snack.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/images/snack.jpg -------------------------------------------------------------------------------- /instance/README.md: -------------------------------------------------------------------------------- 1 | Two folders will be created here: uploads and output. 2 | 3 | Every time an image is submitted to the server, it will be stored in the uploads folder. After processing, the bounding boxes of the recognised classes and their respective confidence levels are plotted on the image and stored in the output folder. -------------------------------------------------------------------------------- /iti/Title Background.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/iti/Title Background.gif -------------------------------------------------------------------------------- /iti/image: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /iti/postman.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/iti/postman.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | click==7.1.2 2 | cycler==0.10.0 3 | Flask==1.1.2 4 | Flask-Cors==3.0.8 5 | itsdangerous==1.1.0 6 | Jinja2==2.11.2 7 | kiwisolver==1.2.0 8 | MarkupSafe==1.1.1 9 | matplotlib==3.2.1 10 | opencv-python==4.2.0.34 11 | pyparsing==2.4.7 12 | python-dateutil==2.8.1 13 | six==1.14.0 14 | Werkzeug==1.0.1 -------------------------------------------------------------------------------- /sample_output/20200521_233133_570.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/sample_output/20200521_233133_570.jpg -------------------------------------------------------------------------------- /sample_output/20200521_233208_33.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/sample_output/20200521_233208_33.jpg -------------------------------------------------------------------------------- /sample_output/20200521_233222_914.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/sample_output/20200521_233222_914.jpg -------------------------------------------------------------------------------- /sample_output/20200521_233233_695.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/sample_output/20200521_233233_695.jpg -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | import time 2 | import torch 3 | import numpy as np 4 | import matplotlib.pyplot as plt 5 | import matplotlib.patches as patches 6 | import os 7 | 8 | 9 | def boxes_iou(box1, box2): 10 | 11 | # Get the Width and Height of each bounding box 12 | width_box1 = box1[2] 13 | height_box1 = box1[3] 14 | width_box2 = box2[2] 15 | height_box2 = box2[3] 16 | 17 | # Calculate the area of the each bounding box 18 | area_box1 = width_box1 * height_box1 19 | area_box2 = width_box2 * height_box2 20 | 21 | # Find the vertical edges of the union of the two bounding boxes 22 | mx = min(box1[0] - width_box1/2.0, box2[0] - width_box2/2.0) 23 | Mx = max(box1[0] + width_box1/2.0, box2[0] + width_box2/2.0) 24 | 25 | # Calculate the width of the union of the two bounding boxes 26 | union_width = Mx - mx 27 | 28 | # Find the horizontal edges of the union of the two bounding boxes 29 | my = min(box1[1] - height_box1/2.0, box2[1] - height_box2/2.0) 30 | My = max(box1[1] + height_box1/2.0, box2[1] + height_box2/2.0) 31 | 32 | # Calculate the height of the union of the two bounding boxes 33 | union_height = My - my 34 | 35 | # Calculate the width and height of the area of intersection of the two bounding boxes 36 | intersection_width = width_box1 + width_box2 - union_width 37 | intersection_height = height_box1 + height_box2 - union_height 38 | 39 | # If the the boxes don't overlap then their IOU is zero 40 | if intersection_width <= 0 or intersection_height <= 0: 41 | return 0.0 42 | 43 | # Calculate the area of intersection of the two bounding boxes 44 | intersection_area = intersection_width * intersection_height 45 | 46 | # Calculate the area of the union of the two bounding boxes 47 | union_area = area_box1 + area_box2 - intersection_area 48 | 49 | # Calculate the IOU 50 | iou = intersection_area/union_area 51 | 52 | return iou 53 | 54 | 55 | def nms(boxes, iou_thresh): 56 | 57 | # If there are no bounding boxes do nothing 58 | if len(boxes) == 0: 59 | return boxes 60 | 61 | # Create a PyTorch Tensor to keep track of the detection confidence 62 | # of each predicted bounding box 63 | det_confs = torch.zeros(len(boxes)) 64 | 65 | # Get the detection confidence of each predicted bounding box 66 | for i in range(len(boxes)): 67 | det_confs[i] = boxes[i][4] 68 | 69 | # Sort the indices of the bounding boxes by detection confidence value in descending order. 70 | # We ignore the first returned element since we are only interested in the sorted indices 71 | _,sortIds = torch.sort(det_confs, descending = True) 72 | 73 | # Create an empty list to hold the best bounding boxes after 74 | # Non-Maximal Suppression (NMS) is performed 75 | best_boxes = [] 76 | 77 | # Perform Non-Maximal Suppression 78 | for i in range(len(boxes)): 79 | 80 | # Get the bounding box with the highest detection confidence first 81 | box_i = boxes[sortIds[i]] 82 | 83 | # Check that the detection confidence is not zero 84 | if box_i[4] > 0: 85 | 86 | # Save the bounding box 87 | best_boxes.append(box_i) 88 | 89 | # Go through the rest of the bounding boxes in the list and calculate their IOU with 90 | # respect to the previous selected box_i. 91 | for j in range(i + 1, len(boxes)): 92 | box_j = boxes[sortIds[j]] 93 | 94 | # If the IOU of box_i and box_j is higher than the given IOU threshold set 95 | # box_j's detection confidence to zero. 96 | if boxes_iou(box_i, box_j) > iou_thresh: 97 | box_j[4] = 0 98 | 99 | return best_boxes 100 | 101 | 102 | def detect_objects(model, img, iou_thresh, nms_thresh): 103 | 104 | # Start the time. This is done to calculate how long the detection takes. 105 | start = time.time() 106 | 107 | # Set the model to evaluation mode. 108 | model.eval() 109 | 110 | # Convert the image from a NumPy ndarray to a PyTorch Tensor of the correct shape. 111 | # The image is transposed, then converted to a FloatTensor of dtype float32, then 112 | # Normalized to values between 0 and 1, and finally unsqueezed to have the correct 113 | # shape of 1 x 3 x 416 x 416 114 | img = torch.from_numpy(img.transpose(2,0,1)).float().div(255.0).unsqueeze(0) 115 | 116 | # Feed the image to the neural network with the corresponding NMS threshold. 117 | # The first step in NMS is to remove all bounding boxes that have a very low 118 | # probability of detection. All predicted bounding boxes with a value less than 119 | # the given NMS threshold will be removed. 120 | list_boxes = model(img, nms_thresh) 121 | 122 | # Make a new list with all the bounding boxes returned by the neural network 123 | boxes = list_boxes[0][0] + list_boxes[1][0] + list_boxes[2][0] 124 | 125 | # Perform the second step of NMS on the bounding boxes returned by the neural network. 126 | # In this step, we only keep the best bounding boxes by eliminating all the bounding boxes 127 | # whose IOU value is higher than the given IOU threshold 128 | boxes = nms(boxes, iou_thresh) 129 | 130 | # Stop the time. 131 | finish = time.time() 132 | 133 | # Print the time it took to detect objects 134 | print('\n\nIt took {:.3f}'.format(finish - start), 'seconds to detect the objects in the image.\n') 135 | 136 | # Print the number of objects detected 137 | print('Number of Objects Detected:', len(boxes), '\n') 138 | 139 | return boxes 140 | 141 | 142 | def load_class_names(namesfile): 143 | 144 | # Create an empty list to hold the object classes 145 | class_names = [] 146 | 147 | # Open the file containing the COCO object classes in read-only mode 148 | with open(namesfile, 'r') as fp: 149 | 150 | # The coco.names file contains only one object class per line. 151 | # Read the file line by line and save all the lines in a list. 152 | lines = fp.readlines() 153 | 154 | # Get the object class names 155 | for line in lines: 156 | 157 | # Make a copy of each line with any trailing whitespace removed 158 | line = line.rstrip() 159 | 160 | # Save the object class name into class_names 161 | class_names.append(line) 162 | 163 | return class_names 164 | 165 | 166 | def print_objects(boxes, class_names): 167 | print('Objects Found and Confidence Level:\n') 168 | objects_count = {} 169 | objects_confidence = [] 170 | for i in range(len(boxes)): 171 | box = boxes[i] 172 | if len(box) >= 7 and class_names: 173 | cls_conf = box[5] 174 | cls_id = box[6] 175 | print('%i. %s: %f' % (i + 1, class_names[cls_id], cls_conf)) 176 | if class_names[cls_id] in objects_count: 177 | objects_count[class_names[cls_id]] += 1 178 | else: 179 | objects_count[class_names[cls_id]] = 1 180 | objects_confidence.append({class_names[cls_id]: round(float(cls_conf), 6)}) 181 | 182 | return objects_count, objects_confidence 183 | 184 | 185 | def plot_boxes(img, boxes, class_names, output_dir, filename, plot_labels = True, color = None): 186 | 187 | # Define a tensor used to set the colors of the bounding boxes 188 | colors = torch.FloatTensor([[1,0,1],[0,0,1],[0,1,1],[0,1,0],[1,1,0],[1,0,0]]) 189 | 190 | # Define a function to set the colors of the bounding boxes 191 | def get_color(c, x, max_val): 192 | ratio = float(x) / max_val * 5 193 | i = int(np.floor(ratio)) 194 | j = int(np.ceil(ratio)) 195 | 196 | ratio = ratio - i 197 | r = (1 - ratio) * colors[i][c] + ratio * colors[j][c] 198 | 199 | return int(r * 255) 200 | 201 | # Get the width and height of the image 202 | width = img.shape[1] 203 | height = img.shape[0] 204 | 205 | # Create a figure and plot the image 206 | fig, a = plt.subplots(1,1) 207 | a.imshow(img) 208 | 209 | # Plot the bounding boxes and corresponding labels on top of the image 210 | for i in range(len(boxes)): 211 | 212 | # Get the ith bounding box 213 | box = boxes[i] 214 | 215 | # Get the (x,y) pixel coordinates of the lower-left and lower-right corners 216 | # of the bounding box relative to the size of the image. 217 | x1 = int(np.around((box[0] - box[2]/2.0) * width)) 218 | y1 = int(np.around((box[1] - box[3]/2.0) * height)) 219 | x2 = int(np.around((box[0] + box[2]/2.0) * width)) 220 | y2 = int(np.around((box[1] + box[3]/2.0) * height)) 221 | 222 | # Set the default rgb value to red 223 | rgb = (1, 0, 0) 224 | 225 | # Use the same color to plot the bounding boxes of the same object class 226 | if len(box) >= 7 and class_names: 227 | cls_conf = box[5] 228 | cls_id = box[6] 229 | classes = len(class_names) 230 | offset = cls_id * 123457 % classes 231 | red = get_color(2, offset, classes) / 255 232 | green = get_color(1, offset, classes) / 255 233 | blue = get_color(0, offset, classes) / 255 234 | 235 | # If a color is given then set rgb to the given color instead 236 | if color is None: 237 | rgb = (red, green, blue) 238 | else: 239 | rgb = color 240 | 241 | # Calculate the width and height of the bounding box relative to the size of the image. 242 | width_x = x2 - x1 243 | width_y = y1 - y2 244 | 245 | # Set the postion and size of the bounding box. (x1, y2) is the pixel coordinate of the 246 | # lower-left corner of the bounding box relative to the size of the image. 247 | rect = patches.Rectangle((x1, y2), 248 | width_x, width_y, 249 | linewidth = 2, 250 | edgecolor = rgb, 251 | facecolor = 'none') 252 | 253 | # Draw the bounding box on top of the image 254 | a.add_patch(rect) 255 | 256 | # If plot_labels = True then plot the corresponding label 257 | if plot_labels: 258 | 259 | # Create a string with the object class name and the corresponding object class probability 260 | conf_tx = class_names[cls_id] + ': {0}%'.format(int(cls_conf * 100)) 261 | 262 | # Define x and y offsets for the labels 263 | lxc = (img.shape[1] * 0.266) / 100 264 | lyc = (img.shape[0] * 1.180) / 100 265 | 266 | # Draw the labels on top of the image 267 | a.text(x1 + lxc, y1 - lyc, conf_tx, fontsize = 24, color = 'k', 268 | bbox = dict(facecolor = rgb, edgecolor = rgb, alpha = 0.8)) 269 | 270 | plt.axis('off') 271 | plt.savefig(os.path.join(output_dir, filename + '.jpg'), bbox_inches='tight', pad_inches = 0) -------------------------------------------------------------------------------- /weights/README.md: -------------------------------------------------------------------------------- 1 | YOLOv3 .weights files are to be placed here. 2 | 3 | If you do not have your own trained model, you can download the YOLOv3 pre-trained weight file by Darknet here: 4 | https://pjreddie.com/media/files/yolov3.weights -------------------------------------------------------------------------------- /yolo.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import matplotlib.pyplot as plt 3 | 4 | from utils import * 5 | from darknet import Darknet 6 | 7 | 8 | def process(uploads_dir, output_dir, filename): 9 | 10 | # Set the location and name of the cfg file 11 | cfg_file = './cfg/yolov3.cfg' 12 | 13 | # Set the location and name of the pre-trained weights file 14 | weight_file = './weights/yolov3.weights' 15 | 16 | # Set the location and name of the COCO object classes file 17 | namesfile = 'data/coco.names' 18 | 19 | # Load the network architecture 20 | m = Darknet(cfg_file) 21 | 22 | # Load the pre-trained weights 23 | m.load_weights(weight_file) 24 | 25 | # Load the COCO object classes 26 | class_names = load_class_names(namesfile) 27 | 28 | # Set the default figure size 29 | plt.rcParams['figure.figsize'] = [24.0, 14.0] 30 | 31 | # Set the NMS threshold 32 | nms_thresh = 0.6 33 | 34 | # Set the IOU threshold 35 | iou_thresh = 0.4 36 | 37 | # Set the default figure size 38 | plt.rcParams['figure.figsize'] = [24.0, 14.0] 39 | 40 | # Load the image 41 | img = cv2.imread(uploads_dir + '/' + filename + '.jpg') 42 | 43 | # Convert the image to RGB 44 | original_image = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) 45 | 46 | # We resize the image to the input width and height of the first layer of the network. 47 | resized_image = cv2.resize(original_image, (m.width, m.height)) 48 | 49 | # Set the IOU threshold. Default value is 0.4 50 | iou_thresh = 0.4 51 | 52 | # Set the NMS threshold. Default value is 0.6 53 | nms_thresh = 0.6 54 | 55 | # Detect objects in the image 56 | boxes = detect_objects(m, resized_image, iou_thresh, nms_thresh) 57 | 58 | # Print and save the objects found and their confidence levels 59 | objects_count, objects_confidence = print_objects(boxes, class_names) 60 | 61 | # Plot the image with bounding boxes and corresponding object class labels 62 | plot_boxes(original_image, boxes, class_names, output_dir, filename) 63 | 64 | return objects_count, objects_confidence --------------------------------------------------------------------------------