├── .gitignore
├── README.md
├── app.py
├── cfg
└── yolov3.cfg
├── darknet.py
├── data
└── coco.names
├── images
├── bar.jpeg
├── city_scene.jpg
├── class.jpg
├── dog.jpg
├── home.jpeg
├── meeting.jpeg
└── snack.jpg
├── instance
└── README.md
├── iti
├── Title Background.gif
├── image
└── postman.png
├── requirements.txt
├── sample_output
├── 20200521_233133_570.jpg
├── 20200521_233208_33.jpg
├── 20200521_233222_914.jpg
└── 20200521_233233_695.jpg
├── utils.py
├── weights
└── README.md
└── yolo.py
/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | .env
3 | .flaskenv
4 | *.pyc
5 | *.pyo
6 | env/
7 | env*
8 | dist/
9 | build/
10 | *.egg
11 | *.egg-info/
12 | _mailinglist
13 | .tox/
14 | .cache/
15 | .pytest_cache/
16 | .idea/
17 | docs/_build/
18 | .vscode
19 | *.weights
20 | instance/output/*
21 | instance/uploads/*
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # A simple YOLOv3 Object Detection API in Python (Flask)
2 |
3 |
4 | This repository provides a simple implementation of object detection in Python, served as an API using Flask. It is based on the YOLOv3 object detection system and we will be using the pre-trained weights on the COCO dataset.
5 |
6 |
7 | ## Installation
8 |
9 | ### 1. Clone repository and install requirements
10 |
11 | ##### NOTE: I am using Windows OS and Pip for package installation, and I have to install pytorch separately else I will run into issues. The command for installation varies, so do check out the PyTorch website and see which command you should run under "Quick Start Locally". For me, I run this:
12 | ```
13 | pip install torch==1.5.0+cpu torchvision==0.6.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
14 | ```
15 | ```
16 | git clone git@github.com:yankai364/Object-Detection-Flask-API.git
17 | cd Object-Detection-Flask-API
18 | pip install -r requirements.txt
19 | ```
20 |
21 |
22 | ### 2. Download pre-trained weights
23 | You can download the YOLOv3 pre-trained weights on the COCO dataset here:
24 |
25 | https://pjreddie.com/media/files/yolov3.weights
26 |
27 | Once downloaded, place the .weights file in the weights folder.
28 |
29 |
30 | ## API Documentation
31 | There is only 1 endpoint in this API.
32 |
33 | ### Request
34 |
35 | Method: POST
36 | Endpoint: /upload/
37 | Body:
38 | ```
39 | {
40 | "file":
41 | }
42 | ```
43 |
44 | ### Response
45 | ```
46 | {
47 | "data": {
48 | "objects_count": {
49 | : ,
50 | :
51 | },
52 | "objects_confidence": {
53 | : ,
54 | : ,
55 | : ,
56 | ...
57 | },
58 | "filename":
59 | }
60 | }
61 | ```
62 |
63 | ## Usage
64 |
65 | ### 1. Start the application
66 | ```
67 | cd Object-Detection-Flask-API
68 | python app.py
69 | ```
70 |
71 | If the application runs successfully, you should see the following:
72 | ```
73 | * Serving Flask app "app" (lazy loading)
74 | * Environment: production
75 | WARNING: This is a development server. Do not use it in a production deployment.
76 | Use a production WSGI server instead.
77 | * Debug mode: off
78 | * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
79 | ```
80 |
81 | ### 2. Test the API
82 | You can test the API using Postman. Let's test it with the image "bar.jpeg" in the images folder:
83 |
84 |
85 |
86 | Open your Postman and configure the request [according to the documentation above](#api-documentation). Remember to set the "file" key to the "File" type and attach the image. Your request should look like this:
87 |
88 |
89 |
90 | Click the Send button. The request may take a few seconds to complete, but you should receive the following response:
91 |
92 | #### Response:
93 | ```
94 | {
95 | "data": {
96 | "filename": "20200523_120754_313.jpg",
97 | "objects_confidence": [
98 | {
99 | "cell phone": 1.0
100 | },
101 | {
102 | "wine glass": 0.999997
103 | },
104 | {
105 | "wine glass": 0.999972
106 | },
107 | {
108 | "cup": 0.990166
109 | },
110 | {
111 | "person": 0.999974
112 | },
113 | {
114 | "bottle": 0.824177
115 | },
116 | {
117 | "person": 1.0
118 | }
119 | ],
120 | "objects_count": {
121 | "bottle": 1,
122 | "cell phone": 1,
123 | "cup": 1,
124 | "person": 2,
125 | "wine glass": 2
126 | }
127 | }
128 | }
129 | ```
130 |
131 |
132 | The application also draws the bounding boxes for each object in the image and saves it. The output image is named according to the filename in the response. You can find the below image in the /instance/output folder.
133 |
134 |
135 |
136 |
137 | And that's it! The pre-trained weights are decent in detecting everyday objects, so you can also test it using your own photos (instead of stock images):
138 |
139 |
140 | 
141 |
142 | ## Acknowledgements
143 | YOLOv3
--------------------------------------------------------------------------------
/app.py:
--------------------------------------------------------------------------------
1 | from flask import Flask, request, jsonify
2 | import os
3 | from werkzeug.utils import secure_filename
4 | from flask_cors import CORS
5 | from yolo import process
6 | from datetime import datetime
7 | from random import randint
8 |
9 |
10 | app = Flask(__name__)
11 | CORS(app)
12 | uploads_dir = os.path.join(app.instance_path, 'uploads')
13 | output_dir = os.path.join(app.instance_path, 'output')
14 |
15 |
16 | @app.route('/upload/', methods=['POST'])
17 | def upload_image():
18 | try:
19 | os.mkdir(uploads_dir)
20 | os.mkdir(output_dir)
21 | except:
22 | pass
23 |
24 | file = request.files['file']
25 | if not file:
26 | return {'error': 'Missing file'}, 400
27 |
28 | now = datetime.now()
29 | filename = now.strftime("%Y%m%d_%H%M%S") + "_" + str(randint(000, 999))
30 | file.save(os.path.join(uploads_dir, secure_filename(filename + '.jpg')))
31 | objects_count, objects_confidence = process(uploads_dir, output_dir, filename)
32 |
33 | response = {
34 | 'objects_count': objects_count,
35 | 'objects_confidence': objects_confidence,
36 | 'filename': filename + '.jpg'
37 | }
38 |
39 | return jsonify({"data": response}), 200
40 |
41 |
42 | if __name__ == '__main__':
43 | app.run(host="0.0.0.0", port=5000)
44 |
--------------------------------------------------------------------------------
/cfg/yolov3.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | batch=1
4 | subdivisions=1
5 | # Training
6 | # batch=64
7 | # subdivisions=16
8 | width=416
9 | height=416
10 | channels=3
11 | momentum=0.9
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.001
19 | burn_in=1000
20 | max_batches = 500200
21 | policy=steps
22 | steps=400000,450000
23 | scales=.1,.1
24 |
25 | [convolutional]
26 | batch_normalize=1
27 | filters=32
28 | size=3
29 | stride=1
30 | pad=1
31 | activation=leaky
32 |
33 | # Downsample
34 |
35 | [convolutional]
36 | batch_normalize=1
37 | filters=64
38 | size=3
39 | stride=2
40 | pad=1
41 | activation=leaky
42 |
43 | [convolutional]
44 | batch_normalize=1
45 | filters=32
46 | size=1
47 | stride=1
48 | pad=1
49 | activation=leaky
50 |
51 | [convolutional]
52 | batch_normalize=1
53 | filters=64
54 | size=3
55 | stride=1
56 | pad=1
57 | activation=leaky
58 |
59 | [shortcut]
60 | from=-3
61 | activation=linear
62 |
63 | # Downsample
64 |
65 | [convolutional]
66 | batch_normalize=1
67 | filters=128
68 | size=3
69 | stride=2
70 | pad=1
71 | activation=leaky
72 |
73 | [convolutional]
74 | batch_normalize=1
75 | filters=64
76 | size=1
77 | stride=1
78 | pad=1
79 | activation=leaky
80 |
81 | [convolutional]
82 | batch_normalize=1
83 | filters=128
84 | size=3
85 | stride=1
86 | pad=1
87 | activation=leaky
88 |
89 | [shortcut]
90 | from=-3
91 | activation=linear
92 |
93 | [convolutional]
94 | batch_normalize=1
95 | filters=64
96 | size=1
97 | stride=1
98 | pad=1
99 | activation=leaky
100 |
101 | [convolutional]
102 | batch_normalize=1
103 | filters=128
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 |
109 | [shortcut]
110 | from=-3
111 | activation=linear
112 |
113 | # Downsample
114 |
115 | [convolutional]
116 | batch_normalize=1
117 | filters=256
118 | size=3
119 | stride=2
120 | pad=1
121 | activation=leaky
122 |
123 | [convolutional]
124 | batch_normalize=1
125 | filters=128
126 | size=1
127 | stride=1
128 | pad=1
129 | activation=leaky
130 |
131 | [convolutional]
132 | batch_normalize=1
133 | filters=256
134 | size=3
135 | stride=1
136 | pad=1
137 | activation=leaky
138 |
139 | [shortcut]
140 | from=-3
141 | activation=linear
142 |
143 | [convolutional]
144 | batch_normalize=1
145 | filters=128
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=leaky
150 |
151 | [convolutional]
152 | batch_normalize=1
153 | filters=256
154 | size=3
155 | stride=1
156 | pad=1
157 | activation=leaky
158 |
159 | [shortcut]
160 | from=-3
161 | activation=linear
162 |
163 | [convolutional]
164 | batch_normalize=1
165 | filters=128
166 | size=1
167 | stride=1
168 | pad=1
169 | activation=leaky
170 |
171 | [convolutional]
172 | batch_normalize=1
173 | filters=256
174 | size=3
175 | stride=1
176 | pad=1
177 | activation=leaky
178 |
179 | [shortcut]
180 | from=-3
181 | activation=linear
182 |
183 | [convolutional]
184 | batch_normalize=1
185 | filters=128
186 | size=1
187 | stride=1
188 | pad=1
189 | activation=leaky
190 |
191 | [convolutional]
192 | batch_normalize=1
193 | filters=256
194 | size=3
195 | stride=1
196 | pad=1
197 | activation=leaky
198 |
199 | [shortcut]
200 | from=-3
201 | activation=linear
202 |
203 |
204 | [convolutional]
205 | batch_normalize=1
206 | filters=128
207 | size=1
208 | stride=1
209 | pad=1
210 | activation=leaky
211 |
212 | [convolutional]
213 | batch_normalize=1
214 | filters=256
215 | size=3
216 | stride=1
217 | pad=1
218 | activation=leaky
219 |
220 | [shortcut]
221 | from=-3
222 | activation=linear
223 |
224 | [convolutional]
225 | batch_normalize=1
226 | filters=128
227 | size=1
228 | stride=1
229 | pad=1
230 | activation=leaky
231 |
232 | [convolutional]
233 | batch_normalize=1
234 | filters=256
235 | size=3
236 | stride=1
237 | pad=1
238 | activation=leaky
239 |
240 | [shortcut]
241 | from=-3
242 | activation=linear
243 |
244 | [convolutional]
245 | batch_normalize=1
246 | filters=128
247 | size=1
248 | stride=1
249 | pad=1
250 | activation=leaky
251 |
252 | [convolutional]
253 | batch_normalize=1
254 | filters=256
255 | size=3
256 | stride=1
257 | pad=1
258 | activation=leaky
259 |
260 | [shortcut]
261 | from=-3
262 | activation=linear
263 |
264 | [convolutional]
265 | batch_normalize=1
266 | filters=128
267 | size=1
268 | stride=1
269 | pad=1
270 | activation=leaky
271 |
272 | [convolutional]
273 | batch_normalize=1
274 | filters=256
275 | size=3
276 | stride=1
277 | pad=1
278 | activation=leaky
279 |
280 | [shortcut]
281 | from=-3
282 | activation=linear
283 |
284 | # Downsample
285 |
286 | [convolutional]
287 | batch_normalize=1
288 | filters=512
289 | size=3
290 | stride=2
291 | pad=1
292 | activation=leaky
293 |
294 | [convolutional]
295 | batch_normalize=1
296 | filters=256
297 | size=1
298 | stride=1
299 | pad=1
300 | activation=leaky
301 |
302 | [convolutional]
303 | batch_normalize=1
304 | filters=512
305 | size=3
306 | stride=1
307 | pad=1
308 | activation=leaky
309 |
310 | [shortcut]
311 | from=-3
312 | activation=linear
313 |
314 |
315 | [convolutional]
316 | batch_normalize=1
317 | filters=256
318 | size=1
319 | stride=1
320 | pad=1
321 | activation=leaky
322 |
323 | [convolutional]
324 | batch_normalize=1
325 | filters=512
326 | size=3
327 | stride=1
328 | pad=1
329 | activation=leaky
330 |
331 | [shortcut]
332 | from=-3
333 | activation=linear
334 |
335 |
336 | [convolutional]
337 | batch_normalize=1
338 | filters=256
339 | size=1
340 | stride=1
341 | pad=1
342 | activation=leaky
343 |
344 | [convolutional]
345 | batch_normalize=1
346 | filters=512
347 | size=3
348 | stride=1
349 | pad=1
350 | activation=leaky
351 |
352 | [shortcut]
353 | from=-3
354 | activation=linear
355 |
356 |
357 | [convolutional]
358 | batch_normalize=1
359 | filters=256
360 | size=1
361 | stride=1
362 | pad=1
363 | activation=leaky
364 |
365 | [convolutional]
366 | batch_normalize=1
367 | filters=512
368 | size=3
369 | stride=1
370 | pad=1
371 | activation=leaky
372 |
373 | [shortcut]
374 | from=-3
375 | activation=linear
376 |
377 | [convolutional]
378 | batch_normalize=1
379 | filters=256
380 | size=1
381 | stride=1
382 | pad=1
383 | activation=leaky
384 |
385 | [convolutional]
386 | batch_normalize=1
387 | filters=512
388 | size=3
389 | stride=1
390 | pad=1
391 | activation=leaky
392 |
393 | [shortcut]
394 | from=-3
395 | activation=linear
396 |
397 |
398 | [convolutional]
399 | batch_normalize=1
400 | filters=256
401 | size=1
402 | stride=1
403 | pad=1
404 | activation=leaky
405 |
406 | [convolutional]
407 | batch_normalize=1
408 | filters=512
409 | size=3
410 | stride=1
411 | pad=1
412 | activation=leaky
413 |
414 | [shortcut]
415 | from=-3
416 | activation=linear
417 |
418 |
419 | [convolutional]
420 | batch_normalize=1
421 | filters=256
422 | size=1
423 | stride=1
424 | pad=1
425 | activation=leaky
426 |
427 | [convolutional]
428 | batch_normalize=1
429 | filters=512
430 | size=3
431 | stride=1
432 | pad=1
433 | activation=leaky
434 |
435 | [shortcut]
436 | from=-3
437 | activation=linear
438 |
439 | [convolutional]
440 | batch_normalize=1
441 | filters=256
442 | size=1
443 | stride=1
444 | pad=1
445 | activation=leaky
446 |
447 | [convolutional]
448 | batch_normalize=1
449 | filters=512
450 | size=3
451 | stride=1
452 | pad=1
453 | activation=leaky
454 |
455 | [shortcut]
456 | from=-3
457 | activation=linear
458 |
459 | # Downsample
460 |
461 | [convolutional]
462 | batch_normalize=1
463 | filters=1024
464 | size=3
465 | stride=2
466 | pad=1
467 | activation=leaky
468 |
469 | [convolutional]
470 | batch_normalize=1
471 | filters=512
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=leaky
476 |
477 | [convolutional]
478 | batch_normalize=1
479 | filters=1024
480 | size=3
481 | stride=1
482 | pad=1
483 | activation=leaky
484 |
485 | [shortcut]
486 | from=-3
487 | activation=linear
488 |
489 | [convolutional]
490 | batch_normalize=1
491 | filters=512
492 | size=1
493 | stride=1
494 | pad=1
495 | activation=leaky
496 |
497 | [convolutional]
498 | batch_normalize=1
499 | filters=1024
500 | size=3
501 | stride=1
502 | pad=1
503 | activation=leaky
504 |
505 | [shortcut]
506 | from=-3
507 | activation=linear
508 |
509 | [convolutional]
510 | batch_normalize=1
511 | filters=512
512 | size=1
513 | stride=1
514 | pad=1
515 | activation=leaky
516 |
517 | [convolutional]
518 | batch_normalize=1
519 | filters=1024
520 | size=3
521 | stride=1
522 | pad=1
523 | activation=leaky
524 |
525 | [shortcut]
526 | from=-3
527 | activation=linear
528 |
529 | [convolutional]
530 | batch_normalize=1
531 | filters=512
532 | size=1
533 | stride=1
534 | pad=1
535 | activation=leaky
536 |
537 | [convolutional]
538 | batch_normalize=1
539 | filters=1024
540 | size=3
541 | stride=1
542 | pad=1
543 | activation=leaky
544 |
545 | [shortcut]
546 | from=-3
547 | activation=linear
548 |
549 | ######################
550 |
551 | [convolutional]
552 | batch_normalize=1
553 | filters=512
554 | size=1
555 | stride=1
556 | pad=1
557 | activation=leaky
558 |
559 | [convolutional]
560 | batch_normalize=1
561 | size=3
562 | stride=1
563 | pad=1
564 | filters=1024
565 | activation=leaky
566 |
567 | [convolutional]
568 | batch_normalize=1
569 | filters=512
570 | size=1
571 | stride=1
572 | pad=1
573 | activation=leaky
574 |
575 | [convolutional]
576 | batch_normalize=1
577 | size=3
578 | stride=1
579 | pad=1
580 | filters=1024
581 | activation=leaky
582 |
583 | [convolutional]
584 | batch_normalize=1
585 | filters=512
586 | size=1
587 | stride=1
588 | pad=1
589 | activation=leaky
590 |
591 | [convolutional]
592 | batch_normalize=1
593 | size=3
594 | stride=1
595 | pad=1
596 | filters=1024
597 | activation=leaky
598 |
599 | [convolutional]
600 | size=1
601 | stride=1
602 | pad=1
603 | filters=255
604 | activation=linear
605 |
606 |
607 | [yolo]
608 | mask = 6,7,8
609 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
610 | classes=80
611 | num=9
612 | jitter=.3
613 | ignore_thresh = .5
614 | truth_thresh = 1
615 | random=1
616 |
617 |
618 | [route]
619 | layers = -4
620 |
621 | [convolutional]
622 | batch_normalize=1
623 | filters=256
624 | size=1
625 | stride=1
626 | pad=1
627 | activation=leaky
628 |
629 | [upsample]
630 | stride=2
631 |
632 | [route]
633 | layers = -1, 61
634 |
635 |
636 |
637 | [convolutional]
638 | batch_normalize=1
639 | filters=256
640 | size=1
641 | stride=1
642 | pad=1
643 | activation=leaky
644 |
645 | [convolutional]
646 | batch_normalize=1
647 | size=3
648 | stride=1
649 | pad=1
650 | filters=512
651 | activation=leaky
652 |
653 | [convolutional]
654 | batch_normalize=1
655 | filters=256
656 | size=1
657 | stride=1
658 | pad=1
659 | activation=leaky
660 |
661 | [convolutional]
662 | batch_normalize=1
663 | size=3
664 | stride=1
665 | pad=1
666 | filters=512
667 | activation=leaky
668 |
669 | [convolutional]
670 | batch_normalize=1
671 | filters=256
672 | size=1
673 | stride=1
674 | pad=1
675 | activation=leaky
676 |
677 | [convolutional]
678 | batch_normalize=1
679 | size=3
680 | stride=1
681 | pad=1
682 | filters=512
683 | activation=leaky
684 |
685 | [convolutional]
686 | size=1
687 | stride=1
688 | pad=1
689 | filters=255
690 | activation=linear
691 |
692 |
693 | [yolo]
694 | mask = 3,4,5
695 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
696 | classes=80
697 | num=9
698 | jitter=.3
699 | ignore_thresh = .5
700 | truth_thresh = 1
701 | random=1
702 |
703 |
704 |
705 | [route]
706 | layers = -4
707 |
708 | [convolutional]
709 | batch_normalize=1
710 | filters=128
711 | size=1
712 | stride=1
713 | pad=1
714 | activation=leaky
715 |
716 | [upsample]
717 | stride=2
718 |
719 | [route]
720 | layers = -1, 36
721 |
722 |
723 |
724 | [convolutional]
725 | batch_normalize=1
726 | filters=128
727 | size=1
728 | stride=1
729 | pad=1
730 | activation=leaky
731 |
732 | [convolutional]
733 | batch_normalize=1
734 | size=3
735 | stride=1
736 | pad=1
737 | filters=256
738 | activation=leaky
739 |
740 | [convolutional]
741 | batch_normalize=1
742 | filters=128
743 | size=1
744 | stride=1
745 | pad=1
746 | activation=leaky
747 |
748 | [convolutional]
749 | batch_normalize=1
750 | size=3
751 | stride=1
752 | pad=1
753 | filters=256
754 | activation=leaky
755 |
756 | [convolutional]
757 | batch_normalize=1
758 | filters=128
759 | size=1
760 | stride=1
761 | pad=1
762 | activation=leaky
763 |
764 | [convolutional]
765 | batch_normalize=1
766 | size=3
767 | stride=1
768 | pad=1
769 | filters=256
770 | activation=leaky
771 |
772 | [convolutional]
773 | size=1
774 | stride=1
775 | pad=1
776 | filters=255
777 | activation=linear
778 |
779 |
780 | [yolo]
781 | mask = 0,1,2
782 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
783 | classes=80
784 | num=9
785 | jitter=.3
786 | ignore_thresh = .5
787 | truth_thresh = 1
788 | random=1
789 |
790 |
--------------------------------------------------------------------------------
/darknet.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import numpy as np
4 |
5 |
6 | class YoloLayer(nn.Module):
7 | def __init__(self, anchor_mask=[], num_classes=0, anchors=[], num_anchors=1):
8 | super(YoloLayer, self).__init__()
9 | self.anchor_mask = anchor_mask
10 | self.num_classes = num_classes
11 | self.anchors = anchors
12 | self.num_anchors = num_anchors
13 | self.anchor_step = len(anchors)/num_anchors
14 | self.coord_scale = 1
15 | self.noobject_scale = 1
16 | self.object_scale = 5
17 | self.class_scale = 1
18 | self.thresh = 0.6
19 | self.stride = 32
20 | self.seen = 0
21 |
22 | def forward(self, output, nms_thresh):
23 | self.thresh = nms_thresh
24 | masked_anchors = []
25 |
26 | for m in self.anchor_mask:
27 | masked_anchors += self.anchors[m*self.anchor_step:(m+1)*self.anchor_step]
28 |
29 | masked_anchors = [anchor/self.stride for anchor in masked_anchors]
30 | boxes = get_region_boxes(output.data, self.thresh, self.num_classes, masked_anchors, len(self.anchor_mask))
31 |
32 | return boxes
33 |
34 |
35 | class Upsample(nn.Module):
36 | def __init__(self, stride=2):
37 | super(Upsample, self).__init__()
38 | self.stride = stride
39 | def forward(self, x):
40 | stride = self.stride
41 | assert(x.data.dim() == 4)
42 | B = x.data.size(0)
43 | C = x.data.size(1)
44 | H = x.data.size(2)
45 | W = x.data.size(3)
46 | ws = stride
47 | hs = stride
48 | x = x.view(B, C, H, 1, W, 1).expand(B, C, H, stride, W, stride).contiguous().view(B, C, H*stride, W*stride)
49 | return x
50 |
51 |
52 | #for route and shortcut
53 | class EmptyModule(nn.Module):
54 | def __init__(self):
55 | super(EmptyModule, self).__init__()
56 |
57 | def forward(self, x):
58 | return x
59 |
60 | # support route shortcut
61 | class Darknet(nn.Module):
62 | def __init__(self, cfgfile):
63 | super(Darknet, self).__init__()
64 | self.blocks = parse_cfg(cfgfile)
65 | self.models = self.create_network(self.blocks) # merge conv, bn,leaky
66 | self.loss = self.models[len(self.models)-1]
67 |
68 | self.width = int(self.blocks[0]['width'])
69 | self.height = int(self.blocks[0]['height'])
70 |
71 | self.header = torch.IntTensor([0,0,0,0])
72 | self.seen = 0
73 |
74 | def forward(self, x, nms_thresh):
75 | ind = -2
76 | self.loss = None
77 | outputs = dict()
78 | out_boxes = []
79 |
80 | for block in self.blocks:
81 | ind = ind + 1
82 | if block['type'] == 'net':
83 | continue
84 | elif block['type'] in ['convolutional', 'upsample']:
85 | x = self.models[ind](x)
86 | outputs[ind] = x
87 | elif block['type'] == 'route':
88 | layers = block['layers'].split(',')
89 | layers = [int(i) if int(i) > 0 else int(i)+ind for i in layers]
90 | if len(layers) == 1:
91 | x = outputs[layers[0]]
92 | outputs[ind] = x
93 | elif len(layers) == 2:
94 | x1 = outputs[layers[0]]
95 | x2 = outputs[layers[1]]
96 | x = torch.cat((x1,x2),1)
97 | outputs[ind] = x
98 | elif block['type'] == 'shortcut':
99 | from_layer = int(block['from'])
100 | activation = block['activation']
101 | from_layer = from_layer if from_layer > 0 else from_layer + ind
102 | x1 = outputs[from_layer]
103 | x2 = outputs[ind-1]
104 | x = x1 + x2
105 | outputs[ind] = x
106 | elif block['type'] == 'yolo':
107 | boxes = self.models[ind](x, nms_thresh)
108 | out_boxes.append(boxes)
109 | else:
110 | print('unknown type %s' % (block['type']))
111 |
112 | return out_boxes
113 |
114 |
115 | def print_network(self):
116 | print_cfg(self.blocks)
117 |
118 | def create_network(self, blocks):
119 | models = nn.ModuleList()
120 |
121 | prev_filters = 3
122 | out_filters =[]
123 | prev_stride = 1
124 | out_strides = []
125 | conv_id = 0
126 | for block in blocks:
127 | if block['type'] == 'net':
128 | prev_filters = int(block['channels'])
129 | continue
130 | elif block['type'] == 'convolutional':
131 | conv_id = conv_id + 1
132 | batch_normalize = int(block['batch_normalize'])
133 | filters = int(block['filters'])
134 | kernel_size = int(block['size'])
135 | stride = int(block['stride'])
136 | is_pad = int(block['pad'])
137 | pad = (kernel_size-1)//2 if is_pad else 0
138 | activation = block['activation']
139 | model = nn.Sequential()
140 | if batch_normalize:
141 | model.add_module('conv{0}'.format(conv_id), nn.Conv2d(prev_filters, filters, kernel_size, stride, pad, bias=False))
142 | model.add_module('bn{0}'.format(conv_id), nn.BatchNorm2d(filters))
143 | else:
144 | model.add_module('conv{0}'.format(conv_id), nn.Conv2d(prev_filters, filters, kernel_size, stride, pad))
145 | if activation == 'leaky':
146 | model.add_module('leaky{0}'.format(conv_id), nn.LeakyReLU(0.1, inplace=True))
147 | prev_filters = filters
148 | out_filters.append(prev_filters)
149 | prev_stride = stride * prev_stride
150 | out_strides.append(prev_stride)
151 | models.append(model)
152 | elif block['type'] == 'upsample':
153 | stride = int(block['stride'])
154 | out_filters.append(prev_filters)
155 | prev_stride = prev_stride // stride
156 | out_strides.append(prev_stride)
157 | models.append(Upsample(stride))
158 | elif block['type'] == 'route':
159 | layers = block['layers'].split(',')
160 | ind = len(models)
161 | layers = [int(i) if int(i) > 0 else int(i)+ind for i in layers]
162 | if len(layers) == 1:
163 | prev_filters = out_filters[layers[0]]
164 | prev_stride = out_strides[layers[0]]
165 | elif len(layers) == 2:
166 | assert(layers[0] == ind - 1)
167 | prev_filters = out_filters[layers[0]] + out_filters[layers[1]]
168 | prev_stride = out_strides[layers[0]]
169 | out_filters.append(prev_filters)
170 | out_strides.append(prev_stride)
171 | models.append(EmptyModule())
172 | elif block['type'] == 'shortcut':
173 | ind = len(models)
174 | prev_filters = out_filters[ind-1]
175 | out_filters.append(prev_filters)
176 | prev_stride = out_strides[ind-1]
177 | out_strides.append(prev_stride)
178 | models.append(EmptyModule())
179 | elif block['type'] == 'yolo':
180 | yolo_layer = YoloLayer()
181 | anchors = block['anchors'].split(',')
182 | anchor_mask = block['mask'].split(',')
183 | yolo_layer.anchor_mask = [int(i) for i in anchor_mask]
184 | yolo_layer.anchors = [float(i) for i in anchors]
185 | yolo_layer.num_classes = int(block['classes'])
186 | yolo_layer.num_anchors = int(block['num'])
187 | yolo_layer.anchor_step = len(yolo_layer.anchors)//yolo_layer.num_anchors
188 | yolo_layer.stride = prev_stride
189 | out_filters.append(prev_filters)
190 | out_strides.append(prev_stride)
191 | models.append(yolo_layer)
192 | else:
193 | print('unknown type %s' % (block['type']))
194 |
195 | return models
196 |
197 | def load_weights(self, weightfile):
198 | print()
199 | fp = open(weightfile, 'rb')
200 | header = np.fromfile(fp, count=5, dtype=np.int32)
201 | self.header = torch.from_numpy(header)
202 | self.seen = self.header[3]
203 | buf = np.fromfile(fp, dtype = np.float32)
204 | fp.close()
205 |
206 | start = 0
207 | ind = -2
208 | counter = 3
209 | for block in self.blocks:
210 | if start >= buf.size:
211 | break
212 | ind = ind + 1
213 | if block['type'] == 'net':
214 | continue
215 | elif block['type'] == 'convolutional':
216 | model = self.models[ind]
217 | batch_normalize = int(block['batch_normalize'])
218 | if batch_normalize:
219 | start = load_conv_bn(buf, start, model[0], model[1])
220 | else:
221 | start = load_conv(buf, start, model[0])
222 | elif block['type'] == 'upsample':
223 | pass
224 | elif block['type'] == 'route':
225 | pass
226 | elif block['type'] == 'shortcut':
227 | pass
228 | elif block['type'] == 'yolo':
229 | pass
230 | else:
231 | print('unknown type %s' % (block['type']))
232 |
233 | percent_comp = (counter / len(self.blocks)) * 100
234 |
235 | print('Loading weights. Please Wait...{:.2f}% Complete'.format(percent_comp), end = '\r', flush = True)
236 |
237 | counter += 1
238 |
239 |
240 |
241 | def convert2cpu(gpu_matrix):
242 | return torch.FloatTensor(gpu_matrix.size()).copy_(gpu_matrix)
243 |
244 |
245 | def convert2cpu_long(gpu_matrix):
246 | return torch.LongTensor(gpu_matrix.size()).copy_(gpu_matrix)
247 |
248 |
249 | def get_region_boxes(output, conf_thresh, num_classes, anchors, num_anchors, only_objectness = 1, validation = False):
250 | anchor_step = len(anchors)//num_anchors
251 | if output.dim() == 3:
252 | output = output.unsqueeze(0)
253 | batch = output.size(0)
254 | assert(output.size(1) == (5+num_classes)*num_anchors)
255 | h = output.size(2)
256 | w = output.size(3)
257 |
258 | all_boxes = []
259 | output = output.view(batch*num_anchors, 5+num_classes, h*w).transpose(0,1).contiguous().view(5+num_classes, batch*num_anchors*h*w)
260 |
261 | grid_x = torch.linspace(0, w-1, w).repeat(h,1).repeat(batch*num_anchors, 1, 1).view(batch*num_anchors*h*w).type_as(output) #cuda()
262 | grid_y = torch.linspace(0, h-1, h).repeat(w,1).t().repeat(batch*num_anchors, 1, 1).view(batch*num_anchors*h*w).type_as(output) #cuda()
263 | xs = torch.sigmoid(output[0]) + grid_x
264 | ys = torch.sigmoid(output[1]) + grid_y
265 |
266 | anchor_w = torch.Tensor(anchors).view(num_anchors, anchor_step).index_select(1, torch.LongTensor([0]))
267 | anchor_h = torch.Tensor(anchors).view(num_anchors, anchor_step).index_select(1, torch.LongTensor([1]))
268 | anchor_w = anchor_w.repeat(batch, 1).repeat(1, 1, h*w).view(batch*num_anchors*h*w).type_as(output) #cuda()
269 | anchor_h = anchor_h.repeat(batch, 1).repeat(1, 1, h*w).view(batch*num_anchors*h*w).type_as(output) #cuda()
270 | ws = torch.exp(output[2]) * anchor_w
271 | hs = torch.exp(output[3]) * anchor_h
272 |
273 | det_confs = torch.sigmoid(output[4])
274 | cls_confs = torch.nn.Softmax(dim=1)(output[5:5+num_classes].transpose(0,1)).detach()
275 | cls_max_confs, cls_max_ids = torch.max(cls_confs, 1)
276 | cls_max_confs = cls_max_confs.view(-1)
277 | cls_max_ids = cls_max_ids.view(-1)
278 |
279 |
280 | sz_hw = h*w
281 | sz_hwa = sz_hw*num_anchors
282 | det_confs = convert2cpu(det_confs)
283 | cls_max_confs = convert2cpu(cls_max_confs)
284 | cls_max_ids = convert2cpu_long(cls_max_ids)
285 | xs = convert2cpu(xs)
286 | ys = convert2cpu(ys)
287 | ws = convert2cpu(ws)
288 | hs = convert2cpu(hs)
289 | if validation:
290 | cls_confs = convert2cpu(cls_confs.view(-1, num_classes))
291 |
292 | for b in range(batch):
293 | boxes = []
294 | for cy in range(h):
295 | for cx in range(w):
296 | for i in range(num_anchors):
297 | ind = b*sz_hwa + i*sz_hw + cy*w + cx
298 | det_conf = det_confs[ind]
299 | if only_objectness:
300 | conf = det_confs[ind]
301 | else:
302 | conf = det_confs[ind] * cls_max_confs[ind]
303 |
304 | if conf > conf_thresh:
305 | bcx = xs[ind]
306 | bcy = ys[ind]
307 | bw = ws[ind]
308 | bh = hs[ind]
309 | cls_max_conf = cls_max_confs[ind]
310 | cls_max_id = cls_max_ids[ind]
311 | box = [bcx/w, bcy/h, bw/w, bh/h, det_conf, cls_max_conf, cls_max_id]
312 | if (not only_objectness) and validation:
313 | for c in range(num_classes):
314 | tmp_conf = cls_confs[ind][c]
315 | if c != cls_max_id and det_confs[ind]*tmp_conf > conf_thresh:
316 | box.append(tmp_conf)
317 | box.append(c)
318 | boxes.append(box)
319 | all_boxes.append(boxes)
320 |
321 | return all_boxes
322 |
323 |
324 | def parse_cfg(cfgfile):
325 | blocks = []
326 | fp = open(cfgfile, 'r')
327 | block = None
328 | line = fp.readline()
329 | while line != '':
330 | line = line.rstrip()
331 | if line == '' or line[0] == '#':
332 | line = fp.readline()
333 | continue
334 | elif line[0] == '[':
335 | if block:
336 | blocks.append(block)
337 | block = dict()
338 | block['type'] = line.lstrip('[').rstrip(']')
339 | # set default value
340 | if block['type'] == 'convolutional':
341 | block['batch_normalize'] = 0
342 | else:
343 | key,value = line.split('=')
344 | key = key.strip()
345 | if key == 'type':
346 | key = '_type'
347 | value = value.strip()
348 | block[key] = value
349 | line = fp.readline()
350 |
351 | if block:
352 | blocks.append(block)
353 | fp.close()
354 | return blocks
355 |
356 |
357 | def print_cfg(blocks):
358 | print('layer filters size input output');
359 | prev_width = 416
360 | prev_height = 416
361 | prev_filters = 3
362 | out_filters =[]
363 | out_widths =[]
364 | out_heights =[]
365 | ind = -2
366 | for block in blocks:
367 | ind = ind + 1
368 | if block['type'] == 'net':
369 | prev_width = int(block['width'])
370 | prev_height = int(block['height'])
371 | continue
372 | elif block['type'] == 'convolutional':
373 | filters = int(block['filters'])
374 | kernel_size = int(block['size'])
375 | stride = int(block['stride'])
376 | is_pad = int(block['pad'])
377 | pad = (kernel_size-1)//2 if is_pad else 0
378 | width = (prev_width + 2*pad - kernel_size)//stride + 1
379 | height = (prev_height + 2*pad - kernel_size)//stride + 1
380 | print('%5d %-6s %4d %d x %d / %d %3d x %3d x%4d -> %3d x %3d x%4d' % (ind, 'conv', filters, kernel_size, kernel_size, stride, prev_width, prev_height, prev_filters, width, height, filters))
381 | prev_width = width
382 | prev_height = height
383 | prev_filters = filters
384 | out_widths.append(prev_width)
385 | out_heights.append(prev_height)
386 | out_filters.append(prev_filters)
387 | elif block['type'] == 'upsample':
388 | stride = int(block['stride'])
389 | filters = prev_filters
390 | width = prev_width*stride
391 | height = prev_height*stride
392 | print('%5d %-6s * %d %3d x %3d x%4d -> %3d x %3d x%4d' % (ind, 'upsample', stride, prev_width, prev_height, prev_filters, width, height, filters))
393 | prev_width = width
394 | prev_height = height
395 | prev_filters = filters
396 | out_widths.append(prev_width)
397 | out_heights.append(prev_height)
398 | out_filters.append(prev_filters)
399 | elif block['type'] == 'route':
400 | layers = block['layers'].split(',')
401 | layers = [int(i) if int(i) > 0 else int(i)+ind for i in layers]
402 | if len(layers) == 1:
403 | print('%5d %-6s %d' % (ind, 'route', layers[0]))
404 | prev_width = out_widths[layers[0]]
405 | prev_height = out_heights[layers[0]]
406 | prev_filters = out_filters[layers[0]]
407 | elif len(layers) == 2:
408 | print('%5d %-6s %d %d' % (ind, 'route', layers[0], layers[1]))
409 | prev_width = out_widths[layers[0]]
410 | prev_height = out_heights[layers[0]]
411 | assert(prev_width == out_widths[layers[1]])
412 | assert(prev_height == out_heights[layers[1]])
413 | prev_filters = out_filters[layers[0]] + out_filters[layers[1]]
414 | out_widths.append(prev_width)
415 | out_heights.append(prev_height)
416 | out_filters.append(prev_filters)
417 | elif block['type'] in ['region', 'yolo']:
418 | print('%5d %-6s' % (ind, 'detection'))
419 | out_widths.append(prev_width)
420 | out_heights.append(prev_height)
421 | out_filters.append(prev_filters)
422 | elif block['type'] == 'shortcut':
423 | from_id = int(block['from'])
424 | from_id = from_id if from_id > 0 else from_id+ind
425 | print('%5d %-6s %d' % (ind, 'shortcut', from_id))
426 | prev_width = out_widths[from_id]
427 | prev_height = out_heights[from_id]
428 | prev_filters = out_filters[from_id]
429 | out_widths.append(prev_width)
430 | out_heights.append(prev_height)
431 | out_filters.append(prev_filters)
432 | else:
433 | print('unknown type %s' % (block['type']))
434 |
435 |
436 | def load_conv(buf, start, conv_model):
437 | num_w = conv_model.weight.numel()
438 | num_b = conv_model.bias.numel()
439 | conv_model.bias.data.copy_(torch.from_numpy(buf[start:start+num_b])); start = start + num_b
440 | conv_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_w]).view_as(conv_model.weight.data)); start = start + num_w
441 | return start
442 |
443 |
444 | def load_conv_bn(buf, start, conv_model, bn_model):
445 | num_w = conv_model.weight.numel()
446 | num_b = bn_model.bias.numel()
447 | bn_model.bias.data.copy_(torch.from_numpy(buf[start:start+num_b])); start = start + num_b
448 | bn_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_b])); start = start + num_b
449 | bn_model.running_mean.copy_(torch.from_numpy(buf[start:start+num_b])); start = start + num_b
450 | bn_model.running_var.copy_(torch.from_numpy(buf[start:start+num_b])); start = start + num_b
451 | conv_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_w]).view_as(conv_model.weight.data)); start = start + num_w
452 | return start
453 |
--------------------------------------------------------------------------------
/data/coco.names:
--------------------------------------------------------------------------------
1 | person
2 | bicycle
3 | car
4 | motorbike
5 | aeroplane
6 | bus
7 | train
8 | truck
9 | boat
10 | traffic light
11 | fire hydrant
12 | stop sign
13 | parking meter
14 | bench
15 | bird
16 | cat
17 | dog
18 | horse
19 | sheep
20 | cow
21 | elephant
22 | bear
23 | zebra
24 | giraffe
25 | backpack
26 | umbrella
27 | handbag
28 | tie
29 | suitcase
30 | frisbee
31 | skis
32 | snowboard
33 | sports ball
34 | kite
35 | baseball bat
36 | baseball glove
37 | skateboard
38 | surfboard
39 | tennis racket
40 | bottle
41 | wine glass
42 | cup
43 | fork
44 | knife
45 | spoon
46 | bowl
47 | banana
48 | apple
49 | sandwich
50 | orange
51 | broccoli
52 | carrot
53 | hot dog
54 | pizza
55 | donut
56 | cake
57 | chair
58 | sofa
59 | pottedplant
60 | bed
61 | diningtable
62 | toilet
63 | tvmonitor
64 | laptop
65 | mouse
66 | remote
67 | keyboard
68 | cell phone
69 | microwave
70 | oven
71 | toaster
72 | sink
73 | refrigerator
74 | book
75 | clock
76 | vase
77 | scissors
78 | teddy bear
79 | hair drier
80 | toothbrush
81 |
--------------------------------------------------------------------------------
/images/bar.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/images/bar.jpeg
--------------------------------------------------------------------------------
/images/city_scene.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/images/city_scene.jpg
--------------------------------------------------------------------------------
/images/class.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/images/class.jpg
--------------------------------------------------------------------------------
/images/dog.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/images/dog.jpg
--------------------------------------------------------------------------------
/images/home.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/images/home.jpeg
--------------------------------------------------------------------------------
/images/meeting.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/images/meeting.jpeg
--------------------------------------------------------------------------------
/images/snack.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/images/snack.jpg
--------------------------------------------------------------------------------
/instance/README.md:
--------------------------------------------------------------------------------
1 | Two folders will be created here: uploads and output.
2 |
3 | Every time an image is submitted to the server, it will be stored in the uploads folder. After processing, the bounding boxes of the recognised classes and their respective confidence levels are plotted on the image and stored in the output folder.
--------------------------------------------------------------------------------
/iti/Title Background.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/iti/Title Background.gif
--------------------------------------------------------------------------------
/iti/image:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/iti/postman.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/iti/postman.png
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | click==7.1.2
2 | cycler==0.10.0
3 | Flask==1.1.2
4 | Flask-Cors==3.0.8
5 | itsdangerous==1.1.0
6 | Jinja2==2.11.2
7 | kiwisolver==1.2.0
8 | MarkupSafe==1.1.1
9 | matplotlib==3.2.1
10 | opencv-python==4.2.0.34
11 | pyparsing==2.4.7
12 | python-dateutil==2.8.1
13 | six==1.14.0
14 | Werkzeug==1.0.1
--------------------------------------------------------------------------------
/sample_output/20200521_233133_570.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/sample_output/20200521_233133_570.jpg
--------------------------------------------------------------------------------
/sample_output/20200521_233208_33.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/sample_output/20200521_233208_33.jpg
--------------------------------------------------------------------------------
/sample_output/20200521_233222_914.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/sample_output/20200521_233222_914.jpg
--------------------------------------------------------------------------------
/sample_output/20200521_233233_695.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yankai364/Object-Detection-Flask-API/f14751ca3606e59d677a1634faf8be8e916bef53/sample_output/20200521_233233_695.jpg
--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
1 | import time
2 | import torch
3 | import numpy as np
4 | import matplotlib.pyplot as plt
5 | import matplotlib.patches as patches
6 | import os
7 |
8 |
9 | def boxes_iou(box1, box2):
10 |
11 | # Get the Width and Height of each bounding box
12 | width_box1 = box1[2]
13 | height_box1 = box1[3]
14 | width_box2 = box2[2]
15 | height_box2 = box2[3]
16 |
17 | # Calculate the area of the each bounding box
18 | area_box1 = width_box1 * height_box1
19 | area_box2 = width_box2 * height_box2
20 |
21 | # Find the vertical edges of the union of the two bounding boxes
22 | mx = min(box1[0] - width_box1/2.0, box2[0] - width_box2/2.0)
23 | Mx = max(box1[0] + width_box1/2.0, box2[0] + width_box2/2.0)
24 |
25 | # Calculate the width of the union of the two bounding boxes
26 | union_width = Mx - mx
27 |
28 | # Find the horizontal edges of the union of the two bounding boxes
29 | my = min(box1[1] - height_box1/2.0, box2[1] - height_box2/2.0)
30 | My = max(box1[1] + height_box1/2.0, box2[1] + height_box2/2.0)
31 |
32 | # Calculate the height of the union of the two bounding boxes
33 | union_height = My - my
34 |
35 | # Calculate the width and height of the area of intersection of the two bounding boxes
36 | intersection_width = width_box1 + width_box2 - union_width
37 | intersection_height = height_box1 + height_box2 - union_height
38 |
39 | # If the the boxes don't overlap then their IOU is zero
40 | if intersection_width <= 0 or intersection_height <= 0:
41 | return 0.0
42 |
43 | # Calculate the area of intersection of the two bounding boxes
44 | intersection_area = intersection_width * intersection_height
45 |
46 | # Calculate the area of the union of the two bounding boxes
47 | union_area = area_box1 + area_box2 - intersection_area
48 |
49 | # Calculate the IOU
50 | iou = intersection_area/union_area
51 |
52 | return iou
53 |
54 |
55 | def nms(boxes, iou_thresh):
56 |
57 | # If there are no bounding boxes do nothing
58 | if len(boxes) == 0:
59 | return boxes
60 |
61 | # Create a PyTorch Tensor to keep track of the detection confidence
62 | # of each predicted bounding box
63 | det_confs = torch.zeros(len(boxes))
64 |
65 | # Get the detection confidence of each predicted bounding box
66 | for i in range(len(boxes)):
67 | det_confs[i] = boxes[i][4]
68 |
69 | # Sort the indices of the bounding boxes by detection confidence value in descending order.
70 | # We ignore the first returned element since we are only interested in the sorted indices
71 | _,sortIds = torch.sort(det_confs, descending = True)
72 |
73 | # Create an empty list to hold the best bounding boxes after
74 | # Non-Maximal Suppression (NMS) is performed
75 | best_boxes = []
76 |
77 | # Perform Non-Maximal Suppression
78 | for i in range(len(boxes)):
79 |
80 | # Get the bounding box with the highest detection confidence first
81 | box_i = boxes[sortIds[i]]
82 |
83 | # Check that the detection confidence is not zero
84 | if box_i[4] > 0:
85 |
86 | # Save the bounding box
87 | best_boxes.append(box_i)
88 |
89 | # Go through the rest of the bounding boxes in the list and calculate their IOU with
90 | # respect to the previous selected box_i.
91 | for j in range(i + 1, len(boxes)):
92 | box_j = boxes[sortIds[j]]
93 |
94 | # If the IOU of box_i and box_j is higher than the given IOU threshold set
95 | # box_j's detection confidence to zero.
96 | if boxes_iou(box_i, box_j) > iou_thresh:
97 | box_j[4] = 0
98 |
99 | return best_boxes
100 |
101 |
102 | def detect_objects(model, img, iou_thresh, nms_thresh):
103 |
104 | # Start the time. This is done to calculate how long the detection takes.
105 | start = time.time()
106 |
107 | # Set the model to evaluation mode.
108 | model.eval()
109 |
110 | # Convert the image from a NumPy ndarray to a PyTorch Tensor of the correct shape.
111 | # The image is transposed, then converted to a FloatTensor of dtype float32, then
112 | # Normalized to values between 0 and 1, and finally unsqueezed to have the correct
113 | # shape of 1 x 3 x 416 x 416
114 | img = torch.from_numpy(img.transpose(2,0,1)).float().div(255.0).unsqueeze(0)
115 |
116 | # Feed the image to the neural network with the corresponding NMS threshold.
117 | # The first step in NMS is to remove all bounding boxes that have a very low
118 | # probability of detection. All predicted bounding boxes with a value less than
119 | # the given NMS threshold will be removed.
120 | list_boxes = model(img, nms_thresh)
121 |
122 | # Make a new list with all the bounding boxes returned by the neural network
123 | boxes = list_boxes[0][0] + list_boxes[1][0] + list_boxes[2][0]
124 |
125 | # Perform the second step of NMS on the bounding boxes returned by the neural network.
126 | # In this step, we only keep the best bounding boxes by eliminating all the bounding boxes
127 | # whose IOU value is higher than the given IOU threshold
128 | boxes = nms(boxes, iou_thresh)
129 |
130 | # Stop the time.
131 | finish = time.time()
132 |
133 | # Print the time it took to detect objects
134 | print('\n\nIt took {:.3f}'.format(finish - start), 'seconds to detect the objects in the image.\n')
135 |
136 | # Print the number of objects detected
137 | print('Number of Objects Detected:', len(boxes), '\n')
138 |
139 | return boxes
140 |
141 |
142 | def load_class_names(namesfile):
143 |
144 | # Create an empty list to hold the object classes
145 | class_names = []
146 |
147 | # Open the file containing the COCO object classes in read-only mode
148 | with open(namesfile, 'r') as fp:
149 |
150 | # The coco.names file contains only one object class per line.
151 | # Read the file line by line and save all the lines in a list.
152 | lines = fp.readlines()
153 |
154 | # Get the object class names
155 | for line in lines:
156 |
157 | # Make a copy of each line with any trailing whitespace removed
158 | line = line.rstrip()
159 |
160 | # Save the object class name into class_names
161 | class_names.append(line)
162 |
163 | return class_names
164 |
165 |
166 | def print_objects(boxes, class_names):
167 | print('Objects Found and Confidence Level:\n')
168 | objects_count = {}
169 | objects_confidence = []
170 | for i in range(len(boxes)):
171 | box = boxes[i]
172 | if len(box) >= 7 and class_names:
173 | cls_conf = box[5]
174 | cls_id = box[6]
175 | print('%i. %s: %f' % (i + 1, class_names[cls_id], cls_conf))
176 | if class_names[cls_id] in objects_count:
177 | objects_count[class_names[cls_id]] += 1
178 | else:
179 | objects_count[class_names[cls_id]] = 1
180 | objects_confidence.append({class_names[cls_id]: round(float(cls_conf), 6)})
181 |
182 | return objects_count, objects_confidence
183 |
184 |
185 | def plot_boxes(img, boxes, class_names, output_dir, filename, plot_labels = True, color = None):
186 |
187 | # Define a tensor used to set the colors of the bounding boxes
188 | colors = torch.FloatTensor([[1,0,1],[0,0,1],[0,1,1],[0,1,0],[1,1,0],[1,0,0]])
189 |
190 | # Define a function to set the colors of the bounding boxes
191 | def get_color(c, x, max_val):
192 | ratio = float(x) / max_val * 5
193 | i = int(np.floor(ratio))
194 | j = int(np.ceil(ratio))
195 |
196 | ratio = ratio - i
197 | r = (1 - ratio) * colors[i][c] + ratio * colors[j][c]
198 |
199 | return int(r * 255)
200 |
201 | # Get the width and height of the image
202 | width = img.shape[1]
203 | height = img.shape[0]
204 |
205 | # Create a figure and plot the image
206 | fig, a = plt.subplots(1,1)
207 | a.imshow(img)
208 |
209 | # Plot the bounding boxes and corresponding labels on top of the image
210 | for i in range(len(boxes)):
211 |
212 | # Get the ith bounding box
213 | box = boxes[i]
214 |
215 | # Get the (x,y) pixel coordinates of the lower-left and lower-right corners
216 | # of the bounding box relative to the size of the image.
217 | x1 = int(np.around((box[0] - box[2]/2.0) * width))
218 | y1 = int(np.around((box[1] - box[3]/2.0) * height))
219 | x2 = int(np.around((box[0] + box[2]/2.0) * width))
220 | y2 = int(np.around((box[1] + box[3]/2.0) * height))
221 |
222 | # Set the default rgb value to red
223 | rgb = (1, 0, 0)
224 |
225 | # Use the same color to plot the bounding boxes of the same object class
226 | if len(box) >= 7 and class_names:
227 | cls_conf = box[5]
228 | cls_id = box[6]
229 | classes = len(class_names)
230 | offset = cls_id * 123457 % classes
231 | red = get_color(2, offset, classes) / 255
232 | green = get_color(1, offset, classes) / 255
233 | blue = get_color(0, offset, classes) / 255
234 |
235 | # If a color is given then set rgb to the given color instead
236 | if color is None:
237 | rgb = (red, green, blue)
238 | else:
239 | rgb = color
240 |
241 | # Calculate the width and height of the bounding box relative to the size of the image.
242 | width_x = x2 - x1
243 | width_y = y1 - y2
244 |
245 | # Set the postion and size of the bounding box. (x1, y2) is the pixel coordinate of the
246 | # lower-left corner of the bounding box relative to the size of the image.
247 | rect = patches.Rectangle((x1, y2),
248 | width_x, width_y,
249 | linewidth = 2,
250 | edgecolor = rgb,
251 | facecolor = 'none')
252 |
253 | # Draw the bounding box on top of the image
254 | a.add_patch(rect)
255 |
256 | # If plot_labels = True then plot the corresponding label
257 | if plot_labels:
258 |
259 | # Create a string with the object class name and the corresponding object class probability
260 | conf_tx = class_names[cls_id] + ': {0}%'.format(int(cls_conf * 100))
261 |
262 | # Define x and y offsets for the labels
263 | lxc = (img.shape[1] * 0.266) / 100
264 | lyc = (img.shape[0] * 1.180) / 100
265 |
266 | # Draw the labels on top of the image
267 | a.text(x1 + lxc, y1 - lyc, conf_tx, fontsize = 24, color = 'k',
268 | bbox = dict(facecolor = rgb, edgecolor = rgb, alpha = 0.8))
269 |
270 | plt.axis('off')
271 | plt.savefig(os.path.join(output_dir, filename + '.jpg'), bbox_inches='tight', pad_inches = 0)
--------------------------------------------------------------------------------
/weights/README.md:
--------------------------------------------------------------------------------
1 | YOLOv3 .weights files are to be placed here.
2 |
3 | If you do not have your own trained model, you can download the YOLOv3 pre-trained weight file by Darknet here:
4 | https://pjreddie.com/media/files/yolov3.weights
--------------------------------------------------------------------------------
/yolo.py:
--------------------------------------------------------------------------------
1 | import cv2
2 | import matplotlib.pyplot as plt
3 |
4 | from utils import *
5 | from darknet import Darknet
6 |
7 |
8 | def process(uploads_dir, output_dir, filename):
9 |
10 | # Set the location and name of the cfg file
11 | cfg_file = './cfg/yolov3.cfg'
12 |
13 | # Set the location and name of the pre-trained weights file
14 | weight_file = './weights/yolov3.weights'
15 |
16 | # Set the location and name of the COCO object classes file
17 | namesfile = 'data/coco.names'
18 |
19 | # Load the network architecture
20 | m = Darknet(cfg_file)
21 |
22 | # Load the pre-trained weights
23 | m.load_weights(weight_file)
24 |
25 | # Load the COCO object classes
26 | class_names = load_class_names(namesfile)
27 |
28 | # Set the default figure size
29 | plt.rcParams['figure.figsize'] = [24.0, 14.0]
30 |
31 | # Set the NMS threshold
32 | nms_thresh = 0.6
33 |
34 | # Set the IOU threshold
35 | iou_thresh = 0.4
36 |
37 | # Set the default figure size
38 | plt.rcParams['figure.figsize'] = [24.0, 14.0]
39 |
40 | # Load the image
41 | img = cv2.imread(uploads_dir + '/' + filename + '.jpg')
42 |
43 | # Convert the image to RGB
44 | original_image = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
45 |
46 | # We resize the image to the input width and height of the first layer of the network.
47 | resized_image = cv2.resize(original_image, (m.width, m.height))
48 |
49 | # Set the IOU threshold. Default value is 0.4
50 | iou_thresh = 0.4
51 |
52 | # Set the NMS threshold. Default value is 0.6
53 | nms_thresh = 0.6
54 |
55 | # Detect objects in the image
56 | boxes = detect_objects(m, resized_image, iou_thresh, nms_thresh)
57 |
58 | # Print and save the objects found and their confidence levels
59 | objects_count, objects_confidence = print_objects(boxes, class_names)
60 |
61 | # Plot the image with bounding boxes and corresponding object class labels
62 | plot_boxes(original_image, boxes, class_names, output_dir, filename)
63 |
64 | return objects_count, objects_confidence
--------------------------------------------------------------------------------