├── .gitignore ├── LICENSE ├── README.md ├── keras_video_object_detector ├── __init__.py ├── demo │ ├── __init__.py │ ├── detect_objects_in_camera.py │ ├── detect_objects_in_video.py │ └── font │ │ ├── FiraMono-Medium.otf │ │ └── SIL Open Font License.txt ├── library │ ├── __init__.py │ ├── download_utils.py │ ├── video_utils.py │ ├── yad2k │ │ ├── models │ │ │ ├── keras_darknet19.py │ │ │ └── keras_yolo.py │ │ └── utils │ │ │ ├── __init__.py │ │ │ └── utils.py │ ├── yolo.py │ └── yolo_utils.py └── models │ ├── coco_classes.txt │ ├── object_classes.txt │ └── yolo_anchors.txt ├── notes ├── ReadMe.md └── evaluation.md ├── requirements.txt ├── setup.cfg └── setup.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | .idea/ 10 | *.iml 11 | 12 | keras_video_object_detector/models/yolo.h5 13 | keras_video_object_detector/demo/videos 14 | keras_video_object_detector/demo/frames 15 | 16 | # Distribution / packaging 17 | .Python 18 | env/ 19 | build/ 20 | develop-eggs/ 21 | dist/ 22 | downloads/ 23 | eggs/ 24 | .eggs/ 25 | lib/ 26 | lib64/ 27 | parts/ 28 | sdist/ 29 | var/ 30 | wheels/ 31 | *.egg-info/ 32 | .installed.cfg 33 | *.egg 34 | 35 | # PyInstaller 36 | # Usually these files are written by a python script from a template 37 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 38 | *.manifest 39 | *.spec 40 | 41 | # Installer logs 42 | pip-log.txt 43 | pip-delete-this-directory.txt 44 | 45 | # Unit test / coverage reports 46 | htmlcov/ 47 | .tox/ 48 | .coverage 49 | .coverage.* 50 | .cache 51 | nosetests.xml 52 | coverage.xml 53 | *.cover 54 | .hypothesis/ 55 | 56 | # Translations 57 | *.mo 58 | *.pot 59 | 60 | # Django stuff: 61 | *.log 62 | local_settings.py 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | target/ 76 | 77 | # Jupyter Notebook 78 | .ipynb_checkpoints 79 | 80 | # pyenv 81 | .python-version 82 | 83 | # celery beat schedule file 84 | celerybeat-schedule 85 | 86 | # SageMath parsed files 87 | *.sage.py 88 | 89 | # dotenv 90 | .env 91 | 92 | # virtualenv 93 | .venv 94 | venv/ 95 | ENV/ 96 | 97 | # Spyder project settings 98 | .spyderproject 99 | .spyproject 100 | 101 | # Rope project settings 102 | .ropeproject 103 | 104 | # mkdocs documentation 105 | /site 106 | 107 | # mypy 108 | .mypy_cache/ 109 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Xianshun Chen 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # keras-video-object-detector 2 | 3 | Object detector in videos using keras and YOLO 4 | 5 | # Usage 6 | 7 | ### Detect objects in a video file using YOLO algorithm 8 | 9 | The demo code below can be found in [keras_video_object_detector/demo/detect_objects_in_video.py](keras_video_object_detector/demo/detect_objects_in_video.py) 10 | 11 | The demo codes takes in a sample video and output another video that has the detected boxes with class labels 12 | 13 | ```python 14 | from keras_video_object_detector.library.download_utils import download_file 15 | from keras_video_object_detector.library.yolo import YoloObjectDetector 16 | 17 | model_dir_path = 'keras_video_object_detector/models' 18 | 19 | video_file_path = 'keras_video_object_detector/demo/videos/road_video.mp4' 20 | output_video_file_path = 'keras_video_object_detector/demo/videos/predicted_video.mp4' 21 | temp_image_folder = 'frames' 22 | 23 | # download the test video file if not exists 24 | download_file(video_file_path, url_path='https://www.dropbox.com/s/9nlph8ha6g1kxhw/road_video.mp4?dl=1') 25 | 26 | detector = YoloObjectDetector() 27 | detector.load_model(model_dir_path) 28 | 29 | result = detector.detect_objects_in_video(video_file_path=video_file_path, 30 | output_video_path=output_video_file_path, 31 | temp_image_folder=temp_image_folder) 32 | ``` 33 | 34 | ### Real-time Detect objects in a camera using YOLO algorithm 35 | 36 | The demo code below can be found in [keras_video_object_detector/demo/detect_objects_in_camera.py](keras_video_object_detector/demo/detect_objects_in_camera.py) 37 | 38 | The demo codes uses the camera from opencv-pyton and adds the detected boxes with class labels to the camera frames: 39 | 40 | ```python 41 | import cv2 42 | from keras_video_object_detector.library.yolo import YoloObjectDetector 43 | 44 | model_dir_path = 'keras_video_object_detector/models' 45 | 46 | detector = YoloObjectDetector() 47 | detector.load_model(model_dir_path) 48 | 49 | camera = cv2.VideoCapture(0) 50 | 51 | detector.detect_objects_in_camera(camera=camera) 52 | 53 | camera.release() 54 | cv2.destroyAllWindows() 55 | ``` -------------------------------------------------------------------------------- /keras_video_object_detector/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/chen0040/keras-video-object-detector/52f07ff4047dcc8732015c3debba1fa3eb7f2c56/keras_video_object_detector/__init__.py -------------------------------------------------------------------------------- /keras_video_object_detector/demo/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/chen0040/keras-video-object-detector/52f07ff4047dcc8732015c3debba1fa3eb7f2c56/keras_video_object_detector/demo/__init__.py -------------------------------------------------------------------------------- /keras_video_object_detector/demo/detect_objects_in_camera.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | from keras_video_object_detector.library.yolo import YoloObjectDetector 3 | 4 | 5 | def main(): 6 | model_dir_path = '../models' 7 | 8 | detector = YoloObjectDetector() 9 | detector.load_model(model_dir_path) 10 | 11 | camera = cv2.VideoCapture(0) 12 | 13 | detector.detect_objects_in_camera(camera=camera) 14 | 15 | camera.release() 16 | cv2.destroyAllWindows() 17 | 18 | 19 | if __name__ == '__main__': 20 | main() 21 | -------------------------------------------------------------------------------- /keras_video_object_detector/demo/detect_objects_in_video.py: -------------------------------------------------------------------------------- 1 | from keras_video_object_detector.library.download_utils import download_file 2 | from keras_video_object_detector.library.yolo import YoloObjectDetector 3 | 4 | 5 | def main(): 6 | model_dir_path = '../models' 7 | 8 | video_file_path = 'videos/road_video.mp4' 9 | output_video_file_path = 'videos/predicted_video.mp4' 10 | temp_image_folder = 'frames' 11 | 12 | # download the test video file if not exists 13 | download_file(video_file_path, url_path='https://www.dropbox.com/s/9nlph8ha6g1kxhw/road_video.mp4?dl=1') 14 | 15 | detector = YoloObjectDetector() 16 | detector.load_model(model_dir_path) 17 | 18 | detector.detect_objects_in_video(video_file_path=video_file_path, 19 | output_video_path=output_video_file_path, 20 | temp_image_folder=temp_image_folder) 21 | 22 | 23 | if __name__ == '__main__': 24 | main() 25 | -------------------------------------------------------------------------------- /keras_video_object_detector/demo/font/FiraMono-Medium.otf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/chen0040/keras-video-object-detector/52f07ff4047dcc8732015c3debba1fa3eb7f2c56/keras_video_object_detector/demo/font/FiraMono-Medium.otf -------------------------------------------------------------------------------- /keras_video_object_detector/demo/font/SIL Open Font License.txt: -------------------------------------------------------------------------------- 1 | Copyright (c) 2014, Mozilla Foundation https://mozilla.org/ with Reserved Font Name Fira Mono. 2 | 3 | Copyright (c) 2014, Telefonica S.A. 4 | 5 | This Font Software is licensed under the SIL Open Font License, Version 1.1. 6 | This license is copied below, and is also available with a FAQ at: http://scripts.sil.org/OFL 7 | 8 | ----------------------------------------------------------- 9 | SIL OPEN FONT LICENSE Version 1.1 - 26 February 2007 10 | ----------------------------------------------------------- 11 | 12 | PREAMBLE 13 | The goals of the Open Font License (OFL) are to stimulate worldwide development of collaborative font projects, to support the font creation efforts of academic and linguistic communities, and to provide a free and open framework in which fonts may be shared and improved in partnership with others. 14 | 15 | The OFL allows the licensed fonts to be used, studied, modified and redistributed freely as long as they are not sold by themselves. The fonts, including any derivative works, can be bundled, embedded, redistributed and/or sold with any software provided that any reserved names are not used by derivative works. The fonts and derivatives, however, cannot be released under any other type of license. The requirement for fonts to remain under this license does not apply to any document created using the fonts or their derivatives. 16 | 17 | DEFINITIONS 18 | "Font Software" refers to the set of files released by the Copyright Holder(s) under this license and clearly marked as such. This may include source files, build scripts and documentation. 19 | 20 | "Reserved Font Name" refers to any names specified as such after the copyright statement(s). 21 | 22 | "Original Version" refers to the collection of Font Software components as distributed by the Copyright Holder(s). 23 | 24 | "Modified Version" refers to any derivative made by adding to, deleting, or substituting -- in part or in whole -- any of the components of the Original Version, by changing formats or by porting the Font Software to a new environment. 25 | 26 | "Author" refers to any designer, engineer, programmer, technical writer or other person who contributed to the Font Software. 27 | 28 | PERMISSION & CONDITIONS 29 | Permission is hereby granted, free of charge, to any person obtaining a copy of the Font Software, to use, study, copy, merge, embed, modify, redistribute, and sell modified and unmodified copies of the Font Software, subject to the following conditions: 30 | 31 | 1) Neither the Font Software nor any of its individual components, in Original or Modified Versions, may be sold by itself. 32 | 33 | 2) Original or Modified Versions of the Font Software may be bundled, redistributed and/or sold with any software, provided that each copy contains the above copyright notice and this license. These can be included either as stand-alone text files, human-readable headers or in the appropriate machine-readable metadata fields within text or binary files as long as those fields can be easily viewed by the user. 34 | 35 | 3) No Modified Version of the Font Software may use the Reserved Font Name(s) unless explicit written permission is granted by the corresponding Copyright Holder. This restriction only applies to the primary font name as presented to the users. 36 | 37 | 4) The name(s) of the Copyright Holder(s) or the Author(s) of the Font Software shall not be used to promote, endorse or advertise any Modified Version, except to acknowledge the contribution(s) of the Copyright Holder(s) and the Author(s) or with their explicit written permission. 38 | 39 | 5) The Font Software, modified or unmodified, in part or in whole, must be distributed entirely under this license, and must not be distributed under any other license. The requirement for fonts to remain under this license does not apply to any document created using the Font Software. 40 | 41 | TERMINATION 42 | This license becomes null and void if any of the above conditions are not met. 43 | 44 | DISCLAIMER 45 | THE FONT SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF COPYRIGHT, PATENT, TRADEMARK, OR OTHER RIGHT. IN NO EVENT SHALL THE COPYRIGHT HOLDER BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, INCLUDING ANY GENERAL, SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF THE USE OR INABILITY TO USE THE FONT SOFTWARE OR FROM OTHER DEALINGS IN THE FONT SOFTWARE. -------------------------------------------------------------------------------- /keras_video_object_detector/library/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/chen0040/keras-video-object-detector/52f07ff4047dcc8732015c3debba1fa3eb7f2c56/keras_video_object_detector/library/__init__.py -------------------------------------------------------------------------------- /keras_video_object_detector/library/download_utils.py: -------------------------------------------------------------------------------- 1 | import urllib.request 2 | import os 3 | 4 | import sys 5 | 6 | 7 | def reporthook(block_num, block_size, total_size): 8 | read_so_far = block_num * block_size 9 | if total_size > 0: 10 | percent = read_so_far * 1e2 / total_size 11 | s = "\r%5.1f%% %*d / %d" % ( 12 | percent, len(str(total_size)), read_so_far, total_size) 13 | sys.stderr.write(s) 14 | if read_so_far >= total_size: # near the end 15 | sys.stderr.write("\n") 16 | else: # total size is unknown 17 | sys.stderr.write("read %d\n" % (read_so_far,)) 18 | 19 | 20 | def download_file(file_path, url_path): 21 | if not os.path.exists(file_path): 22 | print('file does not exist, downloading from internet') 23 | urllib.request.urlretrieve(url=url_path, filename=file_path, 24 | reporthook=reporthook) 25 | -------------------------------------------------------------------------------- /keras_video_object_detector/library/video_utils.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import os 3 | 4 | 5 | def extract_images(video_input_file_path, image_output_dir_path, image_shape=None, frames_per_second=None): 6 | if frames_per_second is None: 7 | frames_per_second = 20 8 | if not os.path.exists(image_output_dir_path): 9 | os.mkdir(image_output_dir_path) 10 | count = 0 11 | print('Extracting frames from video: ', video_input_file_path) 12 | vidcap = cv2.VideoCapture(video_input_file_path) 13 | success, image = vidcap.read() 14 | success = True 15 | while success: 16 | vidcap.set(cv2.CAP_PROP_POS_MSEC, (count * 1000 // frames_per_second)) # added this line 17 | success, image = vidcap.read() 18 | # print('Read a new frame: ', success) 19 | if success: 20 | if image_shape is not None: 21 | image = cv2.resize(image, image_shape, interpolation=cv2.INTER_AREA) 22 | image_file = image_output_dir_path + os.path.sep + "frame%4d.jpg" % count 23 | cv2.imwrite(image_file, image) # save frame as JPEG file 24 | print('extracting ' + image_file) 25 | count = count + 1 -------------------------------------------------------------------------------- /keras_video_object_detector/library/yad2k/models/keras_darknet19.py: -------------------------------------------------------------------------------- 1 | """Darknet19 Model Defined in Keras.""" 2 | import functools 3 | from functools import partial 4 | 5 | from keras.layers import Conv2D, MaxPooling2D 6 | from keras.layers.advanced_activations import LeakyReLU 7 | from keras.layers.normalization import BatchNormalization 8 | from keras.models import Model 9 | from keras.regularizers import l2 10 | 11 | from ..utils import compose 12 | 13 | # Partial wrapper for Convolution2D with static default argument. 14 | _DarknetConv2D = partial(Conv2D, padding='same') 15 | 16 | 17 | @functools.wraps(Conv2D) 18 | def DarknetConv2D(*args, **kwargs): 19 | """Wrapper to set Darknet weight regularizer for Convolution2D.""" 20 | darknet_conv_kwargs = {'kernel_regularizer': l2(5e-4)} 21 | darknet_conv_kwargs.update(kwargs) 22 | return _DarknetConv2D(*args, **darknet_conv_kwargs) 23 | 24 | 25 | def DarknetConv2D_BN_Leaky(*args, **kwargs): 26 | """Darknet Convolution2D followed by BatchNormalization and LeakyReLU.""" 27 | no_bias_kwargs = {'use_bias': False} 28 | no_bias_kwargs.update(kwargs) 29 | return compose( 30 | DarknetConv2D(*args, **no_bias_kwargs), 31 | BatchNormalization(), 32 | LeakyReLU(alpha=0.1)) 33 | 34 | 35 | def bottleneck_block(outer_filters, bottleneck_filters): 36 | """Bottleneck block of 3x3, 1x1, 3x3 convolutions.""" 37 | return compose( 38 | DarknetConv2D_BN_Leaky(outer_filters, (3, 3)), 39 | DarknetConv2D_BN_Leaky(bottleneck_filters, (1, 1)), 40 | DarknetConv2D_BN_Leaky(outer_filters, (3, 3))) 41 | 42 | 43 | def bottleneck_x2_block(outer_filters, bottleneck_filters): 44 | """Bottleneck block of 3x3, 1x1, 3x3, 1x1, 3x3 convolutions.""" 45 | return compose( 46 | bottleneck_block(outer_filters, bottleneck_filters), 47 | DarknetConv2D_BN_Leaky(bottleneck_filters, (1, 1)), 48 | DarknetConv2D_BN_Leaky(outer_filters, (3, 3))) 49 | 50 | 51 | def darknet_body(): 52 | """Generate first 18 conv layers of Darknet-19.""" 53 | return compose( 54 | DarknetConv2D_BN_Leaky(32, (3, 3)), 55 | MaxPooling2D(), 56 | DarknetConv2D_BN_Leaky(64, (3, 3)), 57 | MaxPooling2D(), 58 | bottleneck_block(128, 64), 59 | MaxPooling2D(), 60 | bottleneck_block(256, 128), 61 | MaxPooling2D(), 62 | bottleneck_x2_block(512, 256), 63 | MaxPooling2D(), 64 | bottleneck_x2_block(1024, 512)) 65 | 66 | 67 | def darknet19(inputs): 68 | """Generate Darknet-19 model for Imagenet classification.""" 69 | body = darknet_body()(inputs) 70 | logits = DarknetConv2D(1000, (1, 1), activation='softmax')(body) 71 | return Model(inputs, logits) 72 | -------------------------------------------------------------------------------- /keras_video_object_detector/library/yad2k/models/keras_yolo.py: -------------------------------------------------------------------------------- 1 | """YOLO_v2 Model Defined in Keras.""" 2 | import sys 3 | 4 | import numpy as np 5 | import tensorflow as tf 6 | from keras import backend as K 7 | from keras.layers import Lambda 8 | from keras.layers.merge import concatenate 9 | from keras.models import Model 10 | 11 | from ..utils import compose 12 | from .keras_darknet19 import (DarknetConv2D, DarknetConv2D_BN_Leaky, darknet_body) 13 | 14 | sys.path.append('..') 15 | 16 | voc_anchors = np.array( 17 | [[1.08, 1.19], [3.42, 4.41], [6.63, 11.38], [9.42, 5.11], [16.62, 10.52]]) 18 | 19 | voc_classes = [ 20 | "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", 21 | "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", 22 | "pottedplant", "sheep", "sofa", "train", "tvmonitor" 23 | ] 24 | 25 | 26 | def space_to_depth_x2(x): 27 | """Thin wrapper for Tensorflow space_to_depth with block_size=2.""" 28 | # Import currently required to make Lambda work. 29 | # See: https://github.com/fchollet/keras/issues/5088#issuecomment-273851273 30 | import tensorflow as tf 31 | return tf.space_to_depth(x, block_size=2) 32 | 33 | 34 | def space_to_depth_x2_output_shape(input_shape): 35 | """Determine space_to_depth output shape for block_size=2. 36 | 37 | Note: For Lambda with TensorFlow backend, output shape may not be needed. 38 | """ 39 | return (input_shape[0], input_shape[1] // 2, input_shape[2] // 2, 4 * 40 | input_shape[3]) if input_shape[1] else (input_shape[0], None, None, 41 | 4 * input_shape[3]) 42 | 43 | 44 | def yolo_body(inputs, num_anchors, num_classes): 45 | """Create YOLO_V2 model CNN body in Keras.""" 46 | darknet = Model(inputs, darknet_body()(inputs)) 47 | conv20 = compose( 48 | DarknetConv2D_BN_Leaky(1024, (3, 3)), 49 | DarknetConv2D_BN_Leaky(1024, (3, 3)))(darknet.output) 50 | 51 | conv13 = darknet.layers[43].output 52 | conv21 = DarknetConv2D_BN_Leaky(64, (1, 1))(conv13) 53 | # TODO: Allow Keras Lambda to use func arguments for output_shape? 54 | conv21_reshaped = Lambda( 55 | space_to_depth_x2, 56 | output_shape=space_to_depth_x2_output_shape, 57 | name='space_to_depth')(conv21) 58 | 59 | x = concatenate([conv21_reshaped, conv20]) 60 | x = DarknetConv2D_BN_Leaky(1024, (3, 3))(x) 61 | x = DarknetConv2D(num_anchors * (num_classes + 5), (1, 1))(x) 62 | return Model(inputs, x) 63 | 64 | 65 | def yolo_head(feats, anchors, num_classes): 66 | """Convert final layer features to bounding box parameters. 67 | 68 | Parameters 69 | ---------- 70 | feats : tensor 71 | Final convolutional layer features. 72 | anchors : array-like 73 | Anchor box widths and heights. 74 | num_classes : int 75 | Number of target classes. 76 | 77 | Returns 78 | ------- 79 | box_xy : tensor 80 | x, y box predictions adjusted by spatial location in conv layer. 81 | box_wh : tensor 82 | w, h box predictions adjusted by anchors and conv spatial resolution. 83 | box_conf : tensor 84 | Probability estimate for whether each box contains any object. 85 | box_class_pred : tensor 86 | Probability distribution estimate for each box over class labels. 87 | """ 88 | num_anchors = len(anchors) 89 | # Reshape to batch, height, width, num_anchors, box_params. 90 | anchors_tensor = K.reshape(K.variable(anchors), [1, 1, 1, num_anchors, 2]) 91 | # Static implementation for fixed models. 92 | # TODO: Remove or add option for static implementation. 93 | # _, conv_height, conv_width, _ = K.int_shape(feats) 94 | # conv_dims = K.variable([conv_width, conv_height]) 95 | 96 | # Dynamic implementation of conv dims for fully convolutional model. 97 | conv_dims = K.shape(feats)[1:3] # assuming channels last 98 | # In YOLO the height index is the inner most iteration. 99 | conv_height_index = K.arange(0, stop=conv_dims[0]) 100 | conv_width_index = K.arange(0, stop=conv_dims[1]) 101 | conv_height_index = K.tile(conv_height_index, [conv_dims[1]]) 102 | 103 | # TODO: Repeat_elements and tf.split doesn't support dynamic splits. 104 | # conv_width_index = K.repeat_elements(conv_width_index, conv_dims[1], axis=0) 105 | conv_width_index = K.tile(K.expand_dims(conv_width_index, 0), [conv_dims[0], 1]) 106 | conv_width_index = K.flatten(K.transpose(conv_width_index)) 107 | conv_index = K.transpose(K.stack([conv_height_index, conv_width_index])) 108 | conv_index = K.reshape(conv_index, [1, conv_dims[0], conv_dims[1], 1, 2]) 109 | conv_index = K.cast(conv_index, K.dtype(feats)) 110 | 111 | feats = K.reshape(feats, [-1, conv_dims[0], conv_dims[1], num_anchors, num_classes + 5]) 112 | conv_dims = K.cast(K.reshape(conv_dims, [1, 1, 1, 1, 2]), K.dtype(feats)) 113 | 114 | # Static generation of conv_index: 115 | # conv_index = np.array([_ for _ in np.ndindex(conv_width, conv_height)]) 116 | # conv_index = conv_index[:, [1, 0]] # swap columns for YOLO ordering. 117 | # conv_index = K.variable( 118 | # conv_index.reshape(1, conv_height, conv_width, 1, 2)) 119 | # feats = Reshape( 120 | # (conv_dims[0], conv_dims[1], num_anchors, num_classes + 5))(feats) 121 | 122 | box_confidence = K.sigmoid(feats[..., 4:5]) 123 | box_xy = K.sigmoid(feats[..., :2]) 124 | box_wh = K.exp(feats[..., 2:4]) 125 | box_class_probs = K.softmax(feats[..., 5:]) 126 | 127 | # Adjust preditions to each spatial grid point and anchor size. 128 | # Note: YOLO iterates over height index before width index. 129 | box_xy = (box_xy + conv_index) / conv_dims 130 | box_wh = box_wh * anchors_tensor / conv_dims 131 | 132 | return box_confidence, box_xy, box_wh, box_class_probs 133 | 134 | 135 | def yolo_boxes_to_corners(box_xy, box_wh): 136 | """Convert YOLO box predictions to bounding box corners.""" 137 | box_mins = box_xy - (box_wh / 2.) 138 | box_maxes = box_xy + (box_wh / 2.) 139 | 140 | return K.concatenate([ 141 | box_mins[..., 1:2], # y_min 142 | box_mins[..., 0:1], # x_min 143 | box_maxes[..., 1:2], # y_max 144 | box_maxes[..., 0:1] # x_max 145 | ]) 146 | 147 | 148 | def yolo_loss(args, 149 | anchors, 150 | num_classes, 151 | rescore_confidence=False, 152 | print_loss=False): 153 | """YOLO localization loss function. 154 | 155 | Parameters 156 | ---------- 157 | yolo_output : tensor 158 | Final convolutional layer features. 159 | 160 | true_boxes : tensor 161 | Ground truth boxes tensor with shape [batch, num_true_boxes, 5] 162 | containing box x_center, y_center, width, height, and class. 163 | 164 | detectors_mask : array 165 | 0/1 mask for detector positions where there is a matching ground truth. 166 | 167 | matching_true_boxes : array 168 | Corresponding ground truth boxes for positive detector positions. 169 | Already adjusted for conv height and width. 170 | 171 | anchors : tensor 172 | Anchor boxes for model. 173 | 174 | num_classes : int 175 | Number of object classes. 176 | 177 | rescore_confidence : bool, default=False 178 | If true then set confidence target to IOU of best predicted box with 179 | the closest matching ground truth box. 180 | 181 | print_loss : bool, default=False 182 | If True then use a tf.Print() to print the loss components. 183 | 184 | Returns 185 | ------- 186 | mean_loss : float 187 | mean localization loss across minibatch 188 | """ 189 | (yolo_output, true_boxes, detectors_mask, matching_true_boxes) = args 190 | num_anchors = len(anchors) 191 | object_scale = 5 192 | no_object_scale = 1 193 | class_scale = 1 194 | coordinates_scale = 1 195 | pred_xy, pred_wh, pred_confidence, pred_class_prob = yolo_head( 196 | yolo_output, anchors, num_classes) 197 | 198 | # Unadjusted box predictions for loss. 199 | # TODO: Remove extra computation shared with yolo_head. 200 | yolo_output_shape = K.shape(yolo_output) 201 | feats = K.reshape(yolo_output, [ 202 | -1, yolo_output_shape[1], yolo_output_shape[2], num_anchors, 203 | num_classes + 5 204 | ]) 205 | pred_boxes = K.concatenate( 206 | (K.sigmoid(feats[..., 0:2]), feats[..., 2:4]), axis=-1) 207 | 208 | # TODO: Adjust predictions by image width/height for non-square images? 209 | # IOUs may be off due to different aspect ratio. 210 | 211 | # Expand pred x,y,w,h to allow comparison with ground truth. 212 | # batch, conv_height, conv_width, num_anchors, num_true_boxes, box_params 213 | pred_xy = K.expand_dims(pred_xy, 4) 214 | pred_wh = K.expand_dims(pred_wh, 4) 215 | 216 | pred_wh_half = pred_wh / 2. 217 | pred_mins = pred_xy - pred_wh_half 218 | pred_maxes = pred_xy + pred_wh_half 219 | 220 | true_boxes_shape = K.shape(true_boxes) 221 | 222 | # batch, conv_height, conv_width, num_anchors, num_true_boxes, box_params 223 | true_boxes = K.reshape(true_boxes, [ 224 | true_boxes_shape[0], 1, 1, 1, true_boxes_shape[1], true_boxes_shape[2] 225 | ]) 226 | true_xy = true_boxes[..., 0:2] 227 | true_wh = true_boxes[..., 2:4] 228 | 229 | # Find IOU of each predicted box with each ground truth box. 230 | true_wh_half = true_wh / 2. 231 | true_mins = true_xy - true_wh_half 232 | true_maxes = true_xy + true_wh_half 233 | 234 | intersect_mins = K.maximum(pred_mins, true_mins) 235 | intersect_maxes = K.minimum(pred_maxes, true_maxes) 236 | intersect_wh = K.maximum(intersect_maxes - intersect_mins, 0.) 237 | intersect_areas = intersect_wh[..., 0] * intersect_wh[..., 1] 238 | 239 | pred_areas = pred_wh[..., 0] * pred_wh[..., 1] 240 | true_areas = true_wh[..., 0] * true_wh[..., 1] 241 | 242 | union_areas = pred_areas + true_areas - intersect_areas 243 | iou_scores = intersect_areas / union_areas 244 | 245 | # Best IOUs for each location. 246 | best_ious = K.max(iou_scores, axis=4) # Best IOU scores. 247 | best_ious = K.expand_dims(best_ious) 248 | 249 | # A detector has found an object if IOU > thresh for some true box. 250 | object_detections = K.cast(best_ious > 0.6, K.dtype(best_ious)) 251 | 252 | # TODO: Darknet region training includes extra coordinate loss for early 253 | # training steps to encourage predictions to match anchor priors. 254 | 255 | # Determine confidence weights from object and no_object weights. 256 | # NOTE: YOLO does not use binary cross-entropy here. 257 | no_object_weights = (no_object_scale * (1 - object_detections) * 258 | (1 - detectors_mask)) 259 | no_objects_loss = no_object_weights * K.square(-pred_confidence) 260 | 261 | if rescore_confidence: 262 | objects_loss = (object_scale * detectors_mask * 263 | K.square(best_ious - pred_confidence)) 264 | else: 265 | objects_loss = (object_scale * detectors_mask * 266 | K.square(1 - pred_confidence)) 267 | confidence_loss = objects_loss + no_objects_loss 268 | 269 | # Classification loss for matching detections. 270 | # NOTE: YOLO does not use categorical cross-entropy loss here. 271 | matching_classes = K.cast(matching_true_boxes[..., 4], 'int32') 272 | matching_classes = K.one_hot(matching_classes, num_classes) 273 | classification_loss = (class_scale * detectors_mask * 274 | K.square(matching_classes - pred_class_prob)) 275 | 276 | # Coordinate loss for matching detection boxes. 277 | matching_boxes = matching_true_boxes[..., 0:4] 278 | coordinates_loss = (coordinates_scale * detectors_mask * 279 | K.square(matching_boxes - pred_boxes)) 280 | 281 | confidence_loss_sum = K.sum(confidence_loss) 282 | classification_loss_sum = K.sum(classification_loss) 283 | coordinates_loss_sum = K.sum(coordinates_loss) 284 | total_loss = 0.5 * ( 285 | confidence_loss_sum + classification_loss_sum + coordinates_loss_sum) 286 | if print_loss: 287 | total_loss = tf.Print( 288 | total_loss, [ 289 | total_loss, confidence_loss_sum, classification_loss_sum, 290 | coordinates_loss_sum 291 | ], 292 | message='yolo_loss, conf_loss, class_loss, box_coord_loss:') 293 | 294 | return total_loss 295 | 296 | 297 | def yolo(inputs, anchors, num_classes): 298 | """Generate a complete YOLO_v2 localization model.""" 299 | num_anchors = len(anchors) 300 | body = yolo_body(inputs, num_anchors, num_classes) 301 | outputs = yolo_head(body.output, anchors, num_classes) 302 | return outputs 303 | 304 | 305 | def yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold=.6): 306 | """Filter YOLO boxes based on object and class confidence.""" 307 | 308 | box_scores = box_confidence * box_class_probs 309 | box_classes = K.argmax(box_scores, axis=-1) 310 | box_class_scores = K.max(box_scores, axis=-1) 311 | prediction_mask = box_class_scores >= threshold 312 | 313 | # TODO: Expose tf.boolean_mask to Keras backend? 314 | boxes = tf.boolean_mask(boxes, prediction_mask) 315 | scores = tf.boolean_mask(box_class_scores, prediction_mask) 316 | classes = tf.boolean_mask(box_classes, prediction_mask) 317 | 318 | return boxes, scores, classes 319 | 320 | 321 | def yolo_eval(yolo_outputs, 322 | image_shape, 323 | max_boxes=10, 324 | score_threshold=.6, 325 | iou_threshold=.5): 326 | """Evaluate YOLO model on given input batch and return filtered boxes.""" 327 | box_confidence, box_xy, box_wh, box_class_probs = yolo_outputs 328 | boxes = yolo_boxes_to_corners(box_xy, box_wh) 329 | boxes, scores, classes = yolo_filter_boxes( 330 | box_confidence, boxes, box_class_probs, threshold=score_threshold) 331 | 332 | # Scale boxes back to original image shape. 333 | height = image_shape[0] 334 | width = image_shape[1] 335 | image_dims = K.stack([height, width, height, width]) 336 | image_dims = K.reshape(image_dims, [1, 4]) 337 | boxes = boxes * image_dims 338 | 339 | # TODO: Something must be done about this ugly hack! 340 | max_boxes_tensor = K.variable(max_boxes, dtype='int32') 341 | K.get_session().run(tf.variables_initializer([max_boxes_tensor])) 342 | nms_index = tf.image.non_max_suppression( 343 | boxes, scores, max_boxes_tensor, iou_threshold=iou_threshold) 344 | boxes = K.gather(boxes, nms_index) 345 | scores = K.gather(scores, nms_index) 346 | classes = K.gather(classes, nms_index) 347 | 348 | return boxes, scores, classes 349 | 350 | 351 | def preprocess_true_boxes(true_boxes, anchors, image_size): 352 | """Find detector in YOLO where ground truth box should appear. 353 | 354 | Parameters 355 | ---------- 356 | true_boxes : array 357 | List of ground truth boxes in form of relative x, y, w, h, class. 358 | Relative coordinates are in the range [0, 1] indicating a percentage 359 | of the original image dimensions. 360 | anchors : array 361 | List of anchors in form of w, h. 362 | Anchors are assumed to be in the range [0, conv_size] where conv_size 363 | is the spatial dimension of the final convolutional features. 364 | image_size : array-like 365 | List of image dimensions in form of h, w in pixels. 366 | 367 | Returns 368 | ------- 369 | detectors_mask : array 370 | 0/1 mask for detectors in [conv_height, conv_width, num_anchors, 1] 371 | that should be compared with a matching ground truth box. 372 | matching_true_boxes: array 373 | Same shape as detectors_mask with the corresponding ground truth box 374 | adjusted for comparison with predicted parameters at training time. 375 | """ 376 | height, width = image_size 377 | num_anchors = len(anchors) 378 | # Downsampling factor of 5x 2-stride max_pools == 32. 379 | # TODO: Remove hardcoding of downscaling calculations. 380 | assert height % 32 == 0, 'Image sizes in YOLO_v2 must be multiples of 32.' 381 | assert width % 32 == 0, 'Image sizes in YOLO_v2 must be multiples of 32.' 382 | conv_height = height // 32 383 | conv_width = width // 32 384 | num_box_params = true_boxes.shape[1] 385 | detectors_mask = np.zeros( 386 | (conv_height, conv_width, num_anchors, 1), dtype=np.float32) 387 | matching_true_boxes = np.zeros( 388 | (conv_height, conv_width, num_anchors, num_box_params), 389 | dtype=np.float32) 390 | 391 | for box in true_boxes: 392 | # scale box to convolutional feature spatial dimensions 393 | box_class = box[4:5] 394 | box = box[0:4] * np.array( 395 | [conv_width, conv_height, conv_width, conv_height]) 396 | i = np.floor(box[1]).astype('int') 397 | j = min(np.floor(box[0]).astype('int'),1) 398 | best_iou = 0 399 | best_anchor = 0 400 | 401 | for k, anchor in enumerate(anchors): 402 | # Find IOU between box shifted to origin and anchor box. 403 | box_maxes = box[2:4] / 2. 404 | box_mins = -box_maxes 405 | anchor_maxes = (anchor / 2.) 406 | anchor_mins = -anchor_maxes 407 | 408 | intersect_mins = np.maximum(box_mins, anchor_mins) 409 | intersect_maxes = np.minimum(box_maxes, anchor_maxes) 410 | intersect_wh = np.maximum(intersect_maxes - intersect_mins, 0.) 411 | intersect_area = intersect_wh[0] * intersect_wh[1] 412 | box_area = box[2] * box[3] 413 | anchor_area = anchor[0] * anchor[1] 414 | iou = intersect_area / (box_area + anchor_area - intersect_area) 415 | if iou > best_iou: 416 | best_iou = iou 417 | best_anchor = k 418 | 419 | if best_iou > 0: 420 | detectors_mask[i, j, best_anchor] = 1 421 | adjusted_box = np.array( 422 | [ 423 | box[0] - j, box[1] - i, 424 | np.log(box[2] / anchors[best_anchor][0]), 425 | np.log(box[3] / anchors[best_anchor][1]), box_class 426 | ], 427 | dtype=np.float32) 428 | matching_true_boxes[i, j, best_anchor] = adjusted_box 429 | return detectors_mask, matching_true_boxes 430 | -------------------------------------------------------------------------------- /keras_video_object_detector/library/yad2k/utils/__init__.py: -------------------------------------------------------------------------------- 1 | from .utils import * 2 | -------------------------------------------------------------------------------- /keras_video_object_detector/library/yad2k/utils/utils.py: -------------------------------------------------------------------------------- 1 | """Miscellaneous utility functions.""" 2 | 3 | from functools import reduce 4 | 5 | 6 | def compose(*funcs): 7 | """Compose arbitrarily many functions, evaluated left to right. 8 | 9 | Reference: https://mathieularose.com/function-composition-in-python/ 10 | """ 11 | # return lambda x: reduce(lambda v, f: f(v), funcs, x) 12 | if funcs: 13 | return reduce(lambda f, g: lambda *a, **kw: g(f(*a, **kw)), funcs) 14 | else: 15 | raise ValueError('Composition of empty sequence not supported.') 16 | -------------------------------------------------------------------------------- /keras_video_object_detector/library/yolo.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | 4 | import cv2 5 | import matplotlib.pyplot as plt 6 | from PIL import Image 7 | from keras.models import load_model 8 | from matplotlib.pyplot import imshow 9 | import scipy.io 10 | import scipy.misc 11 | import numpy as np 12 | import pandas as pd 13 | import PIL 14 | import tensorflow as tf 15 | from keras import backend as K 16 | from keras.layers import Input, Lambda, Conv2D 17 | 18 | from keras_video_object_detector.library.download_utils import download_file 19 | from keras_video_object_detector.library.video_utils import extract_images 20 | from keras_video_object_detector.library.yolo_utils import read_classes, read_anchors, generate_colors, \ 21 | preprocess_image, \ 22 | draw_boxes, scale_boxes, preprocess_image_data 23 | from keras_video_object_detector.library.yad2k.models.keras_yolo import yolo_head, yolo_boxes_to_corners, \ 24 | preprocess_true_boxes, yolo_loss, yolo_body 25 | 26 | 27 | def yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold=.6): 28 | """Filters YOLO boxes by thresholding on object and class confidence. 29 | 30 | Arguments: 31 | box_confidence -- tensor of shape (19, 19, 5, 1) 32 | boxes -- tensor of shape (19, 19, 5, 4) 33 | box_class_probs -- tensor of shape (19, 19, 5, 80) 34 | threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box 35 | 36 | Returns: 37 | scores -- tensor of shape (None,), containing the class probability score for selected boxes 38 | boxes -- tensor of shape (None, 4), containing (b_x, b_y, b_h, b_w) coordinates of selected boxes 39 | classes -- tensor of shape (None,), containing the index of the class detected by the selected boxes 40 | 41 | Note: "None" is here because you don't know the exact number of selected boxes, as it depends on the threshold. 42 | For example, the actual output size of scores would be (10,) if there are 10 boxes. 43 | """ 44 | 45 | # Step 1: Compute box scores 46 | box_scores = box_confidence * box_class_probs 47 | 48 | # Step 2: Find the box_classes thanks to the max box_scores, keep track of the corresponding score 49 | box_classes = K.argmax(box_scores, axis=-1) 50 | box_class_scores = K.max(box_scores, axis=-1) 51 | 52 | # Step 3: Create a filtering mask based on "box_class_scores" by using "threshold". The mask should have the 53 | # same dimension as box_class_scores, and be True for the boxes you want to keep (with probability >= threshold) 54 | filtering_mask = box_class_scores > threshold 55 | 56 | # Step 4: Apply the mask to scores, boxes and classes 57 | scores = tf.boolean_mask(box_class_scores, filtering_mask) 58 | boxes = tf.boolean_mask(boxes, filtering_mask) 59 | classes = tf.boolean_mask(box_classes, filtering_mask) 60 | 61 | return scores, boxes, classes 62 | 63 | 64 | def yolo_filter_boxes_test(): 65 | with tf.Session() as test_a: 66 | box_confidence = tf.random_normal([19, 19, 5, 1], mean=1, stddev=4, seed=1) 67 | boxes = tf.random_normal([19, 19, 5, 4], mean=1, stddev=4, seed=1) 68 | box_class_probs = tf.random_normal([19, 19, 5, 80], mean=1, stddev=4, seed=1) 69 | scores, boxes, classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold=0.5) 70 | print("scores[2] = " + str(scores[2].eval())) 71 | print("boxes[2] = " + str(boxes[2].eval())) 72 | print("classes[2] = " + str(classes[2].eval())) 73 | print("scores.shape = " + str(scores.shape)) 74 | print("boxes.shape = " + str(boxes.shape)) 75 | print("classes.shape = " + str(classes.shape)) 76 | 77 | 78 | def iou(box1, box2): 79 | """Implement the intersection over union (IoU) between box1 and box2 80 | 81 | Arguments: 82 | box1 -- first box, list object with coordinates (x1, y1, x2, y2) 83 | box2 -- second box, list object with coordinates (x1, y1, x2, y2) 84 | """ 85 | 86 | # Calculate the (y1, x1, y2, x2) coordinates of the intersection of box1 and box2. Calculate its Area. 87 | xi1 = max(box1[0], box2[0]) 88 | yi1 = max(box1[1], box2[1]) 89 | xi2 = min(box1[2], box2[2]) 90 | yi2 = min(box1[3], box2[3]) 91 | inter_area = (xi2 - xi1) * (yi2 - yi1) 92 | 93 | # Calculate the Union area by using Formula: Union(A,B) = A + B - Inter(A,B) 94 | box1_area = (box1[2] - box1[0]) * (box1[3] - box1[1]) 95 | box2_area = (box2[2] - box2[0]) * (box2[3] - box2[1]) 96 | union_area = box1_area + box2_area - inter_area 97 | 98 | # compute the IoU 99 | iou = inter_area / union_area 100 | 101 | return iou 102 | 103 | 104 | def iou_test(): 105 | box1 = (2, 1, 4, 3) 106 | box2 = (1, 2, 3, 4) 107 | print("iou = " + str(iou(box1, box2))) 108 | 109 | 110 | def yolo_non_max_suppression(scores, boxes, classes, max_boxes=10, iou_threshold=0.5): 111 | """ 112 | Applies Non-max suppression (NMS) to set of boxes 113 | 114 | Arguments: 115 | scores -- tensor of shape (None,), output of yolo_filter_boxes() 116 | boxes -- tensor of shape (None, 4), output of yolo_filter_boxes() that have been scaled to the image size (see later) 117 | classes -- tensor of shape (None,), output of yolo_filter_boxes() 118 | max_boxes -- integer, maximum number of predicted boxes you'd like 119 | iou_threshold -- real value, "intersection over union" threshold used for NMS filtering 120 | 121 | Returns: 122 | scores -- tensor of shape (, None), predicted score for each box 123 | boxes -- tensor of shape (4, None), predicted box coordinates 124 | classes -- tensor of shape (, None), predicted class for each box 125 | 126 | Note: The "None" dimension of the output tensors has obviously to be less than max_boxes. Note also that this 127 | function will transpose the shapes of scores, boxes, classes. This is made for convenience. 128 | """ 129 | 130 | max_boxes_tensor = K.variable(max_boxes, dtype='int32') # tensor to be used in tf.image.non_max_suppression() 131 | K.get_session().run(tf.variables_initializer([max_boxes_tensor])) # initialize variable max_boxes_tensor 132 | 133 | # Use tf.image.non_max_suppression() to get the list of indices corresponding to boxes you keep 134 | nms_indices = tf.image.non_max_suppression(boxes, scores, max_boxes_tensor, iou_threshold) 135 | 136 | # Use K.gather() to select only nms_indices from scores, boxes and classes 137 | scores = K.gather(scores, nms_indices) 138 | boxes = K.gather(boxes, nms_indices) 139 | classes = K.gather(classes, nms_indices) 140 | 141 | return scores, boxes, classes 142 | 143 | 144 | def yolo_non_max_suppression_test(): 145 | with tf.Session() as test_b: 146 | scores = tf.random_normal([54, ], mean=1, stddev=4, seed=1) 147 | boxes = tf.random_normal([54, 4], mean=1, stddev=4, seed=1) 148 | classes = tf.random_normal([54, ], mean=1, stddev=4, seed=1) 149 | scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes) 150 | print("scores[2] = " + str(scores[2].eval())) 151 | print("boxes[2] = " + str(boxes[2].eval())) 152 | print("classes[2] = " + str(classes[2].eval())) 153 | print("scores.shape = " + str(scores.eval().shape)) 154 | print("boxes.shape = " + str(boxes.eval().shape)) 155 | print("classes.shape = " + str(classes.eval().shape)) 156 | 157 | 158 | def yolo_eval(yolo_outputs, image_shape=(720., 1280.), max_boxes=10, score_threshold=.6, iou_threshold=.5): 159 | """ 160 | Converts the output of YOLO encoding (a lot of boxes) to your predicted boxes along with their scores, box coordinates and classes. 161 | 162 | Arguments: 163 | yolo_outputs -- output of the encoding model (for image_shape of (608, 608, 3)), contains 4 tensors: 164 | box_confidence: tensor of shape (None, 19, 19, 5, 1) 165 | box_xy: tensor of shape (None, 19, 19, 5, 2) 166 | box_wh: tensor of shape (None, 19, 19, 5, 2) 167 | box_class_probs: tensor of shape (None, 19, 19, 5, 80) 168 | image_shape -- tensor of shape (2,) containing the input shape, in this notebook we use (608., 608.) (has to be float32 dtype) 169 | max_boxes -- integer, maximum number of predicted boxes you'd like 170 | score_threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box 171 | iou_threshold -- real value, "intersection over union" threshold used for NMS filtering 172 | 173 | Returns: 174 | scores -- tensor of shape (None, ), predicted score for each box 175 | boxes -- tensor of shape (None, 4), predicted box coordinates 176 | classes -- tensor of shape (None,), predicted class for each box 177 | """ 178 | 179 | # Retrieve outputs of the YOLO model (≈1 line) 180 | box_confidence, box_xy, box_wh, box_class_probs = yolo_outputs 181 | 182 | # Convert boxes to be ready for filtering functions 183 | boxes = yolo_boxes_to_corners(box_xy, box_wh) 184 | 185 | # Use one of the functions you've implemented to perform Score-filtering with a threshold of score_threshold (≈1 line) 186 | scores, boxes, classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold=score_threshold) 187 | 188 | # Scale boxes back to original image shape. 189 | boxes = scale_boxes(boxes, image_shape) 190 | 191 | # Use one of the functions you've implemented to perform Non-max suppression with a threshold of iou_threshold (≈1 line) 192 | scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes, max_boxes=max_boxes, 193 | iou_threshold=iou_threshold) 194 | 195 | return scores, boxes, classes 196 | 197 | 198 | def yolo_eval_test(): 199 | with tf.Session() as test_b: 200 | yolo_outputs = (tf.random_normal([19, 19, 5, 1], mean=1, stddev=4, seed=1), 201 | tf.random_normal([19, 19, 5, 2], mean=1, stddev=4, seed=1), 202 | tf.random_normal([19, 19, 5, 2], mean=1, stddev=4, seed=1), 203 | tf.random_normal([19, 19, 5, 80], mean=1, stddev=4, seed=1)) 204 | scores, boxes, classes = yolo_eval(yolo_outputs) 205 | print("scores[2] = " + str(scores[2].eval())) 206 | print("boxes[2] = " + str(boxes[2].eval())) 207 | print("classes[2] = " + str(classes[2].eval())) 208 | print("scores.shape = " + str(scores.eval().shape)) 209 | print("boxes.shape = " + str(boxes.eval().shape)) 210 | print("classes.shape = " + str(classes.eval().shape)) 211 | 212 | 213 | class YoloObjectDetector(object): 214 | 215 | def __init__(self, frame_width=None, frame_height=None): 216 | if frame_width is None: 217 | frame_width = 1280 218 | if frame_height is None: 219 | frame_height = 720 220 | self.frame_width = frame_width 221 | self.frame_height = frame_height 222 | 223 | self.scores = None 224 | self.boxes = None 225 | self.classes = None 226 | self.yolo_model = None 227 | self.sess = K.get_session() 228 | self.class_names = None 229 | self.anchors = None 230 | self.image_shape = (float(self.frame_height), float(self.frame_width)) 231 | self.yolo_outputs = None 232 | 233 | def load_model(self, model_dir_path): 234 | self.class_names = read_classes(model_dir_path + "/coco_classes.txt") 235 | self.anchors = read_anchors(model_dir_path + "/yolo_anchors.txt") 236 | 237 | yolo_model_file = model_dir_path + "/yolo.h5" 238 | yolo_model_file_download_link = 'https://www.dropbox.com/s/krwz5xtpuorah48/yolo.h5?dl=1' 239 | download_file(yolo_model_file, url_path=yolo_model_file_download_link) 240 | self.yolo_model = load_model(yolo_model_file) 241 | print(self.yolo_model.summary()) 242 | 243 | # The output of yolo_model is a (m, 19, 19, 5, 85) tensor that needs to pass through non-trivial 244 | # processing and conversion. 245 | self.yolo_outputs = yolo_head(self.yolo_model.output, self.anchors, len(self.class_names)) 246 | 247 | # yolo_outputs gave you all the predicted boxes of yolo_model in the correct format. 248 | # You're now ready to perform filtering and select only the best boxes. 249 | self.scores, self.boxes, self.classes = yolo_eval(self.yolo_outputs, self.image_shape) 250 | 251 | def predict_objects_in_image(self, image_file): 252 | """ 253 | Runs the graph stored in "sess" to predict boxes for "image_file". Prints and plots the preditions. 254 | 255 | Arguments: 256 | sess -- your tensorflow/Keras session containing the YOLO graph 257 | image_file -- name of an image stored in the "images" folder. 258 | 259 | Returns: 260 | out_scores -- tensor of shape (None, ), scores of the predicted boxes 261 | out_boxes -- tensor of shape (None, 4), coordinates of the predicted boxes 262 | out_classes -- tensor of shape (None, ), class index of the predicted boxes 263 | 264 | Note: "None" actually represents the number of predicted boxes, it varies between 0 and max_boxes. 265 | """ 266 | 267 | # Preprocess your image 268 | image, image_data = preprocess_image(image_file, model_image_size=(608, 608)) 269 | 270 | # Run the session with the correct tensors and choose the correct placeholders in the feed_dict. 271 | out_scores, out_boxes, out_classes = self.sess.run([self.scores, self.boxes, self.classes], 272 | feed_dict={self.yolo_model.input: image_data, 273 | K.learning_phase(): 0 274 | }) 275 | 276 | return [image, out_scores, out_boxes, out_classes] 277 | 278 | def predict_objects_in_image_frame(self, image): 279 | """ 280 | Runs the graph stored in "sess" to predict boxes for "image_file". Prints and plots the preditions. 281 | 282 | Arguments: 283 | sess -- your tensorflow/Keras session containing the YOLO graph 284 | image_file -- name of an image stored in the "images" folder. 285 | 286 | Returns: 287 | out_scores -- tensor of shape (None, ), scores of the predicted boxes 288 | out_boxes -- tensor of shape (None, 4), coordinates of the predicted boxes 289 | out_classes -- tensor of shape (None, ), class index of the predicted boxes 290 | 291 | Note: "None" actually represents the number of predicted boxes, it varies between 0 and max_boxes. 292 | """ 293 | 294 | # Preprocess your image 295 | model_image_size = (608, 608) 296 | resized_image = image.resize(tuple(reversed(model_image_size)), Image.BICUBIC) 297 | image_scaled, image_data = preprocess_image_data(resized_image) 298 | 299 | # Run the session with the correct tensors and choose the correct placeholders in the feed_dict. 300 | out_scores, out_boxes, out_classes = self.sess.run([self.scores, self.boxes, self.classes], 301 | feed_dict={self.yolo_model.input: image_data, 302 | K.learning_phase(): 0 303 | }) 304 | 305 | return [image, out_scores, out_boxes, out_classes] 306 | 307 | def detect_objects_in_video(self, video_file_path, output_video_path, temp_image_folder=None): 308 | if temp_image_folder is None: 309 | temp_image_folder = 'temp_images' 310 | 311 | if not os.path.exists(temp_image_folder): 312 | os.mkdir(temp_image_folder) 313 | 314 | source_image_folder = temp_image_folder + os.path.sep + 'source' 315 | target_image_folder = temp_image_folder + os.path.sep + 'output' 316 | 317 | if not os.path.exists(source_image_folder): 318 | os.mkdir(source_image_folder) 319 | 320 | if not os.path.exists(target_image_folder): 321 | os.mkdir(target_image_folder) 322 | 323 | files_to_delete = [] 324 | for f in os.listdir(source_image_folder): 325 | image_file = source_image_folder + os.path.sep + f 326 | if os.path.isfile(image_file) and image_file.endswith('.jpg'): 327 | files_to_delete.append(image_file) 328 | 329 | for image_file in files_to_delete: 330 | os.remove(image_file) 331 | 332 | frames_per_second = 5 333 | extract_images(video_file_path, source_image_folder, image_shape=(self.frame_width, self.frame_height), 334 | frames_per_second=frames_per_second) 335 | 336 | _fourcc = cv2.VideoWriter.fourcc(*'MP4V') 337 | out = cv2.VideoWriter(output_video_path, _fourcc, frames_per_second, (self.frame_width, self.frame_height)) 338 | 339 | result = [] 340 | 341 | for f in os.listdir(source_image_folder): 342 | image_file = source_image_folder + os.path.sep + f 343 | 344 | if os.path.isfile(image_file) and image_file.endswith('.jpg'): 345 | image, out_scores, out_boxes, out_classes = self.predict_objects_in_image(image_file) 346 | # Print predictions info 347 | print('Found {} boxes for {}'.format(len(out_boxes), image_file)) 348 | # Generate colors for drawing bounding boxes. 349 | colors = generate_colors(self.class_names) 350 | # Draw bounding boxes on the image file 351 | draw_boxes(image, out_scores, out_boxes, out_classes, self.class_names, colors) 352 | # Save the predicted bounding box on the image 353 | output_image_file = target_image_folder + os.path.sep + f 354 | image.save(output_image_file, quality=90) 355 | out.write(np.array(image)) # Write out frame to video 356 | result.append([f, out_scores, out_boxes, out_classes]) 357 | 358 | out.release() 359 | return result 360 | 361 | def detect_objects_in_camera(self, camera): 362 | while True: 363 | # grab the current frame 364 | (grabbed, frame) = camera.read() 365 | 366 | # check to see if we have reached the end of the 367 | # video 368 | if not grabbed: 369 | break 370 | 371 | cv2_im = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) 372 | pil_im = Image.fromarray(cv2_im) 373 | 374 | image, out_scores, out_boxes, out_classes = self.predict_objects_in_image_frame(pil_im) 375 | # Print predictions info 376 | print('Found {} boxes'.format(len(out_boxes))) 377 | # Generate colors for drawing bounding boxes. 378 | colors = generate_colors(self.class_names) 379 | # Draw bounding boxes on the image file 380 | draw_boxes(image, out_scores, out_boxes, out_classes, self.class_names, colors) 381 | 382 | cv2.imshow("Press q key to quit", np.array(image)) 383 | key = cv2.waitKey(1) & 0xFF 384 | 385 | # if the 'q' key is pressed, stop the loop 386 | if key == ord("q"): 387 | break 388 | 389 | 390 | def main(): 391 | yolo_filter_boxes_test() 392 | iou_test() 393 | yolo_non_max_suppression_test() 394 | yolo_eval_test() 395 | 396 | 397 | if __name__ == '__main__': 398 | main() 399 | -------------------------------------------------------------------------------- /keras_video_object_detector/library/yolo_utils.py: -------------------------------------------------------------------------------- 1 | import colorsys 2 | import imghdr 3 | import os 4 | import random 5 | from keras import backend as K 6 | 7 | import numpy as np 8 | from PIL import Image, ImageDraw, ImageFont 9 | 10 | 11 | def read_classes(classes_path): 12 | with open(classes_path) as f: 13 | class_names = f.readlines() 14 | class_names = [c.strip() for c in class_names] 15 | return class_names 16 | 17 | 18 | def read_anchors(anchors_path): 19 | with open(anchors_path) as f: 20 | anchors = f.readline() 21 | anchors = [float(x) for x in anchors.split(',')] 22 | anchors = np.array(anchors).reshape(-1, 2) 23 | return anchors 24 | 25 | 26 | def generate_colors(class_names): 27 | hsv_tuples = [(x / len(class_names), 1., 1.) for x in range(len(class_names))] 28 | colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples)) 29 | colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), colors)) 30 | random.seed(10101) # Fixed seed for consistent colors across runs. 31 | random.shuffle(colors) # Shuffle colors to decorrelate adjacent classes. 32 | random.seed(None) # Reset seed to default. 33 | return colors 34 | 35 | 36 | def scale_boxes(boxes, image_shape): 37 | """ Scales the predicted boxes in order to be drawable on the image""" 38 | height = image_shape[0] 39 | width = image_shape[1] 40 | image_dims = K.stack([height, width, height, width]) 41 | image_dims = K.reshape(image_dims, [1, 4]) 42 | boxes = boxes * image_dims 43 | return boxes 44 | 45 | 46 | def preprocess_image(img_path, model_image_size): 47 | image_type = imghdr.what(img_path) 48 | image = Image.open(img_path) 49 | resized_image = image.resize(tuple(reversed(model_image_size)), Image.BICUBIC) 50 | image_data = np.array(resized_image, dtype='float32') 51 | image_data /= 255. 52 | image_data = np.expand_dims(image_data, 0) # Add batch dimension. 53 | return image, image_data 54 | 55 | 56 | def preprocess_image_data(image): 57 | image_data = np.array(image, dtype='float32') 58 | image_data /= 255. 59 | image_data = np.expand_dims(image_data, 0) # Add batch dimension. 60 | return image, image_data 61 | 62 | 63 | def draw_boxes(image, out_scores, out_boxes, out_classes, class_names, colors): 64 | font = ImageFont.truetype(font='font/FiraMono-Medium.otf', 65 | size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32')) 66 | thickness = (image.size[0] + image.size[1]) // 300 67 | 68 | for i, c in reversed(list(enumerate(out_classes))): 69 | predicted_class = class_names[c] 70 | box = out_boxes[i] 71 | score = out_scores[i] 72 | 73 | label = '{} {:.2f}'.format(predicted_class, score) 74 | 75 | draw = ImageDraw.Draw(image) 76 | label_size = draw.textsize(label, font) 77 | 78 | top, left, bottom, right = box 79 | top = max(0, np.floor(top + 0.5).astype('int32')) 80 | left = max(0, np.floor(left + 0.5).astype('int32')) 81 | bottom = min(image.size[1], np.floor(bottom + 0.5).astype('int32')) 82 | right = min(image.size[0], np.floor(right + 0.5).astype('int32')) 83 | print(label, (left, top), (right, bottom)) 84 | 85 | if top - label_size[1] >= 0: 86 | text_origin = np.array([left, top - label_size[1]]) 87 | else: 88 | text_origin = np.array([left, top + 1]) 89 | 90 | # My kingdom for a good redistributable image drawing library. 91 | for i in range(thickness): 92 | draw.rectangle([left + i, top + i, right - i, bottom - i], outline=colors[c]) 93 | draw.rectangle([tuple(text_origin), tuple(text_origin + label_size)], fill=colors[c]) 94 | draw.text(text_origin, label, fill=(0, 0, 0), font=font) 95 | del draw -------------------------------------------------------------------------------- /keras_video_object_detector/models/coco_classes.txt: -------------------------------------------------------------------------------- 1 | person 2 | bicycle 3 | car 4 | motorbike 5 | aeroplane 6 | bus 7 | train 8 | truck 9 | boat 10 | traffic light 11 | fire hydrant 12 | stop sign 13 | parking meter 14 | bench 15 | bird 16 | cat 17 | dog 18 | horse 19 | sheep 20 | cow 21 | elephant 22 | bear 23 | zebra 24 | giraffe 25 | backpack 26 | umbrella 27 | handbag 28 | tie 29 | suitcase 30 | frisbee 31 | skis 32 | snowboard 33 | sports ball 34 | kite 35 | baseball bat 36 | baseball glove 37 | skateboard 38 | surfboard 39 | tennis racket 40 | bottle 41 | wine glass 42 | cup 43 | fork 44 | knife 45 | spoon 46 | bowl 47 | banana 48 | apple 49 | sandwich 50 | orange 51 | broccoli 52 | carrot 53 | hot dog 54 | pizza 55 | donut 56 | cake 57 | chair 58 | sofa 59 | pottedplant 60 | bed 61 | diningtable 62 | toilet 63 | tvmonitor 64 | laptop 65 | mouse 66 | remote 67 | keyboard 68 | cell phone 69 | microwave 70 | oven 71 | toaster 72 | sink 73 | refrigerator 74 | book 75 | clock 76 | vase 77 | scissors 78 | teddy bear 79 | hair drier 80 | toothbrush 81 | -------------------------------------------------------------------------------- /keras_video_object_detector/models/object_classes.txt: -------------------------------------------------------------------------------- 1 | car -------------------------------------------------------------------------------- /keras_video_object_detector/models/yolo_anchors.txt: -------------------------------------------------------------------------------- 1 | 0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828 2 | -------------------------------------------------------------------------------- /notes/ReadMe.md: -------------------------------------------------------------------------------- 1 | # References 2 | 3 | * https://github.com/shahariarrabby/deeplearning.ai -------------------------------------------------------------------------------- /notes/evaluation.md: -------------------------------------------------------------------------------- 1 | # Model Evaluation 2 | 3 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | scikit-learn 2 | keras==2.0.8 3 | tensorflow 4 | pandas 5 | numpy 6 | scipy 7 | h5py 8 | matplotlib 9 | pillow 10 | opencv-python -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [aliases] 2 | test=pytest -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup 2 | 3 | setup( 4 | name='keras_video_object_detector', 5 | packages=['keras_video_object_detector'], 6 | include_package_data=True, 7 | install_requires=[ 8 | 'flask', 9 | 'keras', 10 | 'sklearn' 11 | ], 12 | setup_requires=[ 13 | 'pytest-runner', 14 | ], 15 | tests_require=[ 16 | 'pytest', 17 | ], 18 | ) --------------------------------------------------------------------------------