├── .github
    ├── FUNDING.yml
    ├── ISSUE_TEMPLATE
    │   └── bug_report.md
    └── workflows
    │   └── stale.yml
├── .gitignore
├── MaskRCNN Microcontroller Detection.ipynb
├── MaskRCNN Microcontroller Segmentation.ipynb
├── MaskRCNN Using pretrained model.ipynb
├── README.md
├── doc
    ├── detection_example.png
    ├── microcontroller_detection.png
    ├── microcontroller_segmentation.png
    └── visualize_masks.PNG
└── video_detection.py


/.github/FUNDING.yml:
--------------------------------------------------------------------------------
 1 | # These are supported funding model platforms
 2 | 
 3 | github: TannerGilbert
 4 | patreon: gilberttanner
 5 | open_collective: # Replace with a single Open Collective username
 6 | ko_fi: # Replace with a single Ko-fi username
 7 | tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
 8 | community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
 9 | liberapay: # Replace with a single Liberapay username
10 | issuehunt: # Replace with a single IssueHunt username
11 | otechie: # Replace with a single Otechie username
12 | custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']
13 | 


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/bug_report.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | name: Bug report
 3 | about: Create a report to help us improve
 4 | title: ''
 5 | labels: ''
 6 | assignees: ''
 7 | 
 8 | ---
 9 | 
10 | **Describe the bug**
11 | A clear and concise description of what the bug is.
12 | 
13 | **To Reproduce**
14 | Steps to reproduce the behavior:
15 | 1. Go to '...'
16 | 2. Click on '....'
17 | 3. Scroll down to '....'
18 | 4. See error
19 | 
20 | **Expected behavior**
21 | A clear and concise description of what you expected to happen.
22 | 
23 | **Screenshots**
24 | If applicable, add screenshots to help explain your problem.
25 | 
26 | **Desktop (please complete the following information):**
27 |  - OS: [e.g. iOS]
28 |  - Browser [e.g. chrome, safari]
29 |  - Version [e.g. 22]
30 | 
31 | **Smartphone (please complete the following information):**
32 |  - Device: [e.g. iPhone6]
33 |  - OS: [e.g. iOS8.1]
34 |  - Browser [e.g. stock browser, safari]
35 |  - Version [e.g. 22]
36 | 
37 | **Additional context**
38 | Add any other context about the problem here.
39 | 


--------------------------------------------------------------------------------
/.github/workflows/stale.yml:
--------------------------------------------------------------------------------
 1 | name: Mark stale issues and pull requests
 2 | 
 3 | on:
 4 |   schedule:
 5 |   - cron: "0 0 * * *"
 6 | 
 7 | jobs:
 8 |   stale:
 9 | 
10 |     runs-on: ubuntu-latest
11 | 
12 |     steps:
13 |     - uses: actions/stale@v1
14 |       with:
15 |         repo-token: ${{ secrets.GITHUB_TOKEN }}
16 |         stale-pr-message: 'Stale pull request message'
17 |         stale-issue-label: 'no-issue-activity'
18 |         stale-pr-label: 'no-pr-activity'
19 |         stale-issue-message: 'This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days'
20 |         days-before-stale: 30
21 |         days-before-close: 5  
22 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TannerGilbert/MaskRCNN-Object-Detection-and-Segmentation/7e24fa66f5a7ab8861448d0c582e8a10645bdbc9/.gitignore


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # MaskRCNN Object Detection and Segmentation
  2 | ![MaskRCNN Inference example](doc/detection_example.png)
  3 | 
  4 | This repository shows you how to do object detection and instance segmentation with [MaskRCNN in Keras](https://github.com/matterport/Mask_RCNN).
  5 | 
  6 | ## Installation
  7 | 
  8 | 1. Clone the repository 
  9 |     ```git clone https://github.com/matterport/Mask_RCNN```
 10 | 
 11 | 2. Install dependencies
 12 |     ```bash
 13 |     cd Mask_RCNN
 14 |     pip3 install -r requirements.txt
 15 |     ```
 16 | 
 17 | 3. Run setup
 18 |     ```bash
 19 |     python3 setup.py install
 20 |     ```
 21 | 
 22 | 4. (Optional) To train or test on MS COCO install pycocotools from one of these repos. They are forks of the original pycocotools with fixes for Python3 and Windows (the official repo doesn't seem to be active anymore).
 23 |     * Linux: https://github.com/waleedka/coco
 24 |     * Windows: https://github.com/philferriere/cocoapi. You must have the Visual C++ 2015 build tools on your path (see the repo for additional details)
 25 | 
 26 | ## Running a pre-trained model
 27 | 
 28 | To use a pre-trained model for inference, we need to download the weights, create an inference config, and create a MaskRCNN object. 
 29 | 
 30 | For a complete example of how to run a pre-trained model on an image take a look at ["MaskRCNN Using pretrained model.ipynb"](<MaskRCNN Using pretrained model.ipynb>).
 31 | 
 32 | ![](doc/detection_example.png)
 33 | 
 34 | I also created [a python script](video_detection.py) that allows you to run MaskRCNN models on videos or a webcam stream. 
 35 | 
 36 | [![pedestrian detection](https://img.youtube.com/vi/g1-TRixHhls/0.jpg)](https://youtu.be/g1-TRixHhls)
 37 | 
 38 | ## Training a custom object detection model
 39 | 
 40 | MaskRCNN also allows you to train your own custom object detection and instance segmentation models. To train a model you'll need to create a class that loads in your data as well as a training config that defines properties for training. You can find the complete code inside the [MaskRCNN Microcontroller Detection.ipynb](<MaskRCNN Microcontroller Detection.ipynb>) file.
 41 | 
 42 | ### Creating the dataloader class
 43 | 
 44 | As an example I'll use my [Microcontroller Detection dataset](https://www.kaggle.com/tannergi/microcontroller-detection), which was labeled with [labelImg](https://github.com/tzutalin/labelImg).
 45 | 
 46 | The annotation files are in PascalVOC format. So every annotations file looks as follows:
 47 | ```xml
 48 | <annotation>
 49 | 	<folder>object_detection</folder>
 50 | 	<filename>IMG_20181228_101826.jpg</filename>
 51 | 	<path>object_detection/IMG_20181228_101826.jpg</path>
 52 | 	<source>
 53 | 		<database>Unknown</database>
 54 | 	</source>
 55 | 	<size>
 56 | 		<width>800</width>
 57 | 		<height>600</height>
 58 | 		<depth>3</depth>
 59 | 	</size>
 60 | 	<segmented>0</segmented>
 61 | 	<object>
 62 | 		<name>Arduino_Nano</name>
 63 | 		<pose>Unspecified</pose>
 64 | 		<truncated>0</truncated>
 65 | 		<difficult>0</difficult>
 66 | 		<bndbox>
 67 | 			<xmin>317</xmin>
 68 | 			<ymin>265</ymin>
 69 | 			<xmax>556</xmax>
 70 | 			<ymax>342</ymax>
 71 | 		</bndbox>
 72 | 	</object>
 73 | </annotation>
 74 | 
 75 | ```
 76 | 
 77 | The dataloader class has three methods we need to implement:
 78 | * load_dataset()
 79 | * load_mask()
 80 | * image_reference()
 81 | 
 82 | ```python
 83 | class MicrocontrollerDataset(utils.Dataset):
 84 |     def load_dataset(self, dataset_dir):
 85 |         pass
 86 |     
 87 |     def load_mask(self, image_id):
 88 |         pass
 89 |     
 90 |     def image_reference(self, image_id):
 91 |         pass
 92 | ```
 93 | 
 94 | The ```load_dataset``` method will define all the classes and add all the images using the ```add_image``` method. The ```load_mask``` method will load in the masks for a given image and the ```image_reference``` method will return the path to an image given its id.
 95 | 
 96 | For the Microcontroller dataset the dataloader class looks as follows:
 97 | 
 98 | ```python
 99 | class MicrocontrollerDataset(utils.Dataset):
100 |     def load_dataset(self, dataset_dir):
101 |         self.add_class('dataset', 1, 'Raspberry_Pi_3')
102 |         self.add_class('dataset', 2, 'Arduino_Nano')
103 |         self.add_class('dataset', 3, 'ESP8266')
104 |         self.add_class('dataset', 4, 'Heltec_ESP32_Lora')
105 |         
106 |         # find all images
107 |         for i, filename in enumerate(os.listdir(dataset_dir)):
108 |             if '.jpg' in filename:
109 |                 self.add_image('dataset', 
110 |                                image_id=i, 
111 |                                path=os.path.join(dataset_dir, filename), 
112 |                                annotation=os.path.join(dataset_dir, filename.replace('.jpg', '.xml')))
113 |     
114 |     # extract bounding boxes from an annotation file
115 |     def extract_boxes(self, filename):
116 |         # load and parse the file
117 |         tree = ET.parse(filename)
118 |         # get the root of the document
119 |         root = tree.getroot()
120 |         # extract each bounding box
121 |         boxes = []
122 |         classes = []
123 |         for member in root.findall('object'):
124 |             xmin = int(member[4][0].text)
125 |             ymin = int(member[4][1].text)
126 |             xmax = int(member[4][2].text)
127 |             ymax = int(member[4][3].text)
128 |             boxes.append([xmin, ymin, xmax, ymax])
129 |             classes.append(self.class_names.index(member[0].text))
130 |         # extract image dimensions
131 |         width = int(root.find('size')[0].text)
132 |         height = int(root.find('size')[1].text)
133 |         return boxes, classes, width, height
134 |  
135 |     # load the masks for an image
136 |     def load_mask(self, image_id):
137 |         # get details of image
138 |         info = self.image_info[image_id]
139 |         # define box file location
140 |         path = info['annotation']
141 |         # load XML
142 |         boxes, classes, w, h = self.extract_boxes(path)
143 |         # create one array for all masks, each on a different channel
144 |         masks = np.zeros([h, w, len(boxes)], dtype='uint8')
145 |         # create masks
146 |         for i in range(len(boxes)):
147 |             box = boxes[i]
148 |             row_s, row_e = box[1], box[3]
149 |             col_s, col_e = box[0], box[2]
150 |             masks[row_s:row_e, col_s:col_e, i] = 1
151 |         return masks, np.asarray(classes, dtype='int32')
152 |     
153 |     def image_reference(self, image_id):
154 |         info = self.image_info[image_id]
155 |         return info['path']
156 | ```
157 | 
158 | Now that we have the dataloader class we can load in both training and testing set and visualize a few random images and their masks.
159 | 
160 | ```python
161 | # Create training and validation set
162 | # train set
163 | dataset_train = MicrocontrollerDataset()
164 | dataset_train.load_dataset('Microcontroller Detection/train')
165 | dataset_train.prepare()
166 | print('Train: %d' % len(dataset_train.image_ids))
167 |  
168 | # test/val set
169 | dataset_val = MicrocontrollerDataset()
170 | dataset_val.load_dataset('Microcontroller Detection/test')
171 | dataset_val.prepare()
172 | print('Test: %d' % len(dataset_val.image_ids))
173 | 
174 | # Load and display random samples
175 | image_ids = np.random.choice(dataset_train.image_ids, 4)
176 | for image_id in image_ids:
177 |     image = dataset_train.load_image(image_id)
178 |     mask, class_ids = dataset_train.load_mask(image_id)
179 |     visualize.display_top_masks(image, mask, class_ids, dataset_train.class_names)
180 | ```
181 | 
182 | ![](doc/visualize_masks.PNG)
183 | 
184 | ### Creating config object
185 | 
186 | MaskRCNN has a Config class. It defines properties for both training and prediction, including the number of classes, gpu count and the learning rate.
187 | 
188 | You can take a look at the default config using the following code:
189 | ```python
190 | from mrcnn.config import Config
191 | config = Config()
192 | config.display()
193 | ```
194 | 
195 | ```
196 | Configurations:
197 | BACKBONE                       resnet101
198 | BACKBONE_STRIDES               [4, 8, 16, 32, 64]
199 | BATCH_SIZE                     2
200 | BBOX_STD_DEV                   [0.1 0.1 0.2 0.2]
201 | COMPUTE_BACKBONE_SHAPE         None
202 | DETECTION_MAX_INSTANCES        100
203 | DETECTION_MIN_CONFIDENCE       0.7
204 | DETECTION_NMS_THRESHOLD        0.3
205 | FPN_CLASSIF_FC_LAYERS_SIZE     1024
206 | GPU_COUNT                      1
207 | GRADIENT_CLIP_NORM             5.0
208 | IMAGES_PER_GPU                 2
209 | IMAGE_CHANNEL_COUNT            3
210 | IMAGE_MAX_DIM                  1024
211 | IMAGE_META_SIZE                13
212 | IMAGE_MIN_DIM                  800
213 | IMAGE_MIN_SCALE                0
214 | IMAGE_RESIZE_MODE              square
215 | IMAGE_SHAPE                    [1024 1024    3]
216 | LEARNING_MOMENTUM              0.9
217 | LEARNING_RATE                  0.001
218 | LOSS_WEIGHTS                   {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
219 | MASK_POOL_SIZE                 14
220 | MASK_SHAPE                     [28, 28]
221 | MAX_GT_INSTANCES               100
222 | MEAN_PIXEL                     [123.7 116.8 103.9]
223 | MINI_MASK_SHAPE                (56, 56)
224 | NAME                           None
225 | NUM_CLASSES                    1
226 | POOL_SIZE                      7
227 | POST_NMS_ROIS_INFERENCE        1000
228 | POST_NMS_ROIS_TRAINING         2000
229 | PRE_NMS_LIMIT                  6000
230 | ROI_POSITIVE_RATIO             0.33
231 | RPN_ANCHOR_RATIOS              [0.5, 1, 2]
232 | RPN_ANCHOR_SCALES              (32, 64, 128, 256, 512)
233 | RPN_ANCHOR_STRIDE              1
234 | RPN_BBOX_STD_DEV               [0.1 0.1 0.2 0.2]
235 | RPN_NMS_THRESHOLD              0.7
236 | RPN_TRAIN_ANCHORS_PER_IMAGE    256
237 | STEPS_PER_EPOCH                1000
238 | TOP_DOWN_PYRAMID_SIZE          256
239 | TRAIN_BN                       False
240 | TRAIN_ROIS_PER_IMAGE           200
241 | USE_MINI_MASK                  True
242 | USE_RPN_ROIS                   True
243 | VALIDATION_STEPS               50
244 | WEIGHT_DECAY                   0.0001
245 | ```
246 | 
247 | For training we need to change at least two properties. The NAME and the NUM_CLASSES.
248 | 
249 | ```python
250 | class MicrocontrollerConfig(Config):
251 |     # Give the configuration a recognizable name
252 |     NAME = "microcontroller_detection"
253 |     
254 |     NUM_CLASSES = 1 + 4
255 | 
256 |     GPU_COUNT = 1
257 |     IMAGES_PER_GPU = 1
258 | 
259 | config = MicrocontrollerConfig()
260 | config.display()
261 | ```
262 | 
263 | ### Creating and training the model.
264 | 
265 | Now that we have both the Config and Dataset class we can create and train a model using the following code.
266 | 
267 | ```python
268 | # Create model in training mode
269 | model = modellib.MaskRCNN(mode="training", config=config,
270 |                           model_dir=MODEL_DIR)
271 | 
272 | # Which weights to start with?
273 | init_with = "coco"  # imagenet, coco, or last
274 | 
275 | if init_with == "imagenet":
276 |     model.load_weights(model.get_imagenet_weights(), by_name=True)
277 | elif init_with == "coco":
278 |     # Load weights trained on MS COCO, but skip layers that
279 |     # are different due to the different number of classes
280 |     # See README for instructions to download the COCO weights
281 |     model.load_weights(COCO_MODEL_PATH, by_name=True,
282 |                        exclude=["mrcnn_class_logits", "mrcnn_bbox_fc", 
283 |                                 "mrcnn_bbox", "mrcnn_mask"])
284 | elif init_with == "last":
285 |     # Load the last model you trained and continue training
286 |     model.load_weights(model.find_last(), by_name=True)
287 | 
288 | # Train the head branches
289 | # Passing layers="heads" freezes all layers except the head
290 | # layers. You can also pass a regular expression to select
291 | # which layers to train by name pattern.
292 | model.train(dataset_train, dataset_val, 
293 |             learning_rate=config.LEARNING_RATE, 
294 |             epochs=5, 
295 |             layers='heads')
296 | 
297 | # Fine tune all layers
298 | # Passing layers="all" trains all layers. You can also 
299 | # pass a regular expression to select which layers to
300 | # train by name pattern.
301 | model.train(dataset_train, dataset_val, 
302 |             learning_rate=config.LEARNING_RATE / 10,
303 |             epochs=10, 
304 |             layers="all")
305 | ```
306 | 
307 | ![](doc/microcontroller_detection.png)
308 | 
309 | ## Training a custom instance segmentation model
310 | 
311 | For instance segmentation I'll make use of [my Microcontroller instance segmentation data-set](https://github.com/TannerGilbert/Detectron2-Train-a-Instance-Segmentation-Model/blob/master/microcontroller_segmentation_data.zip), which I labeled with [labelme](https://github.com/wkentaro/labelme).
312 | 
313 | Instead of xml files we now have json files with the following format:
314 | ```json
315 | {
316 |   "version": "4.2.9",
317 |   "flags": {},
318 |   "shapes": [
319 |     {
320 |       "label": "Arduino_Nano",
321 |       "points": [
322 |         [
323 |           318.9368770764119,
324 |           307.30897009966776
325 |         ],
326 |         [
327 |           328.4053156146179,
328 |           307.1428571428571
329 |         ],
330 |         [
331 |           323.75415282392026,
332 |           293.0232558139535
333 |         ],
334 |         [
335 |           530.2839116719243,
336 |           269.4006309148265
337 |         ],
338 |         [
339 |           549.2146596858638,
340 |           315.9685863874345
341 |         ],
342 |         [
343 |           339.79057591623035,
344 |           341.0994764397906
345 |         ],
346 |         [
347 |           336.1256544502618,
348 |           327.7486910994764
349 |         ],
350 |         [
351 |           326.1780104712042,
352 |           328.5340314136125
353 |         ]
354 |       ],
355 |       "group_id": null,
356 |       "shape_type": "polygon",
357 |       "flags": {}
358 |     }
359 |   ],
360 |   "imagePath": "IMG_20181228_101826.jpg",
361 |   "imageData": "...",
362 |   "imageHeight": 600,
363 |   "imageWidth": 800
364 | }
365 | ```
366 | 
367 | To load in the polygon annotations we will have to make some changes to the ```load_mask``` methods.
368 | 
369 | ```python
370 | class MicrocontrollerDataset(utils.Dataset):
371 |     def load_dataset(self, dataset_dir):
372 |         self.add_class('dataset', 1, 'Raspberry_Pi_3')
373 |         self.add_class('dataset', 2, 'Arduino_Nano')
374 |         self.add_class('dataset', 3, 'ESP8266')
375 |         self.add_class('dataset', 4, 'Heltec_ESP32_Lora')
376 |         
377 |         # find all images
378 |         for i, filename in enumerate(os.listdir(dataset_dir)):
379 |             if '.jpg' in filename:
380 |                 self.add_image('dataset', 
381 |                                image_id=i, 
382 |                                path=os.path.join(dataset_dir, filename), 
383 |                                annotation=os.path.join(dataset_dir, filename.replace('.jpg', '.json')))
384 |     
385 |     def extract_masks(self, filename):
386 |         json_file = os.path.join(filename)
387 |         with open(json_file) as f:
388 |             img_anns = json.load(f)
389 |             
390 |         masks = np.zeros([600, 800, len(img_anns['shapes'])], dtype='uint8')
391 |         classes = []
392 |         for i, anno in enumerate(img_anns['shapes']):
393 |             mask = np.zeros([600, 800], dtype=np.uint8)
394 |             cv2.fillPoly(mask, np.array([anno['points']], dtype=np.int32), 1)
395 |             masks[:, :, i] = mask
396 |             classes.append(self.class_names.index(anno['label']))
397 |         return masks, classes
398 |  
399 |     # load the masks for an image
400 |     def load_mask(self, image_id):
401 |         # get details of image
402 |         info = self.image_info[image_id]
403 |         # define box file location
404 |         path = info['annotation']
405 |         # load XML
406 |         masks, classes = self.extract_masks(path)
407 |         return masks, np.asarray(classes, dtype='int32')
408 |     
409 |     def image_reference(self, image_id):
410 |         info = self.image_info[image_id]
411 |         return info['path']
412 | ```
413 | 
414 | Everything else stays the same as in the object detection example. You can find the full code in the [MaskRCNN Microcontroller Segmentation file](<MaskRCNN Microcontroller Segmentation.ipynb>).
415 | 
416 | ![](doc/microcontroller_segmentation.png)


--------------------------------------------------------------------------------
/doc/detection_example.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TannerGilbert/MaskRCNN-Object-Detection-and-Segmentation/7e24fa66f5a7ab8861448d0c582e8a10645bdbc9/doc/detection_example.png


--------------------------------------------------------------------------------
/doc/microcontroller_detection.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TannerGilbert/MaskRCNN-Object-Detection-and-Segmentation/7e24fa66f5a7ab8861448d0c582e8a10645bdbc9/doc/microcontroller_detection.png


--------------------------------------------------------------------------------
/doc/microcontroller_segmentation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TannerGilbert/MaskRCNN-Object-Detection-and-Segmentation/7e24fa66f5a7ab8861448d0c582e8a10645bdbc9/doc/microcontroller_segmentation.png


--------------------------------------------------------------------------------
/doc/visualize_masks.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TannerGilbert/MaskRCNN-Object-Detection-and-Segmentation/7e24fa66f5a7ab8861448d0c582e8a10645bdbc9/doc/visualize_masks.PNG


--------------------------------------------------------------------------------
/video_detection.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import sys
  3 | import numpy as np
  4 | np.random.seed(0)
  5 | import matplotlib.pyplot as plt
  6 | import cv2
  7 | import argparse
  8 | 
  9 | # Root directory of the project
 10 | ROOT_DIR = os.path.abspath("../")
 11 | 
 12 | # Import Mask RCNN
 13 | sys.path.append(ROOT_DIR)  # To find local version of the library
 14 | from mrcnn.config import Config
 15 | from mrcnn import utils
 16 | import mrcnn.model as modellib
 17 | from mrcnn import visualize
 18 | from mrcnn.model import log
 19 | 
 20 | sys.path.append(os.path.join(ROOT_DIR, "samples/coco/"))
 21 | import coco
 22 | 
 23 | 
 24 | def apply_mask(image, mask, color, alpha=0.5):
 25 |     """apply mask to image"""
 26 |     for n, c in enumerate(color):
 27 |         image[:, :, n] = np.where(
 28 |             mask == 1,
 29 |             image[:, :, n] * (1 - alpha) + alpha * c,
 30 |             image[:, :, n]
 31 |         )
 32 |     return image
 33 | 
 34 | 
 35 | # based on https://github.com/matterport/Mask_RCNN/blob/master/mrcnn/visualize.py
 36 | # and https://github.com/markjay4k/Mask-RCNN-series/blob/887404d990695a7bf7f180e3ffaee939fbd9a1cf/visualize_cv.py
 37 | def display_instances(image, boxes, masks, class_ids, class_names, scores=None):
 38 |     assert boxes.shape[0] == masks.shape[-1] == class_ids.shape[0]
 39 |     
 40 |     N = boxes.shape[0]
 41 | 
 42 |     colors = colors = [tuple(255 * np.random.rand(3)) for _ in range(N)]
 43 | 
 44 |     for i, c in enumerate(colors):
 45 |         if not np.any(boxes[i]):
 46 |             continue
 47 |         
 48 |         y1, x1, y2, x2 = boxes[i]
 49 |         label = class_names[class_ids[i]]
 50 |         score = scores[i] if scores is not None else None
 51 |         caption = "{} {:.3f}".format(label, score) if score else label
 52 | 
 53 |         # Mask
 54 |         mask = masks[:, :, i]
 55 |         image = apply_mask(image, mask, c)
 56 |         image = cv2.rectangle(image, (x1, y1), (x2, y2), c, 2)
 57 |         image = cv2.putText(image, caption, (x1, y1), cv2.FONT_HERSHEY_COMPLEX, 0.7, c, 2)
 58 |     return image
 59 | 
 60 | 
 61 | class InferenceConfig(coco.CocoConfig):
 62 |     # Set batch size to 1 since we'll be running inference on
 63 |     # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
 64 |     GPU_COUNT = 1
 65 |     IMAGES_PER_GPU = 1
 66 | 
 67 | 
 68 | if __name__ == '__main__':
 69 |     parser = argparse.ArgumentParser(description='MaskRCNN Video Object Detection/Instance Segmentation')
 70 |     parser.add_argument('-v', '--video_path', type=str, default='', help='Path to video. If None camera will be used')
 71 |     parser.add_argument('-sp', '--save_path', type=str, default='', help= 'Path to save the output. If None output won\'t be saved')
 72 |     parser.add_argument('-s', '--show', default=True, action="store_false", help='Show output')
 73 |     args = parser.parse_args()
 74 | 
 75 |     # Directory to save logs and trained model
 76 |     MODEL_DIR = os.path.join(ROOT_DIR, "logs")
 77 | 
 78 |     # Local path to trained weights file
 79 |     COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
 80 |     # Download COCO trained weights from Releases if needed
 81 |     if not os.path.exists(COCO_MODEL_PATH):
 82 |         utils.download_trained_weights(COCO_MODEL_PATH)
 83 | 
 84 |     class_names = ['BG', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
 85 |                'bus', 'train', 'truck', 'boat', 'traffic light',
 86 |                'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird',
 87 |                'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear',
 88 |                'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',
 89 |                'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
 90 |                'kite', 'baseball bat', 'baseball glove', 'skateboard',
 91 |                'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',
 92 |                'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
 93 |                'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
 94 |                'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',
 95 |                'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
 96 |                'keyboard', 'cell phone', 'microwave', 'oven', 'toaster',
 97 |                'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
 98 |                'teddy bear', 'hair drier', 'toothbrush']
 99 |     
100 |     config = InferenceConfig()
101 | 
102 |     # Create model object in inference mode.
103 |     model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)
104 | 
105 |     # Load weights trained on MS-COCO
106 |     model.load_weights(COCO_MODEL_PATH, by_name=True)
107 | 
108 |     if args.video_path != '':
109 |         cap = cv2.VideoCapture(args.video_path)
110 |     else:
111 |         cap = cv2.VideoCapture(0)
112 | 
113 |     if args.save_path:
114 |         width = int(cap.get(3))
115 |         height = int(cap.get(4))
116 |         fps = cap.get(cv2.CAP_PROP_FPS)
117 |         out = cv2.VideoWriter(args.save_path, cv2.VideoWriter_fourcc('M','J','P','G'), fps, (width, height))
118 |     while cap.isOpened():
119 |         ret, image = cap.read()
120 |         results = model.detect([image], verbose=1)
121 |         r = results[0]
122 |         image = display_instances(image, r['rois'], r['masks'], r['class_ids'], class_names, r['scores'])
123 |         if args.show:
124 |             cv2.imshow('MaskRCNN Object Detection/Instance Segmentation', image)
125 |             if cv2.waitKey(1) & 0xFF == ord('q'):
126 |                 break
127 |         if args.save_path:
128 |             out.write(image)
129 |     cap.release()
130 |     if args.save_path:
131 |         out.release()
132 |     cv2.destroyAllWindows()
133 | 


--------------------------------------------------------------------------------