├── .dockerignore
├── .gitignore
├── Dockerfile
├── LICENSE
├── README.md
├── classes.py
├── coco.py
├── config.py
├── docker-entrypoint.sh
├── example_output
    ├── img1_blocked.png
    ├── img2_blocked.png
    ├── img3_blocked.png
    ├── img4_blocked.gif
    ├── img4_blocked.png
    └── img4_labels.png
├── images
    ├── img1.jpg
    ├── img2.jpg
    ├── img3.jpg
    └── img4.jpg
├── model.py
├── person_blocker.py
├── requirements.txt
├── utils.py
└── visualize.py


/.dockerignore:
--------------------------------------------------------------------------------
1 | .git
2 | example_output
3 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | *.h5
2 | __pycache__


--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
 1 | FROM python:slim
 2 | 
 3 | WORKDIR /app
 4 | 
 5 | COPY requirements.txt ./
 6 | 
 7 | RUN apt-get update -qq && \
 8 | 	DEBIAN_FRONTEND=noninteractive apt-get install -qq \
 9 | 		python3-tk \
10 | 		xvfb \
11 | 		curl && \
12 | 	curl -OJL https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5 && \
13 | 	pip install --no-cache-dir -r requirements.txt && \
14 | 	apt-get remove --purge -qq curl && \
15 | 	apt-get autoremove --purge -qq && \
16 | 	apt-get clean -qq && \
17 | 	rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* /init /root/.cache
18 | 
19 | COPY . .
20 | 
21 | WORKDIR /data
22 | 
23 | ENTRYPOINT ["/app/docker-entrypoint.sh"]
24 | CMD ["-n"]
25 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2018 Max Woolf
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 
23 | ---
24 | 
25 | Mask R-CNN
26 | 
27 | The MIT License (MIT)
28 | 
29 | Copyright (c) 2017 Matterport, Inc.
30 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Person Blocker
 2 | 
 3 | ![img4](example_output/img4_blocked.gif)
 4 | 
 5 | A script to automatically "block" people in images (like the [Black Mirror](https://en.wikipedia.org/wiki/Black_Mirror) episode [White Christmas](https://en.wikipedia.org/wiki/White_Christmas_(Black_Mirror))) using [Mask R-CNN](https://github.com/matterport/Mask_RCNN) pretrained on the [MS COCO](https://arxiv.org/abs/1405.0312) dataset. No GPU required!
 6 | 
 7 | But you can block more than just people: up to [80 different types](https://github.com/minimaxir/person-blocker/blob/master/classes.py) of objects can be blocked, including giraffes and busses!
 8 | 
 9 | ## Setup
10 | 
11 | This project relies on a handful of dependencies, use the following command to install your dependencies:
12 | 
13 | ```shell
14 | pip3 install -r requirements.txt
15 | ```
16 | 
17 | _Note_: Depending on your environment, you may need to use `sudo`. You may also want to use virtualenv.
18 | 
19 | ## Usage
20 | 
21 | Person Blocker is used from the command line:
22 | 
23 | ```shell
24 | python3 person_blocker.py -i images/img3.jpg -c '(128, 128, 128)' -o 'bus' 'truck'
25 | ```
26 | 
27 | * `-i/--image`: specifies the image file.
28 | * `-m/--model`: path to the pretrained COCO model weights (default: current directory): if not specified, it will download them automatically to the current directory if not already present (note: the weights are 258 MB!)
29 | * `-c/--color`: color of the mask, in either quote-wrapped hexidecimal or 3-element RGB tuple format. (default: white)
30 | * `-o/--object`: list of types of objects to block (or object IDs of specific objects). You can see the allowable choices of objects to block in `classes.py` or by using the `-names` flag. (default: person)
31 | * `-l/--labeled`: saves a labeled image annotated with detected objects and their object ID.
32 | * `-n/--names`: prints the class options for objects, then exits.
33 | 
34 | The script outputs two images: a static (pun intended) image `person_blocked.png` and an animated image `person_blocked.gif` like the one at the beginning of this README.
35 | 
36 | ## Examples
37 | 
38 | ```shell
39 | python3 person_blocker.py -i images/img1.jpg
40 | ```
41 | 
42 | ![img1](example_output/img1_blocked.png)
43 | 
44 | ```shell
45 | python3 person_blocker.py -i images/img2.jpg -c '#c0392b' -o 'giraffe'
46 | ```
47 | 
48 | ![img2](example_output/img2_blocked.png)
49 | 
50 | ```shell
51 | python3 person_blocker.py -i images/img3.jpg -c '(128, 128, 128)' -o 'bus' 'truck'
52 | ```
53 | 
54 | ![img3](example_output/img3_blocked.png)
55 | 
56 | Blocking specific object(s) requires 2 steps: running in inference mode to get the object IDs for each object, and then blocking those object IDs.
57 | 
58 | ```shell
59 | python3 person_blocker.py -i images/img4.jpg -l
60 | ```
61 | 
62 | ![img4 labels](example_output/img4_labels.png)
63 | 
64 | ```shell
65 | python3 person_blocker.py -i images/img4.jpg -o 1
66 | ```
67 | 
68 | ![img4](example_output/img4_blocked.png)
69 | 
70 | ## Requirements
71 | 
72 | The same requirements as Mask R-CNN:
73 | * Python 3.4+
74 | * TensorFlow 1.3+
75 | * Keras 2.0.8+
76 | * Numpy, skimage, scipy, Pillow, cython, h5py
77 | 
78 | plus matplotlib and imageio
79 | 
80 | ## Maintainer
81 | 
82 | Max Woolf ([@minimaxir](http://minimaxir.com))
83 | 
84 | *Max's open-source projects are supported by his [Patreon](https://www.patreon.com/minimaxir). If you found this project helpful, any monetary contributions to the Patreon are appreciated and will be put to good creative use.*
85 | 
86 | ## License
87 | 
88 | MIT
89 | 
90 | Code used from Mask R-CNN by Matterport, Inc. (MIT-Licensed), with minor alterations and copyright notices retained.
91 | 


--------------------------------------------------------------------------------
/classes.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import coco
 3 | 
 4 | 
 5 | def get_class_names():
 6 |     return np.array(['BG', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
 7 |                      'bus', 'train', 'truck', 'boat', 'traffic light',
 8 |                      'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird',
 9 |                      'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear',
10 |                      'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',
11 |                      'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
12 |                      'kite', 'baseball bat', 'baseball glove', 'skateboard',
13 |                      'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',
14 |                      'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
15 |                      'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
16 |                      'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',
17 |                      'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
18 |                      'keyboard', 'cell phone', 'microwave', 'oven', 'toaster',
19 |                      'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
20 |                      'teddy bear', 'hair drier', 'toothbrush'])
21 | 
22 | 
23 | class InferenceConfig(coco.CocoConfig):
24 |     GPU_COUNT = 1
25 |     IMAGES_PER_GPU = 1
26 | 


--------------------------------------------------------------------------------
/coco.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Mask R-CNN
  3 | Configurations and data loading code for MS COCO.
  4 | 
  5 | Copyright (c) 2017 Matterport, Inc.
  6 | Licensed under the MIT License (see LICENSE for details)
  7 | Written by Waleed Abdulla
  8 | 
  9 | ------------------------------------------------------------
 10 | 
 11 | Usage: import the module (see Jupyter notebooks for examples), or run from
 12 |        the command line as such:
 13 | 
 14 |     # Train a new model starting from pre-trained COCO weights
 15 |     python3 coco.py train --dataset=/path/to/coco/ --model=coco
 16 | 
 17 |     # Train a new model starting from ImageNet weights
 18 |     python3 coco.py train --dataset=/path/to/coco/ --model=imagenet
 19 | 
 20 |     # Continue training a model that you had trained earlier
 21 |     python3 coco.py train --dataset=/path/to/coco/ --model=/path/to/weights.h5
 22 | 
 23 |     # Continue training the last model you trained
 24 |     python3 coco.py train --dataset=/path/to/coco/ --model=last
 25 | 
 26 |     # Run COCO evaluatoin on the last model you trained
 27 |     python3 coco.py evaluate --dataset=/path/to/coco/ --model=last
 28 | """
 29 | 
 30 | import os
 31 | import time
 32 | import numpy as np
 33 | 
 34 | # Download and install the Python COCO tools from https://github.com/waleedka/coco
 35 | # That's a fork from the original https://github.com/pdollar/coco with a bug
 36 | # fix for Python 3.
 37 | # I submitted a pull request https://github.com/cocodataset/cocoapi/pull/50
 38 | # If the PR is merged then use the original repo.
 39 | # Note: Edit PythonAPI/Makefile and replace "python" with "python3".
 40 | # from pycocotools.coco import COCO
 41 | # from pycocotools.cocoeval import COCOeval
 42 | # from pycocotools import mask as maskUtils
 43 | 
 44 | import zipfile
 45 | import urllib.request
 46 | import shutil
 47 | 
 48 | from config import Config
 49 | import utils
 50 | import model as modellib
 51 | 
 52 | # Root directory of the project
 53 | ROOT_DIR = os.getcwd()
 54 | 
 55 | # Path to trained weights file
 56 | COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
 57 | 
 58 | # Directory to save logs and model checkpoints, if not provided
 59 | # through the command line argument --logs
 60 | DEFAULT_LOGS_DIR = os.path.join(ROOT_DIR, "logs")
 61 | DEFAULT_DATASET_YEAR = "2014"
 62 | 
 63 | ############################################################
 64 | #  Configurations
 65 | ############################################################
 66 | 
 67 | 
 68 | class CocoConfig(Config):
 69 |     """Configuration for training on MS COCO.
 70 |     Derives from the base Config class and overrides values specific
 71 |     to the COCO dataset.
 72 |     """
 73 |     # Give the configuration a recognizable name
 74 |     NAME = "coco"
 75 | 
 76 |     # We use a GPU with 12GB memory, which can fit two images.
 77 |     # Adjust down if you use a smaller GPU.
 78 |     IMAGES_PER_GPU = 2
 79 | 
 80 |     # Uncomment to train on 8 GPUs (default is 1)
 81 |     # GPU_COUNT = 8
 82 | 
 83 |     # Number of classes (including background)
 84 |     NUM_CLASSES = 1 + 80  # COCO has 80 classes
 85 | 
 86 | 
 87 | ############################################################
 88 | #  Dataset
 89 | ############################################################
 90 | 
 91 | class CocoDataset(utils.Dataset):
 92 |     def load_coco(self, dataset_dir, subset, year=DEFAULT_DATASET_YEAR, class_ids=None,
 93 |                   class_map=None, return_coco=False, auto_download=False):
 94 |         """Load a subset of the COCO dataset.
 95 |         dataset_dir: The root directory of the COCO dataset.
 96 |         subset: What to load (train, val, minival, valminusminival)
 97 |         year: What dataset year to load (2014, 2017) as a string, not an integer
 98 |         class_ids: If provided, only loads images that have the given classes.
 99 |         class_map: TODO: Not implemented yet. Supports maping classes from
100 |             different datasets to the same class ID.
101 |         return_coco: If True, returns the COCO object.
102 |         auto_download: Automatically download and unzip MS-COCO images and annotations
103 |         """
104 | 
105 |         if auto_download is True:
106 |             self.auto_download(dataset_dir, subset, year)
107 | 
108 |         coco = COCO("{}/annotations/instances_{}{}.json".format(dataset_dir, subset, year))
109 |         if subset == "minival" or subset == "valminusminival":
110 |             subset = "val"
111 |         image_dir = "{}/{}{}".format(dataset_dir, subset, year)
112 | 
113 |         # Load all classes or a subset?
114 |         if not class_ids:
115 |             # All classes
116 |             class_ids = sorted(coco.getCatIds())
117 | 
118 |         # All images or a subset?
119 |         if class_ids:
120 |             image_ids = []
121 |             for id in class_ids:
122 |                 image_ids.extend(list(coco.getImgIds(catIds=[id])))
123 |             # Remove duplicates
124 |             image_ids = list(set(image_ids))
125 |         else:
126 |             # All images
127 |             image_ids = list(coco.imgs.keys())
128 | 
129 |         # Add classes
130 |         for i in class_ids:
131 |             self.add_class("coco", i, coco.loadCats(i)[0]["name"])
132 | 
133 |         # Add images
134 |         for i in image_ids:
135 |             self.add_image(
136 |                 "coco", image_id=i,
137 |                 path=os.path.join(image_dir, coco.imgs[i]['file_name']),
138 |                 width=coco.imgs[i]["width"],
139 |                 height=coco.imgs[i]["height"],
140 |                 annotations=coco.loadAnns(coco.getAnnIds(
141 |                     imgIds=[i], catIds=class_ids, iscrowd=None)))
142 |         if return_coco:
143 |             return coco
144 | 
145 |     def auto_download(self, dataDir, dataType, dataYear):
146 |         """Download the COCO dataset/annotations if requested.
147 |         dataDir: The root directory of the COCO dataset.
148 |         dataType: What to load (train, val, minival, valminusminival)
149 |         dataYear: What dataset year to load (2014, 2017) as a string, not an integer
150 |         Note:
151 |             For 2014, use "train", "val", "minival", or "valminusminival"
152 |             For 2017, only "train" and "val" annotations are available
153 |         """
154 | 
155 |         # Setup paths and file names
156 |         if dataType == "minival" or dataType == "valminusminival":
157 |             imgDir = "{}/{}{}".format(dataDir, "val", dataYear)
158 |             imgZipFile = "{}/{}{}.zip".format(dataDir, "val", dataYear)
159 |             imgURL = "http://images.cocodataset.org/zips/{}{}.zip".format("val", dataYear)
160 |         else:
161 |             imgDir = "{}/{}{}".format(dataDir, dataType, dataYear)
162 |             imgZipFile = "{}/{}{}.zip".format(dataDir, dataType, dataYear)
163 |             imgURL = "http://images.cocodataset.org/zips/{}{}.zip".format(dataType, dataYear)
164 |         # print("Image paths:"); print(imgDir); print(imgZipFile); print(imgURL)
165 | 
166 |         # Create main folder if it doesn't exist yet
167 |         if not os.path.exists(dataDir):
168 |             os.makedirs(dataDir)
169 | 
170 |         # Download images if not available locally
171 |         if not os.path.exists(imgDir):
172 |             os.makedirs(imgDir)
173 |             print("Downloading images to " + imgZipFile + " ...")
174 |             with urllib.request.urlopen(imgURL) as resp, open(imgZipFile, 'wb') as out:
175 |                 shutil.copyfileobj(resp, out)
176 |             print("... done downloading.")
177 |             print("Unzipping " + imgZipFile)
178 |             with zipfile.ZipFile(imgZipFile, "r") as zip_ref:
179 |                 zip_ref.extractall(dataDir)
180 |             print("... done unzipping")
181 |         print("Will use images in " + imgDir)
182 | 
183 |         # Setup annotations data paths
184 |         annDir = "{}/annotations".format(dataDir)
185 |         if dataType == "minival":
186 |             annZipFile = "{}/instances_minival2014.json.zip".format(dataDir)
187 |             annFile = "{}/instances_minival2014.json".format(annDir)
188 |             annURL = "https://dl.dropboxusercontent.com/s/o43o90bna78omob/instances_minival2014.json.zip?dl=0"
189 |             unZipDir = annDir
190 |         elif dataType == "valminusminival":
191 |             annZipFile = "{}/instances_valminusminival2014.json.zip".format(dataDir)
192 |             annFile = "{}/instances_valminusminival2014.json".format(annDir)
193 |             annURL = "https://dl.dropboxusercontent.com/s/s3tw5zcg7395368/instances_valminusminival2014.json.zip?dl=0"
194 |             unZipDir = annDir
195 |         else:
196 |             annZipFile = "{}/annotations_trainval{}.zip".format(dataDir, dataYear)
197 |             annFile = "{}/instances_{}{}.json".format(annDir, dataType, dataYear)
198 |             annURL = "http://images.cocodataset.org/annotations/annotations_trainval{}.zip".format(dataYear)
199 |             unZipDir = dataDir
200 |         # print("Annotations paths:"); print(annDir); print(annFile); print(annZipFile); print(annURL)
201 | 
202 |         # Download annotations if not available locally
203 |         if not os.path.exists(annDir):
204 |             os.makedirs(annDir)
205 |         if not os.path.exists(annFile):
206 |             if not os.path.exists(annZipFile):
207 |                 print("Downloading zipped annotations to " + annZipFile + " ...")
208 |                 with urllib.request.urlopen(annURL) as resp, open(annZipFile, 'wb') as out:
209 |                     shutil.copyfileobj(resp, out)
210 |                 print("... done downloading.")
211 |             print("Unzipping " + annZipFile)
212 |             with zipfile.ZipFile(annZipFile, "r") as zip_ref:
213 |                 zip_ref.extractall(unZipDir)
214 |             print("... done unzipping")
215 |         print("Will use annotations in " + annFile)
216 | 
217 |     def load_mask(self, image_id):
218 |         """Load instance masks for the given image.
219 | 
220 |         Different datasets use different ways to store masks. This
221 |         function converts the different mask format to one format
222 |         in the form of a bitmap [height, width, instances].
223 | 
224 |         Returns:
225 |         masks: A bool array of shape [height, width, instance count] with
226 |             one mask per instance.
227 |         class_ids: a 1D array of class IDs of the instance masks.
228 |         """
229 |         # If not a COCO image, delegate to parent class.
230 |         image_info = self.image_info[image_id]
231 |         if image_info["source"] != "coco":
232 |             return super(CocoDataset, self).load_mask(image_id)
233 | 
234 |         instance_masks = []
235 |         class_ids = []
236 |         annotations = self.image_info[image_id]["annotations"]
237 |         # Build mask of shape [height, width, instance_count] and list
238 |         # of class IDs that correspond to each channel of the mask.
239 |         for annotation in annotations:
240 |             class_id = self.map_source_class_id(
241 |                 "coco.{}".format(annotation['category_id']))
242 |             if class_id:
243 |                 m = self.annToMask(annotation, image_info["height"],
244 |                                    image_info["width"])
245 |                 # Some objects are so small that they're less than 1 pixel area
246 |                 # and end up rounded out. Skip those objects.
247 |                 if m.max() < 1:
248 |                     continue
249 |                 # Is it a crowd? If so, use a negative class ID.
250 |                 if annotation['iscrowd']:
251 |                     # Use negative class ID for crowds
252 |                     class_id *= -1
253 |                     # For crowd masks, annToMask() sometimes returns a mask
254 |                     # smaller than the given dimensions. If so, resize it.
255 |                     if m.shape[0] != image_info["height"] or m.shape[1] != image_info["width"]:
256 |                         m = np.ones([image_info["height"], image_info["width"]], dtype=bool)
257 |                 instance_masks.append(m)
258 |                 class_ids.append(class_id)
259 | 
260 |         # Pack instance masks into an array
261 |         if class_ids:
262 |             mask = np.stack(instance_masks, axis=2)
263 |             class_ids = np.array(class_ids, dtype=np.int32)
264 |             return mask, class_ids
265 |         else:
266 |             # Call super class to return an empty mask
267 |             return super(CocoDataset, self).load_mask(image_id)
268 | 
269 |     def image_reference(self, image_id):
270 |         """Return a link to the image in the COCO Website."""
271 |         info = self.image_info[image_id]
272 |         if info["source"] == "coco":
273 |             return "http://cocodataset.org/#explore?id={}".format(info["id"])
274 |         else:
275 |             super(CocoDataset, self).image_reference(image_id)
276 | 
277 |     # The following two functions are from pycocotools with a few changes.
278 | 
279 |     def annToRLE(self, ann, height, width):
280 |         """
281 |         Convert annotation which can be polygons, uncompressed RLE to RLE.
282 |         :return: binary mask (numpy 2D array)
283 |         """
284 |         segm = ann['segmentation']
285 |         if isinstance(segm, list):
286 |             # polygon -- a single object might consist of multiple parts
287 |             # we merge all parts into one mask rle code
288 |             rles = maskUtils.frPyObjects(segm, height, width)
289 |             rle = maskUtils.merge(rles)
290 |         elif isinstance(segm['counts'], list):
291 |             # uncompressed RLE
292 |             rle = maskUtils.frPyObjects(segm, height, width)
293 |         else:
294 |             # rle
295 |             rle = ann['segmentation']
296 |         return rle
297 | 
298 |     def annToMask(self, ann, height, width):
299 |         """
300 |         Convert annotation which can be polygons, uncompressed RLE, or RLE to binary mask.
301 |         :return: binary mask (numpy 2D array)
302 |         """
303 |         rle = self.annToRLE(ann, height, width)
304 |         m = maskUtils.decode(rle)
305 |         return m
306 | 
307 | 
308 | ############################################################
309 | #  COCO Evaluation
310 | ############################################################
311 | 
312 | def build_coco_results(dataset, image_ids, rois, class_ids, scores, masks):
313 |     """Arrange resutls to match COCO specs in http://cocodataset.org/#format
314 |     """
315 |     # If no results, return an empty list
316 |     if rois is None:
317 |         return []
318 | 
319 |     results = []
320 |     for image_id in image_ids:
321 |         # Loop through detections
322 |         for i in range(rois.shape[0]):
323 |             class_id = class_ids[i]
324 |             score = scores[i]
325 |             bbox = np.around(rois[i], 1)
326 |             mask = masks[:, :, i]
327 | 
328 |             result = {
329 |                 "image_id": image_id,
330 |                 "category_id": dataset.get_source_class_id(class_id, "coco"),
331 |                 "bbox": [bbox[1], bbox[0], bbox[3] - bbox[1], bbox[2] - bbox[0]],
332 |                 "score": score,
333 |                 "segmentation": maskUtils.encode(np.asfortranarray(mask))
334 |             }
335 |             results.append(result)
336 |     return results
337 | 
338 | 
339 | def evaluate_coco(model, dataset, coco, eval_type="bbox", limit=0, image_ids=None):
340 |     """Runs official COCO evaluation.
341 |     dataset: A Dataset object with valiadtion data
342 |     eval_type: "bbox" or "segm" for bounding box or segmentation evaluation
343 |     limit: if not 0, it's the number of images to use for evaluation
344 |     """
345 |     # Pick COCO images from the dataset
346 |     image_ids = image_ids or dataset.image_ids
347 | 
348 |     # Limit to a subset
349 |     if limit:
350 |         image_ids = image_ids[:limit]
351 | 
352 |     # Get corresponding COCO image IDs.
353 |     coco_image_ids = [dataset.image_info[id]["id"] for id in image_ids]
354 | 
355 |     t_prediction = 0
356 |     t_start = time.time()
357 | 
358 |     results = []
359 |     for i, image_id in enumerate(image_ids):
360 |         # Load image
361 |         image = dataset.load_image(image_id)
362 | 
363 |         # Run detection
364 |         t = time.time()
365 |         r = model.detect([image], verbose=0)[0]
366 |         t_prediction += (time.time() - t)
367 | 
368 |         # Convert results to COCO format
369 |         image_results = build_coco_results(dataset, coco_image_ids[i:i + 1],
370 |                                            r["rois"], r["class_ids"],
371 |                                            r["scores"], r["masks"])
372 |         results.extend(image_results)
373 | 
374 |     # Load results. This modifies results with additional attributes.
375 |     coco_results = coco.loadRes(results)
376 | 
377 |     # Evaluate
378 |     cocoEval = COCOeval(coco, coco_results, eval_type)
379 |     cocoEval.params.imgIds = coco_image_ids
380 |     cocoEval.evaluate()
381 |     cocoEval.accumulate()
382 |     cocoEval.summarize()
383 | 
384 |     print("Prediction time: {}. Average {}/image".format(
385 |         t_prediction, t_prediction / len(image_ids)))
386 |     print("Total time: ", time.time() - t_start)
387 | 
388 | 
389 | ############################################################
390 | #  Training
391 | ############################################################
392 | 
393 | 
394 | if __name__ == '__main__':
395 |     import argparse
396 | 
397 |     # Parse command line arguments
398 |     parser = argparse.ArgumentParser(
399 |         description='Train Mask R-CNN on MS COCO.')
400 |     parser.add_argument("command",
401 |                         metavar="<command>",
402 |                         help="'train' or 'evaluate' on MS COCO")
403 |     parser.add_argument('--dataset', required=True,
404 |                         metavar="/path/to/coco/",
405 |                         help='Directory of the MS-COCO dataset')
406 |     parser.add_argument('--year', required=False,
407 |                         default=DEFAULT_DATASET_YEAR,
408 |                         metavar="<year>",
409 |                         help='Year of the MS-COCO dataset (2014 or 2017) (default=2014)')
410 |     parser.add_argument('--model', required=True,
411 |                         metavar="/path/to/weights.h5",
412 |                         help="Path to weights .h5 file or 'coco'")
413 |     parser.add_argument('--logs', required=False,
414 |                         default=DEFAULT_LOGS_DIR,
415 |                         metavar="/path/to/logs/",
416 |                         help='Logs and checkpoints directory (default=logs/)')
417 |     parser.add_argument('--limit', required=False,
418 |                         default=500,
419 |                         metavar="<image count>",
420 |                         help='Images to use for evaluation (default=500)')
421 |     parser.add_argument('--download', required=False,
422 |                         default=False,
423 |                         metavar="<True|False>",
424 |                         help='Automatically download and unzip MS-COCO files (default=False)',
425 |                         type=bool)
426 |     args = parser.parse_args()
427 |     print("Command: ", args.command)
428 |     print("Model: ", args.model)
429 |     print("Dataset: ", args.dataset)
430 |     print("Year: ", args.year)
431 |     print("Logs: ", args.logs)
432 |     print("Auto Download: ", args.download)
433 | 
434 |     # Configurations
435 |     if args.command == "train":
436 |         config = CocoConfig()
437 |     else:
438 |         class InferenceConfig(CocoConfig):
439 |             # Set batch size to 1 since we'll be running inference on
440 |             # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
441 |             GPU_COUNT = 1
442 |             IMAGES_PER_GPU = 1
443 |             DETECTION_MIN_CONFIDENCE = 0
444 |         config = InferenceConfig()
445 |     config.display()
446 | 
447 |     # Create model
448 |     if args.command == "train":
449 |         model = modellib.MaskRCNN(mode="training", config=config,
450 |                                   model_dir=args.logs)
451 |     else:
452 |         model = modellib.MaskRCNN(mode="inference", config=config,
453 |                                   model_dir=args.logs)
454 | 
455 |     # Select weights file to load
456 |     if args.model.lower() == "coco":
457 |         model_path = COCO_MODEL_PATH
458 |     elif args.model.lower() == "last":
459 |         # Find last trained weights
460 |         model_path = model.find_last()[1]
461 |     elif args.model.lower() == "imagenet":
462 |         # Start from ImageNet trained weights
463 |         model_path = model.get_imagenet_weights()
464 |     else:
465 |         model_path = args.model
466 | 
467 |     # Load weights
468 |     print("Loading weights ", model_path)
469 |     model.load_weights(model_path, by_name=True)
470 | 
471 |     # Train or evaluate
472 |     if args.command == "train":
473 |         # Training dataset. Use the training set and 35K from the
474 |         # validation set, as as in the Mask RCNN paper.
475 |         dataset_train = CocoDataset()
476 |         dataset_train.load_coco(args.dataset, "train", year=args.year, auto_download=args.download)
477 |         dataset_train.load_coco(args.dataset, "valminusminival", year=args.year, auto_download=args.download)
478 |         dataset_train.prepare()
479 | 
480 |         # Validation dataset
481 |         dataset_val = CocoDataset()
482 |         dataset_val.load_coco(args.dataset, "minival", year=args.year, auto_download=args.download)
483 |         dataset_val.prepare()
484 | 
485 |         # *** This training schedule is an example. Update to your needs ***
486 | 
487 |         # Training - Stage 1
488 |         print("Training network heads")
489 |         model.train(dataset_train, dataset_val,
490 |                     learning_rate=config.LEARNING_RATE,
491 |                     epochs=40,
492 |                     layers='heads')
493 | 
494 |         # Training - Stage 2
495 |         # Finetune layers from ResNet stage 4 and up
496 |         print("Fine tune Resnet stage 4 and up")
497 |         model.train(dataset_train, dataset_val,
498 |                     learning_rate=config.LEARNING_RATE,
499 |                     epochs=120,
500 |                     layers='4+')
501 | 
502 |         # Training - Stage 3
503 |         # Fine tune all layers
504 |         print("Fine tune all layers")
505 |         model.train(dataset_train, dataset_val,
506 |                     learning_rate=config.LEARNING_RATE / 10,
507 |                     epochs=160,
508 |                     layers='all')
509 | 
510 |     elif args.command == "evaluate":
511 |         # Validation dataset
512 |         dataset_val = CocoDataset()
513 |         coco = dataset_val.load_coco(args.dataset, "minival", year=args.year, return_coco=True, auto_download=args.download)
514 |         dataset_val.prepare()
515 |         print("Running COCO evaluation on {} images.".format(args.limit))
516 |         evaluate_coco(model, dataset_val, coco, "bbox", limit=int(args.limit))
517 |     else:
518 |         print("'{}' is not recognized. "
519 |               "Use 'train' or 'evaluate'".format(args.command))
520 | 


--------------------------------------------------------------------------------
/config.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Mask R-CNN
  3 | Base Configurations class.
  4 | 
  5 | Copyright (c) 2017 Matterport, Inc.
  6 | Licensed under the MIT License (see LICENSE for details)
  7 | Written by Waleed Abdulla
  8 | """
  9 | 
 10 | import math
 11 | import numpy as np
 12 | 
 13 | 
 14 | # Base Configuration Class
 15 | # Don't use this class directly. Instead, sub-class it and override
 16 | # the configurations you need to change.
 17 | 
 18 | class Config(object):
 19 |     """Base configuration class. For custom configurations, create a
 20 |     sub-class that inherits from this one and override properties
 21 |     that need to be changed.
 22 |     """
 23 |     # Name the configurations. For example, 'COCO', 'Experiment 3', ...etc.
 24 |     # Useful if your code needs to do things differently depending on which
 25 |     # experiment is running.
 26 |     NAME = None  # Override in sub-classes
 27 | 
 28 |     # NUMBER OF GPUs to use. For CPU training, use 1
 29 |     GPU_COUNT = 1
 30 | 
 31 |     # Number of images to train with on each GPU. A 12GB GPU can typically
 32 |     # handle 2 images of 1024x1024px.
 33 |     # Adjust based on your GPU memory and image sizes. Use the highest
 34 |     # number that your GPU can handle for best performance.
 35 |     IMAGES_PER_GPU = 2
 36 | 
 37 |     # Number of training steps per epoch
 38 |     # This doesn't need to match the size of the training set. Tensorboard
 39 |     # updates are saved at the end of each epoch, so setting this to a
 40 |     # smaller number means getting more frequent TensorBoard updates.
 41 |     # Validation stats are also calculated at each epoch end and they
 42 |     # might take a while, so don't set this too small to avoid spending
 43 |     # a lot of time on validation stats.
 44 |     STEPS_PER_EPOCH = 1000
 45 | 
 46 |     # Number of validation steps to run at the end of every training epoch.
 47 |     # A bigger number improves accuracy of validation stats, but slows
 48 |     # down the training.
 49 |     VALIDATION_STEPS = 50
 50 | 
 51 |     # Backbone network architecture
 52 |     # Supported values are: resnet50, resnet101
 53 |     BACKBONE = "resnet101"
 54 | 
 55 |     # The strides of each layer of the FPN Pyramid. These values
 56 |     # are based on a Resnet101 backbone.
 57 |     BACKBONE_STRIDES = [4, 8, 16, 32, 64]
 58 | 
 59 |     # Number of classification classes (including background)
 60 |     NUM_CLASSES = 1  # Override in sub-classes
 61 | 
 62 |     # Length of square anchor side in pixels
 63 |     RPN_ANCHOR_SCALES = (32, 64, 128, 256, 512)
 64 | 
 65 |     # Ratios of anchors at each cell (width/height)
 66 |     # A value of 1 represents a square anchor, and 0.5 is a wide anchor
 67 |     RPN_ANCHOR_RATIOS = [0.5, 1, 2]
 68 | 
 69 |     # Anchor stride
 70 |     # If 1 then anchors are created for each cell in the backbone feature map.
 71 |     # If 2, then anchors are created for every other cell, and so on.
 72 |     RPN_ANCHOR_STRIDE = 1
 73 | 
 74 |     # Non-max suppression threshold to filter RPN proposals.
 75 |     # You can reduce this during training to generate more propsals.
 76 |     RPN_NMS_THRESHOLD = 0.7
 77 | 
 78 |     # How many anchors per image to use for RPN training
 79 |     RPN_TRAIN_ANCHORS_PER_IMAGE = 256
 80 | 
 81 |     # ROIs kept after non-maximum supression (training and inference)
 82 |     POST_NMS_ROIS_TRAINING = 2000
 83 |     POST_NMS_ROIS_INFERENCE = 1000
 84 | 
 85 |     # If enabled, resizes instance masks to a smaller size to reduce
 86 |     # memory load. Recommended when using high-resolution images.
 87 |     USE_MINI_MASK = True
 88 |     MINI_MASK_SHAPE = (56, 56)  # (height, width) of the mini-mask
 89 | 
 90 |     # Input image resing
 91 |     # Images are resized such that the smallest side is >= IMAGE_MIN_DIM and
 92 |     # the longest side is <= IMAGE_MAX_DIM. In case both conditions can't
 93 |     # be satisfied together the IMAGE_MAX_DIM is enforced.
 94 |     IMAGE_MIN_DIM = 800
 95 |     IMAGE_MAX_DIM = 1024
 96 |     # If True, pad images with zeros such that they're (max_dim by max_dim)
 97 |     IMAGE_PADDING = True  # currently, the False option is not supported
 98 | 
 99 |     # Image mean (RGB)
100 |     MEAN_PIXEL = np.array([123.7, 116.8, 103.9])
101 | 
102 |     # Number of ROIs per image to feed to classifier/mask heads
103 |     # The Mask RCNN paper uses 512 but often the RPN doesn't generate
104 |     # enough positive proposals to fill this and keep a positive:negative
105 |     # ratio of 1:3. You can increase the number of proposals by adjusting
106 |     # the RPN NMS threshold.
107 |     TRAIN_ROIS_PER_IMAGE = 200
108 | 
109 |     # Percent of positive ROIs used to train classifier/mask heads
110 |     ROI_POSITIVE_RATIO = 0.33
111 | 
112 |     # Pooled ROIs
113 |     POOL_SIZE = 7
114 |     MASK_POOL_SIZE = 14
115 |     MASK_SHAPE = [28, 28]
116 | 
117 |     # Maximum number of ground truth instances to use in one image
118 |     MAX_GT_INSTANCES = 100
119 | 
120 |     # Bounding box refinement standard deviation for RPN and final detections.
121 |     RPN_BBOX_STD_DEV = np.array([0.1, 0.1, 0.2, 0.2])
122 |     BBOX_STD_DEV = np.array([0.1, 0.1, 0.2, 0.2])
123 | 
124 |     # Max number of final detections
125 |     DETECTION_MAX_INSTANCES = 100
126 | 
127 |     # Minimum probability value to accept a detected instance
128 |     # ROIs below this threshold are skipped
129 |     DETECTION_MIN_CONFIDENCE = 0.7
130 | 
131 |     # Non-maximum suppression threshold for detection
132 |     DETECTION_NMS_THRESHOLD = 0.3
133 | 
134 |     # Learning rate and momentum
135 |     # The Mask RCNN paper uses lr=0.02, but on TensorFlow it causes
136 |     # weights to explode. Likely due to differences in optimzer
137 |     # implementation.
138 |     LEARNING_RATE = 0.001
139 |     LEARNING_MOMENTUM = 0.9
140 | 
141 |     # Weight decay regularization
142 |     WEIGHT_DECAY = 0.0001
143 | 
144 |     # Use RPN ROIs or externally generated ROIs for training
145 |     # Keep this True for most situations. Set to False if you want to train
146 |     # the head branches on ROI generated by code rather than the ROIs from
147 |     # the RPN. For example, to debug the classifier head without having to
148 |     # train the RPN.
149 |     USE_RPN_ROIS = True
150 | 
151 |     def __init__(self):
152 |         """Set values of computed attributes."""
153 |         # Effective batch size
154 |         self.BATCH_SIZE = self.IMAGES_PER_GPU * self.GPU_COUNT
155 | 
156 |         # Input image size
157 |         self.IMAGE_SHAPE = np.array(
158 |             [self.IMAGE_MAX_DIM, self.IMAGE_MAX_DIM, 3])
159 | 
160 |         # Compute backbone size from input image size
161 |         self.BACKBONE_SHAPES = np.array(
162 |             [[int(math.ceil(self.IMAGE_SHAPE[0] / stride)),
163 |               int(math.ceil(self.IMAGE_SHAPE[1] / stride))]
164 |              for stride in self.BACKBONE_STRIDES])
165 | 
166 |     def display(self):
167 |         """Display Configuration values."""
168 |         print("\nConfigurations:")
169 |         for a in dir(self):
170 |             if not a.startswith("__") and not callable(getattr(self, a)):
171 |                 print("{:30} {}".format(a, getattr(self, a)))
172 |         print("\n")
173 | 


--------------------------------------------------------------------------------
/docker-entrypoint.sh:
--------------------------------------------------------------------------------
1 | #!/bin/sh
2 | set -e
3 | xvfb-run python -W ignore /app/person_blocker.py -m /app/mask_rcnn_coco.h5 "$@"
4 | 


--------------------------------------------------------------------------------
/example_output/img1_blocked.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/minimaxir/person-blocker/82cc1bab629ff9faf610861bf94660d0131c38ec/example_output/img1_blocked.png


--------------------------------------------------------------------------------
/example_output/img2_blocked.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/minimaxir/person-blocker/82cc1bab629ff9faf610861bf94660d0131c38ec/example_output/img2_blocked.png


--------------------------------------------------------------------------------
/example_output/img3_blocked.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/minimaxir/person-blocker/82cc1bab629ff9faf610861bf94660d0131c38ec/example_output/img3_blocked.png


--------------------------------------------------------------------------------
/example_output/img4_blocked.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/minimaxir/person-blocker/82cc1bab629ff9faf610861bf94660d0131c38ec/example_output/img4_blocked.gif


--------------------------------------------------------------------------------
/example_output/img4_blocked.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/minimaxir/person-blocker/82cc1bab629ff9faf610861bf94660d0131c38ec/example_output/img4_blocked.png


--------------------------------------------------------------------------------
/example_output/img4_labels.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/minimaxir/person-blocker/82cc1bab629ff9faf610861bf94660d0131c38ec/example_output/img4_labels.png


--------------------------------------------------------------------------------
/images/img1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/minimaxir/person-blocker/82cc1bab629ff9faf610861bf94660d0131c38ec/images/img1.jpg


--------------------------------------------------------------------------------
/images/img2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/minimaxir/person-blocker/82cc1bab629ff9faf610861bf94660d0131c38ec/images/img2.jpg


--------------------------------------------------------------------------------
/images/img3.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/minimaxir/person-blocker/82cc1bab629ff9faf610861bf94660d0131c38ec/images/img3.jpg


--------------------------------------------------------------------------------
/images/img4.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/minimaxir/person-blocker/82cc1bab629ff9faf610861bf94660d0131c38ec/images/img4.jpg


--------------------------------------------------------------------------------
/model.py:
--------------------------------------------------------------------------------
   1 | """
   2 | Mask R-CNN
   3 | The main Mask R-CNN model implemenetation.
   4 | 
   5 | Copyright (c) 2017 Matterport, Inc.
   6 | Licensed under the MIT License (see LICENSE for details)
   7 | Written by Waleed Abdulla
   8 | """
   9 | 
  10 | import os
  11 | import sys
  12 | import glob
  13 | import random
  14 | import math
  15 | import datetime
  16 | import itertools
  17 | import json
  18 | import re
  19 | import logging
  20 | from collections import OrderedDict
  21 | import numpy as np
  22 | import scipy.misc
  23 | import tensorflow as tf
  24 | import keras
  25 | import keras.backend as K
  26 | import keras.layers as KL
  27 | import keras.initializers as KI
  28 | import keras.engine as KE
  29 | import keras.models as KM
  30 | 
  31 | import utils
  32 | 
  33 | # Requires TensorFlow 1.3+ and Keras 2.0.8+.
  34 | from distutils.version import LooseVersion
  35 | assert LooseVersion(tf.__version__) >= LooseVersion("1.3")
  36 | assert LooseVersion(keras.__version__) >= LooseVersion('2.0.8')
  37 | 
  38 | 
  39 | ############################################################
  40 | #  Utility Functions
  41 | ############################################################
  42 | 
  43 | def log(text, array=None):
  44 |     """Prints a text message. And, optionally, if a Numpy array is provided it
  45 |     prints it's shape, min, and max values.
  46 |     """
  47 |     if array is not None:
  48 |         text = text.ljust(25)
  49 |         text += ("shape: {:20}  min: {:10.5f}  max: {:10.5f}".format(
  50 |             str(array.shape),
  51 |             array.min() if array.size else "",
  52 |             array.max() if array.size else ""))
  53 |     print(text)
  54 | 
  55 | 
  56 | class BatchNorm(KL.BatchNormalization):
  57 |     """Batch Normalization class. Subclasses the Keras BN class and
  58 |     hardcodes training=False so the BN layer doesn't update
  59 |     during training.
  60 | 
  61 |     Batch normalization has a negative effect on training if batches are small
  62 |     so we disable it here.
  63 |     """
  64 | 
  65 |     def call(self, inputs, training=None):
  66 |         return super(self.__class__, self).call(inputs, training=False)
  67 | 
  68 | 
  69 | ############################################################
  70 | #  Resnet Graph
  71 | ############################################################
  72 | 
  73 | # Code adopted from:
  74 | # https://github.com/fchollet/deep-learning-models/blob/master/resnet50.py
  75 | 
  76 | def identity_block(input_tensor, kernel_size, filters, stage, block,
  77 |                    use_bias=True):
  78 |     """The identity_block is the block that has no conv layer at shortcut
  79 |     # Arguments
  80 |         input_tensor: input tensor
  81 |         kernel_size: defualt 3, the kernel size of middle conv layer at main path
  82 |         filters: list of integers, the nb_filters of 3 conv layer at main path
  83 |         stage: integer, current stage label, used for generating layer names
  84 |         block: 'a','b'..., current block label, used for generating layer names
  85 |     """
  86 |     nb_filter1, nb_filter2, nb_filter3 = filters
  87 |     conv_name_base = 'res' + str(stage) + block + '_branch'
  88 |     bn_name_base = 'bn' + str(stage) + block + '_branch'
  89 | 
  90 |     x = KL.Conv2D(nb_filter1, (1, 1), name=conv_name_base + '2a',
  91 |                   use_bias=use_bias)(input_tensor)
  92 |     x = BatchNorm(axis=3, name=bn_name_base + '2a')(x)
  93 |     x = KL.Activation('relu')(x)
  94 | 
  95 |     x = KL.Conv2D(nb_filter2, (kernel_size, kernel_size), padding='same',
  96 |                   name=conv_name_base + '2b', use_bias=use_bias)(x)
  97 |     x = BatchNorm(axis=3, name=bn_name_base + '2b')(x)
  98 |     x = KL.Activation('relu')(x)
  99 | 
 100 |     x = KL.Conv2D(nb_filter3, (1, 1), name=conv_name_base + '2c',
 101 |                   use_bias=use_bias)(x)
 102 |     x = BatchNorm(axis=3, name=bn_name_base + '2c')(x)
 103 | 
 104 |     x = KL.Add()([x, input_tensor])
 105 |     x = KL.Activation('relu', name='res' + str(stage) + block + '_out')(x)
 106 |     return x
 107 | 
 108 | 
 109 | def conv_block(input_tensor, kernel_size, filters, stage, block,
 110 |                strides=(2, 2), use_bias=True):
 111 |     """conv_block is the block that has a conv layer at shortcut
 112 |     # Arguments
 113 |         input_tensor: input tensor
 114 |         kernel_size: defualt 3, the kernel size of middle conv layer at main path
 115 |         filters: list of integers, the nb_filters of 3 conv layer at main path
 116 |         stage: integer, current stage label, used for generating layer names
 117 |         block: 'a','b'..., current block label, used for generating layer names
 118 |     Note that from stage 3, the first conv layer at main path is with subsample=(2,2)
 119 |     And the shortcut should have subsample=(2,2) as well
 120 |     """
 121 |     nb_filter1, nb_filter2, nb_filter3 = filters
 122 |     conv_name_base = 'res' + str(stage) + block + '_branch'
 123 |     bn_name_base = 'bn' + str(stage) + block + '_branch'
 124 | 
 125 |     x = KL.Conv2D(nb_filter1, (1, 1), strides=strides,
 126 |                   name=conv_name_base + '2a', use_bias=use_bias)(input_tensor)
 127 |     x = BatchNorm(axis=3, name=bn_name_base + '2a')(x)
 128 |     x = KL.Activation('relu')(x)
 129 | 
 130 |     x = KL.Conv2D(nb_filter2, (kernel_size, kernel_size), padding='same',
 131 |                   name=conv_name_base + '2b', use_bias=use_bias)(x)
 132 |     x = BatchNorm(axis=3, name=bn_name_base + '2b')(x)
 133 |     x = KL.Activation('relu')(x)
 134 | 
 135 |     x = KL.Conv2D(nb_filter3, (1, 1), name=conv_name_base +
 136 |                   '2c', use_bias=use_bias)(x)
 137 |     x = BatchNorm(axis=3, name=bn_name_base + '2c')(x)
 138 | 
 139 |     shortcut = KL.Conv2D(nb_filter3, (1, 1), strides=strides,
 140 |                          name=conv_name_base + '1', use_bias=use_bias)(input_tensor)
 141 |     shortcut = BatchNorm(axis=3, name=bn_name_base + '1')(shortcut)
 142 | 
 143 |     x = KL.Add()([x, shortcut])
 144 |     x = KL.Activation('relu', name='res' + str(stage) + block + '_out')(x)
 145 |     return x
 146 | 
 147 | 
 148 | def resnet_graph(input_image, architecture, stage5=False):
 149 |     assert architecture in ["resnet50", "resnet101"]
 150 |     # Stage 1
 151 |     x = KL.ZeroPadding2D((3, 3))(input_image)
 152 |     x = KL.Conv2D(64, (7, 7), strides=(2, 2), name='conv1', use_bias=True)(x)
 153 |     x = BatchNorm(axis=3, name='bn_conv1')(x)
 154 |     x = KL.Activation('relu')(x)
 155 |     C1 = x = KL.MaxPooling2D((3, 3), strides=(2, 2), padding="same")(x)
 156 |     # Stage 2
 157 |     x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1))
 158 |     x = identity_block(x, 3, [64, 64, 256], stage=2, block='b')
 159 |     C2 = x = identity_block(x, 3, [64, 64, 256], stage=2, block='c')
 160 |     # Stage 3
 161 |     x = conv_block(x, 3, [128, 128, 512], stage=3, block='a')
 162 |     x = identity_block(x, 3, [128, 128, 512], stage=3, block='b')
 163 |     x = identity_block(x, 3, [128, 128, 512], stage=3, block='c')
 164 |     C3 = x = identity_block(x, 3, [128, 128, 512], stage=3, block='d')
 165 |     # Stage 4
 166 |     x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a')
 167 |     block_count = {"resnet50": 5, "resnet101": 22}[architecture]
 168 |     for i in range(block_count):
 169 |         x = identity_block(x, 3, [256, 256, 1024], stage=4, block=chr(98 + i))
 170 |     C4 = x
 171 |     # Stage 5
 172 |     if stage5:
 173 |         x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a')
 174 |         x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b')
 175 |         C5 = x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c')
 176 |     else:
 177 |         C5 = None
 178 |     return [C1, C2, C3, C4, C5]
 179 | 
 180 | 
 181 | ############################################################
 182 | #  Proposal Layer
 183 | ############################################################
 184 | 
 185 | def apply_box_deltas_graph(boxes, deltas):
 186 |     """Applies the given deltas to the given boxes.
 187 |     boxes: [N, 4] where each row is y1, x1, y2, x2
 188 |     deltas: [N, 4] where each row is [dy, dx, log(dh), log(dw)]
 189 |     """
 190 |     # Convert to y, x, h, w
 191 |     height = boxes[:, 2] - boxes[:, 0]
 192 |     width = boxes[:, 3] - boxes[:, 1]
 193 |     center_y = boxes[:, 0] + 0.5 * height
 194 |     center_x = boxes[:, 1] + 0.5 * width
 195 |     # Apply deltas
 196 |     center_y += deltas[:, 0] * height
 197 |     center_x += deltas[:, 1] * width
 198 |     height *= tf.exp(deltas[:, 2])
 199 |     width *= tf.exp(deltas[:, 3])
 200 |     # Convert back to y1, x1, y2, x2
 201 |     y1 = center_y - 0.5 * height
 202 |     x1 = center_x - 0.5 * width
 203 |     y2 = y1 + height
 204 |     x2 = x1 + width
 205 |     result = tf.stack([y1, x1, y2, x2], axis=1, name="apply_box_deltas_out")
 206 |     return result
 207 | 
 208 | 
 209 | def clip_boxes_graph(boxes, window):
 210 |     """
 211 |     boxes: [N, 4] each row is y1, x1, y2, x2
 212 |     window: [4] in the form y1, x1, y2, x2
 213 |     """
 214 |     # Split corners
 215 |     wy1, wx1, wy2, wx2 = tf.split(window, 4)
 216 |     y1, x1, y2, x2 = tf.split(boxes, 4, axis=1)
 217 |     # Clip
 218 |     y1 = tf.maximum(tf.minimum(y1, wy2), wy1)
 219 |     x1 = tf.maximum(tf.minimum(x1, wx2), wx1)
 220 |     y2 = tf.maximum(tf.minimum(y2, wy2), wy1)
 221 |     x2 = tf.maximum(tf.minimum(x2, wx2), wx1)
 222 |     clipped = tf.concat([y1, x1, y2, x2], axis=1, name="clipped_boxes")
 223 |     clipped.set_shape((clipped.shape[0], 4))
 224 |     return clipped
 225 | 
 226 | 
 227 | class ProposalLayer(KE.Layer):
 228 |     """Receives anchor scores and selects a subset to pass as proposals
 229 |     to the second stage. Filtering is done based on anchor scores and
 230 |     non-max suppression to remove overlaps. It also applies bounding
 231 |     box refinement deltas to anchors.
 232 | 
 233 |     Inputs:
 234 |         rpn_probs: [batch, anchors, (bg prob, fg prob)]
 235 |         rpn_bbox: [batch, anchors, (dy, dx, log(dh), log(dw))]
 236 | 
 237 |     Returns:
 238 |         Proposals in normalized coordinates [batch, rois, (y1, x1, y2, x2)]
 239 |     """
 240 | 
 241 |     def __init__(self, proposal_count, nms_threshold, anchors,
 242 |                  config=None, **kwargs):
 243 |         """
 244 |         anchors: [N, (y1, x1, y2, x2)] anchors defined in image coordinates
 245 |         """
 246 |         super(ProposalLayer, self).__init__(**kwargs)
 247 |         self.config = config
 248 |         self.proposal_count = proposal_count
 249 |         self.nms_threshold = nms_threshold
 250 |         self.anchors = anchors.astype(np.float32)
 251 | 
 252 |     def call(self, inputs):
 253 |         # Box Scores. Use the foreground class confidence. [Batch, num_rois, 1]
 254 |         scores = inputs[0][:, :, 1]
 255 |         # Box deltas [batch, num_rois, 4]
 256 |         deltas = inputs[1]
 257 |         deltas = deltas * np.reshape(self.config.RPN_BBOX_STD_DEV, [1, 1, 4])
 258 |         # Base anchors
 259 |         anchors = self.anchors
 260 | 
 261 |         # Improve performance by trimming to top anchors by score
 262 |         # and doing the rest on the smaller subset.
 263 |         pre_nms_limit = min(6000, self.anchors.shape[0])
 264 |         ix = tf.nn.top_k(scores, pre_nms_limit, sorted=True,
 265 |                          name="top_anchors").indices
 266 |         scores = utils.batch_slice([scores, ix], lambda x, y: tf.gather(x, y),
 267 |                                    self.config.IMAGES_PER_GPU)
 268 |         deltas = utils.batch_slice([deltas, ix], lambda x, y: tf.gather(x, y),
 269 |                                    self.config.IMAGES_PER_GPU)
 270 |         anchors = utils.batch_slice(ix, lambda x: tf.gather(anchors, x),
 271 |                                     self.config.IMAGES_PER_GPU,
 272 |                                     names=["pre_nms_anchors"])
 273 | 
 274 |         # Apply deltas to anchors to get refined anchors.
 275 |         # [batch, N, (y1, x1, y2, x2)]
 276 |         boxes = utils.batch_slice([anchors, deltas],
 277 |                                   lambda x, y: apply_box_deltas_graph(x, y),
 278 |                                   self.config.IMAGES_PER_GPU,
 279 |                                   names=["refined_anchors"])
 280 | 
 281 |         # Clip to image boundaries. [batch, N, (y1, x1, y2, x2)]
 282 |         height, width = self.config.IMAGE_SHAPE[:2]
 283 |         window = np.array([0, 0, height, width]).astype(np.float32)
 284 |         boxes = utils.batch_slice(boxes,
 285 |                                   lambda x: clip_boxes_graph(x, window),
 286 |                                   self.config.IMAGES_PER_GPU,
 287 |                                   names=["refined_anchors_clipped"])
 288 | 
 289 |         # Filter out small boxes
 290 |         # According to Xinlei Chen's paper, this reduces detection accuracy
 291 |         # for small objects, so we're skipping it.
 292 | 
 293 |         # Normalize dimensions to range of 0 to 1.
 294 |         normalized_boxes = boxes / np.array([[height, width, height, width]])
 295 | 
 296 |         # Non-max suppression
 297 |         def nms(normalized_boxes, scores):
 298 |             indices = tf.image.non_max_suppression(
 299 |                 normalized_boxes, scores, self.proposal_count,
 300 |                 self.nms_threshold, name="rpn_non_max_suppression")
 301 |             proposals = tf.gather(normalized_boxes, indices)
 302 |             # Pad if needed
 303 |             padding = tf.maximum(self.proposal_count - tf.shape(proposals)[0], 0)
 304 |             proposals = tf.pad(proposals, [(0, padding), (0, 0)])
 305 |             return proposals
 306 |         proposals = utils.batch_slice([normalized_boxes, scores], nms,
 307 |                                       self.config.IMAGES_PER_GPU)
 308 |         return proposals
 309 | 
 310 |     def compute_output_shape(self, input_shape):
 311 |         return (None, self.proposal_count, 4)
 312 | 
 313 | 
 314 | ############################################################
 315 | #  ROIAlign Layer
 316 | ############################################################
 317 | 
 318 | def log2_graph(x):
 319 |     """Implementatin of Log2. TF doesn't have a native implemenation."""
 320 |     return tf.log(x) / tf.log(2.0)
 321 | 
 322 | 
 323 | class PyramidROIAlign(KE.Layer):
 324 |     """Implements ROI Pooling on multiple levels of the feature pyramid.
 325 | 
 326 |     Params:
 327 |     - pool_shape: [height, width] of the output pooled regions. Usually [7, 7]
 328 |     - image_shape: [height, width, channels]. Shape of input image in pixels
 329 | 
 330 |     Inputs:
 331 |     - boxes: [batch, num_boxes, (y1, x1, y2, x2)] in normalized
 332 |              coordinates. Possibly padded with zeros if not enough
 333 |              boxes to fill the array.
 334 |     - Feature maps: List of feature maps from different levels of the pyramid.
 335 |                     Each is [batch, height, width, channels]
 336 | 
 337 |     Output:
 338 |     Pooled regions in the shape: [batch, num_boxes, height, width, channels].
 339 |     The width and height are those specific in the pool_shape in the layer
 340 |     constructor.
 341 |     """
 342 | 
 343 |     def __init__(self, pool_shape, image_shape, **kwargs):
 344 |         super(PyramidROIAlign, self).__init__(**kwargs)
 345 |         self.pool_shape = tuple(pool_shape)
 346 |         self.image_shape = tuple(image_shape)
 347 | 
 348 |     def call(self, inputs):
 349 |         # Crop boxes [batch, num_boxes, (y1, x1, y2, x2)] in normalized coords
 350 |         boxes = inputs[0]
 351 | 
 352 |         # Feature Maps. List of feature maps from different level of the
 353 |         # feature pyramid. Each is [batch, height, width, channels]
 354 |         feature_maps = inputs[1:]
 355 | 
 356 |         # Assign each ROI to a level in the pyramid based on the ROI area.
 357 |         y1, x1, y2, x2 = tf.split(boxes, 4, axis=2)
 358 |         h = y2 - y1
 359 |         w = x2 - x1
 360 |         # Equation 1 in the Feature Pyramid Networks paper. Account for
 361 |         # the fact that our coordinates are normalized here.
 362 |         # e.g. a 224x224 ROI (in pixels) maps to P4
 363 |         image_area = tf.cast(
 364 |             self.image_shape[0] * self.image_shape[1], tf.float32)
 365 |         roi_level = log2_graph(tf.sqrt(h * w) / (224.0 / tf.sqrt(image_area)))
 366 |         roi_level = tf.minimum(5, tf.maximum(
 367 |             2, 4 + tf.cast(tf.round(roi_level), tf.int32)))
 368 |         roi_level = tf.squeeze(roi_level, 2)
 369 | 
 370 |         # Loop through levels and apply ROI pooling to each. P2 to P5.
 371 |         pooled = []
 372 |         box_to_level = []
 373 |         for i, level in enumerate(range(2, 6)):
 374 |             ix = tf.where(tf.equal(roi_level, level))
 375 |             level_boxes = tf.gather_nd(boxes, ix)
 376 | 
 377 |             # Box indicies for crop_and_resize.
 378 |             box_indices = tf.cast(ix[:, 0], tf.int32)
 379 | 
 380 |             # Keep track of which box is mapped to which level
 381 |             box_to_level.append(ix)
 382 | 
 383 |             # Stop gradient propogation to ROI proposals
 384 |             level_boxes = tf.stop_gradient(level_boxes)
 385 |             box_indices = tf.stop_gradient(box_indices)
 386 | 
 387 |             # Crop and Resize
 388 |             # From Mask R-CNN paper: "We sample four regular locations, so
 389 |             # that we can evaluate either max or average pooling. In fact,
 390 |             # interpolating only a single value at each bin center (without
 391 |             # pooling) is nearly as effective."
 392 |             #
 393 |             # Here we use the simplified approach of a single value per bin,
 394 |             # which is how it's done in tf.crop_and_resize()
 395 |             # Result: [batch * num_boxes, pool_height, pool_width, channels]
 396 |             pooled.append(tf.image.crop_and_resize(
 397 |                 feature_maps[i], level_boxes, box_indices, self.pool_shape,
 398 |                 method="bilinear"))
 399 | 
 400 |         # Pack pooled features into one tensor
 401 |         pooled = tf.concat(pooled, axis=0)
 402 | 
 403 |         # Pack box_to_level mapping into one array and add another
 404 |         # column representing the order of pooled boxes
 405 |         box_to_level = tf.concat(box_to_level, axis=0)
 406 |         box_range = tf.expand_dims(tf.range(tf.shape(box_to_level)[0]), 1)
 407 |         box_to_level = tf.concat([tf.cast(box_to_level, tf.int32), box_range],
 408 |                                  axis=1)
 409 | 
 410 |         # Rearrange pooled features to match the order of the original boxes
 411 |         # Sort box_to_level by batch then box index
 412 |         # TF doesn't have a way to sort by two columns, so merge them and sort.
 413 |         sorting_tensor = box_to_level[:, 0] * 100000 + box_to_level[:, 1]
 414 |         ix = tf.nn.top_k(sorting_tensor, k=tf.shape(
 415 |             box_to_level)[0]).indices[::-1]
 416 |         ix = tf.gather(box_to_level[:, 2], ix)
 417 |         pooled = tf.gather(pooled, ix)
 418 | 
 419 |         # Re-add the batch dimension
 420 |         pooled = tf.expand_dims(pooled, 0)
 421 |         return pooled
 422 | 
 423 |     def compute_output_shape(self, input_shape):
 424 |         return input_shape[0][:2] + self.pool_shape + (input_shape[1][-1], )
 425 | 
 426 | 
 427 | ############################################################
 428 | #  Detection Target Layer
 429 | ############################################################
 430 | 
 431 | def overlaps_graph(boxes1, boxes2):
 432 |     """Computes IoU overlaps between two sets of boxes.
 433 |     boxes1, boxes2: [N, (y1, x1, y2, x2)].
 434 |     """
 435 |     # 1. Tile boxes2 and repeate boxes1. This allows us to compare
 436 |     # every boxes1 against every boxes2 without loops.
 437 |     # TF doesn't have an equivalent to np.repeate() so simulate it
 438 |     # using tf.tile() and tf.reshape.
 439 |     b1 = tf.reshape(tf.tile(tf.expand_dims(boxes1, 1),
 440 |                             [1, 1, tf.shape(boxes2)[0]]), [-1, 4])
 441 |     b2 = tf.tile(boxes2, [tf.shape(boxes1)[0], 1])
 442 |     # 2. Compute intersections
 443 |     b1_y1, b1_x1, b1_y2, b1_x2 = tf.split(b1, 4, axis=1)
 444 |     b2_y1, b2_x1, b2_y2, b2_x2 = tf.split(b2, 4, axis=1)
 445 |     y1 = tf.maximum(b1_y1, b2_y1)
 446 |     x1 = tf.maximum(b1_x1, b2_x1)
 447 |     y2 = tf.minimum(b1_y2, b2_y2)
 448 |     x2 = tf.minimum(b1_x2, b2_x2)
 449 |     intersection = tf.maximum(x2 - x1, 0) * tf.maximum(y2 - y1, 0)
 450 |     # 3. Compute unions
 451 |     b1_area = (b1_y2 - b1_y1) * (b1_x2 - b1_x1)
 452 |     b2_area = (b2_y2 - b2_y1) * (b2_x2 - b2_x1)
 453 |     union = b1_area + b2_area - intersection
 454 |     # 4. Compute IoU and reshape to [boxes1, boxes2]
 455 |     iou = intersection / union
 456 |     overlaps = tf.reshape(iou, [tf.shape(boxes1)[0], tf.shape(boxes2)[0]])
 457 |     return overlaps
 458 | 
 459 | 
 460 | def detection_targets_graph(proposals, gt_class_ids, gt_boxes, gt_masks, config):
 461 |     """Generates detection targets for one image. Subsamples proposals and
 462 |     generates target class IDs, bounding box deltas, and masks for each.
 463 | 
 464 |     Inputs:
 465 |     proposals: [N, (y1, x1, y2, x2)] in normalized coordinates. Might
 466 |                be zero padded if there are not enough proposals.
 467 |     gt_class_ids: [MAX_GT_INSTANCES] int class IDs
 468 |     gt_boxes: [MAX_GT_INSTANCES, (y1, x1, y2, x2)] in normalized coordinates.
 469 |     gt_masks: [height, width, MAX_GT_INSTANCES] of boolean type.
 470 | 
 471 |     Returns: Target ROIs and corresponding class IDs, bounding box shifts,
 472 |     and masks.
 473 |     rois: [TRAIN_ROIS_PER_IMAGE, (y1, x1, y2, x2)] in normalized coordinates
 474 |     class_ids: [TRAIN_ROIS_PER_IMAGE]. Integer class IDs. Zero padded.
 475 |     deltas: [TRAIN_ROIS_PER_IMAGE, NUM_CLASSES, (dy, dx, log(dh), log(dw))]
 476 |             Class-specific bbox refinements.
 477 |     masks: [TRAIN_ROIS_PER_IMAGE, height, width). Masks cropped to bbox
 478 |            boundaries and resized to neural network output size.
 479 | 
 480 |     Note: Returned arrays might be zero padded if not enough target ROIs.
 481 |     """
 482 |     # Assertions
 483 |     asserts = [
 484 |         tf.Assert(tf.greater(tf.shape(proposals)[0], 0), [proposals],
 485 |                   name="roi_assertion"),
 486 |     ]
 487 |     with tf.control_dependencies(asserts):
 488 |         proposals = tf.identity(proposals)
 489 | 
 490 |     # Remove zero padding
 491 |     proposals, _ = trim_zeros_graph(proposals, name="trim_proposals")
 492 |     gt_boxes, non_zeros = trim_zeros_graph(gt_boxes, name="trim_gt_boxes")
 493 |     gt_class_ids = tf.boolean_mask(gt_class_ids, non_zeros,
 494 |                                    name="trim_gt_class_ids")
 495 |     gt_masks = tf.gather(gt_masks, tf.where(non_zeros)[:, 0], axis=2,
 496 |                          name="trim_gt_masks")
 497 | 
 498 |     # Handle COCO crowds
 499 |     # A crowd box in COCO is a bounding box around several instances. Exclude
 500 |     # them from training. A crowd box is given a negative class ID.
 501 |     crowd_ix = tf.where(gt_class_ids < 0)[:, 0]
 502 |     non_crowd_ix = tf.where(gt_class_ids > 0)[:, 0]
 503 |     crowd_boxes = tf.gather(gt_boxes, crowd_ix)
 504 |     crowd_masks = tf.gather(gt_masks, crowd_ix, axis=2)
 505 |     gt_class_ids = tf.gather(gt_class_ids, non_crowd_ix)
 506 |     gt_boxes = tf.gather(gt_boxes, non_crowd_ix)
 507 |     gt_masks = tf.gather(gt_masks, non_crowd_ix, axis=2)
 508 | 
 509 |     # Compute overlaps matrix [proposals, gt_boxes]
 510 |     overlaps = overlaps_graph(proposals, gt_boxes)
 511 | 
 512 |     # Compute overlaps with crowd boxes [anchors, crowds]
 513 |     crowd_overlaps = overlaps_graph(proposals, crowd_boxes)
 514 |     crowd_iou_max = tf.reduce_max(crowd_overlaps, axis=1)
 515 |     no_crowd_bool = (crowd_iou_max < 0.001)
 516 | 
 517 |     # Determine postive and negative ROIs
 518 |     roi_iou_max = tf.reduce_max(overlaps, axis=1)
 519 |     # 1. Positive ROIs are those with >= 0.5 IoU with a GT box
 520 |     positive_roi_bool = (roi_iou_max >= 0.5)
 521 |     positive_indices = tf.where(positive_roi_bool)[:, 0]
 522 |     # 2. Negative ROIs are those with < 0.5 with every GT box. Skip crowds.
 523 |     negative_indices = tf.where(tf.logical_and(roi_iou_max < 0.5, no_crowd_bool))[:, 0]
 524 | 
 525 |     # Subsample ROIs. Aim for 33% positive
 526 |     # Positive ROIs
 527 |     positive_count = int(config.TRAIN_ROIS_PER_IMAGE *
 528 |                          config.ROI_POSITIVE_RATIO)
 529 |     positive_indices = tf.random_shuffle(positive_indices)[:positive_count]
 530 |     positive_count = tf.shape(positive_indices)[0]
 531 |     # Negative ROIs. Add enough to maintain positive:negative ratio.
 532 |     r = 1.0 / config.ROI_POSITIVE_RATIO
 533 |     negative_count = tf.cast(r * tf.cast(positive_count, tf.float32), tf.int32) - positive_count
 534 |     negative_indices = tf.random_shuffle(negative_indices)[:negative_count]
 535 |     # Gather selected ROIs
 536 |     positive_rois = tf.gather(proposals, positive_indices)
 537 |     negative_rois = tf.gather(proposals, negative_indices)
 538 | 
 539 |     # Assign positive ROIs to GT boxes.
 540 |     positive_overlaps = tf.gather(overlaps, positive_indices)
 541 |     roi_gt_box_assignment = tf.argmax(positive_overlaps, axis=1)
 542 |     roi_gt_boxes = tf.gather(gt_boxes, roi_gt_box_assignment)
 543 |     roi_gt_class_ids = tf.gather(gt_class_ids, roi_gt_box_assignment)
 544 | 
 545 |     # Compute bbox refinement for positive ROIs
 546 |     deltas = utils.box_refinement_graph(positive_rois, roi_gt_boxes)
 547 |     deltas /= config.BBOX_STD_DEV
 548 | 
 549 |     # Assign positive ROIs to GT masks
 550 |     # Permute masks to [N, height, width, 1]
 551 |     transposed_masks = tf.expand_dims(tf.transpose(gt_masks, [2, 0, 1]), -1)
 552 |     # Pick the right mask for each ROI
 553 |     roi_masks = tf.gather(transposed_masks, roi_gt_box_assignment)
 554 | 
 555 |     # Compute mask targets
 556 |     boxes = positive_rois
 557 |     if config.USE_MINI_MASK:
 558 |         # Transform ROI corrdinates from normalized image space
 559 |         # to normalized mini-mask space.
 560 |         y1, x1, y2, x2 = tf.split(positive_rois, 4, axis=1)
 561 |         gt_y1, gt_x1, gt_y2, gt_x2 = tf.split(roi_gt_boxes, 4, axis=1)
 562 |         gt_h = gt_y2 - gt_y1
 563 |         gt_w = gt_x2 - gt_x1
 564 |         y1 = (y1 - gt_y1) / gt_h
 565 |         x1 = (x1 - gt_x1) / gt_w
 566 |         y2 = (y2 - gt_y1) / gt_h
 567 |         x2 = (x2 - gt_x1) / gt_w
 568 |         boxes = tf.concat([y1, x1, y2, x2], 1)
 569 |     box_ids = tf.range(0, tf.shape(roi_masks)[0])
 570 |     masks = tf.image.crop_and_resize(tf.cast(roi_masks, tf.float32), boxes,
 571 |                                      box_ids,
 572 |                                      config.MASK_SHAPE)
 573 |     # Remove the extra dimension from masks.
 574 |     masks = tf.squeeze(masks, axis=3)
 575 | 
 576 |     # Threshold mask pixels at 0.5 to have GT masks be 0 or 1 to use with
 577 |     # binary cross entropy loss.
 578 |     masks = tf.round(masks)
 579 | 
 580 |     # Append negative ROIs and pad bbox deltas and masks that
 581 |     # are not used for negative ROIs with zeros.
 582 |     rois = tf.concat([positive_rois, negative_rois], axis=0)
 583 |     N = tf.shape(negative_rois)[0]
 584 |     P = tf.maximum(config.TRAIN_ROIS_PER_IMAGE - tf.shape(rois)[0], 0)
 585 |     rois = tf.pad(rois, [(0, P), (0, 0)])
 586 |     roi_gt_boxes = tf.pad(roi_gt_boxes, [(0, N + P), (0, 0)])
 587 |     roi_gt_class_ids = tf.pad(roi_gt_class_ids, [(0, N + P)])
 588 |     deltas = tf.pad(deltas, [(0, N + P), (0, 0)])
 589 |     masks = tf.pad(masks, [[0, N + P], (0, 0), (0, 0)])
 590 | 
 591 |     return rois, roi_gt_class_ids, deltas, masks
 592 | 
 593 | 
 594 | class DetectionTargetLayer(KE.Layer):
 595 |     """Subsamples proposals and generates target box refinement, class_ids,
 596 |     and masks for each.
 597 | 
 598 |     Inputs:
 599 |     proposals: [batch, N, (y1, x1, y2, x2)] in normalized coordinates. Might
 600 |                be zero padded if there are not enough proposals.
 601 |     gt_class_ids: [batch, MAX_GT_INSTANCES] Integer class IDs.
 602 |     gt_boxes: [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in normalized
 603 |               coordinates.
 604 |     gt_masks: [batch, height, width, MAX_GT_INSTANCES] of boolean type
 605 | 
 606 |     Returns: Target ROIs and corresponding class IDs, bounding box shifts,
 607 |     and masks.
 608 |     rois: [batch, TRAIN_ROIS_PER_IMAGE, (y1, x1, y2, x2)] in normalized
 609 |           coordinates
 610 |     target_class_ids: [batch, TRAIN_ROIS_PER_IMAGE]. Integer class IDs.
 611 |     target_deltas: [batch, TRAIN_ROIS_PER_IMAGE, NUM_CLASSES,
 612 |                     (dy, dx, log(dh), log(dw), class_id)]
 613 |                    Class-specific bbox refinements.
 614 |     target_mask: [batch, TRAIN_ROIS_PER_IMAGE, height, width)
 615 |                  Masks cropped to bbox boundaries and resized to neural
 616 |                  network output size.
 617 | 
 618 |     Note: Returned arrays might be zero padded if not enough target ROIs.
 619 |     """
 620 | 
 621 |     def __init__(self, config, **kwargs):
 622 |         super(DetectionTargetLayer, self).__init__(**kwargs)
 623 |         self.config = config
 624 | 
 625 |     def call(self, inputs):
 626 |         proposals = inputs[0]
 627 |         gt_class_ids = inputs[1]
 628 |         gt_boxes = inputs[2]
 629 |         gt_masks = inputs[3]
 630 | 
 631 |         # Slice the batch and run a graph for each slice
 632 |         # TODO: Rename target_bbox to target_deltas for clarity
 633 |         names = ["rois", "target_class_ids", "target_bbox", "target_mask"]
 634 |         outputs = utils.batch_slice(
 635 |             [proposals, gt_class_ids, gt_boxes, gt_masks],
 636 |             lambda w, x, y, z: detection_targets_graph(
 637 |                 w, x, y, z, self.config),
 638 |             self.config.IMAGES_PER_GPU, names=names)
 639 |         return outputs
 640 | 
 641 |     def compute_output_shape(self, input_shape):
 642 |         return [
 643 |             (None, self.config.TRAIN_ROIS_PER_IMAGE, 4),  # rois
 644 |             (None, 1),  # class_ids
 645 |             (None, self.config.TRAIN_ROIS_PER_IMAGE, 4),  # deltas
 646 |             (None, self.config.TRAIN_ROIS_PER_IMAGE, self.config.MASK_SHAPE[0],
 647 |              self.config.MASK_SHAPE[1])  # masks
 648 |         ]
 649 | 
 650 |     def compute_mask(self, inputs, mask=None):
 651 |         return [None, None, None, None]
 652 | 
 653 | 
 654 | ############################################################
 655 | #  Detection Layer
 656 | ############################################################
 657 | 
 658 | def clip_to_window(window, boxes):
 659 |     """
 660 |     window: (y1, x1, y2, x2). The window in the image we want to clip to.
 661 |     boxes: [N, (y1, x1, y2, x2)]
 662 |     """
 663 |     boxes[:, 0] = np.maximum(np.minimum(boxes[:, 0], window[2]), window[0])
 664 |     boxes[:, 1] = np.maximum(np.minimum(boxes[:, 1], window[3]), window[1])
 665 |     boxes[:, 2] = np.maximum(np.minimum(boxes[:, 2], window[2]), window[0])
 666 |     boxes[:, 3] = np.maximum(np.minimum(boxes[:, 3], window[3]), window[1])
 667 |     return boxes
 668 | 
 669 | 
 670 | def refine_detections_graph(rois, probs, deltas, window, config):
 671 |     """Refine classified proposals and filter overlaps and return final
 672 |     detections.
 673 | 
 674 |     Inputs:
 675 |         rois: [N, (y1, x1, y2, x2)] in normalized coordinates
 676 |         probs: [N, num_classes]. Class probabilities.
 677 |         deltas: [N, num_classes, (dy, dx, log(dh), log(dw))]. Class-specific
 678 |                 bounding box deltas.
 679 |         window: (y1, x1, y2, x2) in image coordinates. The part of the image
 680 |             that contains the image excluding the padding.
 681 | 
 682 |     Returns detections shaped: [N, (y1, x1, y2, x2, class_id, score)] where
 683 |         coordinates are in image domain.
 684 |     """
 685 |     # Class IDs per ROI
 686 |     class_ids = tf.argmax(probs, axis=1, output_type=tf.int32)
 687 |     # Class probability of the top class of each ROI
 688 |     indices = tf.stack([tf.range(probs.shape[0]), class_ids], axis=1)
 689 |     class_scores = tf.gather_nd(probs, indices)
 690 |     # Class-specific bounding box deltas
 691 |     deltas_specific = tf.gather_nd(deltas, indices)
 692 |     # Apply bounding box deltas
 693 |     # Shape: [boxes, (y1, x1, y2, x2)] in normalized coordinates
 694 |     refined_rois = apply_box_deltas_graph(
 695 |         rois, deltas_specific * config.BBOX_STD_DEV)
 696 |     # Convert coordiates to image domain
 697 |     # TODO: better to keep them normalized until later
 698 |     height, width = config.IMAGE_SHAPE[:2]
 699 |     refined_rois *= tf.constant([height, width, height, width], dtype=tf.float32)
 700 |     # Clip boxes to image window
 701 |     refined_rois = clip_boxes_graph(refined_rois, window)
 702 |     # Round and cast to int since we're deadling with pixels now
 703 |     refined_rois = tf.to_int32(tf.rint(refined_rois))
 704 | 
 705 |     # TODO: Filter out boxes with zero area
 706 | 
 707 |     # Filter out background boxes
 708 |     keep = tf.where(class_ids > 0)[:, 0]
 709 |     # Filter out low confidence boxes
 710 |     if config.DETECTION_MIN_CONFIDENCE:
 711 |         conf_keep = tf.where(class_scores >= config.DETECTION_MIN_CONFIDENCE)[:, 0]
 712 |         keep = tf.sets.set_intersection(tf.expand_dims(keep, 0),
 713 |                                         tf.expand_dims(conf_keep, 0))
 714 |         keep = tf.sparse_tensor_to_dense(keep)[0]
 715 | 
 716 |     # Apply per-class NMS
 717 |     # 1. Prepare variables
 718 |     pre_nms_class_ids = tf.gather(class_ids, keep)
 719 |     pre_nms_scores = tf.gather(class_scores, keep)
 720 |     pre_nms_rois = tf.gather(refined_rois,   keep)
 721 |     unique_pre_nms_class_ids = tf.unique(pre_nms_class_ids)[0]
 722 | 
 723 |     def nms_keep_map(class_id):
 724 |         """Apply Non-Maximum Suppression on ROIs of the given class."""
 725 |         # Indices of ROIs of the given class
 726 |         ixs = tf.where(tf.equal(pre_nms_class_ids, class_id))[:, 0]
 727 |         # Apply NMS
 728 |         class_keep = tf.image.non_max_suppression(
 729 |                 tf.to_float(tf.gather(pre_nms_rois, ixs)),
 730 |                 tf.gather(pre_nms_scores, ixs),
 731 |                 max_output_size=config.DETECTION_MAX_INSTANCES,
 732 |                 iou_threshold=config.DETECTION_NMS_THRESHOLD)
 733 |         # Map indicies
 734 |         class_keep = tf.gather(keep, tf.gather(ixs, class_keep))
 735 |         # Pad with -1 so returned tensors have the same shape
 736 |         gap = config.DETECTION_MAX_INSTANCES - tf.shape(class_keep)[0]
 737 |         class_keep = tf.pad(class_keep, [(0, gap)],
 738 |                             mode='CONSTANT', constant_values=-1)
 739 |         # Set shape so map_fn() can infer result shape
 740 |         class_keep.set_shape([config.DETECTION_MAX_INSTANCES])
 741 |         return class_keep
 742 | 
 743 |     # 2. Map over class IDs
 744 |     nms_keep = tf.map_fn(nms_keep_map, unique_pre_nms_class_ids,
 745 |                          dtype=tf.int64)
 746 |     # 3. Merge results into one list, and remove -1 padding
 747 |     nms_keep = tf.reshape(nms_keep, [-1])
 748 |     nms_keep = tf.gather(nms_keep, tf.where(nms_keep > -1)[:, 0])
 749 |     # 4. Compute intersection between keep and nms_keep
 750 |     keep = tf.sets.set_intersection(tf.expand_dims(keep, 0),
 751 |                                     tf.expand_dims(nms_keep, 0))
 752 |     keep = tf.sparse_tensor_to_dense(keep)[0]
 753 |     # Keep top detections
 754 |     roi_count = config.DETECTION_MAX_INSTANCES
 755 |     class_scores_keep = tf.gather(class_scores, keep)
 756 |     num_keep = tf.minimum(tf.shape(class_scores_keep)[0], roi_count)
 757 |     top_ids = tf.nn.top_k(class_scores_keep, k=num_keep, sorted=True)[1]
 758 |     keep = tf.gather(keep, top_ids)
 759 | 
 760 |     # Arrange output as [N, (y1, x1, y2, x2, class_id, score)]
 761 |     # Coordinates are in image domain.
 762 |     detections = tf.concat([
 763 |         tf.to_float(tf.gather(refined_rois, keep)),
 764 |         tf.to_float(tf.gather(class_ids, keep))[..., tf.newaxis],
 765 |         tf.gather(class_scores, keep)[..., tf.newaxis]
 766 |         ], axis=1)
 767 | 
 768 |     # Pad with zeros if detections < DETECTION_MAX_INSTANCES
 769 |     gap = config.DETECTION_MAX_INSTANCES - tf.shape(detections)[0]
 770 |     detections = tf.pad(detections, [(0, gap), (0, 0)], "CONSTANT")
 771 |     return detections
 772 | 
 773 | 
 774 | class DetectionLayer(KE.Layer):
 775 |     """Takes classified proposal boxes and their bounding box deltas and
 776 |     returns the final detection boxes.
 777 | 
 778 |     Returns:
 779 |     [batch, num_detections, (y1, x1, y2, x2, class_id, class_score)] where
 780 |     coordinates are in image domain
 781 |     """
 782 | 
 783 |     def __init__(self, config=None, **kwargs):
 784 |         super(DetectionLayer, self).__init__(**kwargs)
 785 |         self.config = config
 786 | 
 787 |     def call(self, inputs):
 788 |         rois = inputs[0]
 789 |         mrcnn_class = inputs[1]
 790 |         mrcnn_bbox = inputs[2]
 791 |         image_meta = inputs[3]
 792 | 
 793 |         # Run detection refinement graph on each item in the batch
 794 |         _, _, window, _ = parse_image_meta_graph(image_meta)
 795 |         detections_batch = utils.batch_slice(
 796 |             [rois, mrcnn_class, mrcnn_bbox, window],
 797 |             lambda x, y, w, z: refine_detections_graph(x, y, w, z, self.config),
 798 |             self.config.IMAGES_PER_GPU)
 799 | 
 800 |         # Reshape output
 801 |         # [batch, num_detections, (y1, x1, y2, x2, class_score)] in pixels
 802 |         return tf.reshape(
 803 |             detections_batch,
 804 |             [self.config.BATCH_SIZE, self.config.DETECTION_MAX_INSTANCES, 6])
 805 | 
 806 |     def compute_output_shape(self, input_shape):
 807 |         return (None, self.config.DETECTION_MAX_INSTANCES, 6)
 808 | 
 809 | 
 810 | # Region Proposal Network (RPN)
 811 | 
 812 | def rpn_graph(feature_map, anchors_per_location, anchor_stride):
 813 |     """Builds the computation graph of Region Proposal Network.
 814 | 
 815 |     feature_map: backbone features [batch, height, width, depth]
 816 |     anchors_per_location: number of anchors per pixel in the feature map
 817 |     anchor_stride: Controls the density of anchors. Typically 1 (anchors for
 818 |                    every pixel in the feature map), or 2 (every other pixel).
 819 | 
 820 |     Returns:
 821 |         rpn_logits: [batch, H, W, 2] Anchor classifier logits (before softmax)
 822 |         rpn_probs: [batch, H, W, 2] Anchor classifier probabilities.
 823 |         rpn_bbox: [batch, H, W, (dy, dx, log(dh), log(dw))] Deltas to be
 824 |                   applied to anchors.
 825 |     """
 826 |     # TODO: check if stride of 2 causes alignment issues if the featuremap
 827 |     #       is not even.
 828 |     # Shared convolutional base of the RPN
 829 |     shared = KL.Conv2D(512, (3, 3), padding='same', activation='relu',
 830 |                        strides=anchor_stride,
 831 |                        name='rpn_conv_shared')(feature_map)
 832 | 
 833 |     # Anchor Score. [batch, height, width, anchors per location * 2].
 834 |     x = KL.Conv2D(2 * anchors_per_location, (1, 1), padding='valid',
 835 |                   activation='linear', name='rpn_class_raw')(shared)
 836 | 
 837 |     # Reshape to [batch, anchors, 2]
 838 |     rpn_class_logits = KL.Lambda(
 839 |         lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 2]))(x)
 840 | 
 841 |     # Softmax on last dimension of BG/FG.
 842 |     rpn_probs = KL.Activation(
 843 |         "softmax", name="rpn_class_xxx")(rpn_class_logits)
 844 | 
 845 |     # Bounding box refinement. [batch, H, W, anchors per location, depth]
 846 |     # where depth is [x, y, log(w), log(h)]
 847 |     x = KL.Conv2D(anchors_per_location * 4, (1, 1), padding="valid",
 848 |                   activation='linear', name='rpn_bbox_pred')(shared)
 849 | 
 850 |     # Reshape to [batch, anchors, 4]
 851 |     rpn_bbox = KL.Lambda(lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 4]))(x)
 852 | 
 853 |     return [rpn_class_logits, rpn_probs, rpn_bbox]
 854 | 
 855 | 
 856 | def build_rpn_model(anchor_stride, anchors_per_location, depth):
 857 |     """Builds a Keras model of the Region Proposal Network.
 858 |     It wraps the RPN graph so it can be used multiple times with shared
 859 |     weights.
 860 | 
 861 |     anchors_per_location: number of anchors per pixel in the feature map
 862 |     anchor_stride: Controls the density of anchors. Typically 1 (anchors for
 863 |                    every pixel in the feature map), or 2 (every other pixel).
 864 |     depth: Depth of the backbone feature map.
 865 | 
 866 |     Returns a Keras Model object. The model outputs, when called, are:
 867 |     rpn_logits: [batch, H, W, 2] Anchor classifier logits (before softmax)
 868 |     rpn_probs: [batch, W, W, 2] Anchor classifier probabilities.
 869 |     rpn_bbox: [batch, H, W, (dy, dx, log(dh), log(dw))] Deltas to be
 870 |                 applied to anchors.
 871 |     """
 872 |     input_feature_map = KL.Input(shape=[None, None, depth],
 873 |                                  name="input_rpn_feature_map")
 874 |     outputs = rpn_graph(input_feature_map, anchors_per_location, anchor_stride)
 875 |     return KM.Model([input_feature_map], outputs, name="rpn_model")
 876 | 
 877 | 
 878 | ############################################################
 879 | #  Feature Pyramid Network Heads
 880 | ############################################################
 881 | 
 882 | def fpn_classifier_graph(rois, feature_maps,
 883 |                          image_shape, pool_size, num_classes):
 884 |     """Builds the computation graph of the feature pyramid network classifier
 885 |     and regressor heads.
 886 | 
 887 |     rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized
 888 |           coordinates.
 889 |     feature_maps: List of feature maps from diffent layers of the pyramid,
 890 |                   [P2, P3, P4, P5]. Each has a different resolution.
 891 |     image_shape: [height, width, depth]
 892 |     pool_size: The width of the square feature map generated from ROI Pooling.
 893 |     num_classes: number of classes, which determines the depth of the results
 894 | 
 895 |     Returns:
 896 |         logits: [N, NUM_CLASSES] classifier logits (before softmax)
 897 |         probs: [N, NUM_CLASSES] classifier probabilities
 898 |         bbox_deltas: [N, (dy, dx, log(dh), log(dw))] Deltas to apply to
 899 |                      proposal boxes
 900 |     """
 901 |     # ROI Pooling
 902 |     # Shape: [batch, num_boxes, pool_height, pool_width, channels]
 903 |     x = PyramidROIAlign([pool_size, pool_size], image_shape,
 904 |                         name="roi_align_classifier")([rois] + feature_maps)
 905 |     # Two 1024 FC layers (implemented with Conv2D for consistency)
 906 |     x = KL.TimeDistributed(KL.Conv2D(1024, (pool_size, pool_size), padding="valid"),
 907 |                            name="mrcnn_class_conv1")(x)
 908 |     x = KL.TimeDistributed(BatchNorm(axis=3), name='mrcnn_class_bn1')(x)
 909 |     x = KL.Activation('relu')(x)
 910 |     x = KL.TimeDistributed(KL.Conv2D(1024, (1, 1)),
 911 |                            name="mrcnn_class_conv2")(x)
 912 |     x = KL.TimeDistributed(BatchNorm(axis=3),
 913 |                            name='mrcnn_class_bn2')(x)
 914 |     x = KL.Activation('relu')(x)
 915 | 
 916 |     shared = KL.Lambda(lambda x: K.squeeze(K.squeeze(x, 3), 2),
 917 |                        name="pool_squeeze")(x)
 918 | 
 919 |     # Classifier head
 920 |     mrcnn_class_logits = KL.TimeDistributed(KL.Dense(num_classes),
 921 |                                             name='mrcnn_class_logits')(shared)
 922 |     mrcnn_probs = KL.TimeDistributed(KL.Activation("softmax"),
 923 |                                      name="mrcnn_class")(mrcnn_class_logits)
 924 | 
 925 |     # BBox head
 926 |     # [batch, boxes, num_classes * (dy, dx, log(dh), log(dw))]
 927 |     x = KL.TimeDistributed(KL.Dense(num_classes * 4, activation='linear'),
 928 |                            name='mrcnn_bbox_fc')(shared)
 929 |     # Reshape to [batch, boxes, num_classes, (dy, dx, log(dh), log(dw))]
 930 |     s = K.int_shape(x)
 931 |     mrcnn_bbox = KL.Reshape((s[1], num_classes, 4), name="mrcnn_bbox")(x)
 932 | 
 933 |     return mrcnn_class_logits, mrcnn_probs, mrcnn_bbox
 934 | 
 935 | 
 936 | def build_fpn_mask_graph(rois, feature_maps,
 937 |                          image_shape, pool_size, num_classes):
 938 |     """Builds the computation graph of the mask head of Feature Pyramid Network.
 939 | 
 940 |     rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized
 941 |           coordinates.
 942 |     feature_maps: List of feature maps from diffent layers of the pyramid,
 943 |                   [P2, P3, P4, P5]. Each has a different resolution.
 944 |     image_shape: [height, width, depth]
 945 |     pool_size: The width of the square feature map generated from ROI Pooling.
 946 |     num_classes: number of classes, which determines the depth of the results
 947 | 
 948 |     Returns: Masks [batch, roi_count, height, width, num_classes]
 949 |     """
 950 |     # ROI Pooling
 951 |     # Shape: [batch, boxes, pool_height, pool_width, channels]
 952 |     x = PyramidROIAlign([pool_size, pool_size], image_shape,
 953 |                         name="roi_align_mask")([rois] + feature_maps)
 954 | 
 955 |     # Conv layers
 956 |     x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"),
 957 |                            name="mrcnn_mask_conv1")(x)
 958 |     x = KL.TimeDistributed(BatchNorm(axis=3),
 959 |                            name='mrcnn_mask_bn1')(x)
 960 |     x = KL.Activation('relu')(x)
 961 | 
 962 |     x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"),
 963 |                            name="mrcnn_mask_conv2")(x)
 964 |     x = KL.TimeDistributed(BatchNorm(axis=3),
 965 |                            name='mrcnn_mask_bn2')(x)
 966 |     x = KL.Activation('relu')(x)
 967 | 
 968 |     x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"),
 969 |                            name="mrcnn_mask_conv3")(x)
 970 |     x = KL.TimeDistributed(BatchNorm(axis=3),
 971 |                            name='mrcnn_mask_bn3')(x)
 972 |     x = KL.Activation('relu')(x)
 973 | 
 974 |     x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"),
 975 |                            name="mrcnn_mask_conv4")(x)
 976 |     x = KL.TimeDistributed(BatchNorm(axis=3),
 977 |                            name='mrcnn_mask_bn4')(x)
 978 |     x = KL.Activation('relu')(x)
 979 | 
 980 |     x = KL.TimeDistributed(KL.Conv2DTranspose(256, (2, 2), strides=2, activation="relu"),
 981 |                            name="mrcnn_mask_deconv")(x)
 982 |     x = KL.TimeDistributed(KL.Conv2D(num_classes, (1, 1), strides=1, activation="sigmoid"),
 983 |                            name="mrcnn_mask")(x)
 984 |     return x
 985 | 
 986 | 
 987 | ############################################################
 988 | #  Loss Functions
 989 | ############################################################
 990 | 
 991 | def smooth_l1_loss(y_true, y_pred):
 992 |     """Implements Smooth-L1 loss.
 993 |     y_true and y_pred are typicallly: [N, 4], but could be any shape.
 994 |     """
 995 |     diff = K.abs(y_true - y_pred)
 996 |     less_than_one = K.cast(K.less(diff, 1.0), "float32")
 997 |     loss = (less_than_one * 0.5 * diff**2) + (1 - less_than_one) * (diff - 0.5)
 998 |     return loss
 999 | 
1000 | 
1001 | def rpn_class_loss_graph(rpn_match, rpn_class_logits):
1002 |     """RPN anchor classifier loss.
1003 | 
1004 |     rpn_match: [batch, anchors, 1]. Anchor match type. 1=positive,
1005 |                -1=negative, 0=neutral anchor.
1006 |     rpn_class_logits: [batch, anchors, 2]. RPN classifier logits for FG/BG.
1007 |     """
1008 |     # Squeeze last dim to simplify
1009 |     rpn_match = tf.squeeze(rpn_match, -1)
1010 |     # Get anchor classes. Convert the -1/+1 match to 0/1 values.
1011 |     anchor_class = K.cast(K.equal(rpn_match, 1), tf.int32)
1012 |     # Positive and Negative anchors contribute to the loss,
1013 |     # but neutral anchors (match value = 0) don't.
1014 |     indices = tf.where(K.not_equal(rpn_match, 0))
1015 |     # Pick rows that contribute to the loss and filter out the rest.
1016 |     rpn_class_logits = tf.gather_nd(rpn_class_logits, indices)
1017 |     anchor_class = tf.gather_nd(anchor_class, indices)
1018 |     # Crossentropy loss
1019 |     loss = K.sparse_categorical_crossentropy(target=anchor_class,
1020 |                                              output=rpn_class_logits,
1021 |                                              from_logits=True)
1022 |     loss = K.switch(tf.size(loss) > 0, K.mean(loss), tf.constant(0.0))
1023 |     return loss
1024 | 
1025 | 
1026 | def rpn_bbox_loss_graph(config, target_bbox, rpn_match, rpn_bbox):
1027 |     """Return the RPN bounding box loss graph.
1028 | 
1029 |     config: the model config object.
1030 |     target_bbox: [batch, max positive anchors, (dy, dx, log(dh), log(dw))].
1031 |         Uses 0 padding to fill in unsed bbox deltas.
1032 |     rpn_match: [batch, anchors, 1]. Anchor match type. 1=positive,
1033 |                -1=negative, 0=neutral anchor.
1034 |     rpn_bbox: [batch, anchors, (dy, dx, log(dh), log(dw))]
1035 |     """
1036 |     # Positive anchors contribute to the loss, but negative and
1037 |     # neutral anchors (match value of 0 or -1) don't.
1038 |     rpn_match = K.squeeze(rpn_match, -1)
1039 |     indices = tf.where(K.equal(rpn_match, 1))
1040 | 
1041 |     # Pick bbox deltas that contribute to the loss
1042 |     rpn_bbox = tf.gather_nd(rpn_bbox, indices)
1043 | 
1044 |     # Trim target bounding box deltas to the same length as rpn_bbox.
1045 |     batch_counts = K.sum(K.cast(K.equal(rpn_match, 1), tf.int32), axis=1)
1046 |     target_bbox = batch_pack_graph(target_bbox, batch_counts,
1047 |                                    config.IMAGES_PER_GPU)
1048 | 
1049 |     # TODO: use smooth_l1_loss() rather than reimplementing here
1050 |     #       to reduce code duplication
1051 |     diff = K.abs(target_bbox - rpn_bbox)
1052 |     less_than_one = K.cast(K.less(diff, 1.0), "float32")
1053 |     loss = (less_than_one * 0.5 * diff**2) + (1 - less_than_one) * (diff - 0.5)
1054 | 
1055 |     loss = K.switch(tf.size(loss) > 0, K.mean(loss), tf.constant(0.0))
1056 |     return loss
1057 | 
1058 | 
1059 | def mrcnn_class_loss_graph(target_class_ids, pred_class_logits,
1060 |                            active_class_ids):
1061 |     """Loss for the classifier head of Mask RCNN.
1062 | 
1063 |     target_class_ids: [batch, num_rois]. Integer class IDs. Uses zero
1064 |         padding to fill in the array.
1065 |     pred_class_logits: [batch, num_rois, num_classes]
1066 |     active_class_ids: [batch, num_classes]. Has a value of 1 for
1067 |         classes that are in the dataset of the image, and 0
1068 |         for classes that are not in the dataset.
1069 |     """
1070 |     target_class_ids = tf.cast(target_class_ids, 'int64')
1071 | 
1072 |     # Find predictions of classes that are not in the dataset.
1073 |     pred_class_ids = tf.argmax(pred_class_logits, axis=2)
1074 |     # TODO: Update this line to work with batch > 1. Right now it assumes all
1075 |     #       images in a batch have the same active_class_ids
1076 |     pred_active = tf.gather(active_class_ids[0], pred_class_ids)
1077 | 
1078 |     # Loss
1079 |     loss = tf.nn.sparse_softmax_cross_entropy_with_logits(
1080 |         labels=target_class_ids, logits=pred_class_logits)
1081 | 
1082 |     # Erase losses of predictions of classes that are not in the active
1083 |     # classes of the image.
1084 |     loss = loss * pred_active
1085 | 
1086 |     # Computer loss mean. Use only predictions that contribute
1087 |     # to the loss to get a correct mean.
1088 |     loss = tf.reduce_sum(loss) / tf.reduce_sum(pred_active)
1089 |     return loss
1090 | 
1091 | 
1092 | def mrcnn_bbox_loss_graph(target_bbox, target_class_ids, pred_bbox):
1093 |     """Loss for Mask R-CNN bounding box refinement.
1094 | 
1095 |     target_bbox: [batch, num_rois, (dy, dx, log(dh), log(dw))]
1096 |     target_class_ids: [batch, num_rois]. Integer class IDs.
1097 |     pred_bbox: [batch, num_rois, num_classes, (dy, dx, log(dh), log(dw))]
1098 |     """
1099 |     # Reshape to merge batch and roi dimensions for simplicity.
1100 |     target_class_ids = K.reshape(target_class_ids, (-1,))
1101 |     target_bbox = K.reshape(target_bbox, (-1, 4))
1102 |     pred_bbox = K.reshape(pred_bbox, (-1, K.int_shape(pred_bbox)[2], 4))
1103 | 
1104 |     # Only positive ROIs contribute to the loss. And only
1105 |     # the right class_id of each ROI. Get their indicies.
1106 |     positive_roi_ix = tf.where(target_class_ids > 0)[:, 0]
1107 |     positive_roi_class_ids = tf.cast(
1108 |         tf.gather(target_class_ids, positive_roi_ix), tf.int64)
1109 |     indices = tf.stack([positive_roi_ix, positive_roi_class_ids], axis=1)
1110 | 
1111 |     # Gather the deltas (predicted and true) that contribute to loss
1112 |     target_bbox = tf.gather(target_bbox, positive_roi_ix)
1113 |     pred_bbox = tf.gather_nd(pred_bbox, indices)
1114 | 
1115 |     # Smooth-L1 Loss
1116 |     loss = K.switch(tf.size(target_bbox) > 0,
1117 |                     smooth_l1_loss(y_true=target_bbox, y_pred=pred_bbox),
1118 |                     tf.constant(0.0))
1119 |     loss = K.mean(loss)
1120 |     loss = K.reshape(loss, [1, 1])
1121 |     return loss
1122 | 
1123 | 
1124 | def mrcnn_mask_loss_graph(target_masks, target_class_ids, pred_masks):
1125 |     """Mask binary cross-entropy loss for the masks head.
1126 | 
1127 |     target_masks: [batch, num_rois, height, width].
1128 |         A float32 tensor of values 0 or 1. Uses zero padding to fill array.
1129 |     target_class_ids: [batch, num_rois]. Integer class IDs. Zero padded.
1130 |     pred_masks: [batch, proposals, height, width, num_classes] float32 tensor
1131 |                 with values from 0 to 1.
1132 |     """
1133 |     # Reshape for simplicity. Merge first two dimensions into one.
1134 |     target_class_ids = K.reshape(target_class_ids, (-1,))
1135 |     mask_shape = tf.shape(target_masks)
1136 |     target_masks = K.reshape(target_masks, (-1, mask_shape[2], mask_shape[3]))
1137 |     pred_shape = tf.shape(pred_masks)
1138 |     pred_masks = K.reshape(pred_masks,
1139 |                            (-1, pred_shape[2], pred_shape[3], pred_shape[4]))
1140 |     # Permute predicted masks to [N, num_classes, height, width]
1141 |     pred_masks = tf.transpose(pred_masks, [0, 3, 1, 2])
1142 | 
1143 |     # Only positive ROIs contribute to the loss. And only
1144 |     # the class specific mask of each ROI.
1145 |     positive_ix = tf.where(target_class_ids > 0)[:, 0]
1146 |     positive_class_ids = tf.cast(
1147 |         tf.gather(target_class_ids, positive_ix), tf.int64)
1148 |     indices = tf.stack([positive_ix, positive_class_ids], axis=1)
1149 | 
1150 |     # Gather the masks (predicted and true) that contribute to loss
1151 |     y_true = tf.gather(target_masks, positive_ix)
1152 |     y_pred = tf.gather_nd(pred_masks, indices)
1153 | 
1154 |     # Compute binary cross entropy. If no positive ROIs, then return 0.
1155 |     # shape: [batch, roi, num_classes]
1156 |     loss = K.switch(tf.size(y_true) > 0,
1157 |                     K.binary_crossentropy(target=y_true, output=y_pred),
1158 |                     tf.constant(0.0))
1159 |     loss = K.mean(loss)
1160 |     loss = K.reshape(loss, [1, 1])
1161 |     return loss
1162 | 
1163 | 
1164 | ############################################################
1165 | #  Data Generator
1166 | ############################################################
1167 | 
1168 | def load_image_gt(dataset, config, image_id, augment=False,
1169 |                   use_mini_mask=False):
1170 |     """Load and return ground truth data for an image (image, mask, bounding boxes).
1171 | 
1172 |     augment: If true, apply random image augmentation. Currently, only
1173 |         horizontal flipping is offered.
1174 |     use_mini_mask: If False, returns full-size masks that are the same height
1175 |         and width as the original image. These can be big, for example
1176 |         1024x1024x100 (for 100 instances). Mini masks are smaller, typically,
1177 |         224x224 and are generated by extracting the bounding box of the
1178 |         object and resizing it to MINI_MASK_SHAPE.
1179 | 
1180 |     Returns:
1181 |     image: [height, width, 3]
1182 |     shape: the original shape of the image before resizing and cropping.
1183 |     class_ids: [instance_count] Integer class IDs
1184 |     bbox: [instance_count, (y1, x1, y2, x2)]
1185 |     mask: [height, width, instance_count]. The height and width are those
1186 |         of the image unless use_mini_mask is True, in which case they are
1187 |         defined in MINI_MASK_SHAPE.
1188 |     """
1189 |     # Load image and mask
1190 |     image = dataset.load_image(image_id)
1191 |     mask, class_ids = dataset.load_mask(image_id)
1192 |     shape = image.shape
1193 |     image, window, scale, padding = utils.resize_image(
1194 |         image,
1195 |         min_dim=config.IMAGE_MIN_DIM,
1196 |         max_dim=config.IMAGE_MAX_DIM,
1197 |         padding=config.IMAGE_PADDING)
1198 |     mask = utils.resize_mask(mask, scale, padding)
1199 | 
1200 |     # Random horizontal flips.
1201 |     if augment:
1202 |         if random.randint(0, 1):
1203 |             image = np.fliplr(image)
1204 |             mask = np.fliplr(mask)
1205 | 
1206 |     # Note that some boxes might be all zeros if the corresponding mask got cropped out.
1207 |     # and here is to filter them out
1208 |     _idx = np.sum(mask, axis=(0, 1)) > 0
1209 |     mask = mask[:, :, _idx]
1210 |     class_ids = class_ids[_idx]
1211 |     # Bounding boxes. Note that some boxes might be all zeros
1212 |     # if the corresponding mask got cropped out.
1213 |     # bbox: [num_instances, (y1, x1, y2, x2)]
1214 |     bbox = utils.extract_bboxes(mask)
1215 | 
1216 |     # Active classes
1217 |     # Different datasets have different classes, so track the
1218 |     # classes supported in the dataset of this image.
1219 |     active_class_ids = np.zeros([dataset.num_classes], dtype=np.int32)
1220 |     source_class_ids = dataset.source_class_ids[dataset.image_info[image_id]["source"]]
1221 |     active_class_ids[source_class_ids] = 1
1222 | 
1223 |     # Resize masks to smaller size to reduce memory usage
1224 |     if use_mini_mask:
1225 |         mask = utils.minimize_mask(bbox, mask, config.MINI_MASK_SHAPE)
1226 | 
1227 |     # Image meta data
1228 |     image_meta = compose_image_meta(image_id, shape, window, active_class_ids)
1229 | 
1230 |     return image, image_meta, class_ids, bbox, mask
1231 | 
1232 | 
1233 | def build_detection_targets(rpn_rois, gt_class_ids, gt_boxes, gt_masks, config):
1234 |     """Generate targets for training Stage 2 classifier and mask heads.
1235 |     This is not used in normal training. It's useful for debugging or to train
1236 |     the Mask RCNN heads without using the RPN head.
1237 | 
1238 |     Inputs:
1239 |     rpn_rois: [N, (y1, x1, y2, x2)] proposal boxes.
1240 |     gt_class_ids: [instance count] Integer class IDs
1241 |     gt_boxes: [instance count, (y1, x1, y2, x2)]
1242 |     gt_masks: [height, width, instance count] Grund truth masks. Can be full
1243 |               size or mini-masks.
1244 | 
1245 |     Returns:
1246 |     rois: [TRAIN_ROIS_PER_IMAGE, (y1, x1, y2, x2)]
1247 |     class_ids: [TRAIN_ROIS_PER_IMAGE]. Integer class IDs.
1248 |     bboxes: [TRAIN_ROIS_PER_IMAGE, NUM_CLASSES, (y, x, log(h), log(w))]. Class-specific
1249 |             bbox refinements.
1250 |     masks: [TRAIN_ROIS_PER_IMAGE, height, width, NUM_CLASSES). Class specific masks cropped
1251 |            to bbox boundaries and resized to neural network output size.
1252 |     """
1253 |     assert rpn_rois.shape[0] > 0
1254 |     assert gt_class_ids.dtype == np.int32, "Expected int but got {}".format(
1255 |         gt_class_ids.dtype)
1256 |     assert gt_boxes.dtype == np.int32, "Expected int but got {}".format(
1257 |         gt_boxes.dtype)
1258 |     assert gt_masks.dtype == np.bool_, "Expected bool but got {}".format(
1259 |         gt_masks.dtype)
1260 | 
1261 |     # It's common to add GT Boxes to ROIs but we don't do that here because
1262 |     # according to XinLei Chen's paper, it doesn't help.
1263 | 
1264 |     # Trim empty padding in gt_boxes and gt_masks parts
1265 |     instance_ids = np.where(gt_class_ids > 0)[0]
1266 |     assert instance_ids.shape[0] > 0, "Image must contain instances."
1267 |     gt_class_ids = gt_class_ids[instance_ids]
1268 |     gt_boxes = gt_boxes[instance_ids]
1269 |     gt_masks = gt_masks[:, :, instance_ids]
1270 | 
1271 |     # Compute areas of ROIs and ground truth boxes.
1272 |     rpn_roi_area = (rpn_rois[:, 2] - rpn_rois[:, 0]) * \
1273 |         (rpn_rois[:, 3] - rpn_rois[:, 1])
1274 |     gt_box_area = (gt_boxes[:, 2] - gt_boxes[:, 0]) * \
1275 |         (gt_boxes[:, 3] - gt_boxes[:, 1])
1276 | 
1277 |     # Compute overlaps [rpn_rois, gt_boxes]
1278 |     overlaps = np.zeros((rpn_rois.shape[0], gt_boxes.shape[0]))
1279 |     for i in range(overlaps.shape[1]):
1280 |         gt = gt_boxes[i]
1281 |         overlaps[:, i] = utils.compute_iou(
1282 |             gt, rpn_rois, gt_box_area[i], rpn_roi_area)
1283 | 
1284 |     # Assign ROIs to GT boxes
1285 |     rpn_roi_iou_argmax = np.argmax(overlaps, axis=1)
1286 |     rpn_roi_iou_max = overlaps[np.arange(
1287 |         overlaps.shape[0]), rpn_roi_iou_argmax]
1288 |     # GT box assigned to each ROI
1289 |     rpn_roi_gt_boxes = gt_boxes[rpn_roi_iou_argmax]
1290 |     rpn_roi_gt_class_ids = gt_class_ids[rpn_roi_iou_argmax]
1291 | 
1292 |     # Positive ROIs are those with >= 0.5 IoU with a GT box.
1293 |     fg_ids = np.where(rpn_roi_iou_max > 0.5)[0]
1294 | 
1295 |     # Negative ROIs are those with max IoU 0.1-0.5 (hard example mining)
1296 |     # TODO: To hard example mine or not to hard example mine, that's the question
1297 | #     bg_ids = np.where((rpn_roi_iou_max >= 0.1) & (rpn_roi_iou_max < 0.5))[0]
1298 |     bg_ids = np.where(rpn_roi_iou_max < 0.5)[0]
1299 | 
1300 |     # Subsample ROIs. Aim for 33% foreground.
1301 |     # FG
1302 |     fg_roi_count = int(config.TRAIN_ROIS_PER_IMAGE * config.ROI_POSITIVE_RATIO)
1303 |     if fg_ids.shape[0] > fg_roi_count:
1304 |         keep_fg_ids = np.random.choice(fg_ids, fg_roi_count, replace=False)
1305 |     else:
1306 |         keep_fg_ids = fg_ids
1307 |     # BG
1308 |     remaining = config.TRAIN_ROIS_PER_IMAGE - keep_fg_ids.shape[0]
1309 |     if bg_ids.shape[0] > remaining:
1310 |         keep_bg_ids = np.random.choice(bg_ids, remaining, replace=False)
1311 |     else:
1312 |         keep_bg_ids = bg_ids
1313 |     # Combine indicies of ROIs to keep
1314 |     keep = np.concatenate([keep_fg_ids, keep_bg_ids])
1315 |     # Need more?
1316 |     remaining = config.TRAIN_ROIS_PER_IMAGE - keep.shape[0]
1317 |     if remaining > 0:
1318 |         # Looks like we don't have enough samples to maintain the desired
1319 |         # balance. Reduce requirements and fill in the rest. This is
1320 |         # likely different from the Mask RCNN paper.
1321 | 
1322 |         # There is a small chance we have neither fg nor bg samples.
1323 |         if keep.shape[0] == 0:
1324 |             # Pick bg regions with easier IoU threshold
1325 |             bg_ids = np.where(rpn_roi_iou_max < 0.5)[0]
1326 |             assert bg_ids.shape[0] >= remaining
1327 |             keep_bg_ids = np.random.choice(bg_ids, remaining, replace=False)
1328 |             assert keep_bg_ids.shape[0] == remaining
1329 |             keep = np.concatenate([keep, keep_bg_ids])
1330 |         else:
1331 |             # Fill the rest with repeated bg rois.
1332 |             keep_extra_ids = np.random.choice(
1333 |                 keep_bg_ids, remaining, replace=True)
1334 |             keep = np.concatenate([keep, keep_extra_ids])
1335 |     assert keep.shape[0] == config.TRAIN_ROIS_PER_IMAGE, \
1336 |         "keep doesn't match ROI batch size {}, {}".format(
1337 |             keep.shape[0], config.TRAIN_ROIS_PER_IMAGE)
1338 | 
1339 |     # Reset the gt boxes assigned to BG ROIs.
1340 |     rpn_roi_gt_boxes[keep_bg_ids, :] = 0
1341 |     rpn_roi_gt_class_ids[keep_bg_ids] = 0
1342 | 
1343 |     # For each kept ROI, assign a class_id, and for FG ROIs also add bbox refinement.
1344 |     rois = rpn_rois[keep]
1345 |     roi_gt_boxes = rpn_roi_gt_boxes[keep]
1346 |     roi_gt_class_ids = rpn_roi_gt_class_ids[keep]
1347 |     roi_gt_assignment = rpn_roi_iou_argmax[keep]
1348 | 
1349 |     # Class-aware bbox deltas. [y, x, log(h), log(w)]
1350 |     bboxes = np.zeros((config.TRAIN_ROIS_PER_IMAGE,
1351 |                        config.NUM_CLASSES, 4), dtype=np.float32)
1352 |     pos_ids = np.where(roi_gt_class_ids > 0)[0]
1353 |     bboxes[pos_ids, roi_gt_class_ids[pos_ids]] = utils.box_refinement(
1354 |         rois[pos_ids], roi_gt_boxes[pos_ids, :4])
1355 |     # Normalize bbox refinements
1356 |     bboxes /= config.BBOX_STD_DEV
1357 | 
1358 |     # Generate class-specific target masks.
1359 |     masks = np.zeros((config.TRAIN_ROIS_PER_IMAGE, config.MASK_SHAPE[0], config.MASK_SHAPE[1], config.NUM_CLASSES),
1360 |                      dtype=np.float32)
1361 |     for i in pos_ids:
1362 |         class_id = roi_gt_class_ids[i]
1363 |         assert class_id > 0, "class id must be greater than 0"
1364 |         gt_id = roi_gt_assignment[i]
1365 |         class_mask = gt_masks[:, :, gt_id]
1366 | 
1367 |         if config.USE_MINI_MASK:
1368 |             # Create a mask placeholder, the size of the image
1369 |             placeholder = np.zeros(config.IMAGE_SHAPE[:2], dtype=bool)
1370 |             # GT box
1371 |             gt_y1, gt_x1, gt_y2, gt_x2 = gt_boxes[gt_id]
1372 |             gt_w = gt_x2 - gt_x1
1373 |             gt_h = gt_y2 - gt_y1
1374 |             # Resize mini mask to size of GT box
1375 |             placeholder[gt_y1:gt_y2, gt_x1:gt_x2] = \
1376 |                 np.round(scipy.misc.imresize(class_mask.astype(float), (gt_h, gt_w),
1377 |                                              interp='nearest') / 255.0).astype(bool)
1378 |             # Place the mini batch in the placeholder
1379 |             class_mask = placeholder
1380 | 
1381 |         # Pick part of the mask and resize it
1382 |         y1, x1, y2, x2 = rois[i].astype(np.int32)
1383 |         m = class_mask[y1:y2, x1:x2]
1384 |         mask = scipy.misc.imresize(
1385 |             m.astype(float), config.MASK_SHAPE, interp='nearest') / 255.0
1386 |         masks[i, :, :, class_id] = mask
1387 | 
1388 |     return rois, roi_gt_class_ids, bboxes, masks
1389 | 
1390 | 
1391 | def build_rpn_targets(image_shape, anchors, gt_class_ids, gt_boxes, config):
1392 |     """Given the anchors and GT boxes, compute overlaps and identify positive
1393 |     anchors and deltas to refine them to match their corresponding GT boxes.
1394 | 
1395 |     anchors: [num_anchors, (y1, x1, y2, x2)]
1396 |     gt_class_ids: [num_gt_boxes] Integer class IDs.
1397 |     gt_boxes: [num_gt_boxes, (y1, x1, y2, x2)]
1398 | 
1399 |     Returns:
1400 |     rpn_match: [N] (int32) matches between anchors and GT boxes.
1401 |                1 = positive anchor, -1 = negative anchor, 0 = neutral
1402 |     rpn_bbox: [N, (dy, dx, log(dh), log(dw))] Anchor bbox deltas.
1403 |     """
1404 |     # RPN Match: 1 = positive anchor, -1 = negative anchor, 0 = neutral
1405 |     rpn_match = np.zeros([anchors.shape[0]], dtype=np.int32)
1406 |     # RPN bounding boxes: [max anchors per image, (dy, dx, log(dh), log(dw))]
1407 |     rpn_bbox = np.zeros((config.RPN_TRAIN_ANCHORS_PER_IMAGE, 4))
1408 | 
1409 |     # Handle COCO crowds
1410 |     # A crowd box in COCO is a bounding box around several instances. Exclude
1411 |     # them from training. A crowd box is given a negative class ID.
1412 |     crowd_ix = np.where(gt_class_ids < 0)[0]
1413 |     if crowd_ix.shape[0] > 0:
1414 |         # Filter out crowds from ground truth class IDs and boxes
1415 |         non_crowd_ix = np.where(gt_class_ids > 0)[0]
1416 |         crowd_boxes = gt_boxes[crowd_ix]
1417 |         gt_class_ids = gt_class_ids[non_crowd_ix]
1418 |         gt_boxes = gt_boxes[non_crowd_ix]
1419 |         # Compute overlaps with crowd boxes [anchors, crowds]
1420 |         crowd_overlaps = utils.compute_overlaps(anchors, crowd_boxes)
1421 |         crowd_iou_max = np.amax(crowd_overlaps, axis=1)
1422 |         no_crowd_bool = (crowd_iou_max < 0.001)
1423 |     else:
1424 |         # All anchors don't intersect a crowd
1425 |         no_crowd_bool = np.ones([anchors.shape[0]], dtype=bool)
1426 | 
1427 |     # Compute overlaps [num_anchors, num_gt_boxes]
1428 |     overlaps = utils.compute_overlaps(anchors, gt_boxes)
1429 | 
1430 |     # Match anchors to GT Boxes
1431 |     # If an anchor overlaps a GT box with IoU >= 0.7 then it's positive.
1432 |     # If an anchor overlaps a GT box with IoU < 0.3 then it's negative.
1433 |     # Neutral anchors are those that don't match the conditions above,
1434 |     # and they don't influence the loss function.
1435 |     # However, don't keep any GT box unmatched (rare, but happens). Instead,
1436 |     # match it to the closest anchor (even if its max IoU is < 0.3).
1437 |     #
1438 |     # 1. Set negative anchors first. They get overwritten below if a GT box is
1439 |     # matched to them. Skip boxes in crowd areas.
1440 |     anchor_iou_argmax = np.argmax(overlaps, axis=1)
1441 |     anchor_iou_max = overlaps[np.arange(overlaps.shape[0]), anchor_iou_argmax]
1442 |     rpn_match[(anchor_iou_max < 0.3) & (no_crowd_bool)] = -1
1443 |     # 2. Set an anchor for each GT box (regardless of IoU value).
1444 |     # TODO: If multiple anchors have the same IoU match all of them
1445 |     gt_iou_argmax = np.argmax(overlaps, axis=0)
1446 |     rpn_match[gt_iou_argmax] = 1
1447 |     # 3. Set anchors with high overlap as positive.
1448 |     rpn_match[anchor_iou_max >= 0.7] = 1
1449 | 
1450 |     # Subsample to balance positive and negative anchors
1451 |     # Don't let positives be more than half the anchors
1452 |     ids = np.where(rpn_match == 1)[0]
1453 |     extra = len(ids) - (config.RPN_TRAIN_ANCHORS_PER_IMAGE // 2)
1454 |     if extra > 0:
1455 |         # Reset the extra ones to neutral
1456 |         ids = np.random.choice(ids, extra, replace=False)
1457 |         rpn_match[ids] = 0
1458 |     # Same for negative proposals
1459 |     ids = np.where(rpn_match == -1)[0]
1460 |     extra = len(ids) - (config.RPN_TRAIN_ANCHORS_PER_IMAGE -
1461 |                         np.sum(rpn_match == 1))
1462 |     if extra > 0:
1463 |         # Rest the extra ones to neutral
1464 |         ids = np.random.choice(ids, extra, replace=False)
1465 |         rpn_match[ids] = 0
1466 | 
1467 |     # For positive anchors, compute shift and scale needed to transform them
1468 |     # to match the corresponding GT boxes.
1469 |     ids = np.where(rpn_match == 1)[0]
1470 |     ix = 0  # index into rpn_bbox
1471 |     # TODO: use box_refinement() rather than duplicating the code here
1472 |     for i, a in zip(ids, anchors[ids]):
1473 |         # Closest gt box (it might have IoU < 0.7)
1474 |         gt = gt_boxes[anchor_iou_argmax[i]]
1475 | 
1476 |         # Convert coordinates to center plus width/height.
1477 |         # GT Box
1478 |         gt_h = gt[2] - gt[0]
1479 |         gt_w = gt[3] - gt[1]
1480 |         gt_center_y = gt[0] + 0.5 * gt_h
1481 |         gt_center_x = gt[1] + 0.5 * gt_w
1482 |         # Anchor
1483 |         a_h = a[2] - a[0]
1484 |         a_w = a[3] - a[1]
1485 |         a_center_y = a[0] + 0.5 * a_h
1486 |         a_center_x = a[1] + 0.5 * a_w
1487 | 
1488 |         # Compute the bbox refinement that the RPN should predict.
1489 |         rpn_bbox[ix] = [
1490 |             (gt_center_y - a_center_y) / a_h,
1491 |             (gt_center_x - a_center_x) / a_w,
1492 |             np.log(gt_h / a_h),
1493 |             np.log(gt_w / a_w),
1494 |         ]
1495 |         # Normalize
1496 |         rpn_bbox[ix] /= config.RPN_BBOX_STD_DEV
1497 |         ix += 1
1498 | 
1499 |     return rpn_match, rpn_bbox
1500 | 
1501 | 
1502 | def generate_random_rois(image_shape, count, gt_class_ids, gt_boxes):
1503 |     """Generates ROI proposals similar to what a region proposal network
1504 |     would generate.
1505 | 
1506 |     image_shape: [Height, Width, Depth]
1507 |     count: Number of ROIs to generate
1508 |     gt_class_ids: [N] Integer ground truth class IDs
1509 |     gt_boxes: [N, (y1, x1, y2, x2)] Ground truth boxes in pixels.
1510 | 
1511 |     Returns: [count, (y1, x1, y2, x2)] ROI boxes in pixels.
1512 |     """
1513 |     # placeholder
1514 |     rois = np.zeros((count, 4), dtype=np.int32)
1515 | 
1516 |     # Generate random ROIs around GT boxes (90% of count)
1517 |     rois_per_box = int(0.9 * count / gt_boxes.shape[0])
1518 |     for i in range(gt_boxes.shape[0]):
1519 |         gt_y1, gt_x1, gt_y2, gt_x2 = gt_boxes[i]
1520 |         h = gt_y2 - gt_y1
1521 |         w = gt_x2 - gt_x1
1522 |         # random boundaries
1523 |         r_y1 = max(gt_y1 - h, 0)
1524 |         r_y2 = min(gt_y2 + h, image_shape[0])
1525 |         r_x1 = max(gt_x1 - w, 0)
1526 |         r_x2 = min(gt_x2 + w, image_shape[1])
1527 | 
1528 |         # To avoid generating boxes with zero area, we generate double what
1529 |         # we need and filter out the extra. If we get fewer valid boxes
1530 |         # than we need, we loop and try again.
1531 |         while True:
1532 |             y1y2 = np.random.randint(r_y1, r_y2, (rois_per_box * 2, 2))
1533 |             x1x2 = np.random.randint(r_x1, r_x2, (rois_per_box * 2, 2))
1534 |             # Filter out zero area boxes
1535 |             threshold = 1
1536 |             y1y2 = y1y2[np.abs(y1y2[:, 0] - y1y2[:, 1]) >=
1537 |                         threshold][:rois_per_box]
1538 |             x1x2 = x1x2[np.abs(x1x2[:, 0] - x1x2[:, 1]) >=
1539 |                         threshold][:rois_per_box]
1540 |             if y1y2.shape[0] == rois_per_box and x1x2.shape[0] == rois_per_box:
1541 |                 break
1542 | 
1543 |         # Sort on axis 1 to ensure x1 <= x2 and y1 <= y2 and then reshape
1544 |         # into x1, y1, x2, y2 order
1545 |         x1, x2 = np.split(np.sort(x1x2, axis=1), 2, axis=1)
1546 |         y1, y2 = np.split(np.sort(y1y2, axis=1), 2, axis=1)
1547 |         box_rois = np.hstack([y1, x1, y2, x2])
1548 |         rois[rois_per_box * i:rois_per_box * (i + 1)] = box_rois
1549 | 
1550 |     # Generate random ROIs anywhere in the image (10% of count)
1551 |     remaining_count = count - (rois_per_box * gt_boxes.shape[0])
1552 |     # To avoid generating boxes with zero area, we generate double what
1553 |     # we need and filter out the extra. If we get fewer valid boxes
1554 |     # than we need, we loop and try again.
1555 |     while True:
1556 |         y1y2 = np.random.randint(0, image_shape[0], (remaining_count * 2, 2))
1557 |         x1x2 = np.random.randint(0, image_shape[1], (remaining_count * 2, 2))
1558 |         # Filter out zero area boxes
1559 |         threshold = 1
1560 |         y1y2 = y1y2[np.abs(y1y2[:, 0] - y1y2[:, 1]) >=
1561 |                     threshold][:remaining_count]
1562 |         x1x2 = x1x2[np.abs(x1x2[:, 0] - x1x2[:, 1]) >=
1563 |                     threshold][:remaining_count]
1564 |         if y1y2.shape[0] == remaining_count and x1x2.shape[0] == remaining_count:
1565 |             break
1566 | 
1567 |     # Sort on axis 1 to ensure x1 <= x2 and y1 <= y2 and then reshape
1568 |     # into x1, y1, x2, y2 order
1569 |     x1, x2 = np.split(np.sort(x1x2, axis=1), 2, axis=1)
1570 |     y1, y2 = np.split(np.sort(y1y2, axis=1), 2, axis=1)
1571 |     global_rois = np.hstack([y1, x1, y2, x2])
1572 |     rois[-remaining_count:] = global_rois
1573 |     return rois
1574 | 
1575 | 
1576 | def data_generator(dataset, config, shuffle=True, augment=True, random_rois=0,
1577 |                    batch_size=1, detection_targets=False):
1578 |     """A generator that returns images and corresponding target class ids,
1579 |     bounding box deltas, and masks.
1580 | 
1581 |     dataset: The Dataset object to pick data from
1582 |     config: The model config object
1583 |     shuffle: If True, shuffles the samples before every epoch
1584 |     augment: If True, applies image augmentation to images (currently only
1585 |              horizontal flips are supported)
1586 |     random_rois: If > 0 then generate proposals to be used to train the
1587 |                  network classifier and mask heads. Useful if training
1588 |                  the Mask RCNN part without the RPN.
1589 |     batch_size: How many images to return in each call
1590 |     detection_targets: If True, generate detection targets (class IDs, bbox
1591 |         deltas, and masks). Typically for debugging or visualizations because
1592 |         in trainig detection targets are generated by DetectionTargetLayer.
1593 | 
1594 |     Returns a Python generator. Upon calling next() on it, the
1595 |     generator returns two lists, inputs and outputs. The containtes
1596 |     of the lists differs depending on the received arguments:
1597 |     inputs list:
1598 |     - images: [batch, H, W, C]
1599 |     - image_meta: [batch, size of image meta]
1600 |     - rpn_match: [batch, N] Integer (1=positive anchor, -1=negative, 0=neutral)
1601 |     - rpn_bbox: [batch, N, (dy, dx, log(dh), log(dw))] Anchor bbox deltas.
1602 |     - gt_class_ids: [batch, MAX_GT_INSTANCES] Integer class IDs
1603 |     - gt_boxes: [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)]
1604 |     - gt_masks: [batch, height, width, MAX_GT_INSTANCES]. The height and width
1605 |                 are those of the image unless use_mini_mask is True, in which
1606 |                 case they are defined in MINI_MASK_SHAPE.
1607 | 
1608 |     outputs list: Usually empty in regular training. But if detection_targets
1609 |         is True then the outputs list contains target class_ids, bbox deltas,
1610 |         and masks.
1611 |     """
1612 |     b = 0  # batch item index
1613 |     image_index = -1
1614 |     image_ids = np.copy(dataset.image_ids)
1615 |     error_count = 0
1616 | 
1617 |     # Anchors
1618 |     # [anchor_count, (y1, x1, y2, x2)]
1619 |     anchors = utils.generate_pyramid_anchors(config.RPN_ANCHOR_SCALES,
1620 |                                              config.RPN_ANCHOR_RATIOS,
1621 |                                              config.BACKBONE_SHAPES,
1622 |                                              config.BACKBONE_STRIDES,
1623 |                                              config.RPN_ANCHOR_STRIDE)
1624 | 
1625 |     # Keras requires a generator to run indefinately.
1626 |     while True:
1627 |         try:
1628 |             # Increment index to pick next image. Shuffle if at the start of an epoch.
1629 |             image_index = (image_index + 1) % len(image_ids)
1630 |             if shuffle and image_index == 0:
1631 |                 np.random.shuffle(image_ids)
1632 | 
1633 |             # Get GT bounding boxes and masks for image.
1634 |             image_id = image_ids[image_index]
1635 |             image, image_meta, gt_class_ids, gt_boxes, gt_masks = \
1636 |                 load_image_gt(dataset, config, image_id, augment=augment,
1637 |                               use_mini_mask=config.USE_MINI_MASK)
1638 | 
1639 |             # Skip images that have no instances. This can happen in cases
1640 |             # where we train on a subset of classes and the image doesn't
1641 |             # have any of the classes we care about.
1642 |             if not np.any(gt_class_ids > 0):
1643 |                 continue
1644 | 
1645 |             # RPN Targets
1646 |             rpn_match, rpn_bbox = build_rpn_targets(image.shape, anchors,
1647 |                                                     gt_class_ids, gt_boxes, config)
1648 | 
1649 |             # Mask R-CNN Targets
1650 |             if random_rois:
1651 |                 rpn_rois = generate_random_rois(
1652 |                     image.shape, random_rois, gt_class_ids, gt_boxes)
1653 |                 if detection_targets:
1654 |                     rois, mrcnn_class_ids, mrcnn_bbox, mrcnn_mask =\
1655 |                         build_detection_targets(
1656 |                             rpn_rois, gt_class_ids, gt_boxes, gt_masks, config)
1657 | 
1658 |             # Init batch arrays
1659 |             if b == 0:
1660 |                 batch_image_meta = np.zeros(
1661 |                     (batch_size,) + image_meta.shape, dtype=image_meta.dtype)
1662 |                 batch_rpn_match = np.zeros(
1663 |                     [batch_size, anchors.shape[0], 1], dtype=rpn_match.dtype)
1664 |                 batch_rpn_bbox = np.zeros(
1665 |                     [batch_size, config.RPN_TRAIN_ANCHORS_PER_IMAGE, 4], dtype=rpn_bbox.dtype)
1666 |                 batch_images = np.zeros(
1667 |                     (batch_size,) + image.shape, dtype=np.float32)
1668 |                 batch_gt_class_ids = np.zeros(
1669 |                     (batch_size, config.MAX_GT_INSTANCES), dtype=np.int32)
1670 |                 batch_gt_boxes = np.zeros(
1671 |                     (batch_size, config.MAX_GT_INSTANCES, 4), dtype=np.int32)
1672 |                 if config.USE_MINI_MASK:
1673 |                     batch_gt_masks = np.zeros((batch_size, config.MINI_MASK_SHAPE[0], config.MINI_MASK_SHAPE[1],
1674 |                                                config.MAX_GT_INSTANCES))
1675 |                 else:
1676 |                     batch_gt_masks = np.zeros(
1677 |                         (batch_size, image.shape[0], image.shape[1], config.MAX_GT_INSTANCES))
1678 |                 if random_rois:
1679 |                     batch_rpn_rois = np.zeros(
1680 |                         (batch_size, rpn_rois.shape[0], 4), dtype=rpn_rois.dtype)
1681 |                     if detection_targets:
1682 |                         batch_rois = np.zeros(
1683 |                             (batch_size,) + rois.shape, dtype=rois.dtype)
1684 |                         batch_mrcnn_class_ids = np.zeros(
1685 |                             (batch_size,) + mrcnn_class_ids.shape, dtype=mrcnn_class_ids.dtype)
1686 |                         batch_mrcnn_bbox = np.zeros(
1687 |                             (batch_size,) + mrcnn_bbox.shape, dtype=mrcnn_bbox.dtype)
1688 |                         batch_mrcnn_mask = np.zeros(
1689 |                             (batch_size,) + mrcnn_mask.shape, dtype=mrcnn_mask.dtype)
1690 | 
1691 |             # If more instances than fits in the array, sub-sample from them.
1692 |             if gt_boxes.shape[0] > config.MAX_GT_INSTANCES:
1693 |                 ids = np.random.choice(
1694 |                     np.arange(gt_boxes.shape[0]), config.MAX_GT_INSTANCES, replace=False)
1695 |                 gt_class_ids = gt_class_ids[ids]
1696 |                 gt_boxes = gt_boxes[ids]
1697 |                 gt_masks = gt_masks[:, :, ids]
1698 | 
1699 |             # Add to batch
1700 |             batch_image_meta[b] = image_meta
1701 |             batch_rpn_match[b] = rpn_match[:, np.newaxis]
1702 |             batch_rpn_bbox[b] = rpn_bbox
1703 |             batch_images[b] = mold_image(image.astype(np.float32), config)
1704 |             batch_gt_class_ids[b, :gt_class_ids.shape[0]] = gt_class_ids
1705 |             batch_gt_boxes[b, :gt_boxes.shape[0]] = gt_boxes
1706 |             batch_gt_masks[b, :, :, :gt_masks.shape[-1]] = gt_masks
1707 |             if random_rois:
1708 |                 batch_rpn_rois[b] = rpn_rois
1709 |                 if detection_targets:
1710 |                     batch_rois[b] = rois
1711 |                     batch_mrcnn_class_ids[b] = mrcnn_class_ids
1712 |                     batch_mrcnn_bbox[b] = mrcnn_bbox
1713 |                     batch_mrcnn_mask[b] = mrcnn_mask
1714 |             b += 1
1715 | 
1716 |             # Batch full?
1717 |             if b >= batch_size:
1718 |                 inputs = [batch_images, batch_image_meta, batch_rpn_match, batch_rpn_bbox,
1719 |                           batch_gt_class_ids, batch_gt_boxes, batch_gt_masks]
1720 |                 outputs = []
1721 | 
1722 |                 if random_rois:
1723 |                     inputs.extend([batch_rpn_rois])
1724 |                     if detection_targets:
1725 |                         inputs.extend([batch_rois])
1726 |                         # Keras requires that output and targets have the same number of dimensions
1727 |                         batch_mrcnn_class_ids = np.expand_dims(
1728 |                             batch_mrcnn_class_ids, -1)
1729 |                         outputs.extend(
1730 |                             [batch_mrcnn_class_ids, batch_mrcnn_bbox, batch_mrcnn_mask])
1731 | 
1732 |                 yield inputs, outputs
1733 | 
1734 |                 # start a new batch
1735 |                 b = 0
1736 |         except (GeneratorExit, KeyboardInterrupt):
1737 |             raise
1738 |         except:
1739 |             # Log it and skip the image
1740 |             logging.exception("Error processing image {}".format(
1741 |                 dataset.image_info[image_id]))
1742 |             error_count += 1
1743 |             if error_count > 5:
1744 |                 raise
1745 | 
1746 | 
1747 | ############################################################
1748 | #  MaskRCNN Class
1749 | ############################################################
1750 | 
1751 | class MaskRCNN():
1752 |     """Encapsulates the Mask RCNN model functionality.
1753 | 
1754 |     The actual Keras model is in the keras_model property.
1755 |     """
1756 | 
1757 |     def __init__(self, mode, config, model_dir):
1758 |         """
1759 |         mode: Either "training" or "inference"
1760 |         config: A Sub-class of the Config class
1761 |         model_dir: Directory to save training logs and trained weights
1762 |         """
1763 |         assert mode in ['training', 'inference']
1764 |         self.mode = mode
1765 |         self.config = config
1766 |         self.model_dir = model_dir
1767 |         self.set_log_dir()
1768 |         self.keras_model = self.build(mode=mode, config=config)
1769 | 
1770 |     def build(self, mode, config):
1771 |         """Build Mask R-CNN architecture.
1772 |             input_shape: The shape of the input image.
1773 |             mode: Either "training" or "inference". The inputs and
1774 |                 outputs of the model differ accordingly.
1775 |         """
1776 |         assert mode in ['training', 'inference']
1777 | 
1778 |         # Image size must be dividable by 2 multiple times
1779 |         h, w = config.IMAGE_SHAPE[:2]
1780 |         if h / 2**6 != int(h / 2**6) or w / 2**6 != int(w / 2**6):
1781 |             raise Exception("Image size must be dividable by 2 at least 6 times "
1782 |                             "to avoid fractions when downscaling and upscaling."
1783 |                             "For example, use 256, 320, 384, 448, 512, ... etc. ")
1784 | 
1785 |         # Inputs
1786 |         input_image = KL.Input(
1787 |             shape=config.IMAGE_SHAPE.tolist(), name="input_image")
1788 |         input_image_meta = KL.Input(shape=[None], name="input_image_meta")
1789 |         if mode == "training":
1790 |             # RPN GT
1791 |             input_rpn_match = KL.Input(
1792 |                 shape=[None, 1], name="input_rpn_match", dtype=tf.int32)
1793 |             input_rpn_bbox = KL.Input(
1794 |                 shape=[None, 4], name="input_rpn_bbox", dtype=tf.float32)
1795 | 
1796 |             # Detection GT (class IDs, bounding boxes, and masks)
1797 |             # 1. GT Class IDs (zero padded)
1798 |             input_gt_class_ids = KL.Input(
1799 |                 shape=[None], name="input_gt_class_ids", dtype=tf.int32)
1800 |             # 2. GT Boxes in pixels (zero padded)
1801 |             # [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in image coordinates
1802 |             input_gt_boxes = KL.Input(
1803 |                 shape=[None, 4], name="input_gt_boxes", dtype=tf.float32)
1804 |             # Normalize coordinates
1805 |             h, w = K.shape(input_image)[1], K.shape(input_image)[2]
1806 |             image_scale = K.cast(K.stack([h, w, h, w], axis=0), tf.float32)
1807 |             gt_boxes = KL.Lambda(lambda x: x / image_scale)(input_gt_boxes)
1808 |             # 3. GT Masks (zero padded)
1809 |             # [batch, height, width, MAX_GT_INSTANCES]
1810 |             if config.USE_MINI_MASK:
1811 |                 input_gt_masks = KL.Input(
1812 |                     shape=[config.MINI_MASK_SHAPE[0],
1813 |                            config.MINI_MASK_SHAPE[1], None],
1814 |                     name="input_gt_masks", dtype=bool)
1815 |             else:
1816 |                 input_gt_masks = KL.Input(
1817 |                     shape=[config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1], None],
1818 |                     name="input_gt_masks", dtype=bool)
1819 | 
1820 |         # Build the shared convolutional layers.
1821 |         # Bottom-up Layers
1822 |         # Returns a list of the last layers of each stage, 5 in total.
1823 |         # Don't create the thead (stage 5), so we pick the 4th item in the list.
1824 |         _, C2, C3, C4, C5 = resnet_graph(input_image, config.BACKBONE, stage5=True)
1825 |         # Top-down Layers
1826 |         # TODO: add assert to varify feature map sizes match what's in config
1827 |         P5 = KL.Conv2D(256, (1, 1), name='fpn_c5p5')(C5)
1828 |         P4 = KL.Add(name="fpn_p4add")([
1829 |             KL.UpSampling2D(size=(2, 2), name="fpn_p5upsampled")(P5),
1830 |             KL.Conv2D(256, (1, 1), name='fpn_c4p4')(C4)])
1831 |         P3 = KL.Add(name="fpn_p3add")([
1832 |             KL.UpSampling2D(size=(2, 2), name="fpn_p4upsampled")(P4),
1833 |             KL.Conv2D(256, (1, 1), name='fpn_c3p3')(C3)])
1834 |         P2 = KL.Add(name="fpn_p2add")([
1835 |             KL.UpSampling2D(size=(2, 2), name="fpn_p3upsampled")(P3),
1836 |             KL.Conv2D(256, (1, 1), name='fpn_c2p2')(C2)])
1837 |         # Attach 3x3 conv to all P layers to get the final feature maps.
1838 |         P2 = KL.Conv2D(256, (3, 3), padding="SAME", name="fpn_p2")(P2)
1839 |         P3 = KL.Conv2D(256, (3, 3), padding="SAME", name="fpn_p3")(P3)
1840 |         P4 = KL.Conv2D(256, (3, 3), padding="SAME", name="fpn_p4")(P4)
1841 |         P5 = KL.Conv2D(256, (3, 3), padding="SAME", name="fpn_p5")(P5)
1842 |         # P6 is used for the 5th anchor scale in RPN. Generated by
1843 |         # subsampling from P5 with stride of 2.
1844 |         P6 = KL.MaxPooling2D(pool_size=(1, 1), strides=2, name="fpn_p6")(P5)
1845 | 
1846 |         # Note that P6 is used in RPN, but not in the classifier heads.
1847 |         rpn_feature_maps = [P2, P3, P4, P5, P6]
1848 |         mrcnn_feature_maps = [P2, P3, P4, P5]
1849 | 
1850 |         # Generate Anchors
1851 |         self.anchors = utils.generate_pyramid_anchors(config.RPN_ANCHOR_SCALES,
1852 |                                                       config.RPN_ANCHOR_RATIOS,
1853 |                                                       config.BACKBONE_SHAPES,
1854 |                                                       config.BACKBONE_STRIDES,
1855 |                                                       config.RPN_ANCHOR_STRIDE)
1856 | 
1857 |         # RPN Model
1858 |         rpn = build_rpn_model(config.RPN_ANCHOR_STRIDE,
1859 |                               len(config.RPN_ANCHOR_RATIOS), 256)
1860 |         # Loop through pyramid layers
1861 |         layer_outputs = []  # list of lists
1862 |         for p in rpn_feature_maps:
1863 |             layer_outputs.append(rpn([p]))
1864 |         # Concatenate layer outputs
1865 |         # Convert from list of lists of level outputs to list of lists
1866 |         # of outputs across levels.
1867 |         # e.g. [[a1, b1, c1], [a2, b2, c2]] => [[a1, a2], [b1, b2], [c1, c2]]
1868 |         output_names = ["rpn_class_logits", "rpn_class", "rpn_bbox"]
1869 |         outputs = list(zip(*layer_outputs))
1870 |         outputs = [KL.Concatenate(axis=1, name=n)(list(o))
1871 |                    for o, n in zip(outputs, output_names)]
1872 | 
1873 |         rpn_class_logits, rpn_class, rpn_bbox = outputs
1874 | 
1875 |         # Generate proposals
1876 |         # Proposals are [batch, N, (y1, x1, y2, x2)] in normalized coordinates
1877 |         # and zero padded.
1878 |         proposal_count = config.POST_NMS_ROIS_TRAINING if mode == "training"\
1879 |             else config.POST_NMS_ROIS_INFERENCE
1880 |         rpn_rois = ProposalLayer(proposal_count=proposal_count,
1881 |                                  nms_threshold=config.RPN_NMS_THRESHOLD,
1882 |                                  name="ROI",
1883 |                                  anchors=self.anchors,
1884 |                                  config=config)([rpn_class, rpn_bbox])
1885 | 
1886 |         if mode == "training":
1887 |             # Class ID mask to mark class IDs supported by the dataset the image
1888 |             # came from.
1889 |             _, _, _, active_class_ids = KL.Lambda(lambda x: parse_image_meta_graph(x),
1890 |                                                   mask=[None, None, None, None])(input_image_meta)
1891 | 
1892 |             if not config.USE_RPN_ROIS:
1893 |                 # Ignore predicted ROIs and use ROIs provided as an input.
1894 |                 input_rois = KL.Input(shape=[config.POST_NMS_ROIS_TRAINING, 4],
1895 |                                       name="input_roi", dtype=np.int32)
1896 |                 # Normalize coordinates to 0-1 range.
1897 |                 target_rois = KL.Lambda(lambda x: K.cast(
1898 |                     x, tf.float32) / image_scale[:4])(input_rois)
1899 |             else:
1900 |                 target_rois = rpn_rois
1901 | 
1902 |             # Generate detection targets
1903 |             # Subsamples proposals and generates target outputs for training
1904 |             # Note that proposal class IDs, gt_boxes, and gt_masks are zero
1905 |             # padded. Equally, returned rois and targets are zero padded.
1906 |             rois, target_class_ids, target_bbox, target_mask =\
1907 |                 DetectionTargetLayer(config, name="proposal_targets")([
1908 |                     target_rois, input_gt_class_ids, gt_boxes, input_gt_masks])
1909 | 
1910 |             # Network Heads
1911 |             # TODO: verify that this handles zero padded ROIs
1912 |             mrcnn_class_logits, mrcnn_class, mrcnn_bbox =\
1913 |                 fpn_classifier_graph(rois, mrcnn_feature_maps, config.IMAGE_SHAPE,
1914 |                                      config.POOL_SIZE, config.NUM_CLASSES)
1915 | 
1916 |             mrcnn_mask = build_fpn_mask_graph(rois, mrcnn_feature_maps,
1917 |                                               config.IMAGE_SHAPE,
1918 |                                               config.MASK_POOL_SIZE,
1919 |                                               config.NUM_CLASSES)
1920 | 
1921 |             # TODO: clean up (use tf.identify if necessary)
1922 |             output_rois = KL.Lambda(lambda x: x * 1, name="output_rois")(rois)
1923 | 
1924 |             # Losses
1925 |             rpn_class_loss = KL.Lambda(lambda x: rpn_class_loss_graph(*x), name="rpn_class_loss")(
1926 |                 [input_rpn_match, rpn_class_logits])
1927 |             rpn_bbox_loss = KL.Lambda(lambda x: rpn_bbox_loss_graph(config, *x), name="rpn_bbox_loss")(
1928 |                 [input_rpn_bbox, input_rpn_match, rpn_bbox])
1929 |             class_loss = KL.Lambda(lambda x: mrcnn_class_loss_graph(*x), name="mrcnn_class_loss")(
1930 |                 [target_class_ids, mrcnn_class_logits, active_class_ids])
1931 |             bbox_loss = KL.Lambda(lambda x: mrcnn_bbox_loss_graph(*x), name="mrcnn_bbox_loss")(
1932 |                 [target_bbox, target_class_ids, mrcnn_bbox])
1933 |             mask_loss = KL.Lambda(lambda x: mrcnn_mask_loss_graph(*x), name="mrcnn_mask_loss")(
1934 |                 [target_mask, target_class_ids, mrcnn_mask])
1935 | 
1936 |             # Model
1937 |             inputs = [input_image, input_image_meta,
1938 |                       input_rpn_match, input_rpn_bbox, input_gt_class_ids, input_gt_boxes, input_gt_masks]
1939 |             if not config.USE_RPN_ROIS:
1940 |                 inputs.append(input_rois)
1941 |             outputs = [rpn_class_logits, rpn_class, rpn_bbox,
1942 |                        mrcnn_class_logits, mrcnn_class, mrcnn_bbox, mrcnn_mask,
1943 |                        rpn_rois, output_rois,
1944 |                        rpn_class_loss, rpn_bbox_loss, class_loss, bbox_loss, mask_loss]
1945 |             model = KM.Model(inputs, outputs, name='mask_rcnn')
1946 |         else:
1947 |             # Network Heads
1948 |             # Proposal classifier and BBox regressor heads
1949 |             mrcnn_class_logits, mrcnn_class, mrcnn_bbox =\
1950 |                 fpn_classifier_graph(rpn_rois, mrcnn_feature_maps, config.IMAGE_SHAPE,
1951 |                                      config.POOL_SIZE, config.NUM_CLASSES)
1952 | 
1953 |             # Detections
1954 |             # output is [batch, num_detections, (y1, x1, y2, x2, class_id, score)] in image coordinates
1955 |             detections = DetectionLayer(config, name="mrcnn_detection")(
1956 |                 [rpn_rois, mrcnn_class, mrcnn_bbox, input_image_meta])
1957 | 
1958 |             # Convert boxes to normalized coordinates
1959 |             # TODO: let DetectionLayer return normalized coordinates to avoid
1960 |             #       unnecessary conversions
1961 |             h, w = config.IMAGE_SHAPE[:2]
1962 |             detection_boxes = KL.Lambda(
1963 |                 lambda x: x[..., :4] / np.array([h, w, h, w]))(detections)
1964 | 
1965 |             # Create masks for detections
1966 |             mrcnn_mask = build_fpn_mask_graph(detection_boxes, mrcnn_feature_maps,
1967 |                                               config.IMAGE_SHAPE,
1968 |                                               config.MASK_POOL_SIZE,
1969 |                                               config.NUM_CLASSES)
1970 | 
1971 |             model = KM.Model([input_image, input_image_meta],
1972 |                              [detections, mrcnn_class, mrcnn_bbox,
1973 |                                  mrcnn_mask, rpn_rois, rpn_class, rpn_bbox],
1974 |                              name='mask_rcnn')
1975 | 
1976 |         # Add multi-GPU support.
1977 |         if config.GPU_COUNT > 1:
1978 |             from parallel_model import ParallelModel
1979 |             model = ParallelModel(model, config.GPU_COUNT)
1980 | 
1981 |         return model
1982 | 
1983 |     def find_last(self):
1984 |         """Finds the last checkpoint file of the last trained model in the
1985 |         model directory.
1986 |         Returns:
1987 |             log_dir: The directory where events and weights are saved
1988 |             checkpoint_path: the path to the last checkpoint file
1989 |         """
1990 |         # Get directory names. Each directory corresponds to a model
1991 |         dir_names = next(os.walk(self.model_dir))[1]
1992 |         key = self.config.NAME.lower()
1993 |         dir_names = filter(lambda f: f.startswith(key), dir_names)
1994 |         dir_names = sorted(dir_names)
1995 |         if not dir_names:
1996 |             return None, None
1997 |         # Pick last directory
1998 |         dir_name = os.path.join(self.model_dir, dir_names[-1])
1999 |         # Find the last checkpoint
2000 |         checkpoints = next(os.walk(dir_name))[2]
2001 |         checkpoints = filter(lambda f: f.startswith("mask_rcnn"), checkpoints)
2002 |         checkpoints = sorted(checkpoints)
2003 |         if not checkpoints:
2004 |             return dir_name, None
2005 |         checkpoint = os.path.join(dir_name, checkpoints[-1])
2006 |         return dir_name, checkpoint
2007 | 
2008 |     def load_weights(self, filepath, by_name=False, exclude=None):
2009 |         """Modified version of the correspoding Keras function with
2010 |         the addition of multi-GPU support and the ability to exclude
2011 |         some layers from loading.
2012 |         exlude: list of layer names to excluce
2013 |         """
2014 |         import h5py
2015 |         from keras.engine import topology
2016 | 
2017 |         if exclude:
2018 |             by_name = True
2019 | 
2020 |         if h5py is None:
2021 |             raise ImportError('`load_weights` requires h5py.')
2022 |         f = h5py.File(filepath, mode='r')
2023 |         if 'layer_names' not in f.attrs and 'model_weights' in f:
2024 |             f = f['model_weights']
2025 | 
2026 |         # In multi-GPU training, we wrap the model. Get layers
2027 |         # of the inner model because they have the weights.
2028 |         keras_model = self.keras_model
2029 |         layers = keras_model.inner_model.layers if hasattr(keras_model, "inner_model")\
2030 |             else keras_model.layers
2031 | 
2032 |         # Exclude some layers
2033 |         if exclude:
2034 |             layers = filter(lambda l: l.name not in exclude, layers)
2035 | 
2036 |         if by_name:
2037 |             topology.load_weights_from_hdf5_group_by_name(f, layers)
2038 |         else:
2039 |             topology.load_weights_from_hdf5_group(f, layers)
2040 |         if hasattr(f, 'close'):
2041 |             f.close()
2042 | 
2043 |         # Update the log directory
2044 |         self.set_log_dir(filepath)
2045 | 
2046 |     def get_imagenet_weights(self):
2047 |         """Downloads ImageNet trained weights from Keras.
2048 |         Returns path to weights file.
2049 |         """
2050 |         from keras.utils.data_utils import get_file
2051 |         TF_WEIGHTS_PATH_NO_TOP = 'https://github.com/fchollet/deep-learning-models/'\
2052 |                                  'releases/download/v0.2/'\
2053 |                                  'resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5'
2054 |         weights_path = get_file('resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5',
2055 |                                 TF_WEIGHTS_PATH_NO_TOP,
2056 |                                 cache_subdir='models',
2057 |                                 md5_hash='a268eb855778b3df3c7506639542a6af')
2058 |         return weights_path
2059 | 
2060 |     def compile(self, learning_rate, momentum):
2061 |         """Gets the model ready for training. Adds losses, regularization, and
2062 |         metrics. Then calls the Keras compile() function.
2063 |         """
2064 |         # Optimizer object
2065 |         optimizer = keras.optimizers.SGD(lr=learning_rate, momentum=momentum,
2066 |                                          clipnorm=5.0)
2067 |         # Add Losses
2068 |         # First, clear previously set losses to avoid duplication
2069 |         self.keras_model._losses = []
2070 |         self.keras_model._per_input_losses = {}
2071 |         loss_names = ["rpn_class_loss", "rpn_bbox_loss",
2072 |                       "mrcnn_class_loss", "mrcnn_bbox_loss", "mrcnn_mask_loss"]
2073 |         for name in loss_names:
2074 |             layer = self.keras_model.get_layer(name)
2075 |             if layer.output in self.keras_model.losses:
2076 |                 continue
2077 |             self.keras_model.add_loss(
2078 |                 tf.reduce_mean(layer.output, keep_dims=True))
2079 | 
2080 |         # Add L2 Regularization
2081 |         # Skip gamma and beta weights of batch normalization layers.
2082 |         reg_losses = [keras.regularizers.l2(self.config.WEIGHT_DECAY)(w) / tf.cast(tf.size(w), tf.float32)
2083 |                       for w in self.keras_model.trainable_weights
2084 |                       if 'gamma' not in w.name and 'beta' not in w.name]
2085 |         self.keras_model.add_loss(tf.add_n(reg_losses))
2086 | 
2087 |         # Compile
2088 |         self.keras_model.compile(optimizer=optimizer, loss=[
2089 |                                  None] * len(self.keras_model.outputs))
2090 | 
2091 |         # Add metrics for losses
2092 |         for name in loss_names:
2093 |             if name in self.keras_model.metrics_names:
2094 |                 continue
2095 |             layer = self.keras_model.get_layer(name)
2096 |             self.keras_model.metrics_names.append(name)
2097 |             self.keras_model.metrics_tensors.append(tf.reduce_mean(
2098 |                 layer.output, keep_dims=True))
2099 | 
2100 |     def set_trainable(self, layer_regex, keras_model=None, indent=0, verbose=1):
2101 |         """Sets model layers as trainable if their names match
2102 |         the given regular expression.
2103 |         """
2104 |         # Print message on the first call (but not on recursive calls)
2105 |         if verbose > 0 and keras_model is None:
2106 |             log("Selecting layers to train")
2107 | 
2108 |         keras_model = keras_model or self.keras_model
2109 | 
2110 |         # In multi-GPU training, we wrap the model. Get layers
2111 |         # of the inner model because they have the weights.
2112 |         layers = keras_model.inner_model.layers if hasattr(keras_model, "inner_model")\
2113 |             else keras_model.layers
2114 | 
2115 |         for layer in layers:
2116 |             # Is the layer a model?
2117 |             if layer.__class__.__name__ == 'Model':
2118 |                 print("In model: ", layer.name)
2119 |                 self.set_trainable(
2120 |                     layer_regex, keras_model=layer, indent=indent + 4)
2121 |                 continue
2122 | 
2123 |             if not layer.weights:
2124 |                 continue
2125 |             # Is it trainable?
2126 |             trainable = bool(re.fullmatch(layer_regex, layer.name))
2127 |             # Update layer. If layer is a container, update inner layer.
2128 |             if layer.__class__.__name__ == 'TimeDistributed':
2129 |                 layer.layer.trainable = trainable
2130 |             else:
2131 |                 layer.trainable = trainable
2132 |             # Print trainble layer names
2133 |             if trainable and verbose > 0:
2134 |                 log("{}{:20}   ({})".format(" " * indent, layer.name,
2135 |                                             layer.__class__.__name__))
2136 | 
2137 |     def set_log_dir(self, model_path=None):
2138 |         """Sets the model log directory and epoch counter.
2139 | 
2140 |         model_path: If None, or a format different from what this code uses
2141 |             then set a new log directory and start epochs from 0. Otherwise,
2142 |             extract the log directory and the epoch counter from the file
2143 |             name.
2144 |         """
2145 |         # Set date and epoch counter as if starting a new model
2146 |         self.epoch = 0
2147 |         now = datetime.datetime.now()
2148 | 
2149 |         # If we have a model path with date and epochs use them
2150 |         if model_path:
2151 |             # Continue from we left of. Get epoch and date from the file name
2152 |             # A sample model path might look like:
2153 |             # /path/to/logs/coco20171029T2315/mask_rcnn_coco_0001.h5
2154 |             regex = r".*/\w+(\d{4})(\d{2})(\d{2})T(\d{2})(\d{2})/mask\_rcnn\_\w+(\d{4})\.h5"
2155 |             m = re.match(regex, model_path)
2156 |             if m:
2157 |                 now = datetime.datetime(int(m.group(1)), int(m.group(2)), int(m.group(3)),
2158 |                                         int(m.group(4)), int(m.group(5)))
2159 |                 self.epoch = int(m.group(6)) + 1
2160 | 
2161 |         # Directory for training logs
2162 |         self.log_dir = os.path.join(self.model_dir, "{}{:%Y%m%dT%H%M}".format(
2163 |             self.config.NAME.lower(), now))
2164 | 
2165 |         # Path to save after each epoch. Include placeholders that get filled by Keras.
2166 |         self.checkpoint_path = os.path.join(self.log_dir, "mask_rcnn_{}_*epoch*.h5".format(
2167 |             self.config.NAME.lower()))
2168 |         self.checkpoint_path = self.checkpoint_path.replace(
2169 |             "*epoch*", "{epoch:04d}")
2170 | 
2171 |     def train(self, train_dataset, val_dataset, learning_rate, epochs, layers):
2172 |         """Train the model.
2173 |         train_dataset, val_dataset: Training and validation Dataset objects.
2174 |         learning_rate: The learning rate to train with
2175 |         epochs: Number of training epochs. Note that previous training epochs
2176 |                 are considered to be done alreay, so this actually determines
2177 |                 the epochs to train in total rather than in this particaular
2178 |                 call.
2179 |         layers: Allows selecting wich layers to train. It can be:
2180 |             - A regular expression to match layer names to train
2181 |             - One of these predefined values:
2182 |               heaads: The RPN, classifier and mask heads of the network
2183 |               all: All the layers
2184 |               3+: Train Resnet stage 3 and up
2185 |               4+: Train Resnet stage 4 and up
2186 |               5+: Train Resnet stage 5 and up
2187 |         """
2188 |         assert self.mode == "training", "Create model in training mode."
2189 | 
2190 |         # Pre-defined layer regular expressions
2191 |         layer_regex = {
2192 |             # all layers but the backbone
2193 |             "heads": r"(mrcnn\_.*)|(rpn\_.*)|(fpn\_.*)",
2194 |             # From a specific Resnet stage and up
2195 |             "3+": r"(res3.*)|(bn3.*)|(res4.*)|(bn4.*)|(res5.*)|(bn5.*)|(mrcnn\_.*)|(rpn\_.*)|(fpn\_.*)",
2196 |             "4+": r"(res4.*)|(bn4.*)|(res5.*)|(bn5.*)|(mrcnn\_.*)|(rpn\_.*)|(fpn\_.*)",
2197 |             "5+": r"(res5.*)|(bn5.*)|(mrcnn\_.*)|(rpn\_.*)|(fpn\_.*)",
2198 |             # All layers
2199 |             "all": ".*",
2200 |         }
2201 |         if layers in layer_regex.keys():
2202 |             layers = layer_regex[layers]
2203 | 
2204 |         # Data generators
2205 |         train_generator = data_generator(train_dataset, self.config, shuffle=True,
2206 |                                          batch_size=self.config.BATCH_SIZE)
2207 |         val_generator = data_generator(val_dataset, self.config, shuffle=True,
2208 |                                        batch_size=self.config.BATCH_SIZE,
2209 |                                        augment=False)
2210 | 
2211 |         # Callbacks
2212 |         callbacks = [
2213 |             keras.callbacks.TensorBoard(log_dir=self.log_dir,
2214 |                                         histogram_freq=0, write_graph=True, write_images=False),
2215 |             keras.callbacks.ModelCheckpoint(self.checkpoint_path,
2216 |                                             verbose=0, save_weights_only=True),
2217 |         ]
2218 | 
2219 |         # Train
2220 |         log("\nStarting at epoch {}. LR={}\n".format(self.epoch, learning_rate))
2221 |         log("Checkpoint Path: {}".format(self.checkpoint_path))
2222 |         self.set_trainable(layers)
2223 |         self.compile(learning_rate, self.config.LEARNING_MOMENTUM)
2224 | 
2225 |         # Work-around for Windows: Keras fails on Windows when using
2226 |         # multiprocessing workers. See discussion here:
2227 |         # https://github.com/matterport/Mask_RCNN/issues/13#issuecomment-353124009
2228 |         if os.name is 'nt':
2229 |             workers = 0
2230 |         else:
2231 |             workers = max(self.config.BATCH_SIZE // 2, 2)
2232 | 
2233 |         self.keras_model.fit_generator(
2234 |             train_generator,
2235 |             initial_epoch=self.epoch,
2236 |             epochs=epochs,
2237 |             steps_per_epoch=self.config.STEPS_PER_EPOCH,
2238 |             callbacks=callbacks,
2239 |             validation_data=next(val_generator),
2240 |             validation_steps=self.config.VALIDATION_STEPS,
2241 |             max_queue_size=100,
2242 |             workers=workers,
2243 |             use_multiprocessing=True,
2244 |         )
2245 |         self.epoch = max(self.epoch, epochs)
2246 | 
2247 |     def mold_inputs(self, images):
2248 |         """Takes a list of images and modifies them to the format expected
2249 |         as an input to the neural network.
2250 |         images: List of image matricies [height,width,depth]. Images can have
2251 |             different sizes.
2252 | 
2253 |         Returns 3 Numpy matricies:
2254 |         molded_images: [N, h, w, 3]. Images resized and normalized.
2255 |         image_metas: [N, length of meta data]. Details about each image.
2256 |         windows: [N, (y1, x1, y2, x2)]. The portion of the image that has the
2257 |             original image (padding excluded).
2258 |         """
2259 |         molded_images = []
2260 |         image_metas = []
2261 |         windows = []
2262 |         for image in images:
2263 |             # Resize image to fit the model expected size
2264 |             # TODO: move resizing to mold_image()
2265 |             molded_image, window, scale, padding = utils.resize_image(
2266 |                 image,
2267 |                 min_dim=self.config.IMAGE_MIN_DIM,
2268 |                 max_dim=self.config.IMAGE_MAX_DIM,
2269 |                 padding=self.config.IMAGE_PADDING)
2270 |             molded_image = mold_image(molded_image, self.config)
2271 |             # Build image_meta
2272 |             image_meta = compose_image_meta(
2273 |                 0, image.shape, window,
2274 |                 np.zeros([self.config.NUM_CLASSES], dtype=np.int32))
2275 |             # Append
2276 |             molded_images.append(molded_image)
2277 |             windows.append(window)
2278 |             image_metas.append(image_meta)
2279 |         # Pack into arrays
2280 |         molded_images = np.stack(molded_images)
2281 |         image_metas = np.stack(image_metas)
2282 |         windows = np.stack(windows)
2283 |         return molded_images, image_metas, windows
2284 | 
2285 |     def unmold_detections(self, detections, mrcnn_mask, image_shape, window):
2286 |         """Reformats the detections of one image from the format of the neural
2287 |         network output to a format suitable for use in the rest of the
2288 |         application.
2289 | 
2290 |         detections: [N, (y1, x1, y2, x2, class_id, score)]
2291 |         mrcnn_mask: [N, height, width, num_classes]
2292 |         image_shape: [height, width, depth] Original size of the image before resizing
2293 |         window: [y1, x1, y2, x2] Box in the image where the real image is
2294 |                 excluding the padding.
2295 | 
2296 |         Returns:
2297 |         boxes: [N, (y1, x1, y2, x2)] Bounding boxes in pixels
2298 |         class_ids: [N] Integer class IDs for each bounding box
2299 |         scores: [N] Float probability scores of the class_id
2300 |         masks: [height, width, num_instances] Instance masks
2301 |         """
2302 |         # How many detections do we have?
2303 |         # Detections array is padded with zeros. Find the first class_id == 0.
2304 |         zero_ix = np.where(detections[:, 4] == 0)[0]
2305 |         N = zero_ix[0] if zero_ix.shape[0] > 0 else detections.shape[0]
2306 | 
2307 |         # Extract boxes, class_ids, scores, and class-specific masks
2308 |         boxes = detections[:N, :4]
2309 |         class_ids = detections[:N, 4].astype(np.int32)
2310 |         scores = detections[:N, 5]
2311 |         masks = mrcnn_mask[np.arange(N), :, :, class_ids]
2312 | 
2313 |         # Compute scale and shift to translate coordinates to image domain.
2314 |         h_scale = image_shape[0] / (window[2] - window[0])
2315 |         w_scale = image_shape[1] / (window[3] - window[1])
2316 |         scale = min(h_scale, w_scale)
2317 |         shift = window[:2]  # y, x
2318 |         scales = np.array([scale, scale, scale, scale])
2319 |         shifts = np.array([shift[0], shift[1], shift[0], shift[1]])
2320 | 
2321 |         # Translate bounding boxes to image domain
2322 |         boxes = np.multiply(boxes - shifts, scales).astype(np.int32)
2323 | 
2324 |         # Filter out detections with zero area. Often only happens in early
2325 |         # stages of training when the network weights are still a bit random.
2326 |         exclude_ix = np.where(
2327 |             (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1]) <= 0)[0]
2328 |         if exclude_ix.shape[0] > 0:
2329 |             boxes = np.delete(boxes, exclude_ix, axis=0)
2330 |             class_ids = np.delete(class_ids, exclude_ix, axis=0)
2331 |             scores = np.delete(scores, exclude_ix, axis=0)
2332 |             masks = np.delete(masks, exclude_ix, axis=0)
2333 |             N = class_ids.shape[0]
2334 | 
2335 |         # Resize masks to original image size and set boundary threshold.
2336 |         full_masks = []
2337 |         for i in range(N):
2338 |             # Convert neural network mask to full size mask
2339 |             full_mask = utils.unmold_mask(masks[i], boxes[i], image_shape)
2340 |             full_masks.append(full_mask)
2341 |         full_masks = np.stack(full_masks, axis=-1)\
2342 |             if full_masks else np.empty((0,) + masks.shape[1:3])
2343 | 
2344 |         return boxes, class_ids, scores, full_masks
2345 | 
2346 |     def detect(self, images, verbose=0):
2347 |         """Runs the detection pipeline.
2348 | 
2349 |         images: List of images, potentially of different sizes.
2350 | 
2351 |         Returns a list of dicts, one dict per image. The dict contains:
2352 |         rois: [N, (y1, x1, y2, x2)] detection bounding boxes
2353 |         class_ids: [N] int class IDs
2354 |         scores: [N] float probability scores for the class IDs
2355 |         masks: [H, W, N] instance binary masks
2356 |         """
2357 |         assert self.mode == "inference", "Create model in inference mode."
2358 |         assert len(
2359 |             images) == self.config.BATCH_SIZE, "len(images) must be equal to BATCH_SIZE"
2360 | 
2361 |         if verbose:
2362 |             log("Processing {} images".format(len(images)))
2363 |             for image in images:
2364 |                 log("image", image)
2365 |         # Mold inputs to format expected by the neural network
2366 |         molded_images, image_metas, windows = self.mold_inputs(images)
2367 |         if verbose:
2368 |             log("molded_images", molded_images)
2369 |             log("image_metas", image_metas)
2370 |         # Run object detection
2371 |         detections, mrcnn_class, mrcnn_bbox, mrcnn_mask, \
2372 |             rois, rpn_class, rpn_bbox =\
2373 |             self.keras_model.predict([molded_images, image_metas], verbose=0)
2374 |         # Process detections
2375 |         results = []
2376 |         for i, image in enumerate(images):
2377 |             final_rois, final_class_ids, final_scores, final_masks =\
2378 |                 self.unmold_detections(detections[i], mrcnn_mask[i],
2379 |                                        image.shape, windows[i])
2380 |             results.append({
2381 |                 "rois": final_rois,
2382 |                 "class_ids": final_class_ids,
2383 |                 "scores": final_scores,
2384 |                 "masks": final_masks,
2385 |             })
2386 |         return results
2387 | 
2388 |     def ancestor(self, tensor, name, checked=None):
2389 |         """Finds the ancestor of a TF tensor in the computation graph.
2390 |         tensor: TensorFlow symbolic tensor.
2391 |         name: Name of ancestor tensor to find
2392 |         checked: For internal use. A list of tensors that were already
2393 |                  searched to avoid loops in traversing the graph.
2394 |         """
2395 |         checked = checked if checked is not None else []
2396 |         # Put a limit on how deep we go to avoid very long loops
2397 |         if len(checked) > 500:
2398 |             return None
2399 |         # Convert name to a regex and allow matching a number prefix
2400 |         # because Keras adds them automatically
2401 |         if isinstance(name, str):
2402 |             name = re.compile(name.replace("/", r"(\_\d+)*/"))
2403 | 
2404 |         parents = tensor.op.inputs
2405 |         for p in parents:
2406 |             if p in checked:
2407 |                 continue
2408 |             if bool(re.fullmatch(name, p.name)):
2409 |                 return p
2410 |             checked.append(p)
2411 |             a = self.ancestor(p, name, checked)
2412 |             if a is not None:
2413 |                 return a
2414 |         return None
2415 | 
2416 |     def find_trainable_layer(self, layer):
2417 |         """If a layer is encapsulated by another layer, this function
2418 |         digs through the encapsulation and returns the layer that holds
2419 |         the weights.
2420 |         """
2421 |         if layer.__class__.__name__ == 'TimeDistributed':
2422 |             return self.find_trainable_layer(layer.layer)
2423 |         return layer
2424 | 
2425 |     def get_trainable_layers(self):
2426 |         """Returns a list of layers that have weights."""
2427 |         layers = []
2428 |         # Loop through all layers
2429 |         for l in self.keras_model.layers:
2430 |             # If layer is a wrapper, find inner trainable layer
2431 |             l = self.find_trainable_layer(l)
2432 |             # Include layer if it has weights
2433 |             if l.get_weights():
2434 |                 layers.append(l)
2435 |         return layers
2436 | 
2437 |     def run_graph(self, images, outputs):
2438 |         """Runs a sub-set of the computation graph that computes the given
2439 |         outputs.
2440 | 
2441 |         outputs: List of tuples (name, tensor) to compute. The tensors are
2442 |             symbolic TensorFlow tensors and the names are for easy tracking.
2443 | 
2444 |         Returns an ordered dict of results. Keys are the names received in the
2445 |         input and values are Numpy arrays.
2446 |         """
2447 |         model = self.keras_model
2448 | 
2449 |         # Organize desired outputs into an ordered dict
2450 |         outputs = OrderedDict(outputs)
2451 |         for o in outputs.values():
2452 |             assert o is not None
2453 | 
2454 |         # Build a Keras function to run parts of the computation graph
2455 |         inputs = model.inputs
2456 |         if model.uses_learning_phase and not isinstance(K.learning_phase(), int):
2457 |             inputs += [K.learning_phase()]
2458 |         kf = K.function(model.inputs, list(outputs.values()))
2459 | 
2460 |         # Run inference
2461 |         molded_images, image_metas, windows = self.mold_inputs(images)
2462 |         # TODO: support training mode?
2463 |         # if TEST_MODE == "training":
2464 |         #     model_in = [molded_images, image_metas,
2465 |         #                 target_rpn_match, target_rpn_bbox,
2466 |         #                 gt_boxes, gt_masks]
2467 |         #     if not config.USE_RPN_ROIS:
2468 |         #         model_in.append(target_rois)
2469 |         #     if model.uses_learning_phase and not isinstance(K.learning_phase(), int):
2470 |         #         model_in.append(1.)
2471 |         #     outputs_np = kf(model_in)
2472 |         # else:
2473 | 
2474 |         model_in = [molded_images, image_metas]
2475 |         if model.uses_learning_phase and not isinstance(K.learning_phase(), int):
2476 |             model_in.append(0.)
2477 |         outputs_np = kf(model_in)
2478 | 
2479 |         # Pack the generated Numpy arrays into a a dict and log the results.
2480 |         outputs_np = OrderedDict([(k, v)
2481 |                                   for k, v in zip(outputs.keys(), outputs_np)])
2482 |         for k, v in outputs_np.items():
2483 |             log(k, v)
2484 |         return outputs_np
2485 | 
2486 | 
2487 | ############################################################
2488 | #  Data Formatting
2489 | ############################################################
2490 | 
2491 | def compose_image_meta(image_id, image_shape, window, active_class_ids):
2492 |     """Takes attributes of an image and puts them in one 1D array.
2493 | 
2494 |     image_id: An int ID of the image. Useful for debugging.
2495 |     image_shape: [height, width, channels]
2496 |     window: (y1, x1, y2, x2) in pixels. The area of the image where the real
2497 |             image is (excluding the padding)
2498 |     active_class_ids: List of class_ids available in the dataset from which
2499 |         the image came. Useful if training on images from multiple datasets
2500 |         where not all classes are present in all datasets.
2501 |     """
2502 |     meta = np.array(
2503 |         [image_id] +            # size=1
2504 |         list(image_shape) +     # size=3
2505 |         list(window) +          # size=4 (y1, x1, y2, x2) in image cooredinates
2506 |         list(active_class_ids)  # size=num_classes
2507 |     )
2508 |     return meta
2509 | 
2510 | 
2511 | def parse_image_meta_graph(meta):
2512 |     """Parses a tensor that contains image attributes to its components.
2513 |     See compose_image_meta() for more details.
2514 | 
2515 |     meta: [batch, meta length] where meta length depends on NUM_CLASSES
2516 |     """
2517 |     image_id = meta[:, 0]
2518 |     image_shape = meta[:, 1:4]
2519 |     window = meta[:, 4:8]   # (y1, x1, y2, x2) window of image in in pixels
2520 |     active_class_ids = meta[:, 8:]
2521 |     return [image_id, image_shape, window, active_class_ids]
2522 | 
2523 | 
2524 | def mold_image(images, config):
2525 |     """Takes RGB images with 0-255 values and subtraces
2526 |     the mean pixel and converts it to float. Expects image
2527 |     colors in RGB order.
2528 |     """
2529 |     return images.astype(np.float32) - config.MEAN_PIXEL
2530 | 
2531 | 
2532 | def unmold_image(normalized_images, config):
2533 |     """Takes a image normalized with mold() and returns the original."""
2534 |     return (normalized_images + config.MEAN_PIXEL).astype(np.uint8)
2535 | 
2536 | 
2537 | ############################################################
2538 | #  Miscellenous Graph Functions
2539 | ############################################################
2540 | 
2541 | def trim_zeros_graph(boxes, name=None):
2542 |     """Often boxes are represented with matricies of shape [N, 4] and
2543 |     are padded with zeros. This removes zero boxes.
2544 | 
2545 |     boxes: [N, 4] matrix of boxes.
2546 |     non_zeros: [N] a 1D boolean mask identifying the rows to keep
2547 |     """
2548 |     non_zeros = tf.cast(tf.reduce_sum(tf.abs(boxes), axis=1), tf.bool)
2549 |     boxes = tf.boolean_mask(boxes, non_zeros, name=name)
2550 |     return boxes, non_zeros
2551 | 
2552 | 
2553 | def batch_pack_graph(x, counts, num_rows):
2554 |     """Picks different number of values from each row
2555 |     in x depending on the values in counts.
2556 |     """
2557 |     outputs = []
2558 |     for i in range(num_rows):
2559 |         outputs.append(x[i, :counts[i]])
2560 |     return tf.concat(outputs, axis=0)
2561 | 


--------------------------------------------------------------------------------
/person_blocker.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import sys
  3 | import argparse
  4 | import numpy as np
  5 | import coco
  6 | import utils
  7 | import model as modellib
  8 | from classes import get_class_names, InferenceConfig
  9 | from ast import literal_eval as make_tuple
 10 | import imageio
 11 | import visualize
 12 | 
 13 | # Creates a color layer and adds Gaussian noise.
 14 | # For each pixel, the same noise value is added to each channel
 15 | # to mitigate hue shfting.
 16 | 
 17 | 
 18 | def create_noisy_color(image, color):
 19 |     color_mask = np.full(shape=(image.shape[0], image.shape[1], 3),
 20 |                          fill_value=color)
 21 | 
 22 |     noise = np.random.normal(0, 25, (image.shape[0], image.shape[1]))
 23 |     noise = np.repeat(np.expand_dims(noise, axis=2), repeats=3, axis=2)
 24 |     mask_noise = np.clip(color_mask + noise, 0., 255.)
 25 |     return mask_noise
 26 | 
 27 | 
 28 | # Helper function to allow both RGB triplet + hex CL input
 29 | 
 30 | def string_to_rgb_triplet(triplet):
 31 | 
 32 |     if '#' in triplet:
 33 |         # http://stackoverflow.com/a/4296727
 34 |         triplet = triplet.lstrip('#')
 35 |         _NUMERALS = '0123456789abcdefABCDEF'
 36 |         _HEXDEC = {v: int(v, 16)
 37 |                    for v in (x + y for x in _NUMERALS for y in _NUMERALS)}
 38 |         return (_HEXDEC[triplet[0:2]], _HEXDEC[triplet[2:4]],
 39 |                 _HEXDEC[triplet[4:6]])
 40 | 
 41 |     else:
 42 |         # https://stackoverflow.com/a/9763133
 43 |         triplet = make_tuple(triplet)
 44 |         return triplet
 45 | 
 46 | 
 47 | def person_blocker(args):
 48 | 
 49 |     # Required to load model, but otherwise unused
 50 |     ROOT_DIR = os.getcwd()
 51 |     COCO_MODEL_PATH = args.model or os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
 52 | 
 53 |     MODEL_DIR = os.path.join(ROOT_DIR, "logs")  # Required to load model
 54 | 
 55 |     if not os.path.exists(COCO_MODEL_PATH):
 56 |         utils.download_trained_weights(COCO_MODEL_PATH)
 57 | 
 58 |     # Load model and config
 59 |     config = InferenceConfig()
 60 |     model = modellib.MaskRCNN(mode="inference",
 61 |                               model_dir=MODEL_DIR, config=config)
 62 |     model.load_weights(COCO_MODEL_PATH, by_name=True)
 63 | 
 64 |     image = imageio.imread(args.image)
 65 | 
 66 |     # Create masks for all objects
 67 |     results = model.detect([image], verbose=0)
 68 |     r = results[0]
 69 | 
 70 |     if args.labeled:
 71 |         position_ids = ['[{}]'.format(x)
 72 |                         for x in range(r['class_ids'].shape[0])]
 73 |         visualize.display_instances(image, r['rois'],
 74 |                                     r['masks'], r['class_ids'],
 75 |                                     get_class_names(), position_ids)
 76 |         sys.exit()
 77 | 
 78 |     # Filter masks to only the selected objects
 79 |     objects = np.array(args.objects)
 80 | 
 81 |     # Object IDs:
 82 |     if np.all(np.chararray.isnumeric(objects)):
 83 |         object_indices = objects.astype(int)
 84 |     # Types of objects:
 85 |     else:
 86 |         selected_class_ids = np.flatnonzero(np.in1d(get_class_names(),
 87 |                                                     objects))
 88 |         object_indices = np.flatnonzero(
 89 |             np.in1d(r['class_ids'], selected_class_ids))
 90 | 
 91 |     mask_selected = np.sum(r['masks'][:, :, object_indices], axis=2)
 92 | 
 93 |     # Replace object masks with noise
 94 |     mask_color = string_to_rgb_triplet(args.color)
 95 |     image_masked = image.copy()
 96 |     noisy_color = create_noisy_color(image, mask_color)
 97 |     image_masked[mask_selected > 0] = noisy_color[mask_selected > 0]
 98 | 
 99 |     imageio.imwrite('person_blocked.png', image_masked)
100 | 
101 |     # Create GIF. The noise will be random for each frame,
102 |     # which creates a "static" effect
103 | 
104 |     images = [image_masked]
105 |     num_images = 10   # should be a divisor of 30
106 | 
107 |     for _ in range(num_images - 1):
108 |         new_image = image.copy()
109 |         noisy_color = create_noisy_color(image, mask_color)
110 |         new_image[mask_selected > 0] = noisy_color[mask_selected > 0]
111 |         images.append(new_image)
112 | 
113 |     imageio.mimsave('person_blocked.gif', images, fps=30., subrectangles=True)
114 | 
115 | 
116 | if __name__ == '__main__':
117 |     parser = argparse.ArgumentParser(
118 |         description='Person Blocker - Automatically "block" people '
119 |                     'in images using a neural network.')
120 |     parser.add_argument('-i', '--image',  help='Image file name.',
121 |                         required=False)
122 |     parser.add_argument(
123 |         '-m', '--model',  help='path to COCO model', default=None)
124 |     parser.add_argument('-o',
125 |                         '--objects', nargs='+',
126 |                         help='object(s)/object ID(s) to block. ' +
127 |                         'Use the -names flag to print a list of ' +
128 |                         'valid objects',
129 |                         default='person')
130 |     parser.add_argument('-c',
131 |                         '--color', nargs='?', default='(255, 255, 255)',
132 |                         help='color of the "block"')
133 |     parser.add_argument('-l',
134 |                         '--labeled', dest='labeled',
135 |                         action='store_true',
136 |                         help='generate labeled image instead')
137 |     parser.add_argument('-n',
138 |                         '--names', dest='names',
139 |                         action='store_true',
140 |                         help='prints class names and exits.')
141 |     parser.set_defaults(labeled=False, names=False)
142 |     args = parser.parse_args()
143 | 
144 |     if args.names:
145 |         print(get_class_names())
146 |         sys.exit()
147 | 
148 |     person_blocker(args)
149 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | h5py
2 | imageio
3 | ipython
4 | keras
5 | scipy
6 | scikit-image
7 | tensorflow
8 | 


--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Mask R-CNN
  3 | Common utility functions and classes.
  4 | 
  5 | Copyright (c) 2017 Matterport, Inc.
  6 | Licensed under the MIT License (see LICENSE for details)
  7 | Written by Waleed Abdulla
  8 | """
  9 | 
 10 | import sys
 11 | import os
 12 | import math
 13 | import random
 14 | import numpy as np
 15 | import tensorflow as tf
 16 | import scipy.misc
 17 | import skimage.color
 18 | import skimage.io
 19 | import urllib.request
 20 | import shutil
 21 | 
 22 | # URL from which to download the latest COCO trained weights
 23 | COCO_MODEL_URL = "https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5"
 24 | 
 25 | 
 26 | ############################################################
 27 | #  Bounding Boxes
 28 | ############################################################
 29 | 
 30 | def extract_bboxes(mask):
 31 |     """Compute bounding boxes from masks.
 32 |     mask: [height, width, num_instances]. Mask pixels are either 1 or 0.
 33 | 
 34 |     Returns: bbox array [num_instances, (y1, x1, y2, x2)].
 35 |     """
 36 |     boxes = np.zeros([mask.shape[-1], 4], dtype=np.int32)
 37 |     for i in range(mask.shape[-1]):
 38 |         m = mask[:, :, i]
 39 |         # Bounding box.
 40 |         horizontal_indicies = np.where(np.any(m, axis=0))[0]
 41 |         vertical_indicies = np.where(np.any(m, axis=1))[0]
 42 |         if horizontal_indicies.shape[0]:
 43 |             x1, x2 = horizontal_indicies[[0, -1]]
 44 |             y1, y2 = vertical_indicies[[0, -1]]
 45 |             # x2 and y2 should not be part of the box. Increment by 1.
 46 |             x2 += 1
 47 |             y2 += 1
 48 |         else:
 49 |             # No mask for this instance. Might happen due to
 50 |             # resizing or cropping. Set bbox to zeros
 51 |             x1, x2, y1, y2 = 0, 0, 0, 0
 52 |         boxes[i] = np.array([y1, x1, y2, x2])
 53 |     return boxes.astype(np.int32)
 54 | 
 55 | 
 56 | def compute_iou(box, boxes, box_area, boxes_area):
 57 |     """Calculates IoU of the given box with the array of the given boxes.
 58 |     box: 1D vector [y1, x1, y2, x2]
 59 |     boxes: [boxes_count, (y1, x1, y2, x2)]
 60 |     box_area: float. the area of 'box'
 61 |     boxes_area: array of length boxes_count.
 62 | 
 63 |     Note: the areas are passed in rather than calculated here for
 64 |           efficency. Calculate once in the caller to avoid duplicate work.
 65 |     """
 66 |     # Calculate intersection areas
 67 |     y1 = np.maximum(box[0], boxes[:, 0])
 68 |     y2 = np.minimum(box[2], boxes[:, 2])
 69 |     x1 = np.maximum(box[1], boxes[:, 1])
 70 |     x2 = np.minimum(box[3], boxes[:, 3])
 71 |     intersection = np.maximum(x2 - x1, 0) * np.maximum(y2 - y1, 0)
 72 |     union = box_area + boxes_area[:] - intersection[:]
 73 |     iou = intersection / union
 74 |     return iou
 75 | 
 76 | 
 77 | def compute_overlaps(boxes1, boxes2):
 78 |     """Computes IoU overlaps between two sets of boxes.
 79 |     boxes1, boxes2: [N, (y1, x1, y2, x2)].
 80 | 
 81 |     For better performance, pass the largest set first and the smaller second.
 82 |     """
 83 |     # Areas of anchors and GT boxes
 84 |     area1 = (boxes1[:, 2] - boxes1[:, 0]) * (boxes1[:, 3] - boxes1[:, 1])
 85 |     area2 = (boxes2[:, 2] - boxes2[:, 0]) * (boxes2[:, 3] - boxes2[:, 1])
 86 | 
 87 |     # Compute overlaps to generate matrix [boxes1 count, boxes2 count]
 88 |     # Each cell contains the IoU value.
 89 |     overlaps = np.zeros((boxes1.shape[0], boxes2.shape[0]))
 90 |     for i in range(overlaps.shape[1]):
 91 |         box2 = boxes2[i]
 92 |         overlaps[:, i] = compute_iou(box2, boxes1, area2[i], area1)
 93 |     return overlaps
 94 | 
 95 | 
 96 | def compute_overlaps_masks(masks1, masks2):
 97 |     '''Computes IoU overlaps between two sets of masks.
 98 |     masks1, masks2: [Height, Width, instances]
 99 |     '''
100 |     # flatten masks
101 |     masks1 = np.reshape(masks1 > .5, (-1, masks1.shape[-1])).astype(np.float32)
102 |     masks2 = np.reshape(masks2 > .5, (-1, masks2.shape[-1])).astype(np.float32)
103 |     area1 = np.sum(masks1, axis=0)
104 |     area2 = np.sum(masks2, axis=0)
105 | 
106 |     # intersections and union
107 |     intersections = np.dot(masks1.T, masks2)
108 |     union = area1[:, None] + area2[None, :] - intersections
109 |     overlaps = intersections / union
110 | 
111 |     return overlaps
112 | 
113 | 
114 | def non_max_suppression(boxes, scores, threshold):
115 |     """Performs non-maximum supression and returns indicies of kept boxes.
116 |     boxes: [N, (y1, x1, y2, x2)]. Notice that (y2, x2) lays outside the box.
117 |     scores: 1-D array of box scores.
118 |     threshold: Float. IoU threshold to use for filtering.
119 |     """
120 |     assert boxes.shape[0] > 0
121 |     if boxes.dtype.kind != "f":
122 |         boxes = boxes.astype(np.float32)
123 | 
124 |     # Compute box areas
125 |     y1 = boxes[:, 0]
126 |     x1 = boxes[:, 1]
127 |     y2 = boxes[:, 2]
128 |     x2 = boxes[:, 3]
129 |     area = (y2 - y1) * (x2 - x1)
130 | 
131 |     # Get indicies of boxes sorted by scores (highest first)
132 |     ixs = scores.argsort()[::-1]
133 | 
134 |     pick = []
135 |     while len(ixs) > 0:
136 |         # Pick top box and add its index to the list
137 |         i = ixs[0]
138 |         pick.append(i)
139 |         # Compute IoU of the picked box with the rest
140 |         iou = compute_iou(boxes[i], boxes[ixs[1:]], area[i], area[ixs[1:]])
141 |         # Identify boxes with IoU over the threshold. This
142 |         # returns indicies into ixs[1:], so add 1 to get
143 |         # indicies into ixs.
144 |         remove_ixs = np.where(iou > threshold)[0] + 1
145 |         # Remove indicies of the picked and overlapped boxes.
146 |         ixs = np.delete(ixs, remove_ixs)
147 |         ixs = np.delete(ixs, 0)
148 |     return np.array(pick, dtype=np.int32)
149 | 
150 | 
151 | def apply_box_deltas(boxes, deltas):
152 |     """Applies the given deltas to the given boxes.
153 |     boxes: [N, (y1, x1, y2, x2)]. Note that (y2, x2) is outside the box.
154 |     deltas: [N, (dy, dx, log(dh), log(dw))]
155 |     """
156 |     boxes = boxes.astype(np.float32)
157 |     # Convert to y, x, h, w
158 |     height = boxes[:, 2] - boxes[:, 0]
159 |     width = boxes[:, 3] - boxes[:, 1]
160 |     center_y = boxes[:, 0] + 0.5 * height
161 |     center_x = boxes[:, 1] + 0.5 * width
162 |     # Apply deltas
163 |     center_y += deltas[:, 0] * height
164 |     center_x += deltas[:, 1] * width
165 |     height *= np.exp(deltas[:, 2])
166 |     width *= np.exp(deltas[:, 3])
167 |     # Convert back to y1, x1, y2, x2
168 |     y1 = center_y - 0.5 * height
169 |     x1 = center_x - 0.5 * width
170 |     y2 = y1 + height
171 |     x2 = x1 + width
172 |     return np.stack([y1, x1, y2, x2], axis=1)
173 | 
174 | 
175 | def box_refinement_graph(box, gt_box):
176 |     """Compute refinement needed to transform box to gt_box.
177 |     box and gt_box are [N, (y1, x1, y2, x2)]
178 |     """
179 |     box = tf.cast(box, tf.float32)
180 |     gt_box = tf.cast(gt_box, tf.float32)
181 | 
182 |     height = box[:, 2] - box[:, 0]
183 |     width = box[:, 3] - box[:, 1]
184 |     center_y = box[:, 0] + 0.5 * height
185 |     center_x = box[:, 1] + 0.5 * width
186 | 
187 |     gt_height = gt_box[:, 2] - gt_box[:, 0]
188 |     gt_width = gt_box[:, 3] - gt_box[:, 1]
189 |     gt_center_y = gt_box[:, 0] + 0.5 * gt_height
190 |     gt_center_x = gt_box[:, 1] + 0.5 * gt_width
191 | 
192 |     dy = (gt_center_y - center_y) / height
193 |     dx = (gt_center_x - center_x) / width
194 |     dh = tf.log(gt_height / height)
195 |     dw = tf.log(gt_width / width)
196 | 
197 |     result = tf.stack([dy, dx, dh, dw], axis=1)
198 |     return result
199 | 
200 | 
201 | def box_refinement(box, gt_box):
202 |     """Compute refinement needed to transform box to gt_box.
203 |     box and gt_box are [N, (y1, x1, y2, x2)]. (y2, x2) is
204 |     assumed to be outside the box.
205 |     """
206 |     box = box.astype(np.float32)
207 |     gt_box = gt_box.astype(np.float32)
208 | 
209 |     height = box[:, 2] - box[:, 0]
210 |     width = box[:, 3] - box[:, 1]
211 |     center_y = box[:, 0] + 0.5 * height
212 |     center_x = box[:, 1] + 0.5 * width
213 | 
214 |     gt_height = gt_box[:, 2] - gt_box[:, 0]
215 |     gt_width = gt_box[:, 3] - gt_box[:, 1]
216 |     gt_center_y = gt_box[:, 0] + 0.5 * gt_height
217 |     gt_center_x = gt_box[:, 1] + 0.5 * gt_width
218 | 
219 |     dy = (gt_center_y - center_y) / height
220 |     dx = (gt_center_x - center_x) / width
221 |     dh = np.log(gt_height / height)
222 |     dw = np.log(gt_width / width)
223 | 
224 |     return np.stack([dy, dx, dh, dw], axis=1)
225 | 
226 | 
227 | ############################################################
228 | #  Dataset
229 | ############################################################
230 | 
231 | class Dataset(object):
232 |     """The base class for dataset classes.
233 |     To use it, create a new class that adds functions specific to the dataset
234 |     you want to use. For example:
235 | 
236 |     class CatsAndDogsDataset(Dataset):
237 |         def load_cats_and_dogs(self):
238 |             ...
239 |         def load_mask(self, image_id):
240 |             ...
241 |         def image_reference(self, image_id):
242 |             ...
243 | 
244 |     See COCODataset and ShapesDataset as examples.
245 |     """
246 | 
247 |     def __init__(self, class_map=None):
248 |         self._image_ids = []
249 |         self.image_info = []
250 |         # Background is always the first class
251 |         self.class_info = [{"source": "", "id": 0, "name": "BG"}]
252 |         self.source_class_ids = {}
253 | 
254 |     def add_class(self, source, class_id, class_name):
255 |         assert "." not in source, "Source name cannot contain a dot"
256 |         # Does the class exist already?
257 |         for info in self.class_info:
258 |             if info['source'] == source and info["id"] == class_id:
259 |                 # source.class_id combination already available, skip
260 |                 return
261 |         # Add the class
262 |         self.class_info.append({
263 |             "source": source,
264 |             "id": class_id,
265 |             "name": class_name,
266 |         })
267 | 
268 |     def add_image(self, source, image_id, path, **kwargs):
269 |         image_info = {
270 |             "id": image_id,
271 |             "source": source,
272 |             "path": path,
273 |         }
274 |         image_info.update(kwargs)
275 |         self.image_info.append(image_info)
276 | 
277 |     def image_reference(self, image_id):
278 |         """Return a link to the image in its source Website or details about
279 |         the image that help looking it up or debugging it.
280 | 
281 |         Override for your dataset, but pass to this function
282 |         if you encounter images not in your dataset.
283 |         """
284 |         return ""
285 | 
286 |     def prepare(self, class_map=None):
287 |         """Prepares the Dataset class for use.
288 | 
289 |         TODO: class map is not supported yet. When done, it should handle mapping
290 |               classes from different datasets to the same class ID.
291 |         """
292 | 
293 |         def clean_name(name):
294 |             """Returns a shorter version of object names for cleaner display."""
295 |             return ",".join(name.split(",")[:1])
296 | 
297 |         # Build (or rebuild) everything else from the info dicts.
298 |         self.num_classes = len(self.class_info)
299 |         self.class_ids = np.arange(self.num_classes)
300 |         self.class_names = [clean_name(c["name"]) for c in self.class_info]
301 |         self.num_images = len(self.image_info)
302 |         self._image_ids = np.arange(self.num_images)
303 | 
304 |         self.class_from_source_map = {"{}.{}".format(info['source'], info['id']): id
305 |                                       for info, id in zip(self.class_info, self.class_ids)}
306 | 
307 |         # Map sources to class_ids they support
308 |         self.sources = list(set([i['source'] for i in self.class_info]))
309 |         self.source_class_ids = {}
310 |         # Loop over datasets
311 |         for source in self.sources:
312 |             self.source_class_ids[source] = []
313 |             # Find classes that belong to this dataset
314 |             for i, info in enumerate(self.class_info):
315 |                 # Include BG class in all datasets
316 |                 if i == 0 or source == info['source']:
317 |                     self.source_class_ids[source].append(i)
318 | 
319 |     def map_source_class_id(self, source_class_id):
320 |         """Takes a source class ID and returns the int class ID assigned to it.
321 | 
322 |         For example:
323 |         dataset.map_source_class_id("coco.12") -> 23
324 |         """
325 |         return self.class_from_source_map[source_class_id]
326 | 
327 |     def get_source_class_id(self, class_id, source):
328 |         """Map an internal class ID to the corresponding class ID in the source dataset."""
329 |         info = self.class_info[class_id]
330 |         assert info['source'] == source
331 |         return info['id']
332 | 
333 |     def append_data(self, class_info, image_info):
334 |         self.external_to_class_id = {}
335 |         for i, c in enumerate(self.class_info):
336 |             for ds, id in c["map"]:
337 |                 self.external_to_class_id[ds + str(id)] = i
338 | 
339 |         # Map external image IDs to internal ones.
340 |         self.external_to_image_id = {}
341 |         for i, info in enumerate(self.image_info):
342 |             self.external_to_image_id[info["ds"] + str(info["id"])] = i
343 | 
344 |     @property
345 |     def image_ids(self):
346 |         return self._image_ids
347 | 
348 |     def source_image_link(self, image_id):
349 |         """Returns the path or URL to the image.
350 |         Override this to return a URL to the image if it's availble online for easy
351 |         debugging.
352 |         """
353 |         return self.image_info[image_id]["path"]
354 | 
355 |     def load_image(self, image_id):
356 |         """Load the specified image and return a [H,W,3] Numpy array.
357 |         """
358 |         # Load image
359 |         image = skimage.io.imread(self.image_info[image_id]['path'])
360 |         # If grayscale. Convert to RGB for consistency.
361 |         if image.ndim != 3:
362 |             image = skimage.color.gray2rgb(image)
363 |         return image
364 | 
365 |     def load_mask(self, image_id):
366 |         """Load instance masks for the given image.
367 | 
368 |         Different datasets use different ways to store masks. Override this
369 |         method to load instance masks and return them in the form of am
370 |         array of binary masks of shape [height, width, instances].
371 | 
372 |         Returns:
373 |             masks: A bool array of shape [height, width, instance count] with
374 |                 a binary mask per instance.
375 |             class_ids: a 1D array of class IDs of the instance masks.
376 |         """
377 |         # Override this function to load a mask from your dataset.
378 |         # Otherwise, it returns an empty mask.
379 |         mask = np.empty([0, 0, 0])
380 |         class_ids = np.empty([0], np.int32)
381 |         return mask, class_ids
382 | 
383 | 
384 | def resize_image(image, min_dim=None, max_dim=None, padding=False):
385 |     """
386 |     Resizes an image keeping the aspect ratio.
387 | 
388 |     min_dim: if provided, resizes the image such that it's smaller
389 |         dimension == min_dim
390 |     max_dim: if provided, ensures that the image longest side doesn't
391 |         exceed this value.
392 |     padding: If true, pads image with zeros so it's size is max_dim x max_dim
393 | 
394 |     Returns:
395 |     image: the resized image
396 |     window: (y1, x1, y2, x2). If max_dim is provided, padding might
397 |         be inserted in the returned image. If so, this window is the
398 |         coordinates of the image part of the full image (excluding
399 |         the padding). The x2, y2 pixels are not included.
400 |     scale: The scale factor used to resize the image
401 |     padding: Padding added to the image [(top, bottom), (left, right), (0, 0)]
402 |     """
403 |     # Default window (y1, x1, y2, x2) and default scale == 1.
404 |     h, w = image.shape[:2]
405 |     window = (0, 0, h, w)
406 |     scale = 1
407 | 
408 |     # Scale?
409 |     if min_dim:
410 |         # Scale up but not down
411 |         scale = max(1, min_dim / min(h, w))
412 |     # Does it exceed max dim?
413 |     if max_dim:
414 |         image_max = max(h, w)
415 |         if round(image_max * scale) > max_dim:
416 |             scale = max_dim / image_max
417 |     # Resize image and mask
418 |     if scale != 1:
419 |         image = scipy.misc.imresize(
420 |             image, (round(h * scale), round(w * scale)))
421 |     # Need padding?
422 |     if padding:
423 |         # Get new height and width
424 |         h, w = image.shape[:2]
425 |         top_pad = (max_dim - h) // 2
426 |         bottom_pad = max_dim - h - top_pad
427 |         left_pad = (max_dim - w) // 2
428 |         right_pad = max_dim - w - left_pad
429 |         padding = [(top_pad, bottom_pad), (left_pad, right_pad), (0, 0)]
430 |         image = np.pad(image, padding, mode='constant', constant_values=0)
431 |         window = (top_pad, left_pad, h + top_pad, w + left_pad)
432 |     return image, window, scale, padding
433 | 
434 | 
435 | def resize_mask(mask, scale, padding):
436 |     """Resizes a mask using the given scale and padding.
437 |     Typically, you get the scale and padding from resize_image() to
438 |     ensure both, the image and the mask, are resized consistently.
439 | 
440 |     scale: mask scaling factor
441 |     padding: Padding to add to the mask in the form
442 |             [(top, bottom), (left, right), (0, 0)]
443 |     """
444 |     h, w = mask.shape[:2]
445 |     mask = scipy.ndimage.zoom(mask, zoom=[scale, scale, 1], order=0)
446 |     mask = np.pad(mask, padding, mode='constant', constant_values=0)
447 |     return mask
448 | 
449 | 
450 | def minimize_mask(bbox, mask, mini_shape):
451 |     """Resize masks to a smaller version to cut memory load.
452 |     Mini-masks can then resized back to image scale using expand_masks()
453 | 
454 |     See inspect_data.ipynb notebook for more details.
455 |     """
456 |     mini_mask = np.zeros(mini_shape + (mask.shape[-1],), dtype=bool)
457 |     for i in range(mask.shape[-1]):
458 |         m = mask[:, :, i]
459 |         y1, x1, y2, x2 = bbox[i][:4]
460 |         m = m[y1:y2, x1:x2]
461 |         if m.size == 0:
462 |             raise Exception("Invalid bounding box with area of zero")
463 |         m = scipy.misc.imresize(m.astype(float), mini_shape, interp='bilinear')
464 |         mini_mask[:, :, i] = np.where(m >= 128, 1, 0)
465 |     return mini_mask
466 | 
467 | 
468 | def expand_mask(bbox, mini_mask, image_shape):
469 |     """Resizes mini masks back to image size. Reverses the change
470 |     of minimize_mask().
471 | 
472 |     See inspect_data.ipynb notebook for more details.
473 |     """
474 |     mask = np.zeros(image_shape[:2] + (mini_mask.shape[-1],), dtype=bool)
475 |     for i in range(mask.shape[-1]):
476 |         m = mini_mask[:, :, i]
477 |         y1, x1, y2, x2 = bbox[i][:4]
478 |         h = y2 - y1
479 |         w = x2 - x1
480 |         m = scipy.misc.imresize(m.astype(float), (h, w), interp='bilinear')
481 |         mask[y1:y2, x1:x2, i] = np.where(m >= 128, 1, 0)
482 |     return mask
483 | 
484 | 
485 | # TODO: Build and use this function to reduce code duplication
486 | def mold_mask(mask, config):
487 |     pass
488 | 
489 | 
490 | def unmold_mask(mask, bbox, image_shape):
491 |     """Converts a mask generated by the neural network into a format similar
492 |     to it's original shape.
493 |     mask: [height, width] of type float. A small, typically 28x28 mask.
494 |     bbox: [y1, x1, y2, x2]. The box to fit the mask in.
495 | 
496 |     Returns a binary mask with the same size as the original image.
497 |     """
498 |     threshold = 0.5
499 |     y1, x1, y2, x2 = bbox
500 |     mask = scipy.misc.imresize(
501 |         mask, (y2 - y1, x2 - x1), interp='bilinear').astype(np.float32) / 255.0
502 |     mask = np.where(mask >= threshold, 1, 0).astype(np.uint8)
503 | 
504 |     # Put the mask in the right location.
505 |     full_mask = np.zeros(image_shape[:2], dtype=np.uint8)
506 |     full_mask[y1:y2, x1:x2] = mask
507 |     return full_mask
508 | 
509 | 
510 | ############################################################
511 | #  Anchors
512 | ############################################################
513 | 
514 | def generate_anchors(scales, ratios, shape, feature_stride, anchor_stride):
515 |     """
516 |     scales: 1D array of anchor sizes in pixels. Example: [32, 64, 128]
517 |     ratios: 1D array of anchor ratios of width/height. Example: [0.5, 1, 2]
518 |     shape: [height, width] spatial shape of the feature map over which
519 |             to generate anchors.
520 |     feature_stride: Stride of the feature map relative to the image in pixels.
521 |     anchor_stride: Stride of anchors on the feature map. For example, if the
522 |         value is 2 then generate anchors for every other feature map pixel.
523 |     """
524 |     # Get all combinations of scales and ratios
525 |     scales, ratios = np.meshgrid(np.array(scales), np.array(ratios))
526 |     scales = scales.flatten()
527 |     ratios = ratios.flatten()
528 | 
529 |     # Enumerate heights and widths from scales and ratios
530 |     heights = scales / np.sqrt(ratios)
531 |     widths = scales * np.sqrt(ratios)
532 | 
533 |     # Enumerate shifts in feature space
534 |     shifts_y = np.arange(0, shape[0], anchor_stride) * feature_stride
535 |     shifts_x = np.arange(0, shape[1], anchor_stride) * feature_stride
536 |     shifts_x, shifts_y = np.meshgrid(shifts_x, shifts_y)
537 | 
538 |     # Enumerate combinations of shifts, widths, and heights
539 |     box_widths, box_centers_x = np.meshgrid(widths, shifts_x)
540 |     box_heights, box_centers_y = np.meshgrid(heights, shifts_y)
541 | 
542 |     # Reshape to get a list of (y, x) and a list of (h, w)
543 |     box_centers = np.stack(
544 |         [box_centers_y, box_centers_x], axis=2).reshape([-1, 2])
545 |     box_sizes = np.stack([box_heights, box_widths], axis=2).reshape([-1, 2])
546 | 
547 |     # Convert to corner coordinates (y1, x1, y2, x2)
548 |     boxes = np.concatenate([box_centers - 0.5 * box_sizes,
549 |                             box_centers + 0.5 * box_sizes], axis=1)
550 |     return boxes
551 | 
552 | 
553 | def generate_pyramid_anchors(scales, ratios, feature_shapes, feature_strides,
554 |                              anchor_stride):
555 |     """Generate anchors at different levels of a feature pyramid. Each scale
556 |     is associated with a level of the pyramid, but each ratio is used in
557 |     all levels of the pyramid.
558 | 
559 |     Returns:
560 |     anchors: [N, (y1, x1, y2, x2)]. All generated anchors in one array. Sorted
561 |         with the same order of the given scales. So, anchors of scale[0] come
562 |         first, then anchors of scale[1], and so on.
563 |     """
564 |     # Anchors
565 |     # [anchor_count, (y1, x1, y2, x2)]
566 |     anchors = []
567 |     for i in range(len(scales)):
568 |         anchors.append(generate_anchors(scales[i], ratios, feature_shapes[i],
569 |                                         feature_strides[i], anchor_stride))
570 |     return np.concatenate(anchors, axis=0)
571 | 
572 | 
573 | ############################################################
574 | #  Miscellaneous
575 | ############################################################
576 | 
577 | def trim_zeros(x):
578 |     """It's common to have tensors larger than the available data and
579 |     pad with zeros. This function removes rows that are all zeros.
580 | 
581 |     x: [rows, columns].
582 |     """
583 |     assert len(x.shape) == 2
584 |     return x[~np.all(x == 0, axis=1)]
585 | 
586 | 
587 | def compute_ap(gt_boxes, gt_class_ids, gt_masks,
588 |                pred_boxes, pred_class_ids, pred_scores, pred_masks,
589 |                iou_threshold=0.5):
590 |     """Compute Average Precision at a set IoU threshold (default 0.5).
591 | 
592 |     Returns:
593 |     mAP: Mean Average Precision
594 |     precisions: List of precisions at different class score thresholds.
595 |     recalls: List of recall values at different class score thresholds.
596 |     overlaps: [pred_boxes, gt_boxes] IoU overlaps.
597 |     """
598 |     # Trim zero padding and sort predictions by score from high to low
599 |     # TODO: cleaner to do zero unpadding upstream
600 |     gt_boxes = trim_zeros(gt_boxes)
601 |     gt_masks = gt_masks[..., :gt_boxes.shape[0]]
602 |     pred_boxes = trim_zeros(pred_boxes)
603 |     pred_scores = pred_scores[:pred_boxes.shape[0]]
604 |     indices = np.argsort(pred_scores)[::-1]
605 |     pred_boxes = pred_boxes[indices]
606 |     pred_class_ids = pred_class_ids[indices]
607 |     pred_scores = pred_scores[indices]
608 |     pred_masks = pred_masks[..., indices]
609 | 
610 |     # Compute IoU overlaps [pred_masks, gt_masks]
611 |     overlaps = compute_overlaps_masks(pred_masks, gt_masks)
612 | 
613 |     # Loop through ground truth boxes and find matching predictions
614 |     match_count = 0
615 |     pred_match = np.zeros([pred_boxes.shape[0]])
616 |     gt_match = np.zeros([gt_boxes.shape[0]])
617 |     for i in range(len(pred_boxes)):
618 |         # Find best matching ground truth box
619 |         sorted_ixs = np.argsort(overlaps[i])[::-1]
620 |         for j in sorted_ixs:
621 |             # If ground truth box is already matched, go to next one
622 |             if gt_match[j] == 1:
623 |                 continue
624 |             # If we reach IoU smaller than the threshold, end the loop
625 |             iou = overlaps[i, j]
626 |             if iou < iou_threshold:
627 |                 break
628 |             # Do we have a match?
629 |             if pred_class_ids[i] == gt_class_ids[j]:
630 |                 match_count += 1
631 |                 gt_match[j] = 1
632 |                 pred_match[i] = 1
633 |                 break
634 | 
635 |     # Compute precision and recall at each prediction box step
636 |     precisions = np.cumsum(pred_match) / (np.arange(len(pred_match)) + 1)
637 |     recalls = np.cumsum(pred_match).astype(np.float32) / len(gt_match)
638 | 
639 |     # Pad with start and end values to simplify the math
640 |     precisions = np.concatenate([[0], precisions, [0]])
641 |     recalls = np.concatenate([[0], recalls, [1]])
642 | 
643 |     # Ensure precision values decrease but don't increase. This way, the
644 |     # precision value at each recall threshold is the maximum it can be
645 |     # for all following recall thresholds, as specified by the VOC paper.
646 |     for i in range(len(precisions) - 2, -1, -1):
647 |         precisions[i] = np.maximum(precisions[i], precisions[i + 1])
648 | 
649 |     # Compute mean AP over recall range
650 |     indices = np.where(recalls[:-1] != recalls[1:])[0] + 1
651 |     mAP = np.sum((recalls[indices] - recalls[indices - 1]) *
652 |                  precisions[indices])
653 | 
654 |     return mAP, precisions, recalls, overlaps
655 | 
656 | 
657 | def compute_recall(pred_boxes, gt_boxes, iou):
658 |     """Compute the recall at the given IoU threshold. It's an indication
659 |     of how many GT boxes were found by the given prediction boxes.
660 | 
661 |     pred_boxes: [N, (y1, x1, y2, x2)] in image coordinates
662 |     gt_boxes: [N, (y1, x1, y2, x2)] in image coordinates
663 |     """
664 |     # Measure overlaps
665 |     overlaps = compute_overlaps(pred_boxes, gt_boxes)
666 |     iou_max = np.max(overlaps, axis=1)
667 |     iou_argmax = np.argmax(overlaps, axis=1)
668 |     positive_ids = np.where(iou_max >= iou)[0]
669 |     matched_gt_boxes = iou_argmax[positive_ids]
670 | 
671 |     recall = len(set(matched_gt_boxes)) / gt_boxes.shape[0]
672 |     return recall, positive_ids
673 | 
674 | 
675 | # ## Batch Slicing
676 | # Some custom layers support a batch size of 1 only, and require a lot of work
677 | # to support batches greater than 1. This function slices an input tensor
678 | # across the batch dimension and feeds batches of size 1. Effectively,
679 | # an easy way to support batches > 1 quickly with little code modification.
680 | # In the long run, it's more efficient to modify the code to support large
681 | # batches and getting rid of this function. Consider this a temporary solution
682 | def batch_slice(inputs, graph_fn, batch_size, names=None):
683 |     """Splits inputs into slices and feeds each slice to a copy of the given
684 |     computation graph and then combines the results. It allows you to run a
685 |     graph on a batch of inputs even if the graph is written to support one
686 |     instance only.
687 | 
688 |     inputs: list of tensors. All must have the same first dimension length
689 |     graph_fn: A function that returns a TF tensor that's part of a graph.
690 |     batch_size: number of slices to divide the data into.
691 |     names: If provided, assigns names to the resulting tensors.
692 |     """
693 |     if not isinstance(inputs, list):
694 |         inputs = [inputs]
695 | 
696 |     outputs = []
697 |     for i in range(batch_size):
698 |         inputs_slice = [x[i] for x in inputs]
699 |         output_slice = graph_fn(*inputs_slice)
700 |         if not isinstance(output_slice, (tuple, list)):
701 |             output_slice = [output_slice]
702 |         outputs.append(output_slice)
703 |     # Change outputs from a list of slices where each is
704 |     # a list of outputs to a list of outputs and each has
705 |     # a list of slices
706 |     outputs = list(zip(*outputs))
707 | 
708 |     if names is None:
709 |         names = [None] * len(outputs)
710 | 
711 |     result = [tf.stack(o, axis=0, name=n)
712 |               for o, n in zip(outputs, names)]
713 |     if len(result) == 1:
714 |         result = result[0]
715 | 
716 |     return result
717 | 
718 | 
719 | def download_trained_weights(coco_model_path, verbose=1):
720 |     """Download COCO trained weights from Releases.
721 | 
722 |     coco_model_path: local path of COCO trained weights
723 |     """
724 |     if verbose > 0:
725 |         print("Downloading pretrained model to " + coco_model_path + " ...")
726 |     with urllib.request.urlopen(COCO_MODEL_URL) as resp, open(coco_model_path, 'wb') as out:
727 |         shutil.copyfileobj(resp, out)
728 |     if verbose > 0:
729 |         print("... done downloading pretrained model!")
730 | 


--------------------------------------------------------------------------------
/visualize.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Mask R-CNN
  3 | Display and Visualization Functions.
  4 | 
  5 | Copyright (c) 2017 Matterport, Inc.
  6 | Licensed under the MIT License (see LICENSE for details)
  7 | Written by Waleed Abdulla
  8 | """
  9 | 
 10 | import random
 11 | import itertools
 12 | import colorsys
 13 | import numpy as np
 14 | from skimage.measure import find_contours
 15 | import matplotlib.pyplot as plt
 16 | import matplotlib.patches as patches
 17 | import matplotlib.lines as lines
 18 | from matplotlib.patches import Polygon
 19 | import IPython.display
 20 | 
 21 | import utils
 22 | 
 23 | 
 24 | ############################################################
 25 | #  Visualization
 26 | ############################################################
 27 | 
 28 | def display_images(images, titles=None, cols=4, cmap=None, norm=None,
 29 |                    interpolation=None):
 30 |     """Display the given set of images, optionally with titles.
 31 |     images: list or array of image tensors in HWC format.
 32 |     titles: optional. A list of titles to display with each image.
 33 |     cols: number of images per row
 34 |     cmap: Optional. Color map to use. For example, "Blues".
 35 |     norm: Optional. A Normalize instance to map values to colors.
 36 |     interpolation: Optional. Image interporlation to use for display.
 37 |     """
 38 |     titles = titles if titles is not None else [""] * len(images)
 39 |     rows = len(images) // cols + 1
 40 |     plt.figure(figsize=(14, 14 * rows // cols))
 41 |     i = 1
 42 |     for image, title in zip(images, titles):
 43 |         plt.subplot(rows, cols, i)
 44 |         plt.title(title, fontsize=9)
 45 |         plt.axis('off')
 46 |         plt.imshow(image.astype(np.uint8), cmap=cmap,
 47 |                    norm=norm, interpolation=interpolation)
 48 |         i += 1
 49 |     plt.show()
 50 | 
 51 | 
 52 | def random_colors(N, bright=True):
 53 |     """
 54 |     Generate random colors.
 55 |     To get visually distinct colors, generate them in HSV space then
 56 |     convert to RGB.
 57 |     """
 58 |     brightness = 1.0 if bright else 0.7
 59 |     hsv = [(i / N, 1, brightness) for i in range(N)]
 60 |     colors = list(map(lambda c: colorsys.hsv_to_rgb(*c), hsv))
 61 |     random.shuffle(colors)
 62 |     return colors
 63 | 
 64 | 
 65 | def apply_mask(image, mask, color, alpha=0.5):
 66 |     """Apply the given mask to the image.
 67 |     """
 68 |     for c in range(3):
 69 |         image[:, :, c] = np.where(mask == 1,
 70 |                                   image[:, :, c] *
 71 |                                   (1 - alpha) + alpha * color[c] * 255,
 72 |                                   image[:, :, c])
 73 |     return image
 74 | 
 75 | 
 76 | def display_instances(image, boxes, masks, class_ids, class_names,
 77 |                       scores=None, title="",
 78 |                       figsize=(16, 16), ax=None):
 79 |     """
 80 |     boxes: [num_instance, (y1, x1, y2, x2, class_id)] in image coordinates.
 81 |     masks: [height, width, num_instances]
 82 |     class_ids: [num_instances]
 83 |     class_names: list of class names of the dataset
 84 |     scores: (optional) confidence scores for each box
 85 |     figsize: (optional) the size of the image.
 86 |     """
 87 |     # Number of instances
 88 |     N = boxes.shape[0]
 89 |     if not N:
 90 |         print("\n*** No instances to display *** \n")
 91 |     else:
 92 |         assert boxes.shape[0] == masks.shape[-1] == class_ids.shape[0]
 93 | 
 94 |     if not ax:
 95 |         _, ax = plt.subplots(1, figsize=figsize)
 96 | 
 97 |     # Generate random colors
 98 |     colors = random_colors(N)
 99 | 
100 |     # Show area outside image boundaries.
101 |     height, width = image.shape[:2]
102 |     ax.set_ylim(height + 10, -10)
103 |     ax.set_xlim(-10, width + 10)
104 |     ax.axis('off')
105 |     ax.set_title(title)
106 | 
107 |     masked_image = image.astype(np.uint32).copy()
108 |     for i in range(N):
109 |         color = colors[i]
110 | 
111 |         # Bounding box
112 |         if not np.any(boxes[i]):
113 |             # Skip this instance. Has no bbox. Likely lost in image cropping.
114 |             continue
115 |         y1, x1, y2, x2 = boxes[i]
116 |         p = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2,
117 |                               alpha=0.7, linestyle="dashed",
118 |                               edgecolor=color, facecolor='none')
119 |         ax.add_patch(p)
120 | 
121 |         # Label
122 |         class_id = class_ids[i]
123 |         score = scores[i] if scores is not None else None
124 |         label = class_names[class_id]
125 |         x = random.randint(x1, (x1 + x2) // 2)
126 |         caption = "{} {}".format(label, score) if score else label
127 |         ax.text(x1, y1 + 8, caption,
128 |                 color='w', size=11, backgroundcolor="none")
129 | 
130 |         # Mask
131 |         mask = masks[:, :, i]
132 |         masked_image = apply_mask(masked_image, mask, color)
133 | 
134 |         # Mask Polygon
135 |         # Pad to ensure proper polygons for masks that touch image edges.
136 |         padded_mask = np.zeros(
137 |             (mask.shape[0] + 2, mask.shape[1] + 2), dtype=np.uint8)
138 |         padded_mask[1:-1, 1:-1] = mask
139 |         contours = find_contours(padded_mask, 0.5)
140 |         for verts in contours:
141 |             # Subtract the padding and flip (y, x) to (x, y)
142 |             verts = np.fliplr(verts) - 1
143 |             p = Polygon(verts, facecolor="none", edgecolor=color)
144 |             ax.add_patch(p)
145 | 
146 |     ax.imshow(masked_image.astype(np.uint8))
147 |     #plt.show()
148 |     plt.savefig('person_blocked_labels.png', bbox_inches='tight')
149 |     
150 | 
151 | def draw_rois(image, rois, refined_rois, mask, class_ids, class_names, limit=10):
152 |     """
153 |     anchors: [n, (y1, x1, y2, x2)] list of anchors in image coordinates.
154 |     proposals: [n, 4] the same anchors but refined to fit objects better.
155 |     """
156 |     masked_image = image.copy()
157 | 
158 |     # Pick random anchors in case there are too many.
159 |     ids = np.arange(rois.shape[0], dtype=np.int32)
160 |     ids = np.random.choice(
161 |         ids, limit, replace=False) if ids.shape[0] > limit else ids
162 | 
163 |     fig, ax = plt.subplots(1, figsize=(12, 12))
164 |     if rois.shape[0] > limit:
165 |         plt.title("Showing {} random ROIs out of {}".format(
166 |             len(ids), rois.shape[0]))
167 |     else:
168 |         plt.title("{} ROIs".format(len(ids)))
169 | 
170 |     # Show area outside image boundaries.
171 |     ax.set_ylim(image.shape[0] + 20, -20)
172 |     ax.set_xlim(-50, image.shape[1] + 20)
173 |     ax.axis('off')
174 | 
175 |     for i, id in enumerate(ids):
176 |         color = np.random.rand(3)
177 |         class_id = class_ids[id]
178 |         # ROI
179 |         y1, x1, y2, x2 = rois[id]
180 |         p = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2,
181 |                               edgecolor=color if class_id else "gray",
182 |                               facecolor='none', linestyle="dashed")
183 |         ax.add_patch(p)
184 |         # Refined ROI
185 |         if class_id:
186 |             ry1, rx1, ry2, rx2 = refined_rois[id]
187 |             p = patches.Rectangle((rx1, ry1), rx2 - rx1, ry2 - ry1, linewidth=2,
188 |                                   edgecolor=color, facecolor='none')
189 |             ax.add_patch(p)
190 |             # Connect the top-left corners of the anchor and proposal for easy visualization
191 |             ax.add_line(lines.Line2D([x1, rx1], [y1, ry1], color=color))
192 | 
193 |             # Label
194 |             label = class_names[class_id]
195 |             ax.text(rx1, ry1 + 8, "{}".format(label),
196 |                     color='w', size=11, backgroundcolor="none")
197 | 
198 |             # Mask
199 |             m = utils.unmold_mask(mask[id], rois[id]
200 |                                   [:4].astype(np.int32), image.shape)
201 |             masked_image = apply_mask(masked_image, m, color)
202 | 
203 |     ax.imshow(masked_image)
204 | 
205 |     # Print stats
206 |     print("Positive ROIs: ", class_ids[class_ids > 0].shape[0])
207 |     print("Negative ROIs: ", class_ids[class_ids == 0].shape[0])
208 |     print("Positive Ratio: {:.2f}".format(
209 |         class_ids[class_ids > 0].shape[0] / class_ids.shape[0]))
210 | 
211 | 
212 | # TODO: Replace with matplotlib equivalent?
213 | def draw_box(image, box, color):
214 |     """Draw 3-pixel width bounding boxes on the given image array.
215 |     color: list of 3 int values for RGB.
216 |     """
217 |     y1, x1, y2, x2 = box
218 |     image[y1:y1 + 2, x1:x2] = color
219 |     image[y2:y2 + 2, x1:x2] = color
220 |     image[y1:y2, x1:x1 + 2] = color
221 |     image[y1:y2, x2:x2 + 2] = color
222 |     return image
223 | 
224 | 
225 | def display_top_masks(image, mask, class_ids, class_names, limit=4):
226 |     """Display the given image and the top few class masks."""
227 |     to_display = []
228 |     titles = []
229 |     to_display.append(image)
230 |     titles.append("H x W={}x{}".format(image.shape[0], image.shape[1]))
231 |     # Pick top prominent classes in this image
232 |     unique_class_ids = np.unique(class_ids)
233 |     mask_area = [np.sum(mask[:, :, np.where(class_ids == i)[0]])
234 |                  for i in unique_class_ids]
235 |     top_ids = [v[0] for v in sorted(zip(unique_class_ids, mask_area),
236 |                                     key=lambda r: r[1], reverse=True) if v[1] > 0]
237 |     # Generate images and titles
238 |     for i in range(limit):
239 |         class_id = top_ids[i] if i < len(top_ids) else -1
240 |         # Pull masks of instances belonging to the same class.
241 |         m = mask[:, :, np.where(class_ids == class_id)[0]]
242 |         m = np.sum(m * np.arange(1, m.shape[-1] + 1), -1)
243 |         to_display.append(m)
244 |         titles.append(class_names[class_id] if class_id != -1 else "-")
245 |     display_images(to_display, titles=titles, cols=limit + 1, cmap="Blues_r")
246 | 
247 | 
248 | def plot_precision_recall(AP, precisions, recalls):
249 |     """Draw the precision-recall curve.
250 | 
251 |     AP: Average precision at IoU >= 0.5
252 |     precisions: list of precision values
253 |     recalls: list of recall values
254 |     """
255 |     # Plot the Precision-Recall curve
256 |     _, ax = plt.subplots(1)
257 |     ax.set_title("Precision-Recall Curve. AP@50 = {:.3f}".format(AP))
258 |     ax.set_ylim(0, 1.1)
259 |     ax.set_xlim(0, 1.1)
260 |     _ = ax.plot(recalls, precisions)
261 | 
262 | 
263 | def plot_overlaps(gt_class_ids, pred_class_ids, pred_scores,
264 |                   overlaps, class_names, threshold=0.5):
265 |     """Draw a grid showing how ground truth objects are classified.
266 |     gt_class_ids: [N] int. Ground truth class IDs
267 |     pred_class_id: [N] int. Predicted class IDs
268 |     pred_scores: [N] float. The probability scores of predicted classes
269 |     overlaps: [pred_boxes, gt_boxes] IoU overlaps of predictins and GT boxes.
270 |     class_names: list of all class names in the dataset
271 |     threshold: Float. The prediction probability required to predict a class
272 |     """
273 |     gt_class_ids = gt_class_ids[gt_class_ids != 0]
274 |     pred_class_ids = pred_class_ids[pred_class_ids != 0]
275 | 
276 |     plt.figure(figsize=(12, 10))
277 |     plt.imshow(overlaps, interpolation='nearest', cmap=plt.cm.Blues)
278 |     plt.yticks(np.arange(len(pred_class_ids)),
279 |                ["{} ({:.2f})".format(class_names[int(id)], pred_scores[i])
280 |                 for i, id in enumerate(pred_class_ids)])
281 |     plt.xticks(np.arange(len(gt_class_ids)),
282 |                [class_names[int(id)] for id in gt_class_ids], rotation=90)
283 | 
284 |     thresh = overlaps.max() / 2.
285 |     for i, j in itertools.product(range(overlaps.shape[0]),
286 |                                   range(overlaps.shape[1])):
287 |         text = ""
288 |         if overlaps[i, j] > threshold:
289 |             text = "match" if gt_class_ids[j] == pred_class_ids[i] else "wrong"
290 |         color = ("white" if overlaps[i, j] > thresh
291 |                  else "black" if overlaps[i, j] > 0
292 |                  else "grey")
293 |         plt.text(j, i, "{:.3f}\n{}".format(overlaps[i, j], text),
294 |                  horizontalalignment="center", verticalalignment="center",
295 |                  fontsize=9, color=color)
296 | 
297 |     plt.tight_layout()
298 |     plt.xlabel("Ground Truth")
299 |     plt.ylabel("Predictions")
300 | 
301 | 
302 | def draw_boxes(image, boxes=None, refined_boxes=None,
303 |                masks=None, captions=None, visibilities=None,
304 |                title="", ax=None):
305 |     """Draw bounding boxes and segmentation masks with differnt
306 |     customizations.
307 | 
308 |     boxes: [N, (y1, x1, y2, x2, class_id)] in image coordinates.
309 |     refined_boxes: Like boxes, but draw with solid lines to show
310 |         that they're the result of refining 'boxes'.
311 |     masks: [N, height, width]
312 |     captions: List of N titles to display on each box
313 |     visibilities: (optional) List of values of 0, 1, or 2. Determine how
314 |         prominant each bounding box should be.
315 |     title: An optional title to show over the image
316 |     ax: (optional) Matplotlib axis to draw on.
317 |     """
318 |     # Number of boxes
319 |     assert boxes is not None or refined_boxes is not None
320 |     N = boxes.shape[0] if boxes is not None else refined_boxes.shape[0]
321 | 
322 |     # Matplotlib Axis
323 |     if not ax:
324 |         _, ax = plt.subplots(1, figsize=(12, 12))
325 | 
326 |     # Generate random colors
327 |     colors = random_colors(N)
328 | 
329 |     # Show area outside image boundaries.
330 |     margin = image.shape[0] // 10
331 |     ax.set_ylim(image.shape[0] + margin, -margin)
332 |     ax.set_xlim(-margin, image.shape[1] + margin)
333 |     ax.axis('off')
334 | 
335 |     ax.set_title(title)
336 | 
337 |     masked_image = image.astype(np.uint32).copy()
338 |     for i in range(N):
339 |         # Box visibility
340 |         visibility = visibilities[i] if visibilities is not None else 1
341 |         if visibility == 0:
342 |             color = "gray"
343 |             style = "dotted"
344 |             alpha = 0.5
345 |         elif visibility == 1:
346 |             color = colors[i]
347 |             style = "dotted"
348 |             alpha = 1
349 |         elif visibility == 2:
350 |             color = colors[i]
351 |             style = "solid"
352 |             alpha = 1
353 | 
354 |         # Boxes
355 |         if boxes is not None:
356 |             if not np.any(boxes[i]):
357 |                 # Skip this instance. Has no bbox. Likely lost in cropping.
358 |                 continue
359 |             y1, x1, y2, x2 = boxes[i]
360 |             p = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2,
361 |                                   alpha=alpha, linestyle=style,
362 |                                   edgecolor=color, facecolor='none')
363 |             ax.add_patch(p)
364 | 
365 |         # Refined boxes
366 |         if refined_boxes is not None and visibility > 0:
367 |             ry1, rx1, ry2, rx2 = refined_boxes[i].astype(np.int32)
368 |             p = patches.Rectangle((rx1, ry1), rx2 - rx1, ry2 - ry1, linewidth=2,
369 |                                   edgecolor=color, facecolor='none')
370 |             ax.add_patch(p)
371 |             # Connect the top-left corners of the anchor and proposal
372 |             if boxes is not None:
373 |                 ax.add_line(lines.Line2D([x1, rx1], [y1, ry1], color=color))
374 | 
375 |         # Captions
376 |         if captions is not None:
377 |             caption = captions[i]
378 |             # If there are refined boxes, display captions on them
379 |             if refined_boxes is not None:
380 |                 y1, x1, y2, x2 = ry1, rx1, ry2, rx2
381 |             x = random.randint(x1, (x1 + x2) // 2)
382 |             ax.text(x1, y1, caption, size=11, verticalalignment='top',
383 |                     color='w', backgroundcolor="none",
384 |                     bbox={'facecolor': color, 'alpha': 0.5,
385 |                           'pad': 2, 'edgecolor': 'none'})
386 | 
387 |         # Masks
388 |         if masks is not None:
389 |             mask = masks[:, :, i]
390 |             masked_image = apply_mask(masked_image, mask, color)
391 |             # Mask Polygon
392 |             # Pad to ensure proper polygons for masks that touch image edges.
393 |             padded_mask = np.zeros(
394 |                 (mask.shape[0] + 2, mask.shape[1] + 2), dtype=np.uint8)
395 |             padded_mask[1:-1, 1:-1] = mask
396 |             contours = find_contours(padded_mask, 0.5)
397 |             for verts in contours:
398 |                 # Subtract the padding and flip (y, x) to (x, y)
399 |                 verts = np.fliplr(verts) - 1
400 |                 p = Polygon(verts, facecolor="none", edgecolor=color)
401 |                 ax.add_patch(p)
402 |     ax.imshow(masked_image.astype(np.uint8))
403 | 
404 | 
405 | def display_table(table):
406 |     """Display values in a table format.
407 |     table: an iterable of rows, and each row is an iterable of values.
408 |     """
409 |     html = ""
410 |     for row in table:
411 |         row_html = ""
412 |         for col in row:
413 |             row_html += "<td>{:40}</td>".format(str(col))
414 |         html += "<tr>" + row_html + "</tr>"
415 |     html = "<table>" + html + "</table>"
416 |     IPython.display.display(IPython.display.HTML(html))
417 | 
418 | 
419 | def display_weight_stats(model):
420 |     """Scans all the weights in the model and returns a list of tuples
421 |     that contain stats about each weight.
422 |     """
423 |     layers = model.get_trainable_layers()
424 |     table = [["WEIGHT NAME", "SHAPE", "MIN", "MAX", "STD"]]
425 |     for l in layers:
426 |         weight_values = l.get_weights()  # list of Numpy arrays
427 |         weight_tensors = l.weights  # list of TF tensors
428 |         for i, w in enumerate(weight_values):
429 |             weight_name = weight_tensors[i].name
430 |             # Detect problematic layers. Exclude biases of conv layers.
431 |             alert = ""
432 |             if w.min() == w.max() and not (l.__class__.__name__ == "Conv2D" and i == 1):
433 |                 alert += "<span style='color:red'>*** dead?</span>"
434 |             if np.abs(w.min()) > 1000 or np.abs(w.max()) > 1000:
435 |                 alert += "<span style='color:red'>*** Overflow?</span>"
436 |             # Add row
437 |             table.append([
438 |                 weight_name + alert,
439 |                 str(w.shape),
440 |                 "{:+9.4f}".format(w.min()),
441 |                 "{:+10.4f}".format(w.max()),
442 |                 "{:+9.4f}".format(w.std()),
443 |             ])
444 |     display_table(table)
445 | 


--------------------------------------------------------------------------------