├── .dockerignore ├── .gitignore ├── Dockerfile ├── LICENSE ├── README.md ├── classes.py ├── coco.py ├── config.py ├── docker-entrypoint.sh ├── example_output ├── img1_blocked.png ├── img2_blocked.png ├── img3_blocked.png ├── img4_blocked.gif ├── img4_blocked.png └── img4_labels.png ├── images ├── img1.jpg ├── img2.jpg ├── img3.jpg └── img4.jpg ├── model.py ├── person_blocker.py ├── requirements.txt ├── utils.py └── visualize.py /.dockerignore: -------------------------------------------------------------------------------- 1 | .git 2 | example_output 3 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.h5 2 | __pycache__ -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM python:slim 2 | 3 | WORKDIR /app 4 | 5 | COPY requirements.txt ./ 6 | 7 | RUN apt-get update -qq && \ 8 | DEBIAN_FRONTEND=noninteractive apt-get install -qq \ 9 | python3-tk \ 10 | xvfb \ 11 | curl && \ 12 | curl -OJL https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5 && \ 13 | pip install --no-cache-dir -r requirements.txt && \ 14 | apt-get remove --purge -qq curl && \ 15 | apt-get autoremove --purge -qq && \ 16 | apt-get clean -qq && \ 17 | rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* /init /root/.cache 18 | 19 | COPY . . 20 | 21 | WORKDIR /data 22 | 23 | ENTRYPOINT ["/app/docker-entrypoint.sh"] 24 | CMD ["-n"] 25 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Max Woolf 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | 23 | --- 24 | 25 | Mask R-CNN 26 | 27 | The MIT License (MIT) 28 | 29 | Copyright (c) 2017 Matterport, Inc. 30 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Person Blocker 2 | 3 | ![img4](example_output/img4_blocked.gif) 4 | 5 | A script to automatically "block" people in images (like the [Black Mirror](https://en.wikipedia.org/wiki/Black_Mirror) episode [White Christmas](https://en.wikipedia.org/wiki/White_Christmas_(Black_Mirror))) using [Mask R-CNN](https://github.com/matterport/Mask_RCNN) pretrained on the [MS COCO](https://arxiv.org/abs/1405.0312) dataset. No GPU required! 6 | 7 | But you can block more than just people: up to [80 different types](https://github.com/minimaxir/person-blocker/blob/master/classes.py) of objects can be blocked, including giraffes and busses! 8 | 9 | ## Setup 10 | 11 | This project relies on a handful of dependencies, use the following command to install your dependencies: 12 | 13 | ```shell 14 | pip3 install -r requirements.txt 15 | ``` 16 | 17 | _Note_: Depending on your environment, you may need to use `sudo`. You may also want to use virtualenv. 18 | 19 | ## Usage 20 | 21 | Person Blocker is used from the command line: 22 | 23 | ```shell 24 | python3 person_blocker.py -i images/img3.jpg -c '(128, 128, 128)' -o 'bus' 'truck' 25 | ``` 26 | 27 | * `-i/--image`: specifies the image file. 28 | * `-m/--model`: path to the pretrained COCO model weights (default: current directory): if not specified, it will download them automatically to the current directory if not already present (note: the weights are 258 MB!) 29 | * `-c/--color`: color of the mask, in either quote-wrapped hexidecimal or 3-element RGB tuple format. (default: white) 30 | * `-o/--object`: list of types of objects to block (or object IDs of specific objects). You can see the allowable choices of objects to block in `classes.py` or by using the `-names` flag. (default: person) 31 | * `-l/--labeled`: saves a labeled image annotated with detected objects and their object ID. 32 | * `-n/--names`: prints the class options for objects, then exits. 33 | 34 | The script outputs two images: a static (pun intended) image `person_blocked.png` and an animated image `person_blocked.gif` like the one at the beginning of this README. 35 | 36 | ## Examples 37 | 38 | ```shell 39 | python3 person_blocker.py -i images/img1.jpg 40 | ``` 41 | 42 | ![img1](example_output/img1_blocked.png) 43 | 44 | ```shell 45 | python3 person_blocker.py -i images/img2.jpg -c '#c0392b' -o 'giraffe' 46 | ``` 47 | 48 | ![img2](example_output/img2_blocked.png) 49 | 50 | ```shell 51 | python3 person_blocker.py -i images/img3.jpg -c '(128, 128, 128)' -o 'bus' 'truck' 52 | ``` 53 | 54 | ![img3](example_output/img3_blocked.png) 55 | 56 | Blocking specific object(s) requires 2 steps: running in inference mode to get the object IDs for each object, and then blocking those object IDs. 57 | 58 | ```shell 59 | python3 person_blocker.py -i images/img4.jpg -l 60 | ``` 61 | 62 | ![img4 labels](example_output/img4_labels.png) 63 | 64 | ```shell 65 | python3 person_blocker.py -i images/img4.jpg -o 1 66 | ``` 67 | 68 | ![img4](example_output/img4_blocked.png) 69 | 70 | ## Requirements 71 | 72 | The same requirements as Mask R-CNN: 73 | * Python 3.4+ 74 | * TensorFlow 1.3+ 75 | * Keras 2.0.8+ 76 | * Numpy, skimage, scipy, Pillow, cython, h5py 77 | 78 | plus matplotlib and imageio 79 | 80 | ## Maintainer 81 | 82 | Max Woolf ([@minimaxir](http://minimaxir.com)) 83 | 84 | *Max's open-source projects are supported by his [Patreon](https://www.patreon.com/minimaxir). If you found this project helpful, any monetary contributions to the Patreon are appreciated and will be put to good creative use.* 85 | 86 | ## License 87 | 88 | MIT 89 | 90 | Code used from Mask R-CNN by Matterport, Inc. (MIT-Licensed), with minor alterations and copyright notices retained. 91 | -------------------------------------------------------------------------------- /classes.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import coco 3 | 4 | 5 | def get_class_names(): 6 | return np.array(['BG', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 7 | 'bus', 'train', 'truck', 'boat', 'traffic light', 8 | 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 9 | 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 10 | 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 11 | 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 12 | 'kite', 'baseball bat', 'baseball glove', 'skateboard', 13 | 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 14 | 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 15 | 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 16 | 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 17 | 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 18 | 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 19 | 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 20 | 'teddy bear', 'hair drier', 'toothbrush']) 21 | 22 | 23 | class InferenceConfig(coco.CocoConfig): 24 | GPU_COUNT = 1 25 | IMAGES_PER_GPU = 1 26 | -------------------------------------------------------------------------------- /coco.py: -------------------------------------------------------------------------------- 1 | """ 2 | Mask R-CNN 3 | Configurations and data loading code for MS COCO. 4 | 5 | Copyright (c) 2017 Matterport, Inc. 6 | Licensed under the MIT License (see LICENSE for details) 7 | Written by Waleed Abdulla 8 | 9 | ------------------------------------------------------------ 10 | 11 | Usage: import the module (see Jupyter notebooks for examples), or run from 12 | the command line as such: 13 | 14 | # Train a new model starting from pre-trained COCO weights 15 | python3 coco.py train --dataset=/path/to/coco/ --model=coco 16 | 17 | # Train a new model starting from ImageNet weights 18 | python3 coco.py train --dataset=/path/to/coco/ --model=imagenet 19 | 20 | # Continue training a model that you had trained earlier 21 | python3 coco.py train --dataset=/path/to/coco/ --model=/path/to/weights.h5 22 | 23 | # Continue training the last model you trained 24 | python3 coco.py train --dataset=/path/to/coco/ --model=last 25 | 26 | # Run COCO evaluatoin on the last model you trained 27 | python3 coco.py evaluate --dataset=/path/to/coco/ --model=last 28 | """ 29 | 30 | import os 31 | import time 32 | import numpy as np 33 | 34 | # Download and install the Python COCO tools from https://github.com/waleedka/coco 35 | # That's a fork from the original https://github.com/pdollar/coco with a bug 36 | # fix for Python 3. 37 | # I submitted a pull request https://github.com/cocodataset/cocoapi/pull/50 38 | # If the PR is merged then use the original repo. 39 | # Note: Edit PythonAPI/Makefile and replace "python" with "python3". 40 | # from pycocotools.coco import COCO 41 | # from pycocotools.cocoeval import COCOeval 42 | # from pycocotools import mask as maskUtils 43 | 44 | import zipfile 45 | import urllib.request 46 | import shutil 47 | 48 | from config import Config 49 | import utils 50 | import model as modellib 51 | 52 | # Root directory of the project 53 | ROOT_DIR = os.getcwd() 54 | 55 | # Path to trained weights file 56 | COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5") 57 | 58 | # Directory to save logs and model checkpoints, if not provided 59 | # through the command line argument --logs 60 | DEFAULT_LOGS_DIR = os.path.join(ROOT_DIR, "logs") 61 | DEFAULT_DATASET_YEAR = "2014" 62 | 63 | ############################################################ 64 | # Configurations 65 | ############################################################ 66 | 67 | 68 | class CocoConfig(Config): 69 | """Configuration for training on MS COCO. 70 | Derives from the base Config class and overrides values specific 71 | to the COCO dataset. 72 | """ 73 | # Give the configuration a recognizable name 74 | NAME = "coco" 75 | 76 | # We use a GPU with 12GB memory, which can fit two images. 77 | # Adjust down if you use a smaller GPU. 78 | IMAGES_PER_GPU = 2 79 | 80 | # Uncomment to train on 8 GPUs (default is 1) 81 | # GPU_COUNT = 8 82 | 83 | # Number of classes (including background) 84 | NUM_CLASSES = 1 + 80 # COCO has 80 classes 85 | 86 | 87 | ############################################################ 88 | # Dataset 89 | ############################################################ 90 | 91 | class CocoDataset(utils.Dataset): 92 | def load_coco(self, dataset_dir, subset, year=DEFAULT_DATASET_YEAR, class_ids=None, 93 | class_map=None, return_coco=False, auto_download=False): 94 | """Load a subset of the COCO dataset. 95 | dataset_dir: The root directory of the COCO dataset. 96 | subset: What to load (train, val, minival, valminusminival) 97 | year: What dataset year to load (2014, 2017) as a string, not an integer 98 | class_ids: If provided, only loads images that have the given classes. 99 | class_map: TODO: Not implemented yet. Supports maping classes from 100 | different datasets to the same class ID. 101 | return_coco: If True, returns the COCO object. 102 | auto_download: Automatically download and unzip MS-COCO images and annotations 103 | """ 104 | 105 | if auto_download is True: 106 | self.auto_download(dataset_dir, subset, year) 107 | 108 | coco = COCO("{}/annotations/instances_{}{}.json".format(dataset_dir, subset, year)) 109 | if subset == "minival" or subset == "valminusminival": 110 | subset = "val" 111 | image_dir = "{}/{}{}".format(dataset_dir, subset, year) 112 | 113 | # Load all classes or a subset? 114 | if not class_ids: 115 | # All classes 116 | class_ids = sorted(coco.getCatIds()) 117 | 118 | # All images or a subset? 119 | if class_ids: 120 | image_ids = [] 121 | for id in class_ids: 122 | image_ids.extend(list(coco.getImgIds(catIds=[id]))) 123 | # Remove duplicates 124 | image_ids = list(set(image_ids)) 125 | else: 126 | # All images 127 | image_ids = list(coco.imgs.keys()) 128 | 129 | # Add classes 130 | for i in class_ids: 131 | self.add_class("coco", i, coco.loadCats(i)[0]["name"]) 132 | 133 | # Add images 134 | for i in image_ids: 135 | self.add_image( 136 | "coco", image_id=i, 137 | path=os.path.join(image_dir, coco.imgs[i]['file_name']), 138 | width=coco.imgs[i]["width"], 139 | height=coco.imgs[i]["height"], 140 | annotations=coco.loadAnns(coco.getAnnIds( 141 | imgIds=[i], catIds=class_ids, iscrowd=None))) 142 | if return_coco: 143 | return coco 144 | 145 | def auto_download(self, dataDir, dataType, dataYear): 146 | """Download the COCO dataset/annotations if requested. 147 | dataDir: The root directory of the COCO dataset. 148 | dataType: What to load (train, val, minival, valminusminival) 149 | dataYear: What dataset year to load (2014, 2017) as a string, not an integer 150 | Note: 151 | For 2014, use "train", "val", "minival", or "valminusminival" 152 | For 2017, only "train" and "val" annotations are available 153 | """ 154 | 155 | # Setup paths and file names 156 | if dataType == "minival" or dataType == "valminusminival": 157 | imgDir = "{}/{}{}".format(dataDir, "val", dataYear) 158 | imgZipFile = "{}/{}{}.zip".format(dataDir, "val", dataYear) 159 | imgURL = "http://images.cocodataset.org/zips/{}{}.zip".format("val", dataYear) 160 | else: 161 | imgDir = "{}/{}{}".format(dataDir, dataType, dataYear) 162 | imgZipFile = "{}/{}{}.zip".format(dataDir, dataType, dataYear) 163 | imgURL = "http://images.cocodataset.org/zips/{}{}.zip".format(dataType, dataYear) 164 | # print("Image paths:"); print(imgDir); print(imgZipFile); print(imgURL) 165 | 166 | # Create main folder if it doesn't exist yet 167 | if not os.path.exists(dataDir): 168 | os.makedirs(dataDir) 169 | 170 | # Download images if not available locally 171 | if not os.path.exists(imgDir): 172 | os.makedirs(imgDir) 173 | print("Downloading images to " + imgZipFile + " ...") 174 | with urllib.request.urlopen(imgURL) as resp, open(imgZipFile, 'wb') as out: 175 | shutil.copyfileobj(resp, out) 176 | print("... done downloading.") 177 | print("Unzipping " + imgZipFile) 178 | with zipfile.ZipFile(imgZipFile, "r") as zip_ref: 179 | zip_ref.extractall(dataDir) 180 | print("... done unzipping") 181 | print("Will use images in " + imgDir) 182 | 183 | # Setup annotations data paths 184 | annDir = "{}/annotations".format(dataDir) 185 | if dataType == "minival": 186 | annZipFile = "{}/instances_minival2014.json.zip".format(dataDir) 187 | annFile = "{}/instances_minival2014.json".format(annDir) 188 | annURL = "https://dl.dropboxusercontent.com/s/o43o90bna78omob/instances_minival2014.json.zip?dl=0" 189 | unZipDir = annDir 190 | elif dataType == "valminusminival": 191 | annZipFile = "{}/instances_valminusminival2014.json.zip".format(dataDir) 192 | annFile = "{}/instances_valminusminival2014.json".format(annDir) 193 | annURL = "https://dl.dropboxusercontent.com/s/s3tw5zcg7395368/instances_valminusminival2014.json.zip?dl=0" 194 | unZipDir = annDir 195 | else: 196 | annZipFile = "{}/annotations_trainval{}.zip".format(dataDir, dataYear) 197 | annFile = "{}/instances_{}{}.json".format(annDir, dataType, dataYear) 198 | annURL = "http://images.cocodataset.org/annotations/annotations_trainval{}.zip".format(dataYear) 199 | unZipDir = dataDir 200 | # print("Annotations paths:"); print(annDir); print(annFile); print(annZipFile); print(annURL) 201 | 202 | # Download annotations if not available locally 203 | if not os.path.exists(annDir): 204 | os.makedirs(annDir) 205 | if not os.path.exists(annFile): 206 | if not os.path.exists(annZipFile): 207 | print("Downloading zipped annotations to " + annZipFile + " ...") 208 | with urllib.request.urlopen(annURL) as resp, open(annZipFile, 'wb') as out: 209 | shutil.copyfileobj(resp, out) 210 | print("... done downloading.") 211 | print("Unzipping " + annZipFile) 212 | with zipfile.ZipFile(annZipFile, "r") as zip_ref: 213 | zip_ref.extractall(unZipDir) 214 | print("... done unzipping") 215 | print("Will use annotations in " + annFile) 216 | 217 | def load_mask(self, image_id): 218 | """Load instance masks for the given image. 219 | 220 | Different datasets use different ways to store masks. This 221 | function converts the different mask format to one format 222 | in the form of a bitmap [height, width, instances]. 223 | 224 | Returns: 225 | masks: A bool array of shape [height, width, instance count] with 226 | one mask per instance. 227 | class_ids: a 1D array of class IDs of the instance masks. 228 | """ 229 | # If not a COCO image, delegate to parent class. 230 | image_info = self.image_info[image_id] 231 | if image_info["source"] != "coco": 232 | return super(CocoDataset, self).load_mask(image_id) 233 | 234 | instance_masks = [] 235 | class_ids = [] 236 | annotations = self.image_info[image_id]["annotations"] 237 | # Build mask of shape [height, width, instance_count] and list 238 | # of class IDs that correspond to each channel of the mask. 239 | for annotation in annotations: 240 | class_id = self.map_source_class_id( 241 | "coco.{}".format(annotation['category_id'])) 242 | if class_id: 243 | m = self.annToMask(annotation, image_info["height"], 244 | image_info["width"]) 245 | # Some objects are so small that they're less than 1 pixel area 246 | # and end up rounded out. Skip those objects. 247 | if m.max() < 1: 248 | continue 249 | # Is it a crowd? If so, use a negative class ID. 250 | if annotation['iscrowd']: 251 | # Use negative class ID for crowds 252 | class_id *= -1 253 | # For crowd masks, annToMask() sometimes returns a mask 254 | # smaller than the given dimensions. If so, resize it. 255 | if m.shape[0] != image_info["height"] or m.shape[1] != image_info["width"]: 256 | m = np.ones([image_info["height"], image_info["width"]], dtype=bool) 257 | instance_masks.append(m) 258 | class_ids.append(class_id) 259 | 260 | # Pack instance masks into an array 261 | if class_ids: 262 | mask = np.stack(instance_masks, axis=2) 263 | class_ids = np.array(class_ids, dtype=np.int32) 264 | return mask, class_ids 265 | else: 266 | # Call super class to return an empty mask 267 | return super(CocoDataset, self).load_mask(image_id) 268 | 269 | def image_reference(self, image_id): 270 | """Return a link to the image in the COCO Website.""" 271 | info = self.image_info[image_id] 272 | if info["source"] == "coco": 273 | return "http://cocodataset.org/#explore?id={}".format(info["id"]) 274 | else: 275 | super(CocoDataset, self).image_reference(image_id) 276 | 277 | # The following two functions are from pycocotools with a few changes. 278 | 279 | def annToRLE(self, ann, height, width): 280 | """ 281 | Convert annotation which can be polygons, uncompressed RLE to RLE. 282 | :return: binary mask (numpy 2D array) 283 | """ 284 | segm = ann['segmentation'] 285 | if isinstance(segm, list): 286 | # polygon -- a single object might consist of multiple parts 287 | # we merge all parts into one mask rle code 288 | rles = maskUtils.frPyObjects(segm, height, width) 289 | rle = maskUtils.merge(rles) 290 | elif isinstance(segm['counts'], list): 291 | # uncompressed RLE 292 | rle = maskUtils.frPyObjects(segm, height, width) 293 | else: 294 | # rle 295 | rle = ann['segmentation'] 296 | return rle 297 | 298 | def annToMask(self, ann, height, width): 299 | """ 300 | Convert annotation which can be polygons, uncompressed RLE, or RLE to binary mask. 301 | :return: binary mask (numpy 2D array) 302 | """ 303 | rle = self.annToRLE(ann, height, width) 304 | m = maskUtils.decode(rle) 305 | return m 306 | 307 | 308 | ############################################################ 309 | # COCO Evaluation 310 | ############################################################ 311 | 312 | def build_coco_results(dataset, image_ids, rois, class_ids, scores, masks): 313 | """Arrange resutls to match COCO specs in http://cocodataset.org/#format 314 | """ 315 | # If no results, return an empty list 316 | if rois is None: 317 | return [] 318 | 319 | results = [] 320 | for image_id in image_ids: 321 | # Loop through detections 322 | for i in range(rois.shape[0]): 323 | class_id = class_ids[i] 324 | score = scores[i] 325 | bbox = np.around(rois[i], 1) 326 | mask = masks[:, :, i] 327 | 328 | result = { 329 | "image_id": image_id, 330 | "category_id": dataset.get_source_class_id(class_id, "coco"), 331 | "bbox": [bbox[1], bbox[0], bbox[3] - bbox[1], bbox[2] - bbox[0]], 332 | "score": score, 333 | "segmentation": maskUtils.encode(np.asfortranarray(mask)) 334 | } 335 | results.append(result) 336 | return results 337 | 338 | 339 | def evaluate_coco(model, dataset, coco, eval_type="bbox", limit=0, image_ids=None): 340 | """Runs official COCO evaluation. 341 | dataset: A Dataset object with valiadtion data 342 | eval_type: "bbox" or "segm" for bounding box or segmentation evaluation 343 | limit: if not 0, it's the number of images to use for evaluation 344 | """ 345 | # Pick COCO images from the dataset 346 | image_ids = image_ids or dataset.image_ids 347 | 348 | # Limit to a subset 349 | if limit: 350 | image_ids = image_ids[:limit] 351 | 352 | # Get corresponding COCO image IDs. 353 | coco_image_ids = [dataset.image_info[id]["id"] for id in image_ids] 354 | 355 | t_prediction = 0 356 | t_start = time.time() 357 | 358 | results = [] 359 | for i, image_id in enumerate(image_ids): 360 | # Load image 361 | image = dataset.load_image(image_id) 362 | 363 | # Run detection 364 | t = time.time() 365 | r = model.detect([image], verbose=0)[0] 366 | t_prediction += (time.time() - t) 367 | 368 | # Convert results to COCO format 369 | image_results = build_coco_results(dataset, coco_image_ids[i:i + 1], 370 | r["rois"], r["class_ids"], 371 | r["scores"], r["masks"]) 372 | results.extend(image_results) 373 | 374 | # Load results. This modifies results with additional attributes. 375 | coco_results = coco.loadRes(results) 376 | 377 | # Evaluate 378 | cocoEval = COCOeval(coco, coco_results, eval_type) 379 | cocoEval.params.imgIds = coco_image_ids 380 | cocoEval.evaluate() 381 | cocoEval.accumulate() 382 | cocoEval.summarize() 383 | 384 | print("Prediction time: {}. Average {}/image".format( 385 | t_prediction, t_prediction / len(image_ids))) 386 | print("Total time: ", time.time() - t_start) 387 | 388 | 389 | ############################################################ 390 | # Training 391 | ############################################################ 392 | 393 | 394 | if __name__ == '__main__': 395 | import argparse 396 | 397 | # Parse command line arguments 398 | parser = argparse.ArgumentParser( 399 | description='Train Mask R-CNN on MS COCO.') 400 | parser.add_argument("command", 401 | metavar="", 402 | help="'train' or 'evaluate' on MS COCO") 403 | parser.add_argument('--dataset', required=True, 404 | metavar="/path/to/coco/", 405 | help='Directory of the MS-COCO dataset') 406 | parser.add_argument('--year', required=False, 407 | default=DEFAULT_DATASET_YEAR, 408 | metavar="", 409 | help='Year of the MS-COCO dataset (2014 or 2017) (default=2014)') 410 | parser.add_argument('--model', required=True, 411 | metavar="/path/to/weights.h5", 412 | help="Path to weights .h5 file or 'coco'") 413 | parser.add_argument('--logs', required=False, 414 | default=DEFAULT_LOGS_DIR, 415 | metavar="/path/to/logs/", 416 | help='Logs and checkpoints directory (default=logs/)') 417 | parser.add_argument('--limit', required=False, 418 | default=500, 419 | metavar="", 420 | help='Images to use for evaluation (default=500)') 421 | parser.add_argument('--download', required=False, 422 | default=False, 423 | metavar="", 424 | help='Automatically download and unzip MS-COCO files (default=False)', 425 | type=bool) 426 | args = parser.parse_args() 427 | print("Command: ", args.command) 428 | print("Model: ", args.model) 429 | print("Dataset: ", args.dataset) 430 | print("Year: ", args.year) 431 | print("Logs: ", args.logs) 432 | print("Auto Download: ", args.download) 433 | 434 | # Configurations 435 | if args.command == "train": 436 | config = CocoConfig() 437 | else: 438 | class InferenceConfig(CocoConfig): 439 | # Set batch size to 1 since we'll be running inference on 440 | # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU 441 | GPU_COUNT = 1 442 | IMAGES_PER_GPU = 1 443 | DETECTION_MIN_CONFIDENCE = 0 444 | config = InferenceConfig() 445 | config.display() 446 | 447 | # Create model 448 | if args.command == "train": 449 | model = modellib.MaskRCNN(mode="training", config=config, 450 | model_dir=args.logs) 451 | else: 452 | model = modellib.MaskRCNN(mode="inference", config=config, 453 | model_dir=args.logs) 454 | 455 | # Select weights file to load 456 | if args.model.lower() == "coco": 457 | model_path = COCO_MODEL_PATH 458 | elif args.model.lower() == "last": 459 | # Find last trained weights 460 | model_path = model.find_last()[1] 461 | elif args.model.lower() == "imagenet": 462 | # Start from ImageNet trained weights 463 | model_path = model.get_imagenet_weights() 464 | else: 465 | model_path = args.model 466 | 467 | # Load weights 468 | print("Loading weights ", model_path) 469 | model.load_weights(model_path, by_name=True) 470 | 471 | # Train or evaluate 472 | if args.command == "train": 473 | # Training dataset. Use the training set and 35K from the 474 | # validation set, as as in the Mask RCNN paper. 475 | dataset_train = CocoDataset() 476 | dataset_train.load_coco(args.dataset, "train", year=args.year, auto_download=args.download) 477 | dataset_train.load_coco(args.dataset, "valminusminival", year=args.year, auto_download=args.download) 478 | dataset_train.prepare() 479 | 480 | # Validation dataset 481 | dataset_val = CocoDataset() 482 | dataset_val.load_coco(args.dataset, "minival", year=args.year, auto_download=args.download) 483 | dataset_val.prepare() 484 | 485 | # *** This training schedule is an example. Update to your needs *** 486 | 487 | # Training - Stage 1 488 | print("Training network heads") 489 | model.train(dataset_train, dataset_val, 490 | learning_rate=config.LEARNING_RATE, 491 | epochs=40, 492 | layers='heads') 493 | 494 | # Training - Stage 2 495 | # Finetune layers from ResNet stage 4 and up 496 | print("Fine tune Resnet stage 4 and up") 497 | model.train(dataset_train, dataset_val, 498 | learning_rate=config.LEARNING_RATE, 499 | epochs=120, 500 | layers='4+') 501 | 502 | # Training - Stage 3 503 | # Fine tune all layers 504 | print("Fine tune all layers") 505 | model.train(dataset_train, dataset_val, 506 | learning_rate=config.LEARNING_RATE / 10, 507 | epochs=160, 508 | layers='all') 509 | 510 | elif args.command == "evaluate": 511 | # Validation dataset 512 | dataset_val = CocoDataset() 513 | coco = dataset_val.load_coco(args.dataset, "minival", year=args.year, return_coco=True, auto_download=args.download) 514 | dataset_val.prepare() 515 | print("Running COCO evaluation on {} images.".format(args.limit)) 516 | evaluate_coco(model, dataset_val, coco, "bbox", limit=int(args.limit)) 517 | else: 518 | print("'{}' is not recognized. " 519 | "Use 'train' or 'evaluate'".format(args.command)) 520 | -------------------------------------------------------------------------------- /config.py: -------------------------------------------------------------------------------- 1 | """ 2 | Mask R-CNN 3 | Base Configurations class. 4 | 5 | Copyright (c) 2017 Matterport, Inc. 6 | Licensed under the MIT License (see LICENSE for details) 7 | Written by Waleed Abdulla 8 | """ 9 | 10 | import math 11 | import numpy as np 12 | 13 | 14 | # Base Configuration Class 15 | # Don't use this class directly. Instead, sub-class it and override 16 | # the configurations you need to change. 17 | 18 | class Config(object): 19 | """Base configuration class. For custom configurations, create a 20 | sub-class that inherits from this one and override properties 21 | that need to be changed. 22 | """ 23 | # Name the configurations. For example, 'COCO', 'Experiment 3', ...etc. 24 | # Useful if your code needs to do things differently depending on which 25 | # experiment is running. 26 | NAME = None # Override in sub-classes 27 | 28 | # NUMBER OF GPUs to use. For CPU training, use 1 29 | GPU_COUNT = 1 30 | 31 | # Number of images to train with on each GPU. A 12GB GPU can typically 32 | # handle 2 images of 1024x1024px. 33 | # Adjust based on your GPU memory and image sizes. Use the highest 34 | # number that your GPU can handle for best performance. 35 | IMAGES_PER_GPU = 2 36 | 37 | # Number of training steps per epoch 38 | # This doesn't need to match the size of the training set. Tensorboard 39 | # updates are saved at the end of each epoch, so setting this to a 40 | # smaller number means getting more frequent TensorBoard updates. 41 | # Validation stats are also calculated at each epoch end and they 42 | # might take a while, so don't set this too small to avoid spending 43 | # a lot of time on validation stats. 44 | STEPS_PER_EPOCH = 1000 45 | 46 | # Number of validation steps to run at the end of every training epoch. 47 | # A bigger number improves accuracy of validation stats, but slows 48 | # down the training. 49 | VALIDATION_STEPS = 50 50 | 51 | # Backbone network architecture 52 | # Supported values are: resnet50, resnet101 53 | BACKBONE = "resnet101" 54 | 55 | # The strides of each layer of the FPN Pyramid. These values 56 | # are based on a Resnet101 backbone. 57 | BACKBONE_STRIDES = [4, 8, 16, 32, 64] 58 | 59 | # Number of classification classes (including background) 60 | NUM_CLASSES = 1 # Override in sub-classes 61 | 62 | # Length of square anchor side in pixels 63 | RPN_ANCHOR_SCALES = (32, 64, 128, 256, 512) 64 | 65 | # Ratios of anchors at each cell (width/height) 66 | # A value of 1 represents a square anchor, and 0.5 is a wide anchor 67 | RPN_ANCHOR_RATIOS = [0.5, 1, 2] 68 | 69 | # Anchor stride 70 | # If 1 then anchors are created for each cell in the backbone feature map. 71 | # If 2, then anchors are created for every other cell, and so on. 72 | RPN_ANCHOR_STRIDE = 1 73 | 74 | # Non-max suppression threshold to filter RPN proposals. 75 | # You can reduce this during training to generate more propsals. 76 | RPN_NMS_THRESHOLD = 0.7 77 | 78 | # How many anchors per image to use for RPN training 79 | RPN_TRAIN_ANCHORS_PER_IMAGE = 256 80 | 81 | # ROIs kept after non-maximum supression (training and inference) 82 | POST_NMS_ROIS_TRAINING = 2000 83 | POST_NMS_ROIS_INFERENCE = 1000 84 | 85 | # If enabled, resizes instance masks to a smaller size to reduce 86 | # memory load. Recommended when using high-resolution images. 87 | USE_MINI_MASK = True 88 | MINI_MASK_SHAPE = (56, 56) # (height, width) of the mini-mask 89 | 90 | # Input image resing 91 | # Images are resized such that the smallest side is >= IMAGE_MIN_DIM and 92 | # the longest side is <= IMAGE_MAX_DIM. In case both conditions can't 93 | # be satisfied together the IMAGE_MAX_DIM is enforced. 94 | IMAGE_MIN_DIM = 800 95 | IMAGE_MAX_DIM = 1024 96 | # If True, pad images with zeros such that they're (max_dim by max_dim) 97 | IMAGE_PADDING = True # currently, the False option is not supported 98 | 99 | # Image mean (RGB) 100 | MEAN_PIXEL = np.array([123.7, 116.8, 103.9]) 101 | 102 | # Number of ROIs per image to feed to classifier/mask heads 103 | # The Mask RCNN paper uses 512 but often the RPN doesn't generate 104 | # enough positive proposals to fill this and keep a positive:negative 105 | # ratio of 1:3. You can increase the number of proposals by adjusting 106 | # the RPN NMS threshold. 107 | TRAIN_ROIS_PER_IMAGE = 200 108 | 109 | # Percent of positive ROIs used to train classifier/mask heads 110 | ROI_POSITIVE_RATIO = 0.33 111 | 112 | # Pooled ROIs 113 | POOL_SIZE = 7 114 | MASK_POOL_SIZE = 14 115 | MASK_SHAPE = [28, 28] 116 | 117 | # Maximum number of ground truth instances to use in one image 118 | MAX_GT_INSTANCES = 100 119 | 120 | # Bounding box refinement standard deviation for RPN and final detections. 121 | RPN_BBOX_STD_DEV = np.array([0.1, 0.1, 0.2, 0.2]) 122 | BBOX_STD_DEV = np.array([0.1, 0.1, 0.2, 0.2]) 123 | 124 | # Max number of final detections 125 | DETECTION_MAX_INSTANCES = 100 126 | 127 | # Minimum probability value to accept a detected instance 128 | # ROIs below this threshold are skipped 129 | DETECTION_MIN_CONFIDENCE = 0.7 130 | 131 | # Non-maximum suppression threshold for detection 132 | DETECTION_NMS_THRESHOLD = 0.3 133 | 134 | # Learning rate and momentum 135 | # The Mask RCNN paper uses lr=0.02, but on TensorFlow it causes 136 | # weights to explode. Likely due to differences in optimzer 137 | # implementation. 138 | LEARNING_RATE = 0.001 139 | LEARNING_MOMENTUM = 0.9 140 | 141 | # Weight decay regularization 142 | WEIGHT_DECAY = 0.0001 143 | 144 | # Use RPN ROIs or externally generated ROIs for training 145 | # Keep this True for most situations. Set to False if you want to train 146 | # the head branches on ROI generated by code rather than the ROIs from 147 | # the RPN. For example, to debug the classifier head without having to 148 | # train the RPN. 149 | USE_RPN_ROIS = True 150 | 151 | def __init__(self): 152 | """Set values of computed attributes.""" 153 | # Effective batch size 154 | self.BATCH_SIZE = self.IMAGES_PER_GPU * self.GPU_COUNT 155 | 156 | # Input image size 157 | self.IMAGE_SHAPE = np.array( 158 | [self.IMAGE_MAX_DIM, self.IMAGE_MAX_DIM, 3]) 159 | 160 | # Compute backbone size from input image size 161 | self.BACKBONE_SHAPES = np.array( 162 | [[int(math.ceil(self.IMAGE_SHAPE[0] / stride)), 163 | int(math.ceil(self.IMAGE_SHAPE[1] / stride))] 164 | for stride in self.BACKBONE_STRIDES]) 165 | 166 | def display(self): 167 | """Display Configuration values.""" 168 | print("\nConfigurations:") 169 | for a in dir(self): 170 | if not a.startswith("__") and not callable(getattr(self, a)): 171 | print("{:30} {}".format(a, getattr(self, a))) 172 | print("\n") 173 | -------------------------------------------------------------------------------- /docker-entrypoint.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | set -e 3 | xvfb-run python -W ignore /app/person_blocker.py -m /app/mask_rcnn_coco.h5 "$@" 4 | -------------------------------------------------------------------------------- /example_output/img1_blocked.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/minimaxir/person-blocker/82cc1bab629ff9faf610861bf94660d0131c38ec/example_output/img1_blocked.png -------------------------------------------------------------------------------- /example_output/img2_blocked.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/minimaxir/person-blocker/82cc1bab629ff9faf610861bf94660d0131c38ec/example_output/img2_blocked.png -------------------------------------------------------------------------------- /example_output/img3_blocked.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/minimaxir/person-blocker/82cc1bab629ff9faf610861bf94660d0131c38ec/example_output/img3_blocked.png -------------------------------------------------------------------------------- /example_output/img4_blocked.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/minimaxir/person-blocker/82cc1bab629ff9faf610861bf94660d0131c38ec/example_output/img4_blocked.gif -------------------------------------------------------------------------------- /example_output/img4_blocked.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/minimaxir/person-blocker/82cc1bab629ff9faf610861bf94660d0131c38ec/example_output/img4_blocked.png -------------------------------------------------------------------------------- /example_output/img4_labels.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/minimaxir/person-blocker/82cc1bab629ff9faf610861bf94660d0131c38ec/example_output/img4_labels.png -------------------------------------------------------------------------------- /images/img1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/minimaxir/person-blocker/82cc1bab629ff9faf610861bf94660d0131c38ec/images/img1.jpg -------------------------------------------------------------------------------- /images/img2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/minimaxir/person-blocker/82cc1bab629ff9faf610861bf94660d0131c38ec/images/img2.jpg -------------------------------------------------------------------------------- /images/img3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/minimaxir/person-blocker/82cc1bab629ff9faf610861bf94660d0131c38ec/images/img3.jpg -------------------------------------------------------------------------------- /images/img4.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/minimaxir/person-blocker/82cc1bab629ff9faf610861bf94660d0131c38ec/images/img4.jpg -------------------------------------------------------------------------------- /model.py: -------------------------------------------------------------------------------- 1 | """ 2 | Mask R-CNN 3 | The main Mask R-CNN model implemenetation. 4 | 5 | Copyright (c) 2017 Matterport, Inc. 6 | Licensed under the MIT License (see LICENSE for details) 7 | Written by Waleed Abdulla 8 | """ 9 | 10 | import os 11 | import sys 12 | import glob 13 | import random 14 | import math 15 | import datetime 16 | import itertools 17 | import json 18 | import re 19 | import logging 20 | from collections import OrderedDict 21 | import numpy as np 22 | import scipy.misc 23 | import tensorflow as tf 24 | import keras 25 | import keras.backend as K 26 | import keras.layers as KL 27 | import keras.initializers as KI 28 | import keras.engine as KE 29 | import keras.models as KM 30 | 31 | import utils 32 | 33 | # Requires TensorFlow 1.3+ and Keras 2.0.8+. 34 | from distutils.version import LooseVersion 35 | assert LooseVersion(tf.__version__) >= LooseVersion("1.3") 36 | assert LooseVersion(keras.__version__) >= LooseVersion('2.0.8') 37 | 38 | 39 | ############################################################ 40 | # Utility Functions 41 | ############################################################ 42 | 43 | def log(text, array=None): 44 | """Prints a text message. And, optionally, if a Numpy array is provided it 45 | prints it's shape, min, and max values. 46 | """ 47 | if array is not None: 48 | text = text.ljust(25) 49 | text += ("shape: {:20} min: {:10.5f} max: {:10.5f}".format( 50 | str(array.shape), 51 | array.min() if array.size else "", 52 | array.max() if array.size else "")) 53 | print(text) 54 | 55 | 56 | class BatchNorm(KL.BatchNormalization): 57 | """Batch Normalization class. Subclasses the Keras BN class and 58 | hardcodes training=False so the BN layer doesn't update 59 | during training. 60 | 61 | Batch normalization has a negative effect on training if batches are small 62 | so we disable it here. 63 | """ 64 | 65 | def call(self, inputs, training=None): 66 | return super(self.__class__, self).call(inputs, training=False) 67 | 68 | 69 | ############################################################ 70 | # Resnet Graph 71 | ############################################################ 72 | 73 | # Code adopted from: 74 | # https://github.com/fchollet/deep-learning-models/blob/master/resnet50.py 75 | 76 | def identity_block(input_tensor, kernel_size, filters, stage, block, 77 | use_bias=True): 78 | """The identity_block is the block that has no conv layer at shortcut 79 | # Arguments 80 | input_tensor: input tensor 81 | kernel_size: defualt 3, the kernel size of middle conv layer at main path 82 | filters: list of integers, the nb_filters of 3 conv layer at main path 83 | stage: integer, current stage label, used for generating layer names 84 | block: 'a','b'..., current block label, used for generating layer names 85 | """ 86 | nb_filter1, nb_filter2, nb_filter3 = filters 87 | conv_name_base = 'res' + str(stage) + block + '_branch' 88 | bn_name_base = 'bn' + str(stage) + block + '_branch' 89 | 90 | x = KL.Conv2D(nb_filter1, (1, 1), name=conv_name_base + '2a', 91 | use_bias=use_bias)(input_tensor) 92 | x = BatchNorm(axis=3, name=bn_name_base + '2a')(x) 93 | x = KL.Activation('relu')(x) 94 | 95 | x = KL.Conv2D(nb_filter2, (kernel_size, kernel_size), padding='same', 96 | name=conv_name_base + '2b', use_bias=use_bias)(x) 97 | x = BatchNorm(axis=3, name=bn_name_base + '2b')(x) 98 | x = KL.Activation('relu')(x) 99 | 100 | x = KL.Conv2D(nb_filter3, (1, 1), name=conv_name_base + '2c', 101 | use_bias=use_bias)(x) 102 | x = BatchNorm(axis=3, name=bn_name_base + '2c')(x) 103 | 104 | x = KL.Add()([x, input_tensor]) 105 | x = KL.Activation('relu', name='res' + str(stage) + block + '_out')(x) 106 | return x 107 | 108 | 109 | def conv_block(input_tensor, kernel_size, filters, stage, block, 110 | strides=(2, 2), use_bias=True): 111 | """conv_block is the block that has a conv layer at shortcut 112 | # Arguments 113 | input_tensor: input tensor 114 | kernel_size: defualt 3, the kernel size of middle conv layer at main path 115 | filters: list of integers, the nb_filters of 3 conv layer at main path 116 | stage: integer, current stage label, used for generating layer names 117 | block: 'a','b'..., current block label, used for generating layer names 118 | Note that from stage 3, the first conv layer at main path is with subsample=(2,2) 119 | And the shortcut should have subsample=(2,2) as well 120 | """ 121 | nb_filter1, nb_filter2, nb_filter3 = filters 122 | conv_name_base = 'res' + str(stage) + block + '_branch' 123 | bn_name_base = 'bn' + str(stage) + block + '_branch' 124 | 125 | x = KL.Conv2D(nb_filter1, (1, 1), strides=strides, 126 | name=conv_name_base + '2a', use_bias=use_bias)(input_tensor) 127 | x = BatchNorm(axis=3, name=bn_name_base + '2a')(x) 128 | x = KL.Activation('relu')(x) 129 | 130 | x = KL.Conv2D(nb_filter2, (kernel_size, kernel_size), padding='same', 131 | name=conv_name_base + '2b', use_bias=use_bias)(x) 132 | x = BatchNorm(axis=3, name=bn_name_base + '2b')(x) 133 | x = KL.Activation('relu')(x) 134 | 135 | x = KL.Conv2D(nb_filter3, (1, 1), name=conv_name_base + 136 | '2c', use_bias=use_bias)(x) 137 | x = BatchNorm(axis=3, name=bn_name_base + '2c')(x) 138 | 139 | shortcut = KL.Conv2D(nb_filter3, (1, 1), strides=strides, 140 | name=conv_name_base + '1', use_bias=use_bias)(input_tensor) 141 | shortcut = BatchNorm(axis=3, name=bn_name_base + '1')(shortcut) 142 | 143 | x = KL.Add()([x, shortcut]) 144 | x = KL.Activation('relu', name='res' + str(stage) + block + '_out')(x) 145 | return x 146 | 147 | 148 | def resnet_graph(input_image, architecture, stage5=False): 149 | assert architecture in ["resnet50", "resnet101"] 150 | # Stage 1 151 | x = KL.ZeroPadding2D((3, 3))(input_image) 152 | x = KL.Conv2D(64, (7, 7), strides=(2, 2), name='conv1', use_bias=True)(x) 153 | x = BatchNorm(axis=3, name='bn_conv1')(x) 154 | x = KL.Activation('relu')(x) 155 | C1 = x = KL.MaxPooling2D((3, 3), strides=(2, 2), padding="same")(x) 156 | # Stage 2 157 | x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1)) 158 | x = identity_block(x, 3, [64, 64, 256], stage=2, block='b') 159 | C2 = x = identity_block(x, 3, [64, 64, 256], stage=2, block='c') 160 | # Stage 3 161 | x = conv_block(x, 3, [128, 128, 512], stage=3, block='a') 162 | x = identity_block(x, 3, [128, 128, 512], stage=3, block='b') 163 | x = identity_block(x, 3, [128, 128, 512], stage=3, block='c') 164 | C3 = x = identity_block(x, 3, [128, 128, 512], stage=3, block='d') 165 | # Stage 4 166 | x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a') 167 | block_count = {"resnet50": 5, "resnet101": 22}[architecture] 168 | for i in range(block_count): 169 | x = identity_block(x, 3, [256, 256, 1024], stage=4, block=chr(98 + i)) 170 | C4 = x 171 | # Stage 5 172 | if stage5: 173 | x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a') 174 | x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b') 175 | C5 = x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c') 176 | else: 177 | C5 = None 178 | return [C1, C2, C3, C4, C5] 179 | 180 | 181 | ############################################################ 182 | # Proposal Layer 183 | ############################################################ 184 | 185 | def apply_box_deltas_graph(boxes, deltas): 186 | """Applies the given deltas to the given boxes. 187 | boxes: [N, 4] where each row is y1, x1, y2, x2 188 | deltas: [N, 4] where each row is [dy, dx, log(dh), log(dw)] 189 | """ 190 | # Convert to y, x, h, w 191 | height = boxes[:, 2] - boxes[:, 0] 192 | width = boxes[:, 3] - boxes[:, 1] 193 | center_y = boxes[:, 0] + 0.5 * height 194 | center_x = boxes[:, 1] + 0.5 * width 195 | # Apply deltas 196 | center_y += deltas[:, 0] * height 197 | center_x += deltas[:, 1] * width 198 | height *= tf.exp(deltas[:, 2]) 199 | width *= tf.exp(deltas[:, 3]) 200 | # Convert back to y1, x1, y2, x2 201 | y1 = center_y - 0.5 * height 202 | x1 = center_x - 0.5 * width 203 | y2 = y1 + height 204 | x2 = x1 + width 205 | result = tf.stack([y1, x1, y2, x2], axis=1, name="apply_box_deltas_out") 206 | return result 207 | 208 | 209 | def clip_boxes_graph(boxes, window): 210 | """ 211 | boxes: [N, 4] each row is y1, x1, y2, x2 212 | window: [4] in the form y1, x1, y2, x2 213 | """ 214 | # Split corners 215 | wy1, wx1, wy2, wx2 = tf.split(window, 4) 216 | y1, x1, y2, x2 = tf.split(boxes, 4, axis=1) 217 | # Clip 218 | y1 = tf.maximum(tf.minimum(y1, wy2), wy1) 219 | x1 = tf.maximum(tf.minimum(x1, wx2), wx1) 220 | y2 = tf.maximum(tf.minimum(y2, wy2), wy1) 221 | x2 = tf.maximum(tf.minimum(x2, wx2), wx1) 222 | clipped = tf.concat([y1, x1, y2, x2], axis=1, name="clipped_boxes") 223 | clipped.set_shape((clipped.shape[0], 4)) 224 | return clipped 225 | 226 | 227 | class ProposalLayer(KE.Layer): 228 | """Receives anchor scores and selects a subset to pass as proposals 229 | to the second stage. Filtering is done based on anchor scores and 230 | non-max suppression to remove overlaps. It also applies bounding 231 | box refinement deltas to anchors. 232 | 233 | Inputs: 234 | rpn_probs: [batch, anchors, (bg prob, fg prob)] 235 | rpn_bbox: [batch, anchors, (dy, dx, log(dh), log(dw))] 236 | 237 | Returns: 238 | Proposals in normalized coordinates [batch, rois, (y1, x1, y2, x2)] 239 | """ 240 | 241 | def __init__(self, proposal_count, nms_threshold, anchors, 242 | config=None, **kwargs): 243 | """ 244 | anchors: [N, (y1, x1, y2, x2)] anchors defined in image coordinates 245 | """ 246 | super(ProposalLayer, self).__init__(**kwargs) 247 | self.config = config 248 | self.proposal_count = proposal_count 249 | self.nms_threshold = nms_threshold 250 | self.anchors = anchors.astype(np.float32) 251 | 252 | def call(self, inputs): 253 | # Box Scores. Use the foreground class confidence. [Batch, num_rois, 1] 254 | scores = inputs[0][:, :, 1] 255 | # Box deltas [batch, num_rois, 4] 256 | deltas = inputs[1] 257 | deltas = deltas * np.reshape(self.config.RPN_BBOX_STD_DEV, [1, 1, 4]) 258 | # Base anchors 259 | anchors = self.anchors 260 | 261 | # Improve performance by trimming to top anchors by score 262 | # and doing the rest on the smaller subset. 263 | pre_nms_limit = min(6000, self.anchors.shape[0]) 264 | ix = tf.nn.top_k(scores, pre_nms_limit, sorted=True, 265 | name="top_anchors").indices 266 | scores = utils.batch_slice([scores, ix], lambda x, y: tf.gather(x, y), 267 | self.config.IMAGES_PER_GPU) 268 | deltas = utils.batch_slice([deltas, ix], lambda x, y: tf.gather(x, y), 269 | self.config.IMAGES_PER_GPU) 270 | anchors = utils.batch_slice(ix, lambda x: tf.gather(anchors, x), 271 | self.config.IMAGES_PER_GPU, 272 | names=["pre_nms_anchors"]) 273 | 274 | # Apply deltas to anchors to get refined anchors. 275 | # [batch, N, (y1, x1, y2, x2)] 276 | boxes = utils.batch_slice([anchors, deltas], 277 | lambda x, y: apply_box_deltas_graph(x, y), 278 | self.config.IMAGES_PER_GPU, 279 | names=["refined_anchors"]) 280 | 281 | # Clip to image boundaries. [batch, N, (y1, x1, y2, x2)] 282 | height, width = self.config.IMAGE_SHAPE[:2] 283 | window = np.array([0, 0, height, width]).astype(np.float32) 284 | boxes = utils.batch_slice(boxes, 285 | lambda x: clip_boxes_graph(x, window), 286 | self.config.IMAGES_PER_GPU, 287 | names=["refined_anchors_clipped"]) 288 | 289 | # Filter out small boxes 290 | # According to Xinlei Chen's paper, this reduces detection accuracy 291 | # for small objects, so we're skipping it. 292 | 293 | # Normalize dimensions to range of 0 to 1. 294 | normalized_boxes = boxes / np.array([[height, width, height, width]]) 295 | 296 | # Non-max suppression 297 | def nms(normalized_boxes, scores): 298 | indices = tf.image.non_max_suppression( 299 | normalized_boxes, scores, self.proposal_count, 300 | self.nms_threshold, name="rpn_non_max_suppression") 301 | proposals = tf.gather(normalized_boxes, indices) 302 | # Pad if needed 303 | padding = tf.maximum(self.proposal_count - tf.shape(proposals)[0], 0) 304 | proposals = tf.pad(proposals, [(0, padding), (0, 0)]) 305 | return proposals 306 | proposals = utils.batch_slice([normalized_boxes, scores], nms, 307 | self.config.IMAGES_PER_GPU) 308 | return proposals 309 | 310 | def compute_output_shape(self, input_shape): 311 | return (None, self.proposal_count, 4) 312 | 313 | 314 | ############################################################ 315 | # ROIAlign Layer 316 | ############################################################ 317 | 318 | def log2_graph(x): 319 | """Implementatin of Log2. TF doesn't have a native implemenation.""" 320 | return tf.log(x) / tf.log(2.0) 321 | 322 | 323 | class PyramidROIAlign(KE.Layer): 324 | """Implements ROI Pooling on multiple levels of the feature pyramid. 325 | 326 | Params: 327 | - pool_shape: [height, width] of the output pooled regions. Usually [7, 7] 328 | - image_shape: [height, width, channels]. Shape of input image in pixels 329 | 330 | Inputs: 331 | - boxes: [batch, num_boxes, (y1, x1, y2, x2)] in normalized 332 | coordinates. Possibly padded with zeros if not enough 333 | boxes to fill the array. 334 | - Feature maps: List of feature maps from different levels of the pyramid. 335 | Each is [batch, height, width, channels] 336 | 337 | Output: 338 | Pooled regions in the shape: [batch, num_boxes, height, width, channels]. 339 | The width and height are those specific in the pool_shape in the layer 340 | constructor. 341 | """ 342 | 343 | def __init__(self, pool_shape, image_shape, **kwargs): 344 | super(PyramidROIAlign, self).__init__(**kwargs) 345 | self.pool_shape = tuple(pool_shape) 346 | self.image_shape = tuple(image_shape) 347 | 348 | def call(self, inputs): 349 | # Crop boxes [batch, num_boxes, (y1, x1, y2, x2)] in normalized coords 350 | boxes = inputs[0] 351 | 352 | # Feature Maps. List of feature maps from different level of the 353 | # feature pyramid. Each is [batch, height, width, channels] 354 | feature_maps = inputs[1:] 355 | 356 | # Assign each ROI to a level in the pyramid based on the ROI area. 357 | y1, x1, y2, x2 = tf.split(boxes, 4, axis=2) 358 | h = y2 - y1 359 | w = x2 - x1 360 | # Equation 1 in the Feature Pyramid Networks paper. Account for 361 | # the fact that our coordinates are normalized here. 362 | # e.g. a 224x224 ROI (in pixels) maps to P4 363 | image_area = tf.cast( 364 | self.image_shape[0] * self.image_shape[1], tf.float32) 365 | roi_level = log2_graph(tf.sqrt(h * w) / (224.0 / tf.sqrt(image_area))) 366 | roi_level = tf.minimum(5, tf.maximum( 367 | 2, 4 + tf.cast(tf.round(roi_level), tf.int32))) 368 | roi_level = tf.squeeze(roi_level, 2) 369 | 370 | # Loop through levels and apply ROI pooling to each. P2 to P5. 371 | pooled = [] 372 | box_to_level = [] 373 | for i, level in enumerate(range(2, 6)): 374 | ix = tf.where(tf.equal(roi_level, level)) 375 | level_boxes = tf.gather_nd(boxes, ix) 376 | 377 | # Box indicies for crop_and_resize. 378 | box_indices = tf.cast(ix[:, 0], tf.int32) 379 | 380 | # Keep track of which box is mapped to which level 381 | box_to_level.append(ix) 382 | 383 | # Stop gradient propogation to ROI proposals 384 | level_boxes = tf.stop_gradient(level_boxes) 385 | box_indices = tf.stop_gradient(box_indices) 386 | 387 | # Crop and Resize 388 | # From Mask R-CNN paper: "We sample four regular locations, so 389 | # that we can evaluate either max or average pooling. In fact, 390 | # interpolating only a single value at each bin center (without 391 | # pooling) is nearly as effective." 392 | # 393 | # Here we use the simplified approach of a single value per bin, 394 | # which is how it's done in tf.crop_and_resize() 395 | # Result: [batch * num_boxes, pool_height, pool_width, channels] 396 | pooled.append(tf.image.crop_and_resize( 397 | feature_maps[i], level_boxes, box_indices, self.pool_shape, 398 | method="bilinear")) 399 | 400 | # Pack pooled features into one tensor 401 | pooled = tf.concat(pooled, axis=0) 402 | 403 | # Pack box_to_level mapping into one array and add another 404 | # column representing the order of pooled boxes 405 | box_to_level = tf.concat(box_to_level, axis=0) 406 | box_range = tf.expand_dims(tf.range(tf.shape(box_to_level)[0]), 1) 407 | box_to_level = tf.concat([tf.cast(box_to_level, tf.int32), box_range], 408 | axis=1) 409 | 410 | # Rearrange pooled features to match the order of the original boxes 411 | # Sort box_to_level by batch then box index 412 | # TF doesn't have a way to sort by two columns, so merge them and sort. 413 | sorting_tensor = box_to_level[:, 0] * 100000 + box_to_level[:, 1] 414 | ix = tf.nn.top_k(sorting_tensor, k=tf.shape( 415 | box_to_level)[0]).indices[::-1] 416 | ix = tf.gather(box_to_level[:, 2], ix) 417 | pooled = tf.gather(pooled, ix) 418 | 419 | # Re-add the batch dimension 420 | pooled = tf.expand_dims(pooled, 0) 421 | return pooled 422 | 423 | def compute_output_shape(self, input_shape): 424 | return input_shape[0][:2] + self.pool_shape + (input_shape[1][-1], ) 425 | 426 | 427 | ############################################################ 428 | # Detection Target Layer 429 | ############################################################ 430 | 431 | def overlaps_graph(boxes1, boxes2): 432 | """Computes IoU overlaps between two sets of boxes. 433 | boxes1, boxes2: [N, (y1, x1, y2, x2)]. 434 | """ 435 | # 1. Tile boxes2 and repeate boxes1. This allows us to compare 436 | # every boxes1 against every boxes2 without loops. 437 | # TF doesn't have an equivalent to np.repeate() so simulate it 438 | # using tf.tile() and tf.reshape. 439 | b1 = tf.reshape(tf.tile(tf.expand_dims(boxes1, 1), 440 | [1, 1, tf.shape(boxes2)[0]]), [-1, 4]) 441 | b2 = tf.tile(boxes2, [tf.shape(boxes1)[0], 1]) 442 | # 2. Compute intersections 443 | b1_y1, b1_x1, b1_y2, b1_x2 = tf.split(b1, 4, axis=1) 444 | b2_y1, b2_x1, b2_y2, b2_x2 = tf.split(b2, 4, axis=1) 445 | y1 = tf.maximum(b1_y1, b2_y1) 446 | x1 = tf.maximum(b1_x1, b2_x1) 447 | y2 = tf.minimum(b1_y2, b2_y2) 448 | x2 = tf.minimum(b1_x2, b2_x2) 449 | intersection = tf.maximum(x2 - x1, 0) * tf.maximum(y2 - y1, 0) 450 | # 3. Compute unions 451 | b1_area = (b1_y2 - b1_y1) * (b1_x2 - b1_x1) 452 | b2_area = (b2_y2 - b2_y1) * (b2_x2 - b2_x1) 453 | union = b1_area + b2_area - intersection 454 | # 4. Compute IoU and reshape to [boxes1, boxes2] 455 | iou = intersection / union 456 | overlaps = tf.reshape(iou, [tf.shape(boxes1)[0], tf.shape(boxes2)[0]]) 457 | return overlaps 458 | 459 | 460 | def detection_targets_graph(proposals, gt_class_ids, gt_boxes, gt_masks, config): 461 | """Generates detection targets for one image. Subsamples proposals and 462 | generates target class IDs, bounding box deltas, and masks for each. 463 | 464 | Inputs: 465 | proposals: [N, (y1, x1, y2, x2)] in normalized coordinates. Might 466 | be zero padded if there are not enough proposals. 467 | gt_class_ids: [MAX_GT_INSTANCES] int class IDs 468 | gt_boxes: [MAX_GT_INSTANCES, (y1, x1, y2, x2)] in normalized coordinates. 469 | gt_masks: [height, width, MAX_GT_INSTANCES] of boolean type. 470 | 471 | Returns: Target ROIs and corresponding class IDs, bounding box shifts, 472 | and masks. 473 | rois: [TRAIN_ROIS_PER_IMAGE, (y1, x1, y2, x2)] in normalized coordinates 474 | class_ids: [TRAIN_ROIS_PER_IMAGE]. Integer class IDs. Zero padded. 475 | deltas: [TRAIN_ROIS_PER_IMAGE, NUM_CLASSES, (dy, dx, log(dh), log(dw))] 476 | Class-specific bbox refinements. 477 | masks: [TRAIN_ROIS_PER_IMAGE, height, width). Masks cropped to bbox 478 | boundaries and resized to neural network output size. 479 | 480 | Note: Returned arrays might be zero padded if not enough target ROIs. 481 | """ 482 | # Assertions 483 | asserts = [ 484 | tf.Assert(tf.greater(tf.shape(proposals)[0], 0), [proposals], 485 | name="roi_assertion"), 486 | ] 487 | with tf.control_dependencies(asserts): 488 | proposals = tf.identity(proposals) 489 | 490 | # Remove zero padding 491 | proposals, _ = trim_zeros_graph(proposals, name="trim_proposals") 492 | gt_boxes, non_zeros = trim_zeros_graph(gt_boxes, name="trim_gt_boxes") 493 | gt_class_ids = tf.boolean_mask(gt_class_ids, non_zeros, 494 | name="trim_gt_class_ids") 495 | gt_masks = tf.gather(gt_masks, tf.where(non_zeros)[:, 0], axis=2, 496 | name="trim_gt_masks") 497 | 498 | # Handle COCO crowds 499 | # A crowd box in COCO is a bounding box around several instances. Exclude 500 | # them from training. A crowd box is given a negative class ID. 501 | crowd_ix = tf.where(gt_class_ids < 0)[:, 0] 502 | non_crowd_ix = tf.where(gt_class_ids > 0)[:, 0] 503 | crowd_boxes = tf.gather(gt_boxes, crowd_ix) 504 | crowd_masks = tf.gather(gt_masks, crowd_ix, axis=2) 505 | gt_class_ids = tf.gather(gt_class_ids, non_crowd_ix) 506 | gt_boxes = tf.gather(gt_boxes, non_crowd_ix) 507 | gt_masks = tf.gather(gt_masks, non_crowd_ix, axis=2) 508 | 509 | # Compute overlaps matrix [proposals, gt_boxes] 510 | overlaps = overlaps_graph(proposals, gt_boxes) 511 | 512 | # Compute overlaps with crowd boxes [anchors, crowds] 513 | crowd_overlaps = overlaps_graph(proposals, crowd_boxes) 514 | crowd_iou_max = tf.reduce_max(crowd_overlaps, axis=1) 515 | no_crowd_bool = (crowd_iou_max < 0.001) 516 | 517 | # Determine postive and negative ROIs 518 | roi_iou_max = tf.reduce_max(overlaps, axis=1) 519 | # 1. Positive ROIs are those with >= 0.5 IoU with a GT box 520 | positive_roi_bool = (roi_iou_max >= 0.5) 521 | positive_indices = tf.where(positive_roi_bool)[:, 0] 522 | # 2. Negative ROIs are those with < 0.5 with every GT box. Skip crowds. 523 | negative_indices = tf.where(tf.logical_and(roi_iou_max < 0.5, no_crowd_bool))[:, 0] 524 | 525 | # Subsample ROIs. Aim for 33% positive 526 | # Positive ROIs 527 | positive_count = int(config.TRAIN_ROIS_PER_IMAGE * 528 | config.ROI_POSITIVE_RATIO) 529 | positive_indices = tf.random_shuffle(positive_indices)[:positive_count] 530 | positive_count = tf.shape(positive_indices)[0] 531 | # Negative ROIs. Add enough to maintain positive:negative ratio. 532 | r = 1.0 / config.ROI_POSITIVE_RATIO 533 | negative_count = tf.cast(r * tf.cast(positive_count, tf.float32), tf.int32) - positive_count 534 | negative_indices = tf.random_shuffle(negative_indices)[:negative_count] 535 | # Gather selected ROIs 536 | positive_rois = tf.gather(proposals, positive_indices) 537 | negative_rois = tf.gather(proposals, negative_indices) 538 | 539 | # Assign positive ROIs to GT boxes. 540 | positive_overlaps = tf.gather(overlaps, positive_indices) 541 | roi_gt_box_assignment = tf.argmax(positive_overlaps, axis=1) 542 | roi_gt_boxes = tf.gather(gt_boxes, roi_gt_box_assignment) 543 | roi_gt_class_ids = tf.gather(gt_class_ids, roi_gt_box_assignment) 544 | 545 | # Compute bbox refinement for positive ROIs 546 | deltas = utils.box_refinement_graph(positive_rois, roi_gt_boxes) 547 | deltas /= config.BBOX_STD_DEV 548 | 549 | # Assign positive ROIs to GT masks 550 | # Permute masks to [N, height, width, 1] 551 | transposed_masks = tf.expand_dims(tf.transpose(gt_masks, [2, 0, 1]), -1) 552 | # Pick the right mask for each ROI 553 | roi_masks = tf.gather(transposed_masks, roi_gt_box_assignment) 554 | 555 | # Compute mask targets 556 | boxes = positive_rois 557 | if config.USE_MINI_MASK: 558 | # Transform ROI corrdinates from normalized image space 559 | # to normalized mini-mask space. 560 | y1, x1, y2, x2 = tf.split(positive_rois, 4, axis=1) 561 | gt_y1, gt_x1, gt_y2, gt_x2 = tf.split(roi_gt_boxes, 4, axis=1) 562 | gt_h = gt_y2 - gt_y1 563 | gt_w = gt_x2 - gt_x1 564 | y1 = (y1 - gt_y1) / gt_h 565 | x1 = (x1 - gt_x1) / gt_w 566 | y2 = (y2 - gt_y1) / gt_h 567 | x2 = (x2 - gt_x1) / gt_w 568 | boxes = tf.concat([y1, x1, y2, x2], 1) 569 | box_ids = tf.range(0, tf.shape(roi_masks)[0]) 570 | masks = tf.image.crop_and_resize(tf.cast(roi_masks, tf.float32), boxes, 571 | box_ids, 572 | config.MASK_SHAPE) 573 | # Remove the extra dimension from masks. 574 | masks = tf.squeeze(masks, axis=3) 575 | 576 | # Threshold mask pixels at 0.5 to have GT masks be 0 or 1 to use with 577 | # binary cross entropy loss. 578 | masks = tf.round(masks) 579 | 580 | # Append negative ROIs and pad bbox deltas and masks that 581 | # are not used for negative ROIs with zeros. 582 | rois = tf.concat([positive_rois, negative_rois], axis=0) 583 | N = tf.shape(negative_rois)[0] 584 | P = tf.maximum(config.TRAIN_ROIS_PER_IMAGE - tf.shape(rois)[0], 0) 585 | rois = tf.pad(rois, [(0, P), (0, 0)]) 586 | roi_gt_boxes = tf.pad(roi_gt_boxes, [(0, N + P), (0, 0)]) 587 | roi_gt_class_ids = tf.pad(roi_gt_class_ids, [(0, N + P)]) 588 | deltas = tf.pad(deltas, [(0, N + P), (0, 0)]) 589 | masks = tf.pad(masks, [[0, N + P], (0, 0), (0, 0)]) 590 | 591 | return rois, roi_gt_class_ids, deltas, masks 592 | 593 | 594 | class DetectionTargetLayer(KE.Layer): 595 | """Subsamples proposals and generates target box refinement, class_ids, 596 | and masks for each. 597 | 598 | Inputs: 599 | proposals: [batch, N, (y1, x1, y2, x2)] in normalized coordinates. Might 600 | be zero padded if there are not enough proposals. 601 | gt_class_ids: [batch, MAX_GT_INSTANCES] Integer class IDs. 602 | gt_boxes: [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in normalized 603 | coordinates. 604 | gt_masks: [batch, height, width, MAX_GT_INSTANCES] of boolean type 605 | 606 | Returns: Target ROIs and corresponding class IDs, bounding box shifts, 607 | and masks. 608 | rois: [batch, TRAIN_ROIS_PER_IMAGE, (y1, x1, y2, x2)] in normalized 609 | coordinates 610 | target_class_ids: [batch, TRAIN_ROIS_PER_IMAGE]. Integer class IDs. 611 | target_deltas: [batch, TRAIN_ROIS_PER_IMAGE, NUM_CLASSES, 612 | (dy, dx, log(dh), log(dw), class_id)] 613 | Class-specific bbox refinements. 614 | target_mask: [batch, TRAIN_ROIS_PER_IMAGE, height, width) 615 | Masks cropped to bbox boundaries and resized to neural 616 | network output size. 617 | 618 | Note: Returned arrays might be zero padded if not enough target ROIs. 619 | """ 620 | 621 | def __init__(self, config, **kwargs): 622 | super(DetectionTargetLayer, self).__init__(**kwargs) 623 | self.config = config 624 | 625 | def call(self, inputs): 626 | proposals = inputs[0] 627 | gt_class_ids = inputs[1] 628 | gt_boxes = inputs[2] 629 | gt_masks = inputs[3] 630 | 631 | # Slice the batch and run a graph for each slice 632 | # TODO: Rename target_bbox to target_deltas for clarity 633 | names = ["rois", "target_class_ids", "target_bbox", "target_mask"] 634 | outputs = utils.batch_slice( 635 | [proposals, gt_class_ids, gt_boxes, gt_masks], 636 | lambda w, x, y, z: detection_targets_graph( 637 | w, x, y, z, self.config), 638 | self.config.IMAGES_PER_GPU, names=names) 639 | return outputs 640 | 641 | def compute_output_shape(self, input_shape): 642 | return [ 643 | (None, self.config.TRAIN_ROIS_PER_IMAGE, 4), # rois 644 | (None, 1), # class_ids 645 | (None, self.config.TRAIN_ROIS_PER_IMAGE, 4), # deltas 646 | (None, self.config.TRAIN_ROIS_PER_IMAGE, self.config.MASK_SHAPE[0], 647 | self.config.MASK_SHAPE[1]) # masks 648 | ] 649 | 650 | def compute_mask(self, inputs, mask=None): 651 | return [None, None, None, None] 652 | 653 | 654 | ############################################################ 655 | # Detection Layer 656 | ############################################################ 657 | 658 | def clip_to_window(window, boxes): 659 | """ 660 | window: (y1, x1, y2, x2). The window in the image we want to clip to. 661 | boxes: [N, (y1, x1, y2, x2)] 662 | """ 663 | boxes[:, 0] = np.maximum(np.minimum(boxes[:, 0], window[2]), window[0]) 664 | boxes[:, 1] = np.maximum(np.minimum(boxes[:, 1], window[3]), window[1]) 665 | boxes[:, 2] = np.maximum(np.minimum(boxes[:, 2], window[2]), window[0]) 666 | boxes[:, 3] = np.maximum(np.minimum(boxes[:, 3], window[3]), window[1]) 667 | return boxes 668 | 669 | 670 | def refine_detections_graph(rois, probs, deltas, window, config): 671 | """Refine classified proposals and filter overlaps and return final 672 | detections. 673 | 674 | Inputs: 675 | rois: [N, (y1, x1, y2, x2)] in normalized coordinates 676 | probs: [N, num_classes]. Class probabilities. 677 | deltas: [N, num_classes, (dy, dx, log(dh), log(dw))]. Class-specific 678 | bounding box deltas. 679 | window: (y1, x1, y2, x2) in image coordinates. The part of the image 680 | that contains the image excluding the padding. 681 | 682 | Returns detections shaped: [N, (y1, x1, y2, x2, class_id, score)] where 683 | coordinates are in image domain. 684 | """ 685 | # Class IDs per ROI 686 | class_ids = tf.argmax(probs, axis=1, output_type=tf.int32) 687 | # Class probability of the top class of each ROI 688 | indices = tf.stack([tf.range(probs.shape[0]), class_ids], axis=1) 689 | class_scores = tf.gather_nd(probs, indices) 690 | # Class-specific bounding box deltas 691 | deltas_specific = tf.gather_nd(deltas, indices) 692 | # Apply bounding box deltas 693 | # Shape: [boxes, (y1, x1, y2, x2)] in normalized coordinates 694 | refined_rois = apply_box_deltas_graph( 695 | rois, deltas_specific * config.BBOX_STD_DEV) 696 | # Convert coordiates to image domain 697 | # TODO: better to keep them normalized until later 698 | height, width = config.IMAGE_SHAPE[:2] 699 | refined_rois *= tf.constant([height, width, height, width], dtype=tf.float32) 700 | # Clip boxes to image window 701 | refined_rois = clip_boxes_graph(refined_rois, window) 702 | # Round and cast to int since we're deadling with pixels now 703 | refined_rois = tf.to_int32(tf.rint(refined_rois)) 704 | 705 | # TODO: Filter out boxes with zero area 706 | 707 | # Filter out background boxes 708 | keep = tf.where(class_ids > 0)[:, 0] 709 | # Filter out low confidence boxes 710 | if config.DETECTION_MIN_CONFIDENCE: 711 | conf_keep = tf.where(class_scores >= config.DETECTION_MIN_CONFIDENCE)[:, 0] 712 | keep = tf.sets.set_intersection(tf.expand_dims(keep, 0), 713 | tf.expand_dims(conf_keep, 0)) 714 | keep = tf.sparse_tensor_to_dense(keep)[0] 715 | 716 | # Apply per-class NMS 717 | # 1. Prepare variables 718 | pre_nms_class_ids = tf.gather(class_ids, keep) 719 | pre_nms_scores = tf.gather(class_scores, keep) 720 | pre_nms_rois = tf.gather(refined_rois, keep) 721 | unique_pre_nms_class_ids = tf.unique(pre_nms_class_ids)[0] 722 | 723 | def nms_keep_map(class_id): 724 | """Apply Non-Maximum Suppression on ROIs of the given class.""" 725 | # Indices of ROIs of the given class 726 | ixs = tf.where(tf.equal(pre_nms_class_ids, class_id))[:, 0] 727 | # Apply NMS 728 | class_keep = tf.image.non_max_suppression( 729 | tf.to_float(tf.gather(pre_nms_rois, ixs)), 730 | tf.gather(pre_nms_scores, ixs), 731 | max_output_size=config.DETECTION_MAX_INSTANCES, 732 | iou_threshold=config.DETECTION_NMS_THRESHOLD) 733 | # Map indicies 734 | class_keep = tf.gather(keep, tf.gather(ixs, class_keep)) 735 | # Pad with -1 so returned tensors have the same shape 736 | gap = config.DETECTION_MAX_INSTANCES - tf.shape(class_keep)[0] 737 | class_keep = tf.pad(class_keep, [(0, gap)], 738 | mode='CONSTANT', constant_values=-1) 739 | # Set shape so map_fn() can infer result shape 740 | class_keep.set_shape([config.DETECTION_MAX_INSTANCES]) 741 | return class_keep 742 | 743 | # 2. Map over class IDs 744 | nms_keep = tf.map_fn(nms_keep_map, unique_pre_nms_class_ids, 745 | dtype=tf.int64) 746 | # 3. Merge results into one list, and remove -1 padding 747 | nms_keep = tf.reshape(nms_keep, [-1]) 748 | nms_keep = tf.gather(nms_keep, tf.where(nms_keep > -1)[:, 0]) 749 | # 4. Compute intersection between keep and nms_keep 750 | keep = tf.sets.set_intersection(tf.expand_dims(keep, 0), 751 | tf.expand_dims(nms_keep, 0)) 752 | keep = tf.sparse_tensor_to_dense(keep)[0] 753 | # Keep top detections 754 | roi_count = config.DETECTION_MAX_INSTANCES 755 | class_scores_keep = tf.gather(class_scores, keep) 756 | num_keep = tf.minimum(tf.shape(class_scores_keep)[0], roi_count) 757 | top_ids = tf.nn.top_k(class_scores_keep, k=num_keep, sorted=True)[1] 758 | keep = tf.gather(keep, top_ids) 759 | 760 | # Arrange output as [N, (y1, x1, y2, x2, class_id, score)] 761 | # Coordinates are in image domain. 762 | detections = tf.concat([ 763 | tf.to_float(tf.gather(refined_rois, keep)), 764 | tf.to_float(tf.gather(class_ids, keep))[..., tf.newaxis], 765 | tf.gather(class_scores, keep)[..., tf.newaxis] 766 | ], axis=1) 767 | 768 | # Pad with zeros if detections < DETECTION_MAX_INSTANCES 769 | gap = config.DETECTION_MAX_INSTANCES - tf.shape(detections)[0] 770 | detections = tf.pad(detections, [(0, gap), (0, 0)], "CONSTANT") 771 | return detections 772 | 773 | 774 | class DetectionLayer(KE.Layer): 775 | """Takes classified proposal boxes and their bounding box deltas and 776 | returns the final detection boxes. 777 | 778 | Returns: 779 | [batch, num_detections, (y1, x1, y2, x2, class_id, class_score)] where 780 | coordinates are in image domain 781 | """ 782 | 783 | def __init__(self, config=None, **kwargs): 784 | super(DetectionLayer, self).__init__(**kwargs) 785 | self.config = config 786 | 787 | def call(self, inputs): 788 | rois = inputs[0] 789 | mrcnn_class = inputs[1] 790 | mrcnn_bbox = inputs[2] 791 | image_meta = inputs[3] 792 | 793 | # Run detection refinement graph on each item in the batch 794 | _, _, window, _ = parse_image_meta_graph(image_meta) 795 | detections_batch = utils.batch_slice( 796 | [rois, mrcnn_class, mrcnn_bbox, window], 797 | lambda x, y, w, z: refine_detections_graph(x, y, w, z, self.config), 798 | self.config.IMAGES_PER_GPU) 799 | 800 | # Reshape output 801 | # [batch, num_detections, (y1, x1, y2, x2, class_score)] in pixels 802 | return tf.reshape( 803 | detections_batch, 804 | [self.config.BATCH_SIZE, self.config.DETECTION_MAX_INSTANCES, 6]) 805 | 806 | def compute_output_shape(self, input_shape): 807 | return (None, self.config.DETECTION_MAX_INSTANCES, 6) 808 | 809 | 810 | # Region Proposal Network (RPN) 811 | 812 | def rpn_graph(feature_map, anchors_per_location, anchor_stride): 813 | """Builds the computation graph of Region Proposal Network. 814 | 815 | feature_map: backbone features [batch, height, width, depth] 816 | anchors_per_location: number of anchors per pixel in the feature map 817 | anchor_stride: Controls the density of anchors. Typically 1 (anchors for 818 | every pixel in the feature map), or 2 (every other pixel). 819 | 820 | Returns: 821 | rpn_logits: [batch, H, W, 2] Anchor classifier logits (before softmax) 822 | rpn_probs: [batch, H, W, 2] Anchor classifier probabilities. 823 | rpn_bbox: [batch, H, W, (dy, dx, log(dh), log(dw))] Deltas to be 824 | applied to anchors. 825 | """ 826 | # TODO: check if stride of 2 causes alignment issues if the featuremap 827 | # is not even. 828 | # Shared convolutional base of the RPN 829 | shared = KL.Conv2D(512, (3, 3), padding='same', activation='relu', 830 | strides=anchor_stride, 831 | name='rpn_conv_shared')(feature_map) 832 | 833 | # Anchor Score. [batch, height, width, anchors per location * 2]. 834 | x = KL.Conv2D(2 * anchors_per_location, (1, 1), padding='valid', 835 | activation='linear', name='rpn_class_raw')(shared) 836 | 837 | # Reshape to [batch, anchors, 2] 838 | rpn_class_logits = KL.Lambda( 839 | lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 2]))(x) 840 | 841 | # Softmax on last dimension of BG/FG. 842 | rpn_probs = KL.Activation( 843 | "softmax", name="rpn_class_xxx")(rpn_class_logits) 844 | 845 | # Bounding box refinement. [batch, H, W, anchors per location, depth] 846 | # where depth is [x, y, log(w), log(h)] 847 | x = KL.Conv2D(anchors_per_location * 4, (1, 1), padding="valid", 848 | activation='linear', name='rpn_bbox_pred')(shared) 849 | 850 | # Reshape to [batch, anchors, 4] 851 | rpn_bbox = KL.Lambda(lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 4]))(x) 852 | 853 | return [rpn_class_logits, rpn_probs, rpn_bbox] 854 | 855 | 856 | def build_rpn_model(anchor_stride, anchors_per_location, depth): 857 | """Builds a Keras model of the Region Proposal Network. 858 | It wraps the RPN graph so it can be used multiple times with shared 859 | weights. 860 | 861 | anchors_per_location: number of anchors per pixel in the feature map 862 | anchor_stride: Controls the density of anchors. Typically 1 (anchors for 863 | every pixel in the feature map), or 2 (every other pixel). 864 | depth: Depth of the backbone feature map. 865 | 866 | Returns a Keras Model object. The model outputs, when called, are: 867 | rpn_logits: [batch, H, W, 2] Anchor classifier logits (before softmax) 868 | rpn_probs: [batch, W, W, 2] Anchor classifier probabilities. 869 | rpn_bbox: [batch, H, W, (dy, dx, log(dh), log(dw))] Deltas to be 870 | applied to anchors. 871 | """ 872 | input_feature_map = KL.Input(shape=[None, None, depth], 873 | name="input_rpn_feature_map") 874 | outputs = rpn_graph(input_feature_map, anchors_per_location, anchor_stride) 875 | return KM.Model([input_feature_map], outputs, name="rpn_model") 876 | 877 | 878 | ############################################################ 879 | # Feature Pyramid Network Heads 880 | ############################################################ 881 | 882 | def fpn_classifier_graph(rois, feature_maps, 883 | image_shape, pool_size, num_classes): 884 | """Builds the computation graph of the feature pyramid network classifier 885 | and regressor heads. 886 | 887 | rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized 888 | coordinates. 889 | feature_maps: List of feature maps from diffent layers of the pyramid, 890 | [P2, P3, P4, P5]. Each has a different resolution. 891 | image_shape: [height, width, depth] 892 | pool_size: The width of the square feature map generated from ROI Pooling. 893 | num_classes: number of classes, which determines the depth of the results 894 | 895 | Returns: 896 | logits: [N, NUM_CLASSES] classifier logits (before softmax) 897 | probs: [N, NUM_CLASSES] classifier probabilities 898 | bbox_deltas: [N, (dy, dx, log(dh), log(dw))] Deltas to apply to 899 | proposal boxes 900 | """ 901 | # ROI Pooling 902 | # Shape: [batch, num_boxes, pool_height, pool_width, channels] 903 | x = PyramidROIAlign([pool_size, pool_size], image_shape, 904 | name="roi_align_classifier")([rois] + feature_maps) 905 | # Two 1024 FC layers (implemented with Conv2D for consistency) 906 | x = KL.TimeDistributed(KL.Conv2D(1024, (pool_size, pool_size), padding="valid"), 907 | name="mrcnn_class_conv1")(x) 908 | x = KL.TimeDistributed(BatchNorm(axis=3), name='mrcnn_class_bn1')(x) 909 | x = KL.Activation('relu')(x) 910 | x = KL.TimeDistributed(KL.Conv2D(1024, (1, 1)), 911 | name="mrcnn_class_conv2")(x) 912 | x = KL.TimeDistributed(BatchNorm(axis=3), 913 | name='mrcnn_class_bn2')(x) 914 | x = KL.Activation('relu')(x) 915 | 916 | shared = KL.Lambda(lambda x: K.squeeze(K.squeeze(x, 3), 2), 917 | name="pool_squeeze")(x) 918 | 919 | # Classifier head 920 | mrcnn_class_logits = KL.TimeDistributed(KL.Dense(num_classes), 921 | name='mrcnn_class_logits')(shared) 922 | mrcnn_probs = KL.TimeDistributed(KL.Activation("softmax"), 923 | name="mrcnn_class")(mrcnn_class_logits) 924 | 925 | # BBox head 926 | # [batch, boxes, num_classes * (dy, dx, log(dh), log(dw))] 927 | x = KL.TimeDistributed(KL.Dense(num_classes * 4, activation='linear'), 928 | name='mrcnn_bbox_fc')(shared) 929 | # Reshape to [batch, boxes, num_classes, (dy, dx, log(dh), log(dw))] 930 | s = K.int_shape(x) 931 | mrcnn_bbox = KL.Reshape((s[1], num_classes, 4), name="mrcnn_bbox")(x) 932 | 933 | return mrcnn_class_logits, mrcnn_probs, mrcnn_bbox 934 | 935 | 936 | def build_fpn_mask_graph(rois, feature_maps, 937 | image_shape, pool_size, num_classes): 938 | """Builds the computation graph of the mask head of Feature Pyramid Network. 939 | 940 | rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized 941 | coordinates. 942 | feature_maps: List of feature maps from diffent layers of the pyramid, 943 | [P2, P3, P4, P5]. Each has a different resolution. 944 | image_shape: [height, width, depth] 945 | pool_size: The width of the square feature map generated from ROI Pooling. 946 | num_classes: number of classes, which determines the depth of the results 947 | 948 | Returns: Masks [batch, roi_count, height, width, num_classes] 949 | """ 950 | # ROI Pooling 951 | # Shape: [batch, boxes, pool_height, pool_width, channels] 952 | x = PyramidROIAlign([pool_size, pool_size], image_shape, 953 | name="roi_align_mask")([rois] + feature_maps) 954 | 955 | # Conv layers 956 | x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"), 957 | name="mrcnn_mask_conv1")(x) 958 | x = KL.TimeDistributed(BatchNorm(axis=3), 959 | name='mrcnn_mask_bn1')(x) 960 | x = KL.Activation('relu')(x) 961 | 962 | x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"), 963 | name="mrcnn_mask_conv2")(x) 964 | x = KL.TimeDistributed(BatchNorm(axis=3), 965 | name='mrcnn_mask_bn2')(x) 966 | x = KL.Activation('relu')(x) 967 | 968 | x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"), 969 | name="mrcnn_mask_conv3")(x) 970 | x = KL.TimeDistributed(BatchNorm(axis=3), 971 | name='mrcnn_mask_bn3')(x) 972 | x = KL.Activation('relu')(x) 973 | 974 | x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"), 975 | name="mrcnn_mask_conv4")(x) 976 | x = KL.TimeDistributed(BatchNorm(axis=3), 977 | name='mrcnn_mask_bn4')(x) 978 | x = KL.Activation('relu')(x) 979 | 980 | x = KL.TimeDistributed(KL.Conv2DTranspose(256, (2, 2), strides=2, activation="relu"), 981 | name="mrcnn_mask_deconv")(x) 982 | x = KL.TimeDistributed(KL.Conv2D(num_classes, (1, 1), strides=1, activation="sigmoid"), 983 | name="mrcnn_mask")(x) 984 | return x 985 | 986 | 987 | ############################################################ 988 | # Loss Functions 989 | ############################################################ 990 | 991 | def smooth_l1_loss(y_true, y_pred): 992 | """Implements Smooth-L1 loss. 993 | y_true and y_pred are typicallly: [N, 4], but could be any shape. 994 | """ 995 | diff = K.abs(y_true - y_pred) 996 | less_than_one = K.cast(K.less(diff, 1.0), "float32") 997 | loss = (less_than_one * 0.5 * diff**2) + (1 - less_than_one) * (diff - 0.5) 998 | return loss 999 | 1000 | 1001 | def rpn_class_loss_graph(rpn_match, rpn_class_logits): 1002 | """RPN anchor classifier loss. 1003 | 1004 | rpn_match: [batch, anchors, 1]. Anchor match type. 1=positive, 1005 | -1=negative, 0=neutral anchor. 1006 | rpn_class_logits: [batch, anchors, 2]. RPN classifier logits for FG/BG. 1007 | """ 1008 | # Squeeze last dim to simplify 1009 | rpn_match = tf.squeeze(rpn_match, -1) 1010 | # Get anchor classes. Convert the -1/+1 match to 0/1 values. 1011 | anchor_class = K.cast(K.equal(rpn_match, 1), tf.int32) 1012 | # Positive and Negative anchors contribute to the loss, 1013 | # but neutral anchors (match value = 0) don't. 1014 | indices = tf.where(K.not_equal(rpn_match, 0)) 1015 | # Pick rows that contribute to the loss and filter out the rest. 1016 | rpn_class_logits = tf.gather_nd(rpn_class_logits, indices) 1017 | anchor_class = tf.gather_nd(anchor_class, indices) 1018 | # Crossentropy loss 1019 | loss = K.sparse_categorical_crossentropy(target=anchor_class, 1020 | output=rpn_class_logits, 1021 | from_logits=True) 1022 | loss = K.switch(tf.size(loss) > 0, K.mean(loss), tf.constant(0.0)) 1023 | return loss 1024 | 1025 | 1026 | def rpn_bbox_loss_graph(config, target_bbox, rpn_match, rpn_bbox): 1027 | """Return the RPN bounding box loss graph. 1028 | 1029 | config: the model config object. 1030 | target_bbox: [batch, max positive anchors, (dy, dx, log(dh), log(dw))]. 1031 | Uses 0 padding to fill in unsed bbox deltas. 1032 | rpn_match: [batch, anchors, 1]. Anchor match type. 1=positive, 1033 | -1=negative, 0=neutral anchor. 1034 | rpn_bbox: [batch, anchors, (dy, dx, log(dh), log(dw))] 1035 | """ 1036 | # Positive anchors contribute to the loss, but negative and 1037 | # neutral anchors (match value of 0 or -1) don't. 1038 | rpn_match = K.squeeze(rpn_match, -1) 1039 | indices = tf.where(K.equal(rpn_match, 1)) 1040 | 1041 | # Pick bbox deltas that contribute to the loss 1042 | rpn_bbox = tf.gather_nd(rpn_bbox, indices) 1043 | 1044 | # Trim target bounding box deltas to the same length as rpn_bbox. 1045 | batch_counts = K.sum(K.cast(K.equal(rpn_match, 1), tf.int32), axis=1) 1046 | target_bbox = batch_pack_graph(target_bbox, batch_counts, 1047 | config.IMAGES_PER_GPU) 1048 | 1049 | # TODO: use smooth_l1_loss() rather than reimplementing here 1050 | # to reduce code duplication 1051 | diff = K.abs(target_bbox - rpn_bbox) 1052 | less_than_one = K.cast(K.less(diff, 1.0), "float32") 1053 | loss = (less_than_one * 0.5 * diff**2) + (1 - less_than_one) * (diff - 0.5) 1054 | 1055 | loss = K.switch(tf.size(loss) > 0, K.mean(loss), tf.constant(0.0)) 1056 | return loss 1057 | 1058 | 1059 | def mrcnn_class_loss_graph(target_class_ids, pred_class_logits, 1060 | active_class_ids): 1061 | """Loss for the classifier head of Mask RCNN. 1062 | 1063 | target_class_ids: [batch, num_rois]. Integer class IDs. Uses zero 1064 | padding to fill in the array. 1065 | pred_class_logits: [batch, num_rois, num_classes] 1066 | active_class_ids: [batch, num_classes]. Has a value of 1 for 1067 | classes that are in the dataset of the image, and 0 1068 | for classes that are not in the dataset. 1069 | """ 1070 | target_class_ids = tf.cast(target_class_ids, 'int64') 1071 | 1072 | # Find predictions of classes that are not in the dataset. 1073 | pred_class_ids = tf.argmax(pred_class_logits, axis=2) 1074 | # TODO: Update this line to work with batch > 1. Right now it assumes all 1075 | # images in a batch have the same active_class_ids 1076 | pred_active = tf.gather(active_class_ids[0], pred_class_ids) 1077 | 1078 | # Loss 1079 | loss = tf.nn.sparse_softmax_cross_entropy_with_logits( 1080 | labels=target_class_ids, logits=pred_class_logits) 1081 | 1082 | # Erase losses of predictions of classes that are not in the active 1083 | # classes of the image. 1084 | loss = loss * pred_active 1085 | 1086 | # Computer loss mean. Use only predictions that contribute 1087 | # to the loss to get a correct mean. 1088 | loss = tf.reduce_sum(loss) / tf.reduce_sum(pred_active) 1089 | return loss 1090 | 1091 | 1092 | def mrcnn_bbox_loss_graph(target_bbox, target_class_ids, pred_bbox): 1093 | """Loss for Mask R-CNN bounding box refinement. 1094 | 1095 | target_bbox: [batch, num_rois, (dy, dx, log(dh), log(dw))] 1096 | target_class_ids: [batch, num_rois]. Integer class IDs. 1097 | pred_bbox: [batch, num_rois, num_classes, (dy, dx, log(dh), log(dw))] 1098 | """ 1099 | # Reshape to merge batch and roi dimensions for simplicity. 1100 | target_class_ids = K.reshape(target_class_ids, (-1,)) 1101 | target_bbox = K.reshape(target_bbox, (-1, 4)) 1102 | pred_bbox = K.reshape(pred_bbox, (-1, K.int_shape(pred_bbox)[2], 4)) 1103 | 1104 | # Only positive ROIs contribute to the loss. And only 1105 | # the right class_id of each ROI. Get their indicies. 1106 | positive_roi_ix = tf.where(target_class_ids > 0)[:, 0] 1107 | positive_roi_class_ids = tf.cast( 1108 | tf.gather(target_class_ids, positive_roi_ix), tf.int64) 1109 | indices = tf.stack([positive_roi_ix, positive_roi_class_ids], axis=1) 1110 | 1111 | # Gather the deltas (predicted and true) that contribute to loss 1112 | target_bbox = tf.gather(target_bbox, positive_roi_ix) 1113 | pred_bbox = tf.gather_nd(pred_bbox, indices) 1114 | 1115 | # Smooth-L1 Loss 1116 | loss = K.switch(tf.size(target_bbox) > 0, 1117 | smooth_l1_loss(y_true=target_bbox, y_pred=pred_bbox), 1118 | tf.constant(0.0)) 1119 | loss = K.mean(loss) 1120 | loss = K.reshape(loss, [1, 1]) 1121 | return loss 1122 | 1123 | 1124 | def mrcnn_mask_loss_graph(target_masks, target_class_ids, pred_masks): 1125 | """Mask binary cross-entropy loss for the masks head. 1126 | 1127 | target_masks: [batch, num_rois, height, width]. 1128 | A float32 tensor of values 0 or 1. Uses zero padding to fill array. 1129 | target_class_ids: [batch, num_rois]. Integer class IDs. Zero padded. 1130 | pred_masks: [batch, proposals, height, width, num_classes] float32 tensor 1131 | with values from 0 to 1. 1132 | """ 1133 | # Reshape for simplicity. Merge first two dimensions into one. 1134 | target_class_ids = K.reshape(target_class_ids, (-1,)) 1135 | mask_shape = tf.shape(target_masks) 1136 | target_masks = K.reshape(target_masks, (-1, mask_shape[2], mask_shape[3])) 1137 | pred_shape = tf.shape(pred_masks) 1138 | pred_masks = K.reshape(pred_masks, 1139 | (-1, pred_shape[2], pred_shape[3], pred_shape[4])) 1140 | # Permute predicted masks to [N, num_classes, height, width] 1141 | pred_masks = tf.transpose(pred_masks, [0, 3, 1, 2]) 1142 | 1143 | # Only positive ROIs contribute to the loss. And only 1144 | # the class specific mask of each ROI. 1145 | positive_ix = tf.where(target_class_ids > 0)[:, 0] 1146 | positive_class_ids = tf.cast( 1147 | tf.gather(target_class_ids, positive_ix), tf.int64) 1148 | indices = tf.stack([positive_ix, positive_class_ids], axis=1) 1149 | 1150 | # Gather the masks (predicted and true) that contribute to loss 1151 | y_true = tf.gather(target_masks, positive_ix) 1152 | y_pred = tf.gather_nd(pred_masks, indices) 1153 | 1154 | # Compute binary cross entropy. If no positive ROIs, then return 0. 1155 | # shape: [batch, roi, num_classes] 1156 | loss = K.switch(tf.size(y_true) > 0, 1157 | K.binary_crossentropy(target=y_true, output=y_pred), 1158 | tf.constant(0.0)) 1159 | loss = K.mean(loss) 1160 | loss = K.reshape(loss, [1, 1]) 1161 | return loss 1162 | 1163 | 1164 | ############################################################ 1165 | # Data Generator 1166 | ############################################################ 1167 | 1168 | def load_image_gt(dataset, config, image_id, augment=False, 1169 | use_mini_mask=False): 1170 | """Load and return ground truth data for an image (image, mask, bounding boxes). 1171 | 1172 | augment: If true, apply random image augmentation. Currently, only 1173 | horizontal flipping is offered. 1174 | use_mini_mask: If False, returns full-size masks that are the same height 1175 | and width as the original image. These can be big, for example 1176 | 1024x1024x100 (for 100 instances). Mini masks are smaller, typically, 1177 | 224x224 and are generated by extracting the bounding box of the 1178 | object and resizing it to MINI_MASK_SHAPE. 1179 | 1180 | Returns: 1181 | image: [height, width, 3] 1182 | shape: the original shape of the image before resizing and cropping. 1183 | class_ids: [instance_count] Integer class IDs 1184 | bbox: [instance_count, (y1, x1, y2, x2)] 1185 | mask: [height, width, instance_count]. The height and width are those 1186 | of the image unless use_mini_mask is True, in which case they are 1187 | defined in MINI_MASK_SHAPE. 1188 | """ 1189 | # Load image and mask 1190 | image = dataset.load_image(image_id) 1191 | mask, class_ids = dataset.load_mask(image_id) 1192 | shape = image.shape 1193 | image, window, scale, padding = utils.resize_image( 1194 | image, 1195 | min_dim=config.IMAGE_MIN_DIM, 1196 | max_dim=config.IMAGE_MAX_DIM, 1197 | padding=config.IMAGE_PADDING) 1198 | mask = utils.resize_mask(mask, scale, padding) 1199 | 1200 | # Random horizontal flips. 1201 | if augment: 1202 | if random.randint(0, 1): 1203 | image = np.fliplr(image) 1204 | mask = np.fliplr(mask) 1205 | 1206 | # Note that some boxes might be all zeros if the corresponding mask got cropped out. 1207 | # and here is to filter them out 1208 | _idx = np.sum(mask, axis=(0, 1)) > 0 1209 | mask = mask[:, :, _idx] 1210 | class_ids = class_ids[_idx] 1211 | # Bounding boxes. Note that some boxes might be all zeros 1212 | # if the corresponding mask got cropped out. 1213 | # bbox: [num_instances, (y1, x1, y2, x2)] 1214 | bbox = utils.extract_bboxes(mask) 1215 | 1216 | # Active classes 1217 | # Different datasets have different classes, so track the 1218 | # classes supported in the dataset of this image. 1219 | active_class_ids = np.zeros([dataset.num_classes], dtype=np.int32) 1220 | source_class_ids = dataset.source_class_ids[dataset.image_info[image_id]["source"]] 1221 | active_class_ids[source_class_ids] = 1 1222 | 1223 | # Resize masks to smaller size to reduce memory usage 1224 | if use_mini_mask: 1225 | mask = utils.minimize_mask(bbox, mask, config.MINI_MASK_SHAPE) 1226 | 1227 | # Image meta data 1228 | image_meta = compose_image_meta(image_id, shape, window, active_class_ids) 1229 | 1230 | return image, image_meta, class_ids, bbox, mask 1231 | 1232 | 1233 | def build_detection_targets(rpn_rois, gt_class_ids, gt_boxes, gt_masks, config): 1234 | """Generate targets for training Stage 2 classifier and mask heads. 1235 | This is not used in normal training. It's useful for debugging or to train 1236 | the Mask RCNN heads without using the RPN head. 1237 | 1238 | Inputs: 1239 | rpn_rois: [N, (y1, x1, y2, x2)] proposal boxes. 1240 | gt_class_ids: [instance count] Integer class IDs 1241 | gt_boxes: [instance count, (y1, x1, y2, x2)] 1242 | gt_masks: [height, width, instance count] Grund truth masks. Can be full 1243 | size or mini-masks. 1244 | 1245 | Returns: 1246 | rois: [TRAIN_ROIS_PER_IMAGE, (y1, x1, y2, x2)] 1247 | class_ids: [TRAIN_ROIS_PER_IMAGE]. Integer class IDs. 1248 | bboxes: [TRAIN_ROIS_PER_IMAGE, NUM_CLASSES, (y, x, log(h), log(w))]. Class-specific 1249 | bbox refinements. 1250 | masks: [TRAIN_ROIS_PER_IMAGE, height, width, NUM_CLASSES). Class specific masks cropped 1251 | to bbox boundaries and resized to neural network output size. 1252 | """ 1253 | assert rpn_rois.shape[0] > 0 1254 | assert gt_class_ids.dtype == np.int32, "Expected int but got {}".format( 1255 | gt_class_ids.dtype) 1256 | assert gt_boxes.dtype == np.int32, "Expected int but got {}".format( 1257 | gt_boxes.dtype) 1258 | assert gt_masks.dtype == np.bool_, "Expected bool but got {}".format( 1259 | gt_masks.dtype) 1260 | 1261 | # It's common to add GT Boxes to ROIs but we don't do that here because 1262 | # according to XinLei Chen's paper, it doesn't help. 1263 | 1264 | # Trim empty padding in gt_boxes and gt_masks parts 1265 | instance_ids = np.where(gt_class_ids > 0)[0] 1266 | assert instance_ids.shape[0] > 0, "Image must contain instances." 1267 | gt_class_ids = gt_class_ids[instance_ids] 1268 | gt_boxes = gt_boxes[instance_ids] 1269 | gt_masks = gt_masks[:, :, instance_ids] 1270 | 1271 | # Compute areas of ROIs and ground truth boxes. 1272 | rpn_roi_area = (rpn_rois[:, 2] - rpn_rois[:, 0]) * \ 1273 | (rpn_rois[:, 3] - rpn_rois[:, 1]) 1274 | gt_box_area = (gt_boxes[:, 2] - gt_boxes[:, 0]) * \ 1275 | (gt_boxes[:, 3] - gt_boxes[:, 1]) 1276 | 1277 | # Compute overlaps [rpn_rois, gt_boxes] 1278 | overlaps = np.zeros((rpn_rois.shape[0], gt_boxes.shape[0])) 1279 | for i in range(overlaps.shape[1]): 1280 | gt = gt_boxes[i] 1281 | overlaps[:, i] = utils.compute_iou( 1282 | gt, rpn_rois, gt_box_area[i], rpn_roi_area) 1283 | 1284 | # Assign ROIs to GT boxes 1285 | rpn_roi_iou_argmax = np.argmax(overlaps, axis=1) 1286 | rpn_roi_iou_max = overlaps[np.arange( 1287 | overlaps.shape[0]), rpn_roi_iou_argmax] 1288 | # GT box assigned to each ROI 1289 | rpn_roi_gt_boxes = gt_boxes[rpn_roi_iou_argmax] 1290 | rpn_roi_gt_class_ids = gt_class_ids[rpn_roi_iou_argmax] 1291 | 1292 | # Positive ROIs are those with >= 0.5 IoU with a GT box. 1293 | fg_ids = np.where(rpn_roi_iou_max > 0.5)[0] 1294 | 1295 | # Negative ROIs are those with max IoU 0.1-0.5 (hard example mining) 1296 | # TODO: To hard example mine or not to hard example mine, that's the question 1297 | # bg_ids = np.where((rpn_roi_iou_max >= 0.1) & (rpn_roi_iou_max < 0.5))[0] 1298 | bg_ids = np.where(rpn_roi_iou_max < 0.5)[0] 1299 | 1300 | # Subsample ROIs. Aim for 33% foreground. 1301 | # FG 1302 | fg_roi_count = int(config.TRAIN_ROIS_PER_IMAGE * config.ROI_POSITIVE_RATIO) 1303 | if fg_ids.shape[0] > fg_roi_count: 1304 | keep_fg_ids = np.random.choice(fg_ids, fg_roi_count, replace=False) 1305 | else: 1306 | keep_fg_ids = fg_ids 1307 | # BG 1308 | remaining = config.TRAIN_ROIS_PER_IMAGE - keep_fg_ids.shape[0] 1309 | if bg_ids.shape[0] > remaining: 1310 | keep_bg_ids = np.random.choice(bg_ids, remaining, replace=False) 1311 | else: 1312 | keep_bg_ids = bg_ids 1313 | # Combine indicies of ROIs to keep 1314 | keep = np.concatenate([keep_fg_ids, keep_bg_ids]) 1315 | # Need more? 1316 | remaining = config.TRAIN_ROIS_PER_IMAGE - keep.shape[0] 1317 | if remaining > 0: 1318 | # Looks like we don't have enough samples to maintain the desired 1319 | # balance. Reduce requirements and fill in the rest. This is 1320 | # likely different from the Mask RCNN paper. 1321 | 1322 | # There is a small chance we have neither fg nor bg samples. 1323 | if keep.shape[0] == 0: 1324 | # Pick bg regions with easier IoU threshold 1325 | bg_ids = np.where(rpn_roi_iou_max < 0.5)[0] 1326 | assert bg_ids.shape[0] >= remaining 1327 | keep_bg_ids = np.random.choice(bg_ids, remaining, replace=False) 1328 | assert keep_bg_ids.shape[0] == remaining 1329 | keep = np.concatenate([keep, keep_bg_ids]) 1330 | else: 1331 | # Fill the rest with repeated bg rois. 1332 | keep_extra_ids = np.random.choice( 1333 | keep_bg_ids, remaining, replace=True) 1334 | keep = np.concatenate([keep, keep_extra_ids]) 1335 | assert keep.shape[0] == config.TRAIN_ROIS_PER_IMAGE, \ 1336 | "keep doesn't match ROI batch size {}, {}".format( 1337 | keep.shape[0], config.TRAIN_ROIS_PER_IMAGE) 1338 | 1339 | # Reset the gt boxes assigned to BG ROIs. 1340 | rpn_roi_gt_boxes[keep_bg_ids, :] = 0 1341 | rpn_roi_gt_class_ids[keep_bg_ids] = 0 1342 | 1343 | # For each kept ROI, assign a class_id, and for FG ROIs also add bbox refinement. 1344 | rois = rpn_rois[keep] 1345 | roi_gt_boxes = rpn_roi_gt_boxes[keep] 1346 | roi_gt_class_ids = rpn_roi_gt_class_ids[keep] 1347 | roi_gt_assignment = rpn_roi_iou_argmax[keep] 1348 | 1349 | # Class-aware bbox deltas. [y, x, log(h), log(w)] 1350 | bboxes = np.zeros((config.TRAIN_ROIS_PER_IMAGE, 1351 | config.NUM_CLASSES, 4), dtype=np.float32) 1352 | pos_ids = np.where(roi_gt_class_ids > 0)[0] 1353 | bboxes[pos_ids, roi_gt_class_ids[pos_ids]] = utils.box_refinement( 1354 | rois[pos_ids], roi_gt_boxes[pos_ids, :4]) 1355 | # Normalize bbox refinements 1356 | bboxes /= config.BBOX_STD_DEV 1357 | 1358 | # Generate class-specific target masks. 1359 | masks = np.zeros((config.TRAIN_ROIS_PER_IMAGE, config.MASK_SHAPE[0], config.MASK_SHAPE[1], config.NUM_CLASSES), 1360 | dtype=np.float32) 1361 | for i in pos_ids: 1362 | class_id = roi_gt_class_ids[i] 1363 | assert class_id > 0, "class id must be greater than 0" 1364 | gt_id = roi_gt_assignment[i] 1365 | class_mask = gt_masks[:, :, gt_id] 1366 | 1367 | if config.USE_MINI_MASK: 1368 | # Create a mask placeholder, the size of the image 1369 | placeholder = np.zeros(config.IMAGE_SHAPE[:2], dtype=bool) 1370 | # GT box 1371 | gt_y1, gt_x1, gt_y2, gt_x2 = gt_boxes[gt_id] 1372 | gt_w = gt_x2 - gt_x1 1373 | gt_h = gt_y2 - gt_y1 1374 | # Resize mini mask to size of GT box 1375 | placeholder[gt_y1:gt_y2, gt_x1:gt_x2] = \ 1376 | np.round(scipy.misc.imresize(class_mask.astype(float), (gt_h, gt_w), 1377 | interp='nearest') / 255.0).astype(bool) 1378 | # Place the mini batch in the placeholder 1379 | class_mask = placeholder 1380 | 1381 | # Pick part of the mask and resize it 1382 | y1, x1, y2, x2 = rois[i].astype(np.int32) 1383 | m = class_mask[y1:y2, x1:x2] 1384 | mask = scipy.misc.imresize( 1385 | m.astype(float), config.MASK_SHAPE, interp='nearest') / 255.0 1386 | masks[i, :, :, class_id] = mask 1387 | 1388 | return rois, roi_gt_class_ids, bboxes, masks 1389 | 1390 | 1391 | def build_rpn_targets(image_shape, anchors, gt_class_ids, gt_boxes, config): 1392 | """Given the anchors and GT boxes, compute overlaps and identify positive 1393 | anchors and deltas to refine them to match their corresponding GT boxes. 1394 | 1395 | anchors: [num_anchors, (y1, x1, y2, x2)] 1396 | gt_class_ids: [num_gt_boxes] Integer class IDs. 1397 | gt_boxes: [num_gt_boxes, (y1, x1, y2, x2)] 1398 | 1399 | Returns: 1400 | rpn_match: [N] (int32) matches between anchors and GT boxes. 1401 | 1 = positive anchor, -1 = negative anchor, 0 = neutral 1402 | rpn_bbox: [N, (dy, dx, log(dh), log(dw))] Anchor bbox deltas. 1403 | """ 1404 | # RPN Match: 1 = positive anchor, -1 = negative anchor, 0 = neutral 1405 | rpn_match = np.zeros([anchors.shape[0]], dtype=np.int32) 1406 | # RPN bounding boxes: [max anchors per image, (dy, dx, log(dh), log(dw))] 1407 | rpn_bbox = np.zeros((config.RPN_TRAIN_ANCHORS_PER_IMAGE, 4)) 1408 | 1409 | # Handle COCO crowds 1410 | # A crowd box in COCO is a bounding box around several instances. Exclude 1411 | # them from training. A crowd box is given a negative class ID. 1412 | crowd_ix = np.where(gt_class_ids < 0)[0] 1413 | if crowd_ix.shape[0] > 0: 1414 | # Filter out crowds from ground truth class IDs and boxes 1415 | non_crowd_ix = np.where(gt_class_ids > 0)[0] 1416 | crowd_boxes = gt_boxes[crowd_ix] 1417 | gt_class_ids = gt_class_ids[non_crowd_ix] 1418 | gt_boxes = gt_boxes[non_crowd_ix] 1419 | # Compute overlaps with crowd boxes [anchors, crowds] 1420 | crowd_overlaps = utils.compute_overlaps(anchors, crowd_boxes) 1421 | crowd_iou_max = np.amax(crowd_overlaps, axis=1) 1422 | no_crowd_bool = (crowd_iou_max < 0.001) 1423 | else: 1424 | # All anchors don't intersect a crowd 1425 | no_crowd_bool = np.ones([anchors.shape[0]], dtype=bool) 1426 | 1427 | # Compute overlaps [num_anchors, num_gt_boxes] 1428 | overlaps = utils.compute_overlaps(anchors, gt_boxes) 1429 | 1430 | # Match anchors to GT Boxes 1431 | # If an anchor overlaps a GT box with IoU >= 0.7 then it's positive. 1432 | # If an anchor overlaps a GT box with IoU < 0.3 then it's negative. 1433 | # Neutral anchors are those that don't match the conditions above, 1434 | # and they don't influence the loss function. 1435 | # However, don't keep any GT box unmatched (rare, but happens). Instead, 1436 | # match it to the closest anchor (even if its max IoU is < 0.3). 1437 | # 1438 | # 1. Set negative anchors first. They get overwritten below if a GT box is 1439 | # matched to them. Skip boxes in crowd areas. 1440 | anchor_iou_argmax = np.argmax(overlaps, axis=1) 1441 | anchor_iou_max = overlaps[np.arange(overlaps.shape[0]), anchor_iou_argmax] 1442 | rpn_match[(anchor_iou_max < 0.3) & (no_crowd_bool)] = -1 1443 | # 2. Set an anchor for each GT box (regardless of IoU value). 1444 | # TODO: If multiple anchors have the same IoU match all of them 1445 | gt_iou_argmax = np.argmax(overlaps, axis=0) 1446 | rpn_match[gt_iou_argmax] = 1 1447 | # 3. Set anchors with high overlap as positive. 1448 | rpn_match[anchor_iou_max >= 0.7] = 1 1449 | 1450 | # Subsample to balance positive and negative anchors 1451 | # Don't let positives be more than half the anchors 1452 | ids = np.where(rpn_match == 1)[0] 1453 | extra = len(ids) - (config.RPN_TRAIN_ANCHORS_PER_IMAGE // 2) 1454 | if extra > 0: 1455 | # Reset the extra ones to neutral 1456 | ids = np.random.choice(ids, extra, replace=False) 1457 | rpn_match[ids] = 0 1458 | # Same for negative proposals 1459 | ids = np.where(rpn_match == -1)[0] 1460 | extra = len(ids) - (config.RPN_TRAIN_ANCHORS_PER_IMAGE - 1461 | np.sum(rpn_match == 1)) 1462 | if extra > 0: 1463 | # Rest the extra ones to neutral 1464 | ids = np.random.choice(ids, extra, replace=False) 1465 | rpn_match[ids] = 0 1466 | 1467 | # For positive anchors, compute shift and scale needed to transform them 1468 | # to match the corresponding GT boxes. 1469 | ids = np.where(rpn_match == 1)[0] 1470 | ix = 0 # index into rpn_bbox 1471 | # TODO: use box_refinement() rather than duplicating the code here 1472 | for i, a in zip(ids, anchors[ids]): 1473 | # Closest gt box (it might have IoU < 0.7) 1474 | gt = gt_boxes[anchor_iou_argmax[i]] 1475 | 1476 | # Convert coordinates to center plus width/height. 1477 | # GT Box 1478 | gt_h = gt[2] - gt[0] 1479 | gt_w = gt[3] - gt[1] 1480 | gt_center_y = gt[0] + 0.5 * gt_h 1481 | gt_center_x = gt[1] + 0.5 * gt_w 1482 | # Anchor 1483 | a_h = a[2] - a[0] 1484 | a_w = a[3] - a[1] 1485 | a_center_y = a[0] + 0.5 * a_h 1486 | a_center_x = a[1] + 0.5 * a_w 1487 | 1488 | # Compute the bbox refinement that the RPN should predict. 1489 | rpn_bbox[ix] = [ 1490 | (gt_center_y - a_center_y) / a_h, 1491 | (gt_center_x - a_center_x) / a_w, 1492 | np.log(gt_h / a_h), 1493 | np.log(gt_w / a_w), 1494 | ] 1495 | # Normalize 1496 | rpn_bbox[ix] /= config.RPN_BBOX_STD_DEV 1497 | ix += 1 1498 | 1499 | return rpn_match, rpn_bbox 1500 | 1501 | 1502 | def generate_random_rois(image_shape, count, gt_class_ids, gt_boxes): 1503 | """Generates ROI proposals similar to what a region proposal network 1504 | would generate. 1505 | 1506 | image_shape: [Height, Width, Depth] 1507 | count: Number of ROIs to generate 1508 | gt_class_ids: [N] Integer ground truth class IDs 1509 | gt_boxes: [N, (y1, x1, y2, x2)] Ground truth boxes in pixels. 1510 | 1511 | Returns: [count, (y1, x1, y2, x2)] ROI boxes in pixels. 1512 | """ 1513 | # placeholder 1514 | rois = np.zeros((count, 4), dtype=np.int32) 1515 | 1516 | # Generate random ROIs around GT boxes (90% of count) 1517 | rois_per_box = int(0.9 * count / gt_boxes.shape[0]) 1518 | for i in range(gt_boxes.shape[0]): 1519 | gt_y1, gt_x1, gt_y2, gt_x2 = gt_boxes[i] 1520 | h = gt_y2 - gt_y1 1521 | w = gt_x2 - gt_x1 1522 | # random boundaries 1523 | r_y1 = max(gt_y1 - h, 0) 1524 | r_y2 = min(gt_y2 + h, image_shape[0]) 1525 | r_x1 = max(gt_x1 - w, 0) 1526 | r_x2 = min(gt_x2 + w, image_shape[1]) 1527 | 1528 | # To avoid generating boxes with zero area, we generate double what 1529 | # we need and filter out the extra. If we get fewer valid boxes 1530 | # than we need, we loop and try again. 1531 | while True: 1532 | y1y2 = np.random.randint(r_y1, r_y2, (rois_per_box * 2, 2)) 1533 | x1x2 = np.random.randint(r_x1, r_x2, (rois_per_box * 2, 2)) 1534 | # Filter out zero area boxes 1535 | threshold = 1 1536 | y1y2 = y1y2[np.abs(y1y2[:, 0] - y1y2[:, 1]) >= 1537 | threshold][:rois_per_box] 1538 | x1x2 = x1x2[np.abs(x1x2[:, 0] - x1x2[:, 1]) >= 1539 | threshold][:rois_per_box] 1540 | if y1y2.shape[0] == rois_per_box and x1x2.shape[0] == rois_per_box: 1541 | break 1542 | 1543 | # Sort on axis 1 to ensure x1 <= x2 and y1 <= y2 and then reshape 1544 | # into x1, y1, x2, y2 order 1545 | x1, x2 = np.split(np.sort(x1x2, axis=1), 2, axis=1) 1546 | y1, y2 = np.split(np.sort(y1y2, axis=1), 2, axis=1) 1547 | box_rois = np.hstack([y1, x1, y2, x2]) 1548 | rois[rois_per_box * i:rois_per_box * (i + 1)] = box_rois 1549 | 1550 | # Generate random ROIs anywhere in the image (10% of count) 1551 | remaining_count = count - (rois_per_box * gt_boxes.shape[0]) 1552 | # To avoid generating boxes with zero area, we generate double what 1553 | # we need and filter out the extra. If we get fewer valid boxes 1554 | # than we need, we loop and try again. 1555 | while True: 1556 | y1y2 = np.random.randint(0, image_shape[0], (remaining_count * 2, 2)) 1557 | x1x2 = np.random.randint(0, image_shape[1], (remaining_count * 2, 2)) 1558 | # Filter out zero area boxes 1559 | threshold = 1 1560 | y1y2 = y1y2[np.abs(y1y2[:, 0] - y1y2[:, 1]) >= 1561 | threshold][:remaining_count] 1562 | x1x2 = x1x2[np.abs(x1x2[:, 0] - x1x2[:, 1]) >= 1563 | threshold][:remaining_count] 1564 | if y1y2.shape[0] == remaining_count and x1x2.shape[0] == remaining_count: 1565 | break 1566 | 1567 | # Sort on axis 1 to ensure x1 <= x2 and y1 <= y2 and then reshape 1568 | # into x1, y1, x2, y2 order 1569 | x1, x2 = np.split(np.sort(x1x2, axis=1), 2, axis=1) 1570 | y1, y2 = np.split(np.sort(y1y2, axis=1), 2, axis=1) 1571 | global_rois = np.hstack([y1, x1, y2, x2]) 1572 | rois[-remaining_count:] = global_rois 1573 | return rois 1574 | 1575 | 1576 | def data_generator(dataset, config, shuffle=True, augment=True, random_rois=0, 1577 | batch_size=1, detection_targets=False): 1578 | """A generator that returns images and corresponding target class ids, 1579 | bounding box deltas, and masks. 1580 | 1581 | dataset: The Dataset object to pick data from 1582 | config: The model config object 1583 | shuffle: If True, shuffles the samples before every epoch 1584 | augment: If True, applies image augmentation to images (currently only 1585 | horizontal flips are supported) 1586 | random_rois: If > 0 then generate proposals to be used to train the 1587 | network classifier and mask heads. Useful if training 1588 | the Mask RCNN part without the RPN. 1589 | batch_size: How many images to return in each call 1590 | detection_targets: If True, generate detection targets (class IDs, bbox 1591 | deltas, and masks). Typically for debugging or visualizations because 1592 | in trainig detection targets are generated by DetectionTargetLayer. 1593 | 1594 | Returns a Python generator. Upon calling next() on it, the 1595 | generator returns two lists, inputs and outputs. The containtes 1596 | of the lists differs depending on the received arguments: 1597 | inputs list: 1598 | - images: [batch, H, W, C] 1599 | - image_meta: [batch, size of image meta] 1600 | - rpn_match: [batch, N] Integer (1=positive anchor, -1=negative, 0=neutral) 1601 | - rpn_bbox: [batch, N, (dy, dx, log(dh), log(dw))] Anchor bbox deltas. 1602 | - gt_class_ids: [batch, MAX_GT_INSTANCES] Integer class IDs 1603 | - gt_boxes: [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] 1604 | - gt_masks: [batch, height, width, MAX_GT_INSTANCES]. The height and width 1605 | are those of the image unless use_mini_mask is True, in which 1606 | case they are defined in MINI_MASK_SHAPE. 1607 | 1608 | outputs list: Usually empty in regular training. But if detection_targets 1609 | is True then the outputs list contains target class_ids, bbox deltas, 1610 | and masks. 1611 | """ 1612 | b = 0 # batch item index 1613 | image_index = -1 1614 | image_ids = np.copy(dataset.image_ids) 1615 | error_count = 0 1616 | 1617 | # Anchors 1618 | # [anchor_count, (y1, x1, y2, x2)] 1619 | anchors = utils.generate_pyramid_anchors(config.RPN_ANCHOR_SCALES, 1620 | config.RPN_ANCHOR_RATIOS, 1621 | config.BACKBONE_SHAPES, 1622 | config.BACKBONE_STRIDES, 1623 | config.RPN_ANCHOR_STRIDE) 1624 | 1625 | # Keras requires a generator to run indefinately. 1626 | while True: 1627 | try: 1628 | # Increment index to pick next image. Shuffle if at the start of an epoch. 1629 | image_index = (image_index + 1) % len(image_ids) 1630 | if shuffle and image_index == 0: 1631 | np.random.shuffle(image_ids) 1632 | 1633 | # Get GT bounding boxes and masks for image. 1634 | image_id = image_ids[image_index] 1635 | image, image_meta, gt_class_ids, gt_boxes, gt_masks = \ 1636 | load_image_gt(dataset, config, image_id, augment=augment, 1637 | use_mini_mask=config.USE_MINI_MASK) 1638 | 1639 | # Skip images that have no instances. This can happen in cases 1640 | # where we train on a subset of classes and the image doesn't 1641 | # have any of the classes we care about. 1642 | if not np.any(gt_class_ids > 0): 1643 | continue 1644 | 1645 | # RPN Targets 1646 | rpn_match, rpn_bbox = build_rpn_targets(image.shape, anchors, 1647 | gt_class_ids, gt_boxes, config) 1648 | 1649 | # Mask R-CNN Targets 1650 | if random_rois: 1651 | rpn_rois = generate_random_rois( 1652 | image.shape, random_rois, gt_class_ids, gt_boxes) 1653 | if detection_targets: 1654 | rois, mrcnn_class_ids, mrcnn_bbox, mrcnn_mask =\ 1655 | build_detection_targets( 1656 | rpn_rois, gt_class_ids, gt_boxes, gt_masks, config) 1657 | 1658 | # Init batch arrays 1659 | if b == 0: 1660 | batch_image_meta = np.zeros( 1661 | (batch_size,) + image_meta.shape, dtype=image_meta.dtype) 1662 | batch_rpn_match = np.zeros( 1663 | [batch_size, anchors.shape[0], 1], dtype=rpn_match.dtype) 1664 | batch_rpn_bbox = np.zeros( 1665 | [batch_size, config.RPN_TRAIN_ANCHORS_PER_IMAGE, 4], dtype=rpn_bbox.dtype) 1666 | batch_images = np.zeros( 1667 | (batch_size,) + image.shape, dtype=np.float32) 1668 | batch_gt_class_ids = np.zeros( 1669 | (batch_size, config.MAX_GT_INSTANCES), dtype=np.int32) 1670 | batch_gt_boxes = np.zeros( 1671 | (batch_size, config.MAX_GT_INSTANCES, 4), dtype=np.int32) 1672 | if config.USE_MINI_MASK: 1673 | batch_gt_masks = np.zeros((batch_size, config.MINI_MASK_SHAPE[0], config.MINI_MASK_SHAPE[1], 1674 | config.MAX_GT_INSTANCES)) 1675 | else: 1676 | batch_gt_masks = np.zeros( 1677 | (batch_size, image.shape[0], image.shape[1], config.MAX_GT_INSTANCES)) 1678 | if random_rois: 1679 | batch_rpn_rois = np.zeros( 1680 | (batch_size, rpn_rois.shape[0], 4), dtype=rpn_rois.dtype) 1681 | if detection_targets: 1682 | batch_rois = np.zeros( 1683 | (batch_size,) + rois.shape, dtype=rois.dtype) 1684 | batch_mrcnn_class_ids = np.zeros( 1685 | (batch_size,) + mrcnn_class_ids.shape, dtype=mrcnn_class_ids.dtype) 1686 | batch_mrcnn_bbox = np.zeros( 1687 | (batch_size,) + mrcnn_bbox.shape, dtype=mrcnn_bbox.dtype) 1688 | batch_mrcnn_mask = np.zeros( 1689 | (batch_size,) + mrcnn_mask.shape, dtype=mrcnn_mask.dtype) 1690 | 1691 | # If more instances than fits in the array, sub-sample from them. 1692 | if gt_boxes.shape[0] > config.MAX_GT_INSTANCES: 1693 | ids = np.random.choice( 1694 | np.arange(gt_boxes.shape[0]), config.MAX_GT_INSTANCES, replace=False) 1695 | gt_class_ids = gt_class_ids[ids] 1696 | gt_boxes = gt_boxes[ids] 1697 | gt_masks = gt_masks[:, :, ids] 1698 | 1699 | # Add to batch 1700 | batch_image_meta[b] = image_meta 1701 | batch_rpn_match[b] = rpn_match[:, np.newaxis] 1702 | batch_rpn_bbox[b] = rpn_bbox 1703 | batch_images[b] = mold_image(image.astype(np.float32), config) 1704 | batch_gt_class_ids[b, :gt_class_ids.shape[0]] = gt_class_ids 1705 | batch_gt_boxes[b, :gt_boxes.shape[0]] = gt_boxes 1706 | batch_gt_masks[b, :, :, :gt_masks.shape[-1]] = gt_masks 1707 | if random_rois: 1708 | batch_rpn_rois[b] = rpn_rois 1709 | if detection_targets: 1710 | batch_rois[b] = rois 1711 | batch_mrcnn_class_ids[b] = mrcnn_class_ids 1712 | batch_mrcnn_bbox[b] = mrcnn_bbox 1713 | batch_mrcnn_mask[b] = mrcnn_mask 1714 | b += 1 1715 | 1716 | # Batch full? 1717 | if b >= batch_size: 1718 | inputs = [batch_images, batch_image_meta, batch_rpn_match, batch_rpn_bbox, 1719 | batch_gt_class_ids, batch_gt_boxes, batch_gt_masks] 1720 | outputs = [] 1721 | 1722 | if random_rois: 1723 | inputs.extend([batch_rpn_rois]) 1724 | if detection_targets: 1725 | inputs.extend([batch_rois]) 1726 | # Keras requires that output and targets have the same number of dimensions 1727 | batch_mrcnn_class_ids = np.expand_dims( 1728 | batch_mrcnn_class_ids, -1) 1729 | outputs.extend( 1730 | [batch_mrcnn_class_ids, batch_mrcnn_bbox, batch_mrcnn_mask]) 1731 | 1732 | yield inputs, outputs 1733 | 1734 | # start a new batch 1735 | b = 0 1736 | except (GeneratorExit, KeyboardInterrupt): 1737 | raise 1738 | except: 1739 | # Log it and skip the image 1740 | logging.exception("Error processing image {}".format( 1741 | dataset.image_info[image_id])) 1742 | error_count += 1 1743 | if error_count > 5: 1744 | raise 1745 | 1746 | 1747 | ############################################################ 1748 | # MaskRCNN Class 1749 | ############################################################ 1750 | 1751 | class MaskRCNN(): 1752 | """Encapsulates the Mask RCNN model functionality. 1753 | 1754 | The actual Keras model is in the keras_model property. 1755 | """ 1756 | 1757 | def __init__(self, mode, config, model_dir): 1758 | """ 1759 | mode: Either "training" or "inference" 1760 | config: A Sub-class of the Config class 1761 | model_dir: Directory to save training logs and trained weights 1762 | """ 1763 | assert mode in ['training', 'inference'] 1764 | self.mode = mode 1765 | self.config = config 1766 | self.model_dir = model_dir 1767 | self.set_log_dir() 1768 | self.keras_model = self.build(mode=mode, config=config) 1769 | 1770 | def build(self, mode, config): 1771 | """Build Mask R-CNN architecture. 1772 | input_shape: The shape of the input image. 1773 | mode: Either "training" or "inference". The inputs and 1774 | outputs of the model differ accordingly. 1775 | """ 1776 | assert mode in ['training', 'inference'] 1777 | 1778 | # Image size must be dividable by 2 multiple times 1779 | h, w = config.IMAGE_SHAPE[:2] 1780 | if h / 2**6 != int(h / 2**6) or w / 2**6 != int(w / 2**6): 1781 | raise Exception("Image size must be dividable by 2 at least 6 times " 1782 | "to avoid fractions when downscaling and upscaling." 1783 | "For example, use 256, 320, 384, 448, 512, ... etc. ") 1784 | 1785 | # Inputs 1786 | input_image = KL.Input( 1787 | shape=config.IMAGE_SHAPE.tolist(), name="input_image") 1788 | input_image_meta = KL.Input(shape=[None], name="input_image_meta") 1789 | if mode == "training": 1790 | # RPN GT 1791 | input_rpn_match = KL.Input( 1792 | shape=[None, 1], name="input_rpn_match", dtype=tf.int32) 1793 | input_rpn_bbox = KL.Input( 1794 | shape=[None, 4], name="input_rpn_bbox", dtype=tf.float32) 1795 | 1796 | # Detection GT (class IDs, bounding boxes, and masks) 1797 | # 1. GT Class IDs (zero padded) 1798 | input_gt_class_ids = KL.Input( 1799 | shape=[None], name="input_gt_class_ids", dtype=tf.int32) 1800 | # 2. GT Boxes in pixels (zero padded) 1801 | # [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in image coordinates 1802 | input_gt_boxes = KL.Input( 1803 | shape=[None, 4], name="input_gt_boxes", dtype=tf.float32) 1804 | # Normalize coordinates 1805 | h, w = K.shape(input_image)[1], K.shape(input_image)[2] 1806 | image_scale = K.cast(K.stack([h, w, h, w], axis=0), tf.float32) 1807 | gt_boxes = KL.Lambda(lambda x: x / image_scale)(input_gt_boxes) 1808 | # 3. GT Masks (zero padded) 1809 | # [batch, height, width, MAX_GT_INSTANCES] 1810 | if config.USE_MINI_MASK: 1811 | input_gt_masks = KL.Input( 1812 | shape=[config.MINI_MASK_SHAPE[0], 1813 | config.MINI_MASK_SHAPE[1], None], 1814 | name="input_gt_masks", dtype=bool) 1815 | else: 1816 | input_gt_masks = KL.Input( 1817 | shape=[config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1], None], 1818 | name="input_gt_masks", dtype=bool) 1819 | 1820 | # Build the shared convolutional layers. 1821 | # Bottom-up Layers 1822 | # Returns a list of the last layers of each stage, 5 in total. 1823 | # Don't create the thead (stage 5), so we pick the 4th item in the list. 1824 | _, C2, C3, C4, C5 = resnet_graph(input_image, config.BACKBONE, stage5=True) 1825 | # Top-down Layers 1826 | # TODO: add assert to varify feature map sizes match what's in config 1827 | P5 = KL.Conv2D(256, (1, 1), name='fpn_c5p5')(C5) 1828 | P4 = KL.Add(name="fpn_p4add")([ 1829 | KL.UpSampling2D(size=(2, 2), name="fpn_p5upsampled")(P5), 1830 | KL.Conv2D(256, (1, 1), name='fpn_c4p4')(C4)]) 1831 | P3 = KL.Add(name="fpn_p3add")([ 1832 | KL.UpSampling2D(size=(2, 2), name="fpn_p4upsampled")(P4), 1833 | KL.Conv2D(256, (1, 1), name='fpn_c3p3')(C3)]) 1834 | P2 = KL.Add(name="fpn_p2add")([ 1835 | KL.UpSampling2D(size=(2, 2), name="fpn_p3upsampled")(P3), 1836 | KL.Conv2D(256, (1, 1), name='fpn_c2p2')(C2)]) 1837 | # Attach 3x3 conv to all P layers to get the final feature maps. 1838 | P2 = KL.Conv2D(256, (3, 3), padding="SAME", name="fpn_p2")(P2) 1839 | P3 = KL.Conv2D(256, (3, 3), padding="SAME", name="fpn_p3")(P3) 1840 | P4 = KL.Conv2D(256, (3, 3), padding="SAME", name="fpn_p4")(P4) 1841 | P5 = KL.Conv2D(256, (3, 3), padding="SAME", name="fpn_p5")(P5) 1842 | # P6 is used for the 5th anchor scale in RPN. Generated by 1843 | # subsampling from P5 with stride of 2. 1844 | P6 = KL.MaxPooling2D(pool_size=(1, 1), strides=2, name="fpn_p6")(P5) 1845 | 1846 | # Note that P6 is used in RPN, but not in the classifier heads. 1847 | rpn_feature_maps = [P2, P3, P4, P5, P6] 1848 | mrcnn_feature_maps = [P2, P3, P4, P5] 1849 | 1850 | # Generate Anchors 1851 | self.anchors = utils.generate_pyramid_anchors(config.RPN_ANCHOR_SCALES, 1852 | config.RPN_ANCHOR_RATIOS, 1853 | config.BACKBONE_SHAPES, 1854 | config.BACKBONE_STRIDES, 1855 | config.RPN_ANCHOR_STRIDE) 1856 | 1857 | # RPN Model 1858 | rpn = build_rpn_model(config.RPN_ANCHOR_STRIDE, 1859 | len(config.RPN_ANCHOR_RATIOS), 256) 1860 | # Loop through pyramid layers 1861 | layer_outputs = [] # list of lists 1862 | for p in rpn_feature_maps: 1863 | layer_outputs.append(rpn([p])) 1864 | # Concatenate layer outputs 1865 | # Convert from list of lists of level outputs to list of lists 1866 | # of outputs across levels. 1867 | # e.g. [[a1, b1, c1], [a2, b2, c2]] => [[a1, a2], [b1, b2], [c1, c2]] 1868 | output_names = ["rpn_class_logits", "rpn_class", "rpn_bbox"] 1869 | outputs = list(zip(*layer_outputs)) 1870 | outputs = [KL.Concatenate(axis=1, name=n)(list(o)) 1871 | for o, n in zip(outputs, output_names)] 1872 | 1873 | rpn_class_logits, rpn_class, rpn_bbox = outputs 1874 | 1875 | # Generate proposals 1876 | # Proposals are [batch, N, (y1, x1, y2, x2)] in normalized coordinates 1877 | # and zero padded. 1878 | proposal_count = config.POST_NMS_ROIS_TRAINING if mode == "training"\ 1879 | else config.POST_NMS_ROIS_INFERENCE 1880 | rpn_rois = ProposalLayer(proposal_count=proposal_count, 1881 | nms_threshold=config.RPN_NMS_THRESHOLD, 1882 | name="ROI", 1883 | anchors=self.anchors, 1884 | config=config)([rpn_class, rpn_bbox]) 1885 | 1886 | if mode == "training": 1887 | # Class ID mask to mark class IDs supported by the dataset the image 1888 | # came from. 1889 | _, _, _, active_class_ids = KL.Lambda(lambda x: parse_image_meta_graph(x), 1890 | mask=[None, None, None, None])(input_image_meta) 1891 | 1892 | if not config.USE_RPN_ROIS: 1893 | # Ignore predicted ROIs and use ROIs provided as an input. 1894 | input_rois = KL.Input(shape=[config.POST_NMS_ROIS_TRAINING, 4], 1895 | name="input_roi", dtype=np.int32) 1896 | # Normalize coordinates to 0-1 range. 1897 | target_rois = KL.Lambda(lambda x: K.cast( 1898 | x, tf.float32) / image_scale[:4])(input_rois) 1899 | else: 1900 | target_rois = rpn_rois 1901 | 1902 | # Generate detection targets 1903 | # Subsamples proposals and generates target outputs for training 1904 | # Note that proposal class IDs, gt_boxes, and gt_masks are zero 1905 | # padded. Equally, returned rois and targets are zero padded. 1906 | rois, target_class_ids, target_bbox, target_mask =\ 1907 | DetectionTargetLayer(config, name="proposal_targets")([ 1908 | target_rois, input_gt_class_ids, gt_boxes, input_gt_masks]) 1909 | 1910 | # Network Heads 1911 | # TODO: verify that this handles zero padded ROIs 1912 | mrcnn_class_logits, mrcnn_class, mrcnn_bbox =\ 1913 | fpn_classifier_graph(rois, mrcnn_feature_maps, config.IMAGE_SHAPE, 1914 | config.POOL_SIZE, config.NUM_CLASSES) 1915 | 1916 | mrcnn_mask = build_fpn_mask_graph(rois, mrcnn_feature_maps, 1917 | config.IMAGE_SHAPE, 1918 | config.MASK_POOL_SIZE, 1919 | config.NUM_CLASSES) 1920 | 1921 | # TODO: clean up (use tf.identify if necessary) 1922 | output_rois = KL.Lambda(lambda x: x * 1, name="output_rois")(rois) 1923 | 1924 | # Losses 1925 | rpn_class_loss = KL.Lambda(lambda x: rpn_class_loss_graph(*x), name="rpn_class_loss")( 1926 | [input_rpn_match, rpn_class_logits]) 1927 | rpn_bbox_loss = KL.Lambda(lambda x: rpn_bbox_loss_graph(config, *x), name="rpn_bbox_loss")( 1928 | [input_rpn_bbox, input_rpn_match, rpn_bbox]) 1929 | class_loss = KL.Lambda(lambda x: mrcnn_class_loss_graph(*x), name="mrcnn_class_loss")( 1930 | [target_class_ids, mrcnn_class_logits, active_class_ids]) 1931 | bbox_loss = KL.Lambda(lambda x: mrcnn_bbox_loss_graph(*x), name="mrcnn_bbox_loss")( 1932 | [target_bbox, target_class_ids, mrcnn_bbox]) 1933 | mask_loss = KL.Lambda(lambda x: mrcnn_mask_loss_graph(*x), name="mrcnn_mask_loss")( 1934 | [target_mask, target_class_ids, mrcnn_mask]) 1935 | 1936 | # Model 1937 | inputs = [input_image, input_image_meta, 1938 | input_rpn_match, input_rpn_bbox, input_gt_class_ids, input_gt_boxes, input_gt_masks] 1939 | if not config.USE_RPN_ROIS: 1940 | inputs.append(input_rois) 1941 | outputs = [rpn_class_logits, rpn_class, rpn_bbox, 1942 | mrcnn_class_logits, mrcnn_class, mrcnn_bbox, mrcnn_mask, 1943 | rpn_rois, output_rois, 1944 | rpn_class_loss, rpn_bbox_loss, class_loss, bbox_loss, mask_loss] 1945 | model = KM.Model(inputs, outputs, name='mask_rcnn') 1946 | else: 1947 | # Network Heads 1948 | # Proposal classifier and BBox regressor heads 1949 | mrcnn_class_logits, mrcnn_class, mrcnn_bbox =\ 1950 | fpn_classifier_graph(rpn_rois, mrcnn_feature_maps, config.IMAGE_SHAPE, 1951 | config.POOL_SIZE, config.NUM_CLASSES) 1952 | 1953 | # Detections 1954 | # output is [batch, num_detections, (y1, x1, y2, x2, class_id, score)] in image coordinates 1955 | detections = DetectionLayer(config, name="mrcnn_detection")( 1956 | [rpn_rois, mrcnn_class, mrcnn_bbox, input_image_meta]) 1957 | 1958 | # Convert boxes to normalized coordinates 1959 | # TODO: let DetectionLayer return normalized coordinates to avoid 1960 | # unnecessary conversions 1961 | h, w = config.IMAGE_SHAPE[:2] 1962 | detection_boxes = KL.Lambda( 1963 | lambda x: x[..., :4] / np.array([h, w, h, w]))(detections) 1964 | 1965 | # Create masks for detections 1966 | mrcnn_mask = build_fpn_mask_graph(detection_boxes, mrcnn_feature_maps, 1967 | config.IMAGE_SHAPE, 1968 | config.MASK_POOL_SIZE, 1969 | config.NUM_CLASSES) 1970 | 1971 | model = KM.Model([input_image, input_image_meta], 1972 | [detections, mrcnn_class, mrcnn_bbox, 1973 | mrcnn_mask, rpn_rois, rpn_class, rpn_bbox], 1974 | name='mask_rcnn') 1975 | 1976 | # Add multi-GPU support. 1977 | if config.GPU_COUNT > 1: 1978 | from parallel_model import ParallelModel 1979 | model = ParallelModel(model, config.GPU_COUNT) 1980 | 1981 | return model 1982 | 1983 | def find_last(self): 1984 | """Finds the last checkpoint file of the last trained model in the 1985 | model directory. 1986 | Returns: 1987 | log_dir: The directory where events and weights are saved 1988 | checkpoint_path: the path to the last checkpoint file 1989 | """ 1990 | # Get directory names. Each directory corresponds to a model 1991 | dir_names = next(os.walk(self.model_dir))[1] 1992 | key = self.config.NAME.lower() 1993 | dir_names = filter(lambda f: f.startswith(key), dir_names) 1994 | dir_names = sorted(dir_names) 1995 | if not dir_names: 1996 | return None, None 1997 | # Pick last directory 1998 | dir_name = os.path.join(self.model_dir, dir_names[-1]) 1999 | # Find the last checkpoint 2000 | checkpoints = next(os.walk(dir_name))[2] 2001 | checkpoints = filter(lambda f: f.startswith("mask_rcnn"), checkpoints) 2002 | checkpoints = sorted(checkpoints) 2003 | if not checkpoints: 2004 | return dir_name, None 2005 | checkpoint = os.path.join(dir_name, checkpoints[-1]) 2006 | return dir_name, checkpoint 2007 | 2008 | def load_weights(self, filepath, by_name=False, exclude=None): 2009 | """Modified version of the correspoding Keras function with 2010 | the addition of multi-GPU support and the ability to exclude 2011 | some layers from loading. 2012 | exlude: list of layer names to excluce 2013 | """ 2014 | import h5py 2015 | from keras.engine import topology 2016 | 2017 | if exclude: 2018 | by_name = True 2019 | 2020 | if h5py is None: 2021 | raise ImportError('`load_weights` requires h5py.') 2022 | f = h5py.File(filepath, mode='r') 2023 | if 'layer_names' not in f.attrs and 'model_weights' in f: 2024 | f = f['model_weights'] 2025 | 2026 | # In multi-GPU training, we wrap the model. Get layers 2027 | # of the inner model because they have the weights. 2028 | keras_model = self.keras_model 2029 | layers = keras_model.inner_model.layers if hasattr(keras_model, "inner_model")\ 2030 | else keras_model.layers 2031 | 2032 | # Exclude some layers 2033 | if exclude: 2034 | layers = filter(lambda l: l.name not in exclude, layers) 2035 | 2036 | if by_name: 2037 | topology.load_weights_from_hdf5_group_by_name(f, layers) 2038 | else: 2039 | topology.load_weights_from_hdf5_group(f, layers) 2040 | if hasattr(f, 'close'): 2041 | f.close() 2042 | 2043 | # Update the log directory 2044 | self.set_log_dir(filepath) 2045 | 2046 | def get_imagenet_weights(self): 2047 | """Downloads ImageNet trained weights from Keras. 2048 | Returns path to weights file. 2049 | """ 2050 | from keras.utils.data_utils import get_file 2051 | TF_WEIGHTS_PATH_NO_TOP = 'https://github.com/fchollet/deep-learning-models/'\ 2052 | 'releases/download/v0.2/'\ 2053 | 'resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5' 2054 | weights_path = get_file('resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5', 2055 | TF_WEIGHTS_PATH_NO_TOP, 2056 | cache_subdir='models', 2057 | md5_hash='a268eb855778b3df3c7506639542a6af') 2058 | return weights_path 2059 | 2060 | def compile(self, learning_rate, momentum): 2061 | """Gets the model ready for training. Adds losses, regularization, and 2062 | metrics. Then calls the Keras compile() function. 2063 | """ 2064 | # Optimizer object 2065 | optimizer = keras.optimizers.SGD(lr=learning_rate, momentum=momentum, 2066 | clipnorm=5.0) 2067 | # Add Losses 2068 | # First, clear previously set losses to avoid duplication 2069 | self.keras_model._losses = [] 2070 | self.keras_model._per_input_losses = {} 2071 | loss_names = ["rpn_class_loss", "rpn_bbox_loss", 2072 | "mrcnn_class_loss", "mrcnn_bbox_loss", "mrcnn_mask_loss"] 2073 | for name in loss_names: 2074 | layer = self.keras_model.get_layer(name) 2075 | if layer.output in self.keras_model.losses: 2076 | continue 2077 | self.keras_model.add_loss( 2078 | tf.reduce_mean(layer.output, keep_dims=True)) 2079 | 2080 | # Add L2 Regularization 2081 | # Skip gamma and beta weights of batch normalization layers. 2082 | reg_losses = [keras.regularizers.l2(self.config.WEIGHT_DECAY)(w) / tf.cast(tf.size(w), tf.float32) 2083 | for w in self.keras_model.trainable_weights 2084 | if 'gamma' not in w.name and 'beta' not in w.name] 2085 | self.keras_model.add_loss(tf.add_n(reg_losses)) 2086 | 2087 | # Compile 2088 | self.keras_model.compile(optimizer=optimizer, loss=[ 2089 | None] * len(self.keras_model.outputs)) 2090 | 2091 | # Add metrics for losses 2092 | for name in loss_names: 2093 | if name in self.keras_model.metrics_names: 2094 | continue 2095 | layer = self.keras_model.get_layer(name) 2096 | self.keras_model.metrics_names.append(name) 2097 | self.keras_model.metrics_tensors.append(tf.reduce_mean( 2098 | layer.output, keep_dims=True)) 2099 | 2100 | def set_trainable(self, layer_regex, keras_model=None, indent=0, verbose=1): 2101 | """Sets model layers as trainable if their names match 2102 | the given regular expression. 2103 | """ 2104 | # Print message on the first call (but not on recursive calls) 2105 | if verbose > 0 and keras_model is None: 2106 | log("Selecting layers to train") 2107 | 2108 | keras_model = keras_model or self.keras_model 2109 | 2110 | # In multi-GPU training, we wrap the model. Get layers 2111 | # of the inner model because they have the weights. 2112 | layers = keras_model.inner_model.layers if hasattr(keras_model, "inner_model")\ 2113 | else keras_model.layers 2114 | 2115 | for layer in layers: 2116 | # Is the layer a model? 2117 | if layer.__class__.__name__ == 'Model': 2118 | print("In model: ", layer.name) 2119 | self.set_trainable( 2120 | layer_regex, keras_model=layer, indent=indent + 4) 2121 | continue 2122 | 2123 | if not layer.weights: 2124 | continue 2125 | # Is it trainable? 2126 | trainable = bool(re.fullmatch(layer_regex, layer.name)) 2127 | # Update layer. If layer is a container, update inner layer. 2128 | if layer.__class__.__name__ == 'TimeDistributed': 2129 | layer.layer.trainable = trainable 2130 | else: 2131 | layer.trainable = trainable 2132 | # Print trainble layer names 2133 | if trainable and verbose > 0: 2134 | log("{}{:20} ({})".format(" " * indent, layer.name, 2135 | layer.__class__.__name__)) 2136 | 2137 | def set_log_dir(self, model_path=None): 2138 | """Sets the model log directory and epoch counter. 2139 | 2140 | model_path: If None, or a format different from what this code uses 2141 | then set a new log directory and start epochs from 0. Otherwise, 2142 | extract the log directory and the epoch counter from the file 2143 | name. 2144 | """ 2145 | # Set date and epoch counter as if starting a new model 2146 | self.epoch = 0 2147 | now = datetime.datetime.now() 2148 | 2149 | # If we have a model path with date and epochs use them 2150 | if model_path: 2151 | # Continue from we left of. Get epoch and date from the file name 2152 | # A sample model path might look like: 2153 | # /path/to/logs/coco20171029T2315/mask_rcnn_coco_0001.h5 2154 | regex = r".*/\w+(\d{4})(\d{2})(\d{2})T(\d{2})(\d{2})/mask\_rcnn\_\w+(\d{4})\.h5" 2155 | m = re.match(regex, model_path) 2156 | if m: 2157 | now = datetime.datetime(int(m.group(1)), int(m.group(2)), int(m.group(3)), 2158 | int(m.group(4)), int(m.group(5))) 2159 | self.epoch = int(m.group(6)) + 1 2160 | 2161 | # Directory for training logs 2162 | self.log_dir = os.path.join(self.model_dir, "{}{:%Y%m%dT%H%M}".format( 2163 | self.config.NAME.lower(), now)) 2164 | 2165 | # Path to save after each epoch. Include placeholders that get filled by Keras. 2166 | self.checkpoint_path = os.path.join(self.log_dir, "mask_rcnn_{}_*epoch*.h5".format( 2167 | self.config.NAME.lower())) 2168 | self.checkpoint_path = self.checkpoint_path.replace( 2169 | "*epoch*", "{epoch:04d}") 2170 | 2171 | def train(self, train_dataset, val_dataset, learning_rate, epochs, layers): 2172 | """Train the model. 2173 | train_dataset, val_dataset: Training and validation Dataset objects. 2174 | learning_rate: The learning rate to train with 2175 | epochs: Number of training epochs. Note that previous training epochs 2176 | are considered to be done alreay, so this actually determines 2177 | the epochs to train in total rather than in this particaular 2178 | call. 2179 | layers: Allows selecting wich layers to train. It can be: 2180 | - A regular expression to match layer names to train 2181 | - One of these predefined values: 2182 | heaads: The RPN, classifier and mask heads of the network 2183 | all: All the layers 2184 | 3+: Train Resnet stage 3 and up 2185 | 4+: Train Resnet stage 4 and up 2186 | 5+: Train Resnet stage 5 and up 2187 | """ 2188 | assert self.mode == "training", "Create model in training mode." 2189 | 2190 | # Pre-defined layer regular expressions 2191 | layer_regex = { 2192 | # all layers but the backbone 2193 | "heads": r"(mrcnn\_.*)|(rpn\_.*)|(fpn\_.*)", 2194 | # From a specific Resnet stage and up 2195 | "3+": r"(res3.*)|(bn3.*)|(res4.*)|(bn4.*)|(res5.*)|(bn5.*)|(mrcnn\_.*)|(rpn\_.*)|(fpn\_.*)", 2196 | "4+": r"(res4.*)|(bn4.*)|(res5.*)|(bn5.*)|(mrcnn\_.*)|(rpn\_.*)|(fpn\_.*)", 2197 | "5+": r"(res5.*)|(bn5.*)|(mrcnn\_.*)|(rpn\_.*)|(fpn\_.*)", 2198 | # All layers 2199 | "all": ".*", 2200 | } 2201 | if layers in layer_regex.keys(): 2202 | layers = layer_regex[layers] 2203 | 2204 | # Data generators 2205 | train_generator = data_generator(train_dataset, self.config, shuffle=True, 2206 | batch_size=self.config.BATCH_SIZE) 2207 | val_generator = data_generator(val_dataset, self.config, shuffle=True, 2208 | batch_size=self.config.BATCH_SIZE, 2209 | augment=False) 2210 | 2211 | # Callbacks 2212 | callbacks = [ 2213 | keras.callbacks.TensorBoard(log_dir=self.log_dir, 2214 | histogram_freq=0, write_graph=True, write_images=False), 2215 | keras.callbacks.ModelCheckpoint(self.checkpoint_path, 2216 | verbose=0, save_weights_only=True), 2217 | ] 2218 | 2219 | # Train 2220 | log("\nStarting at epoch {}. LR={}\n".format(self.epoch, learning_rate)) 2221 | log("Checkpoint Path: {}".format(self.checkpoint_path)) 2222 | self.set_trainable(layers) 2223 | self.compile(learning_rate, self.config.LEARNING_MOMENTUM) 2224 | 2225 | # Work-around for Windows: Keras fails on Windows when using 2226 | # multiprocessing workers. See discussion here: 2227 | # https://github.com/matterport/Mask_RCNN/issues/13#issuecomment-353124009 2228 | if os.name is 'nt': 2229 | workers = 0 2230 | else: 2231 | workers = max(self.config.BATCH_SIZE // 2, 2) 2232 | 2233 | self.keras_model.fit_generator( 2234 | train_generator, 2235 | initial_epoch=self.epoch, 2236 | epochs=epochs, 2237 | steps_per_epoch=self.config.STEPS_PER_EPOCH, 2238 | callbacks=callbacks, 2239 | validation_data=next(val_generator), 2240 | validation_steps=self.config.VALIDATION_STEPS, 2241 | max_queue_size=100, 2242 | workers=workers, 2243 | use_multiprocessing=True, 2244 | ) 2245 | self.epoch = max(self.epoch, epochs) 2246 | 2247 | def mold_inputs(self, images): 2248 | """Takes a list of images and modifies them to the format expected 2249 | as an input to the neural network. 2250 | images: List of image matricies [height,width,depth]. Images can have 2251 | different sizes. 2252 | 2253 | Returns 3 Numpy matricies: 2254 | molded_images: [N, h, w, 3]. Images resized and normalized. 2255 | image_metas: [N, length of meta data]. Details about each image. 2256 | windows: [N, (y1, x1, y2, x2)]. The portion of the image that has the 2257 | original image (padding excluded). 2258 | """ 2259 | molded_images = [] 2260 | image_metas = [] 2261 | windows = [] 2262 | for image in images: 2263 | # Resize image to fit the model expected size 2264 | # TODO: move resizing to mold_image() 2265 | molded_image, window, scale, padding = utils.resize_image( 2266 | image, 2267 | min_dim=self.config.IMAGE_MIN_DIM, 2268 | max_dim=self.config.IMAGE_MAX_DIM, 2269 | padding=self.config.IMAGE_PADDING) 2270 | molded_image = mold_image(molded_image, self.config) 2271 | # Build image_meta 2272 | image_meta = compose_image_meta( 2273 | 0, image.shape, window, 2274 | np.zeros([self.config.NUM_CLASSES], dtype=np.int32)) 2275 | # Append 2276 | molded_images.append(molded_image) 2277 | windows.append(window) 2278 | image_metas.append(image_meta) 2279 | # Pack into arrays 2280 | molded_images = np.stack(molded_images) 2281 | image_metas = np.stack(image_metas) 2282 | windows = np.stack(windows) 2283 | return molded_images, image_metas, windows 2284 | 2285 | def unmold_detections(self, detections, mrcnn_mask, image_shape, window): 2286 | """Reformats the detections of one image from the format of the neural 2287 | network output to a format suitable for use in the rest of the 2288 | application. 2289 | 2290 | detections: [N, (y1, x1, y2, x2, class_id, score)] 2291 | mrcnn_mask: [N, height, width, num_classes] 2292 | image_shape: [height, width, depth] Original size of the image before resizing 2293 | window: [y1, x1, y2, x2] Box in the image where the real image is 2294 | excluding the padding. 2295 | 2296 | Returns: 2297 | boxes: [N, (y1, x1, y2, x2)] Bounding boxes in pixels 2298 | class_ids: [N] Integer class IDs for each bounding box 2299 | scores: [N] Float probability scores of the class_id 2300 | masks: [height, width, num_instances] Instance masks 2301 | """ 2302 | # How many detections do we have? 2303 | # Detections array is padded with zeros. Find the first class_id == 0. 2304 | zero_ix = np.where(detections[:, 4] == 0)[0] 2305 | N = zero_ix[0] if zero_ix.shape[0] > 0 else detections.shape[0] 2306 | 2307 | # Extract boxes, class_ids, scores, and class-specific masks 2308 | boxes = detections[:N, :4] 2309 | class_ids = detections[:N, 4].astype(np.int32) 2310 | scores = detections[:N, 5] 2311 | masks = mrcnn_mask[np.arange(N), :, :, class_ids] 2312 | 2313 | # Compute scale and shift to translate coordinates to image domain. 2314 | h_scale = image_shape[0] / (window[2] - window[0]) 2315 | w_scale = image_shape[1] / (window[3] - window[1]) 2316 | scale = min(h_scale, w_scale) 2317 | shift = window[:2] # y, x 2318 | scales = np.array([scale, scale, scale, scale]) 2319 | shifts = np.array([shift[0], shift[1], shift[0], shift[1]]) 2320 | 2321 | # Translate bounding boxes to image domain 2322 | boxes = np.multiply(boxes - shifts, scales).astype(np.int32) 2323 | 2324 | # Filter out detections with zero area. Often only happens in early 2325 | # stages of training when the network weights are still a bit random. 2326 | exclude_ix = np.where( 2327 | (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1]) <= 0)[0] 2328 | if exclude_ix.shape[0] > 0: 2329 | boxes = np.delete(boxes, exclude_ix, axis=0) 2330 | class_ids = np.delete(class_ids, exclude_ix, axis=0) 2331 | scores = np.delete(scores, exclude_ix, axis=0) 2332 | masks = np.delete(masks, exclude_ix, axis=0) 2333 | N = class_ids.shape[0] 2334 | 2335 | # Resize masks to original image size and set boundary threshold. 2336 | full_masks = [] 2337 | for i in range(N): 2338 | # Convert neural network mask to full size mask 2339 | full_mask = utils.unmold_mask(masks[i], boxes[i], image_shape) 2340 | full_masks.append(full_mask) 2341 | full_masks = np.stack(full_masks, axis=-1)\ 2342 | if full_masks else np.empty((0,) + masks.shape[1:3]) 2343 | 2344 | return boxes, class_ids, scores, full_masks 2345 | 2346 | def detect(self, images, verbose=0): 2347 | """Runs the detection pipeline. 2348 | 2349 | images: List of images, potentially of different sizes. 2350 | 2351 | Returns a list of dicts, one dict per image. The dict contains: 2352 | rois: [N, (y1, x1, y2, x2)] detection bounding boxes 2353 | class_ids: [N] int class IDs 2354 | scores: [N] float probability scores for the class IDs 2355 | masks: [H, W, N] instance binary masks 2356 | """ 2357 | assert self.mode == "inference", "Create model in inference mode." 2358 | assert len( 2359 | images) == self.config.BATCH_SIZE, "len(images) must be equal to BATCH_SIZE" 2360 | 2361 | if verbose: 2362 | log("Processing {} images".format(len(images))) 2363 | for image in images: 2364 | log("image", image) 2365 | # Mold inputs to format expected by the neural network 2366 | molded_images, image_metas, windows = self.mold_inputs(images) 2367 | if verbose: 2368 | log("molded_images", molded_images) 2369 | log("image_metas", image_metas) 2370 | # Run object detection 2371 | detections, mrcnn_class, mrcnn_bbox, mrcnn_mask, \ 2372 | rois, rpn_class, rpn_bbox =\ 2373 | self.keras_model.predict([molded_images, image_metas], verbose=0) 2374 | # Process detections 2375 | results = [] 2376 | for i, image in enumerate(images): 2377 | final_rois, final_class_ids, final_scores, final_masks =\ 2378 | self.unmold_detections(detections[i], mrcnn_mask[i], 2379 | image.shape, windows[i]) 2380 | results.append({ 2381 | "rois": final_rois, 2382 | "class_ids": final_class_ids, 2383 | "scores": final_scores, 2384 | "masks": final_masks, 2385 | }) 2386 | return results 2387 | 2388 | def ancestor(self, tensor, name, checked=None): 2389 | """Finds the ancestor of a TF tensor in the computation graph. 2390 | tensor: TensorFlow symbolic tensor. 2391 | name: Name of ancestor tensor to find 2392 | checked: For internal use. A list of tensors that were already 2393 | searched to avoid loops in traversing the graph. 2394 | """ 2395 | checked = checked if checked is not None else [] 2396 | # Put a limit on how deep we go to avoid very long loops 2397 | if len(checked) > 500: 2398 | return None 2399 | # Convert name to a regex and allow matching a number prefix 2400 | # because Keras adds them automatically 2401 | if isinstance(name, str): 2402 | name = re.compile(name.replace("/", r"(\_\d+)*/")) 2403 | 2404 | parents = tensor.op.inputs 2405 | for p in parents: 2406 | if p in checked: 2407 | continue 2408 | if bool(re.fullmatch(name, p.name)): 2409 | return p 2410 | checked.append(p) 2411 | a = self.ancestor(p, name, checked) 2412 | if a is not None: 2413 | return a 2414 | return None 2415 | 2416 | def find_trainable_layer(self, layer): 2417 | """If a layer is encapsulated by another layer, this function 2418 | digs through the encapsulation and returns the layer that holds 2419 | the weights. 2420 | """ 2421 | if layer.__class__.__name__ == 'TimeDistributed': 2422 | return self.find_trainable_layer(layer.layer) 2423 | return layer 2424 | 2425 | def get_trainable_layers(self): 2426 | """Returns a list of layers that have weights.""" 2427 | layers = [] 2428 | # Loop through all layers 2429 | for l in self.keras_model.layers: 2430 | # If layer is a wrapper, find inner trainable layer 2431 | l = self.find_trainable_layer(l) 2432 | # Include layer if it has weights 2433 | if l.get_weights(): 2434 | layers.append(l) 2435 | return layers 2436 | 2437 | def run_graph(self, images, outputs): 2438 | """Runs a sub-set of the computation graph that computes the given 2439 | outputs. 2440 | 2441 | outputs: List of tuples (name, tensor) to compute. The tensors are 2442 | symbolic TensorFlow tensors and the names are for easy tracking. 2443 | 2444 | Returns an ordered dict of results. Keys are the names received in the 2445 | input and values are Numpy arrays. 2446 | """ 2447 | model = self.keras_model 2448 | 2449 | # Organize desired outputs into an ordered dict 2450 | outputs = OrderedDict(outputs) 2451 | for o in outputs.values(): 2452 | assert o is not None 2453 | 2454 | # Build a Keras function to run parts of the computation graph 2455 | inputs = model.inputs 2456 | if model.uses_learning_phase and not isinstance(K.learning_phase(), int): 2457 | inputs += [K.learning_phase()] 2458 | kf = K.function(model.inputs, list(outputs.values())) 2459 | 2460 | # Run inference 2461 | molded_images, image_metas, windows = self.mold_inputs(images) 2462 | # TODO: support training mode? 2463 | # if TEST_MODE == "training": 2464 | # model_in = [molded_images, image_metas, 2465 | # target_rpn_match, target_rpn_bbox, 2466 | # gt_boxes, gt_masks] 2467 | # if not config.USE_RPN_ROIS: 2468 | # model_in.append(target_rois) 2469 | # if model.uses_learning_phase and not isinstance(K.learning_phase(), int): 2470 | # model_in.append(1.) 2471 | # outputs_np = kf(model_in) 2472 | # else: 2473 | 2474 | model_in = [molded_images, image_metas] 2475 | if model.uses_learning_phase and not isinstance(K.learning_phase(), int): 2476 | model_in.append(0.) 2477 | outputs_np = kf(model_in) 2478 | 2479 | # Pack the generated Numpy arrays into a a dict and log the results. 2480 | outputs_np = OrderedDict([(k, v) 2481 | for k, v in zip(outputs.keys(), outputs_np)]) 2482 | for k, v in outputs_np.items(): 2483 | log(k, v) 2484 | return outputs_np 2485 | 2486 | 2487 | ############################################################ 2488 | # Data Formatting 2489 | ############################################################ 2490 | 2491 | def compose_image_meta(image_id, image_shape, window, active_class_ids): 2492 | """Takes attributes of an image and puts them in one 1D array. 2493 | 2494 | image_id: An int ID of the image. Useful for debugging. 2495 | image_shape: [height, width, channels] 2496 | window: (y1, x1, y2, x2) in pixels. The area of the image where the real 2497 | image is (excluding the padding) 2498 | active_class_ids: List of class_ids available in the dataset from which 2499 | the image came. Useful if training on images from multiple datasets 2500 | where not all classes are present in all datasets. 2501 | """ 2502 | meta = np.array( 2503 | [image_id] + # size=1 2504 | list(image_shape) + # size=3 2505 | list(window) + # size=4 (y1, x1, y2, x2) in image cooredinates 2506 | list(active_class_ids) # size=num_classes 2507 | ) 2508 | return meta 2509 | 2510 | 2511 | def parse_image_meta_graph(meta): 2512 | """Parses a tensor that contains image attributes to its components. 2513 | See compose_image_meta() for more details. 2514 | 2515 | meta: [batch, meta length] where meta length depends on NUM_CLASSES 2516 | """ 2517 | image_id = meta[:, 0] 2518 | image_shape = meta[:, 1:4] 2519 | window = meta[:, 4:8] # (y1, x1, y2, x2) window of image in in pixels 2520 | active_class_ids = meta[:, 8:] 2521 | return [image_id, image_shape, window, active_class_ids] 2522 | 2523 | 2524 | def mold_image(images, config): 2525 | """Takes RGB images with 0-255 values and subtraces 2526 | the mean pixel and converts it to float. Expects image 2527 | colors in RGB order. 2528 | """ 2529 | return images.astype(np.float32) - config.MEAN_PIXEL 2530 | 2531 | 2532 | def unmold_image(normalized_images, config): 2533 | """Takes a image normalized with mold() and returns the original.""" 2534 | return (normalized_images + config.MEAN_PIXEL).astype(np.uint8) 2535 | 2536 | 2537 | ############################################################ 2538 | # Miscellenous Graph Functions 2539 | ############################################################ 2540 | 2541 | def trim_zeros_graph(boxes, name=None): 2542 | """Often boxes are represented with matricies of shape [N, 4] and 2543 | are padded with zeros. This removes zero boxes. 2544 | 2545 | boxes: [N, 4] matrix of boxes. 2546 | non_zeros: [N] a 1D boolean mask identifying the rows to keep 2547 | """ 2548 | non_zeros = tf.cast(tf.reduce_sum(tf.abs(boxes), axis=1), tf.bool) 2549 | boxes = tf.boolean_mask(boxes, non_zeros, name=name) 2550 | return boxes, non_zeros 2551 | 2552 | 2553 | def batch_pack_graph(x, counts, num_rows): 2554 | """Picks different number of values from each row 2555 | in x depending on the values in counts. 2556 | """ 2557 | outputs = [] 2558 | for i in range(num_rows): 2559 | outputs.append(x[i, :counts[i]]) 2560 | return tf.concat(outputs, axis=0) 2561 | -------------------------------------------------------------------------------- /person_blocker.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import argparse 4 | import numpy as np 5 | import coco 6 | import utils 7 | import model as modellib 8 | from classes import get_class_names, InferenceConfig 9 | from ast import literal_eval as make_tuple 10 | import imageio 11 | import visualize 12 | 13 | # Creates a color layer and adds Gaussian noise. 14 | # For each pixel, the same noise value is added to each channel 15 | # to mitigate hue shfting. 16 | 17 | 18 | def create_noisy_color(image, color): 19 | color_mask = np.full(shape=(image.shape[0], image.shape[1], 3), 20 | fill_value=color) 21 | 22 | noise = np.random.normal(0, 25, (image.shape[0], image.shape[1])) 23 | noise = np.repeat(np.expand_dims(noise, axis=2), repeats=3, axis=2) 24 | mask_noise = np.clip(color_mask + noise, 0., 255.) 25 | return mask_noise 26 | 27 | 28 | # Helper function to allow both RGB triplet + hex CL input 29 | 30 | def string_to_rgb_triplet(triplet): 31 | 32 | if '#' in triplet: 33 | # http://stackoverflow.com/a/4296727 34 | triplet = triplet.lstrip('#') 35 | _NUMERALS = '0123456789abcdefABCDEF' 36 | _HEXDEC = {v: int(v, 16) 37 | for v in (x + y for x in _NUMERALS for y in _NUMERALS)} 38 | return (_HEXDEC[triplet[0:2]], _HEXDEC[triplet[2:4]], 39 | _HEXDEC[triplet[4:6]]) 40 | 41 | else: 42 | # https://stackoverflow.com/a/9763133 43 | triplet = make_tuple(triplet) 44 | return triplet 45 | 46 | 47 | def person_blocker(args): 48 | 49 | # Required to load model, but otherwise unused 50 | ROOT_DIR = os.getcwd() 51 | COCO_MODEL_PATH = args.model or os.path.join(ROOT_DIR, "mask_rcnn_coco.h5") 52 | 53 | MODEL_DIR = os.path.join(ROOT_DIR, "logs") # Required to load model 54 | 55 | if not os.path.exists(COCO_MODEL_PATH): 56 | utils.download_trained_weights(COCO_MODEL_PATH) 57 | 58 | # Load model and config 59 | config = InferenceConfig() 60 | model = modellib.MaskRCNN(mode="inference", 61 | model_dir=MODEL_DIR, config=config) 62 | model.load_weights(COCO_MODEL_PATH, by_name=True) 63 | 64 | image = imageio.imread(args.image) 65 | 66 | # Create masks for all objects 67 | results = model.detect([image], verbose=0) 68 | r = results[0] 69 | 70 | if args.labeled: 71 | position_ids = ['[{}]'.format(x) 72 | for x in range(r['class_ids'].shape[0])] 73 | visualize.display_instances(image, r['rois'], 74 | r['masks'], r['class_ids'], 75 | get_class_names(), position_ids) 76 | sys.exit() 77 | 78 | # Filter masks to only the selected objects 79 | objects = np.array(args.objects) 80 | 81 | # Object IDs: 82 | if np.all(np.chararray.isnumeric(objects)): 83 | object_indices = objects.astype(int) 84 | # Types of objects: 85 | else: 86 | selected_class_ids = np.flatnonzero(np.in1d(get_class_names(), 87 | objects)) 88 | object_indices = np.flatnonzero( 89 | np.in1d(r['class_ids'], selected_class_ids)) 90 | 91 | mask_selected = np.sum(r['masks'][:, :, object_indices], axis=2) 92 | 93 | # Replace object masks with noise 94 | mask_color = string_to_rgb_triplet(args.color) 95 | image_masked = image.copy() 96 | noisy_color = create_noisy_color(image, mask_color) 97 | image_masked[mask_selected > 0] = noisy_color[mask_selected > 0] 98 | 99 | imageio.imwrite('person_blocked.png', image_masked) 100 | 101 | # Create GIF. The noise will be random for each frame, 102 | # which creates a "static" effect 103 | 104 | images = [image_masked] 105 | num_images = 10 # should be a divisor of 30 106 | 107 | for _ in range(num_images - 1): 108 | new_image = image.copy() 109 | noisy_color = create_noisy_color(image, mask_color) 110 | new_image[mask_selected > 0] = noisy_color[mask_selected > 0] 111 | images.append(new_image) 112 | 113 | imageio.mimsave('person_blocked.gif', images, fps=30., subrectangles=True) 114 | 115 | 116 | if __name__ == '__main__': 117 | parser = argparse.ArgumentParser( 118 | description='Person Blocker - Automatically "block" people ' 119 | 'in images using a neural network.') 120 | parser.add_argument('-i', '--image', help='Image file name.', 121 | required=False) 122 | parser.add_argument( 123 | '-m', '--model', help='path to COCO model', default=None) 124 | parser.add_argument('-o', 125 | '--objects', nargs='+', 126 | help='object(s)/object ID(s) to block. ' + 127 | 'Use the -names flag to print a list of ' + 128 | 'valid objects', 129 | default='person') 130 | parser.add_argument('-c', 131 | '--color', nargs='?', default='(255, 255, 255)', 132 | help='color of the "block"') 133 | parser.add_argument('-l', 134 | '--labeled', dest='labeled', 135 | action='store_true', 136 | help='generate labeled image instead') 137 | parser.add_argument('-n', 138 | '--names', dest='names', 139 | action='store_true', 140 | help='prints class names and exits.') 141 | parser.set_defaults(labeled=False, names=False) 142 | args = parser.parse_args() 143 | 144 | if args.names: 145 | print(get_class_names()) 146 | sys.exit() 147 | 148 | person_blocker(args) 149 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | h5py 2 | imageio 3 | ipython 4 | keras 5 | scipy 6 | scikit-image 7 | tensorflow 8 | -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | Mask R-CNN 3 | Common utility functions and classes. 4 | 5 | Copyright (c) 2017 Matterport, Inc. 6 | Licensed under the MIT License (see LICENSE for details) 7 | Written by Waleed Abdulla 8 | """ 9 | 10 | import sys 11 | import os 12 | import math 13 | import random 14 | import numpy as np 15 | import tensorflow as tf 16 | import scipy.misc 17 | import skimage.color 18 | import skimage.io 19 | import urllib.request 20 | import shutil 21 | 22 | # URL from which to download the latest COCO trained weights 23 | COCO_MODEL_URL = "https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5" 24 | 25 | 26 | ############################################################ 27 | # Bounding Boxes 28 | ############################################################ 29 | 30 | def extract_bboxes(mask): 31 | """Compute bounding boxes from masks. 32 | mask: [height, width, num_instances]. Mask pixels are either 1 or 0. 33 | 34 | Returns: bbox array [num_instances, (y1, x1, y2, x2)]. 35 | """ 36 | boxes = np.zeros([mask.shape[-1], 4], dtype=np.int32) 37 | for i in range(mask.shape[-1]): 38 | m = mask[:, :, i] 39 | # Bounding box. 40 | horizontal_indicies = np.where(np.any(m, axis=0))[0] 41 | vertical_indicies = np.where(np.any(m, axis=1))[0] 42 | if horizontal_indicies.shape[0]: 43 | x1, x2 = horizontal_indicies[[0, -1]] 44 | y1, y2 = vertical_indicies[[0, -1]] 45 | # x2 and y2 should not be part of the box. Increment by 1. 46 | x2 += 1 47 | y2 += 1 48 | else: 49 | # No mask for this instance. Might happen due to 50 | # resizing or cropping. Set bbox to zeros 51 | x1, x2, y1, y2 = 0, 0, 0, 0 52 | boxes[i] = np.array([y1, x1, y2, x2]) 53 | return boxes.astype(np.int32) 54 | 55 | 56 | def compute_iou(box, boxes, box_area, boxes_area): 57 | """Calculates IoU of the given box with the array of the given boxes. 58 | box: 1D vector [y1, x1, y2, x2] 59 | boxes: [boxes_count, (y1, x1, y2, x2)] 60 | box_area: float. the area of 'box' 61 | boxes_area: array of length boxes_count. 62 | 63 | Note: the areas are passed in rather than calculated here for 64 | efficency. Calculate once in the caller to avoid duplicate work. 65 | """ 66 | # Calculate intersection areas 67 | y1 = np.maximum(box[0], boxes[:, 0]) 68 | y2 = np.minimum(box[2], boxes[:, 2]) 69 | x1 = np.maximum(box[1], boxes[:, 1]) 70 | x2 = np.minimum(box[3], boxes[:, 3]) 71 | intersection = np.maximum(x2 - x1, 0) * np.maximum(y2 - y1, 0) 72 | union = box_area + boxes_area[:] - intersection[:] 73 | iou = intersection / union 74 | return iou 75 | 76 | 77 | def compute_overlaps(boxes1, boxes2): 78 | """Computes IoU overlaps between two sets of boxes. 79 | boxes1, boxes2: [N, (y1, x1, y2, x2)]. 80 | 81 | For better performance, pass the largest set first and the smaller second. 82 | """ 83 | # Areas of anchors and GT boxes 84 | area1 = (boxes1[:, 2] - boxes1[:, 0]) * (boxes1[:, 3] - boxes1[:, 1]) 85 | area2 = (boxes2[:, 2] - boxes2[:, 0]) * (boxes2[:, 3] - boxes2[:, 1]) 86 | 87 | # Compute overlaps to generate matrix [boxes1 count, boxes2 count] 88 | # Each cell contains the IoU value. 89 | overlaps = np.zeros((boxes1.shape[0], boxes2.shape[0])) 90 | for i in range(overlaps.shape[1]): 91 | box2 = boxes2[i] 92 | overlaps[:, i] = compute_iou(box2, boxes1, area2[i], area1) 93 | return overlaps 94 | 95 | 96 | def compute_overlaps_masks(masks1, masks2): 97 | '''Computes IoU overlaps between two sets of masks. 98 | masks1, masks2: [Height, Width, instances] 99 | ''' 100 | # flatten masks 101 | masks1 = np.reshape(masks1 > .5, (-1, masks1.shape[-1])).astype(np.float32) 102 | masks2 = np.reshape(masks2 > .5, (-1, masks2.shape[-1])).astype(np.float32) 103 | area1 = np.sum(masks1, axis=0) 104 | area2 = np.sum(masks2, axis=0) 105 | 106 | # intersections and union 107 | intersections = np.dot(masks1.T, masks2) 108 | union = area1[:, None] + area2[None, :] - intersections 109 | overlaps = intersections / union 110 | 111 | return overlaps 112 | 113 | 114 | def non_max_suppression(boxes, scores, threshold): 115 | """Performs non-maximum supression and returns indicies of kept boxes. 116 | boxes: [N, (y1, x1, y2, x2)]. Notice that (y2, x2) lays outside the box. 117 | scores: 1-D array of box scores. 118 | threshold: Float. IoU threshold to use for filtering. 119 | """ 120 | assert boxes.shape[0] > 0 121 | if boxes.dtype.kind != "f": 122 | boxes = boxes.astype(np.float32) 123 | 124 | # Compute box areas 125 | y1 = boxes[:, 0] 126 | x1 = boxes[:, 1] 127 | y2 = boxes[:, 2] 128 | x2 = boxes[:, 3] 129 | area = (y2 - y1) * (x2 - x1) 130 | 131 | # Get indicies of boxes sorted by scores (highest first) 132 | ixs = scores.argsort()[::-1] 133 | 134 | pick = [] 135 | while len(ixs) > 0: 136 | # Pick top box and add its index to the list 137 | i = ixs[0] 138 | pick.append(i) 139 | # Compute IoU of the picked box with the rest 140 | iou = compute_iou(boxes[i], boxes[ixs[1:]], area[i], area[ixs[1:]]) 141 | # Identify boxes with IoU over the threshold. This 142 | # returns indicies into ixs[1:], so add 1 to get 143 | # indicies into ixs. 144 | remove_ixs = np.where(iou > threshold)[0] + 1 145 | # Remove indicies of the picked and overlapped boxes. 146 | ixs = np.delete(ixs, remove_ixs) 147 | ixs = np.delete(ixs, 0) 148 | return np.array(pick, dtype=np.int32) 149 | 150 | 151 | def apply_box_deltas(boxes, deltas): 152 | """Applies the given deltas to the given boxes. 153 | boxes: [N, (y1, x1, y2, x2)]. Note that (y2, x2) is outside the box. 154 | deltas: [N, (dy, dx, log(dh), log(dw))] 155 | """ 156 | boxes = boxes.astype(np.float32) 157 | # Convert to y, x, h, w 158 | height = boxes[:, 2] - boxes[:, 0] 159 | width = boxes[:, 3] - boxes[:, 1] 160 | center_y = boxes[:, 0] + 0.5 * height 161 | center_x = boxes[:, 1] + 0.5 * width 162 | # Apply deltas 163 | center_y += deltas[:, 0] * height 164 | center_x += deltas[:, 1] * width 165 | height *= np.exp(deltas[:, 2]) 166 | width *= np.exp(deltas[:, 3]) 167 | # Convert back to y1, x1, y2, x2 168 | y1 = center_y - 0.5 * height 169 | x1 = center_x - 0.5 * width 170 | y2 = y1 + height 171 | x2 = x1 + width 172 | return np.stack([y1, x1, y2, x2], axis=1) 173 | 174 | 175 | def box_refinement_graph(box, gt_box): 176 | """Compute refinement needed to transform box to gt_box. 177 | box and gt_box are [N, (y1, x1, y2, x2)] 178 | """ 179 | box = tf.cast(box, tf.float32) 180 | gt_box = tf.cast(gt_box, tf.float32) 181 | 182 | height = box[:, 2] - box[:, 0] 183 | width = box[:, 3] - box[:, 1] 184 | center_y = box[:, 0] + 0.5 * height 185 | center_x = box[:, 1] + 0.5 * width 186 | 187 | gt_height = gt_box[:, 2] - gt_box[:, 0] 188 | gt_width = gt_box[:, 3] - gt_box[:, 1] 189 | gt_center_y = gt_box[:, 0] + 0.5 * gt_height 190 | gt_center_x = gt_box[:, 1] + 0.5 * gt_width 191 | 192 | dy = (gt_center_y - center_y) / height 193 | dx = (gt_center_x - center_x) / width 194 | dh = tf.log(gt_height / height) 195 | dw = tf.log(gt_width / width) 196 | 197 | result = tf.stack([dy, dx, dh, dw], axis=1) 198 | return result 199 | 200 | 201 | def box_refinement(box, gt_box): 202 | """Compute refinement needed to transform box to gt_box. 203 | box and gt_box are [N, (y1, x1, y2, x2)]. (y2, x2) is 204 | assumed to be outside the box. 205 | """ 206 | box = box.astype(np.float32) 207 | gt_box = gt_box.astype(np.float32) 208 | 209 | height = box[:, 2] - box[:, 0] 210 | width = box[:, 3] - box[:, 1] 211 | center_y = box[:, 0] + 0.5 * height 212 | center_x = box[:, 1] + 0.5 * width 213 | 214 | gt_height = gt_box[:, 2] - gt_box[:, 0] 215 | gt_width = gt_box[:, 3] - gt_box[:, 1] 216 | gt_center_y = gt_box[:, 0] + 0.5 * gt_height 217 | gt_center_x = gt_box[:, 1] + 0.5 * gt_width 218 | 219 | dy = (gt_center_y - center_y) / height 220 | dx = (gt_center_x - center_x) / width 221 | dh = np.log(gt_height / height) 222 | dw = np.log(gt_width / width) 223 | 224 | return np.stack([dy, dx, dh, dw], axis=1) 225 | 226 | 227 | ############################################################ 228 | # Dataset 229 | ############################################################ 230 | 231 | class Dataset(object): 232 | """The base class for dataset classes. 233 | To use it, create a new class that adds functions specific to the dataset 234 | you want to use. For example: 235 | 236 | class CatsAndDogsDataset(Dataset): 237 | def load_cats_and_dogs(self): 238 | ... 239 | def load_mask(self, image_id): 240 | ... 241 | def image_reference(self, image_id): 242 | ... 243 | 244 | See COCODataset and ShapesDataset as examples. 245 | """ 246 | 247 | def __init__(self, class_map=None): 248 | self._image_ids = [] 249 | self.image_info = [] 250 | # Background is always the first class 251 | self.class_info = [{"source": "", "id": 0, "name": "BG"}] 252 | self.source_class_ids = {} 253 | 254 | def add_class(self, source, class_id, class_name): 255 | assert "." not in source, "Source name cannot contain a dot" 256 | # Does the class exist already? 257 | for info in self.class_info: 258 | if info['source'] == source and info["id"] == class_id: 259 | # source.class_id combination already available, skip 260 | return 261 | # Add the class 262 | self.class_info.append({ 263 | "source": source, 264 | "id": class_id, 265 | "name": class_name, 266 | }) 267 | 268 | def add_image(self, source, image_id, path, **kwargs): 269 | image_info = { 270 | "id": image_id, 271 | "source": source, 272 | "path": path, 273 | } 274 | image_info.update(kwargs) 275 | self.image_info.append(image_info) 276 | 277 | def image_reference(self, image_id): 278 | """Return a link to the image in its source Website or details about 279 | the image that help looking it up or debugging it. 280 | 281 | Override for your dataset, but pass to this function 282 | if you encounter images not in your dataset. 283 | """ 284 | return "" 285 | 286 | def prepare(self, class_map=None): 287 | """Prepares the Dataset class for use. 288 | 289 | TODO: class map is not supported yet. When done, it should handle mapping 290 | classes from different datasets to the same class ID. 291 | """ 292 | 293 | def clean_name(name): 294 | """Returns a shorter version of object names for cleaner display.""" 295 | return ",".join(name.split(",")[:1]) 296 | 297 | # Build (or rebuild) everything else from the info dicts. 298 | self.num_classes = len(self.class_info) 299 | self.class_ids = np.arange(self.num_classes) 300 | self.class_names = [clean_name(c["name"]) for c in self.class_info] 301 | self.num_images = len(self.image_info) 302 | self._image_ids = np.arange(self.num_images) 303 | 304 | self.class_from_source_map = {"{}.{}".format(info['source'], info['id']): id 305 | for info, id in zip(self.class_info, self.class_ids)} 306 | 307 | # Map sources to class_ids they support 308 | self.sources = list(set([i['source'] for i in self.class_info])) 309 | self.source_class_ids = {} 310 | # Loop over datasets 311 | for source in self.sources: 312 | self.source_class_ids[source] = [] 313 | # Find classes that belong to this dataset 314 | for i, info in enumerate(self.class_info): 315 | # Include BG class in all datasets 316 | if i == 0 or source == info['source']: 317 | self.source_class_ids[source].append(i) 318 | 319 | def map_source_class_id(self, source_class_id): 320 | """Takes a source class ID and returns the int class ID assigned to it. 321 | 322 | For example: 323 | dataset.map_source_class_id("coco.12") -> 23 324 | """ 325 | return self.class_from_source_map[source_class_id] 326 | 327 | def get_source_class_id(self, class_id, source): 328 | """Map an internal class ID to the corresponding class ID in the source dataset.""" 329 | info = self.class_info[class_id] 330 | assert info['source'] == source 331 | return info['id'] 332 | 333 | def append_data(self, class_info, image_info): 334 | self.external_to_class_id = {} 335 | for i, c in enumerate(self.class_info): 336 | for ds, id in c["map"]: 337 | self.external_to_class_id[ds + str(id)] = i 338 | 339 | # Map external image IDs to internal ones. 340 | self.external_to_image_id = {} 341 | for i, info in enumerate(self.image_info): 342 | self.external_to_image_id[info["ds"] + str(info["id"])] = i 343 | 344 | @property 345 | def image_ids(self): 346 | return self._image_ids 347 | 348 | def source_image_link(self, image_id): 349 | """Returns the path or URL to the image. 350 | Override this to return a URL to the image if it's availble online for easy 351 | debugging. 352 | """ 353 | return self.image_info[image_id]["path"] 354 | 355 | def load_image(self, image_id): 356 | """Load the specified image and return a [H,W,3] Numpy array. 357 | """ 358 | # Load image 359 | image = skimage.io.imread(self.image_info[image_id]['path']) 360 | # If grayscale. Convert to RGB for consistency. 361 | if image.ndim != 3: 362 | image = skimage.color.gray2rgb(image) 363 | return image 364 | 365 | def load_mask(self, image_id): 366 | """Load instance masks for the given image. 367 | 368 | Different datasets use different ways to store masks. Override this 369 | method to load instance masks and return them in the form of am 370 | array of binary masks of shape [height, width, instances]. 371 | 372 | Returns: 373 | masks: A bool array of shape [height, width, instance count] with 374 | a binary mask per instance. 375 | class_ids: a 1D array of class IDs of the instance masks. 376 | """ 377 | # Override this function to load a mask from your dataset. 378 | # Otherwise, it returns an empty mask. 379 | mask = np.empty([0, 0, 0]) 380 | class_ids = np.empty([0], np.int32) 381 | return mask, class_ids 382 | 383 | 384 | def resize_image(image, min_dim=None, max_dim=None, padding=False): 385 | """ 386 | Resizes an image keeping the aspect ratio. 387 | 388 | min_dim: if provided, resizes the image such that it's smaller 389 | dimension == min_dim 390 | max_dim: if provided, ensures that the image longest side doesn't 391 | exceed this value. 392 | padding: If true, pads image with zeros so it's size is max_dim x max_dim 393 | 394 | Returns: 395 | image: the resized image 396 | window: (y1, x1, y2, x2). If max_dim is provided, padding might 397 | be inserted in the returned image. If so, this window is the 398 | coordinates of the image part of the full image (excluding 399 | the padding). The x2, y2 pixels are not included. 400 | scale: The scale factor used to resize the image 401 | padding: Padding added to the image [(top, bottom), (left, right), (0, 0)] 402 | """ 403 | # Default window (y1, x1, y2, x2) and default scale == 1. 404 | h, w = image.shape[:2] 405 | window = (0, 0, h, w) 406 | scale = 1 407 | 408 | # Scale? 409 | if min_dim: 410 | # Scale up but not down 411 | scale = max(1, min_dim / min(h, w)) 412 | # Does it exceed max dim? 413 | if max_dim: 414 | image_max = max(h, w) 415 | if round(image_max * scale) > max_dim: 416 | scale = max_dim / image_max 417 | # Resize image and mask 418 | if scale != 1: 419 | image = scipy.misc.imresize( 420 | image, (round(h * scale), round(w * scale))) 421 | # Need padding? 422 | if padding: 423 | # Get new height and width 424 | h, w = image.shape[:2] 425 | top_pad = (max_dim - h) // 2 426 | bottom_pad = max_dim - h - top_pad 427 | left_pad = (max_dim - w) // 2 428 | right_pad = max_dim - w - left_pad 429 | padding = [(top_pad, bottom_pad), (left_pad, right_pad), (0, 0)] 430 | image = np.pad(image, padding, mode='constant', constant_values=0) 431 | window = (top_pad, left_pad, h + top_pad, w + left_pad) 432 | return image, window, scale, padding 433 | 434 | 435 | def resize_mask(mask, scale, padding): 436 | """Resizes a mask using the given scale and padding. 437 | Typically, you get the scale and padding from resize_image() to 438 | ensure both, the image and the mask, are resized consistently. 439 | 440 | scale: mask scaling factor 441 | padding: Padding to add to the mask in the form 442 | [(top, bottom), (left, right), (0, 0)] 443 | """ 444 | h, w = mask.shape[:2] 445 | mask = scipy.ndimage.zoom(mask, zoom=[scale, scale, 1], order=0) 446 | mask = np.pad(mask, padding, mode='constant', constant_values=0) 447 | return mask 448 | 449 | 450 | def minimize_mask(bbox, mask, mini_shape): 451 | """Resize masks to a smaller version to cut memory load. 452 | Mini-masks can then resized back to image scale using expand_masks() 453 | 454 | See inspect_data.ipynb notebook for more details. 455 | """ 456 | mini_mask = np.zeros(mini_shape + (mask.shape[-1],), dtype=bool) 457 | for i in range(mask.shape[-1]): 458 | m = mask[:, :, i] 459 | y1, x1, y2, x2 = bbox[i][:4] 460 | m = m[y1:y2, x1:x2] 461 | if m.size == 0: 462 | raise Exception("Invalid bounding box with area of zero") 463 | m = scipy.misc.imresize(m.astype(float), mini_shape, interp='bilinear') 464 | mini_mask[:, :, i] = np.where(m >= 128, 1, 0) 465 | return mini_mask 466 | 467 | 468 | def expand_mask(bbox, mini_mask, image_shape): 469 | """Resizes mini masks back to image size. Reverses the change 470 | of minimize_mask(). 471 | 472 | See inspect_data.ipynb notebook for more details. 473 | """ 474 | mask = np.zeros(image_shape[:2] + (mini_mask.shape[-1],), dtype=bool) 475 | for i in range(mask.shape[-1]): 476 | m = mini_mask[:, :, i] 477 | y1, x1, y2, x2 = bbox[i][:4] 478 | h = y2 - y1 479 | w = x2 - x1 480 | m = scipy.misc.imresize(m.astype(float), (h, w), interp='bilinear') 481 | mask[y1:y2, x1:x2, i] = np.where(m >= 128, 1, 0) 482 | return mask 483 | 484 | 485 | # TODO: Build and use this function to reduce code duplication 486 | def mold_mask(mask, config): 487 | pass 488 | 489 | 490 | def unmold_mask(mask, bbox, image_shape): 491 | """Converts a mask generated by the neural network into a format similar 492 | to it's original shape. 493 | mask: [height, width] of type float. A small, typically 28x28 mask. 494 | bbox: [y1, x1, y2, x2]. The box to fit the mask in. 495 | 496 | Returns a binary mask with the same size as the original image. 497 | """ 498 | threshold = 0.5 499 | y1, x1, y2, x2 = bbox 500 | mask = scipy.misc.imresize( 501 | mask, (y2 - y1, x2 - x1), interp='bilinear').astype(np.float32) / 255.0 502 | mask = np.where(mask >= threshold, 1, 0).astype(np.uint8) 503 | 504 | # Put the mask in the right location. 505 | full_mask = np.zeros(image_shape[:2], dtype=np.uint8) 506 | full_mask[y1:y2, x1:x2] = mask 507 | return full_mask 508 | 509 | 510 | ############################################################ 511 | # Anchors 512 | ############################################################ 513 | 514 | def generate_anchors(scales, ratios, shape, feature_stride, anchor_stride): 515 | """ 516 | scales: 1D array of anchor sizes in pixels. Example: [32, 64, 128] 517 | ratios: 1D array of anchor ratios of width/height. Example: [0.5, 1, 2] 518 | shape: [height, width] spatial shape of the feature map over which 519 | to generate anchors. 520 | feature_stride: Stride of the feature map relative to the image in pixels. 521 | anchor_stride: Stride of anchors on the feature map. For example, if the 522 | value is 2 then generate anchors for every other feature map pixel. 523 | """ 524 | # Get all combinations of scales and ratios 525 | scales, ratios = np.meshgrid(np.array(scales), np.array(ratios)) 526 | scales = scales.flatten() 527 | ratios = ratios.flatten() 528 | 529 | # Enumerate heights and widths from scales and ratios 530 | heights = scales / np.sqrt(ratios) 531 | widths = scales * np.sqrt(ratios) 532 | 533 | # Enumerate shifts in feature space 534 | shifts_y = np.arange(0, shape[0], anchor_stride) * feature_stride 535 | shifts_x = np.arange(0, shape[1], anchor_stride) * feature_stride 536 | shifts_x, shifts_y = np.meshgrid(shifts_x, shifts_y) 537 | 538 | # Enumerate combinations of shifts, widths, and heights 539 | box_widths, box_centers_x = np.meshgrid(widths, shifts_x) 540 | box_heights, box_centers_y = np.meshgrid(heights, shifts_y) 541 | 542 | # Reshape to get a list of (y, x) and a list of (h, w) 543 | box_centers = np.stack( 544 | [box_centers_y, box_centers_x], axis=2).reshape([-1, 2]) 545 | box_sizes = np.stack([box_heights, box_widths], axis=2).reshape([-1, 2]) 546 | 547 | # Convert to corner coordinates (y1, x1, y2, x2) 548 | boxes = np.concatenate([box_centers - 0.5 * box_sizes, 549 | box_centers + 0.5 * box_sizes], axis=1) 550 | return boxes 551 | 552 | 553 | def generate_pyramid_anchors(scales, ratios, feature_shapes, feature_strides, 554 | anchor_stride): 555 | """Generate anchors at different levels of a feature pyramid. Each scale 556 | is associated with a level of the pyramid, but each ratio is used in 557 | all levels of the pyramid. 558 | 559 | Returns: 560 | anchors: [N, (y1, x1, y2, x2)]. All generated anchors in one array. Sorted 561 | with the same order of the given scales. So, anchors of scale[0] come 562 | first, then anchors of scale[1], and so on. 563 | """ 564 | # Anchors 565 | # [anchor_count, (y1, x1, y2, x2)] 566 | anchors = [] 567 | for i in range(len(scales)): 568 | anchors.append(generate_anchors(scales[i], ratios, feature_shapes[i], 569 | feature_strides[i], anchor_stride)) 570 | return np.concatenate(anchors, axis=0) 571 | 572 | 573 | ############################################################ 574 | # Miscellaneous 575 | ############################################################ 576 | 577 | def trim_zeros(x): 578 | """It's common to have tensors larger than the available data and 579 | pad with zeros. This function removes rows that are all zeros. 580 | 581 | x: [rows, columns]. 582 | """ 583 | assert len(x.shape) == 2 584 | return x[~np.all(x == 0, axis=1)] 585 | 586 | 587 | def compute_ap(gt_boxes, gt_class_ids, gt_masks, 588 | pred_boxes, pred_class_ids, pred_scores, pred_masks, 589 | iou_threshold=0.5): 590 | """Compute Average Precision at a set IoU threshold (default 0.5). 591 | 592 | Returns: 593 | mAP: Mean Average Precision 594 | precisions: List of precisions at different class score thresholds. 595 | recalls: List of recall values at different class score thresholds. 596 | overlaps: [pred_boxes, gt_boxes] IoU overlaps. 597 | """ 598 | # Trim zero padding and sort predictions by score from high to low 599 | # TODO: cleaner to do zero unpadding upstream 600 | gt_boxes = trim_zeros(gt_boxes) 601 | gt_masks = gt_masks[..., :gt_boxes.shape[0]] 602 | pred_boxes = trim_zeros(pred_boxes) 603 | pred_scores = pred_scores[:pred_boxes.shape[0]] 604 | indices = np.argsort(pred_scores)[::-1] 605 | pred_boxes = pred_boxes[indices] 606 | pred_class_ids = pred_class_ids[indices] 607 | pred_scores = pred_scores[indices] 608 | pred_masks = pred_masks[..., indices] 609 | 610 | # Compute IoU overlaps [pred_masks, gt_masks] 611 | overlaps = compute_overlaps_masks(pred_masks, gt_masks) 612 | 613 | # Loop through ground truth boxes and find matching predictions 614 | match_count = 0 615 | pred_match = np.zeros([pred_boxes.shape[0]]) 616 | gt_match = np.zeros([gt_boxes.shape[0]]) 617 | for i in range(len(pred_boxes)): 618 | # Find best matching ground truth box 619 | sorted_ixs = np.argsort(overlaps[i])[::-1] 620 | for j in sorted_ixs: 621 | # If ground truth box is already matched, go to next one 622 | if gt_match[j] == 1: 623 | continue 624 | # If we reach IoU smaller than the threshold, end the loop 625 | iou = overlaps[i, j] 626 | if iou < iou_threshold: 627 | break 628 | # Do we have a match? 629 | if pred_class_ids[i] == gt_class_ids[j]: 630 | match_count += 1 631 | gt_match[j] = 1 632 | pred_match[i] = 1 633 | break 634 | 635 | # Compute precision and recall at each prediction box step 636 | precisions = np.cumsum(pred_match) / (np.arange(len(pred_match)) + 1) 637 | recalls = np.cumsum(pred_match).astype(np.float32) / len(gt_match) 638 | 639 | # Pad with start and end values to simplify the math 640 | precisions = np.concatenate([[0], precisions, [0]]) 641 | recalls = np.concatenate([[0], recalls, [1]]) 642 | 643 | # Ensure precision values decrease but don't increase. This way, the 644 | # precision value at each recall threshold is the maximum it can be 645 | # for all following recall thresholds, as specified by the VOC paper. 646 | for i in range(len(precisions) - 2, -1, -1): 647 | precisions[i] = np.maximum(precisions[i], precisions[i + 1]) 648 | 649 | # Compute mean AP over recall range 650 | indices = np.where(recalls[:-1] != recalls[1:])[0] + 1 651 | mAP = np.sum((recalls[indices] - recalls[indices - 1]) * 652 | precisions[indices]) 653 | 654 | return mAP, precisions, recalls, overlaps 655 | 656 | 657 | def compute_recall(pred_boxes, gt_boxes, iou): 658 | """Compute the recall at the given IoU threshold. It's an indication 659 | of how many GT boxes were found by the given prediction boxes. 660 | 661 | pred_boxes: [N, (y1, x1, y2, x2)] in image coordinates 662 | gt_boxes: [N, (y1, x1, y2, x2)] in image coordinates 663 | """ 664 | # Measure overlaps 665 | overlaps = compute_overlaps(pred_boxes, gt_boxes) 666 | iou_max = np.max(overlaps, axis=1) 667 | iou_argmax = np.argmax(overlaps, axis=1) 668 | positive_ids = np.where(iou_max >= iou)[0] 669 | matched_gt_boxes = iou_argmax[positive_ids] 670 | 671 | recall = len(set(matched_gt_boxes)) / gt_boxes.shape[0] 672 | return recall, positive_ids 673 | 674 | 675 | # ## Batch Slicing 676 | # Some custom layers support a batch size of 1 only, and require a lot of work 677 | # to support batches greater than 1. This function slices an input tensor 678 | # across the batch dimension and feeds batches of size 1. Effectively, 679 | # an easy way to support batches > 1 quickly with little code modification. 680 | # In the long run, it's more efficient to modify the code to support large 681 | # batches and getting rid of this function. Consider this a temporary solution 682 | def batch_slice(inputs, graph_fn, batch_size, names=None): 683 | """Splits inputs into slices and feeds each slice to a copy of the given 684 | computation graph and then combines the results. It allows you to run a 685 | graph on a batch of inputs even if the graph is written to support one 686 | instance only. 687 | 688 | inputs: list of tensors. All must have the same first dimension length 689 | graph_fn: A function that returns a TF tensor that's part of a graph. 690 | batch_size: number of slices to divide the data into. 691 | names: If provided, assigns names to the resulting tensors. 692 | """ 693 | if not isinstance(inputs, list): 694 | inputs = [inputs] 695 | 696 | outputs = [] 697 | for i in range(batch_size): 698 | inputs_slice = [x[i] for x in inputs] 699 | output_slice = graph_fn(*inputs_slice) 700 | if not isinstance(output_slice, (tuple, list)): 701 | output_slice = [output_slice] 702 | outputs.append(output_slice) 703 | # Change outputs from a list of slices where each is 704 | # a list of outputs to a list of outputs and each has 705 | # a list of slices 706 | outputs = list(zip(*outputs)) 707 | 708 | if names is None: 709 | names = [None] * len(outputs) 710 | 711 | result = [tf.stack(o, axis=0, name=n) 712 | for o, n in zip(outputs, names)] 713 | if len(result) == 1: 714 | result = result[0] 715 | 716 | return result 717 | 718 | 719 | def download_trained_weights(coco_model_path, verbose=1): 720 | """Download COCO trained weights from Releases. 721 | 722 | coco_model_path: local path of COCO trained weights 723 | """ 724 | if verbose > 0: 725 | print("Downloading pretrained model to " + coco_model_path + " ...") 726 | with urllib.request.urlopen(COCO_MODEL_URL) as resp, open(coco_model_path, 'wb') as out: 727 | shutil.copyfileobj(resp, out) 728 | if verbose > 0: 729 | print("... done downloading pretrained model!") 730 | -------------------------------------------------------------------------------- /visualize.py: -------------------------------------------------------------------------------- 1 | """ 2 | Mask R-CNN 3 | Display and Visualization Functions. 4 | 5 | Copyright (c) 2017 Matterport, Inc. 6 | Licensed under the MIT License (see LICENSE for details) 7 | Written by Waleed Abdulla 8 | """ 9 | 10 | import random 11 | import itertools 12 | import colorsys 13 | import numpy as np 14 | from skimage.measure import find_contours 15 | import matplotlib.pyplot as plt 16 | import matplotlib.patches as patches 17 | import matplotlib.lines as lines 18 | from matplotlib.patches import Polygon 19 | import IPython.display 20 | 21 | import utils 22 | 23 | 24 | ############################################################ 25 | # Visualization 26 | ############################################################ 27 | 28 | def display_images(images, titles=None, cols=4, cmap=None, norm=None, 29 | interpolation=None): 30 | """Display the given set of images, optionally with titles. 31 | images: list or array of image tensors in HWC format. 32 | titles: optional. A list of titles to display with each image. 33 | cols: number of images per row 34 | cmap: Optional. Color map to use. For example, "Blues". 35 | norm: Optional. A Normalize instance to map values to colors. 36 | interpolation: Optional. Image interporlation to use for display. 37 | """ 38 | titles = titles if titles is not None else [""] * len(images) 39 | rows = len(images) // cols + 1 40 | plt.figure(figsize=(14, 14 * rows // cols)) 41 | i = 1 42 | for image, title in zip(images, titles): 43 | plt.subplot(rows, cols, i) 44 | plt.title(title, fontsize=9) 45 | plt.axis('off') 46 | plt.imshow(image.astype(np.uint8), cmap=cmap, 47 | norm=norm, interpolation=interpolation) 48 | i += 1 49 | plt.show() 50 | 51 | 52 | def random_colors(N, bright=True): 53 | """ 54 | Generate random colors. 55 | To get visually distinct colors, generate them in HSV space then 56 | convert to RGB. 57 | """ 58 | brightness = 1.0 if bright else 0.7 59 | hsv = [(i / N, 1, brightness) for i in range(N)] 60 | colors = list(map(lambda c: colorsys.hsv_to_rgb(*c), hsv)) 61 | random.shuffle(colors) 62 | return colors 63 | 64 | 65 | def apply_mask(image, mask, color, alpha=0.5): 66 | """Apply the given mask to the image. 67 | """ 68 | for c in range(3): 69 | image[:, :, c] = np.where(mask == 1, 70 | image[:, :, c] * 71 | (1 - alpha) + alpha * color[c] * 255, 72 | image[:, :, c]) 73 | return image 74 | 75 | 76 | def display_instances(image, boxes, masks, class_ids, class_names, 77 | scores=None, title="", 78 | figsize=(16, 16), ax=None): 79 | """ 80 | boxes: [num_instance, (y1, x1, y2, x2, class_id)] in image coordinates. 81 | masks: [height, width, num_instances] 82 | class_ids: [num_instances] 83 | class_names: list of class names of the dataset 84 | scores: (optional) confidence scores for each box 85 | figsize: (optional) the size of the image. 86 | """ 87 | # Number of instances 88 | N = boxes.shape[0] 89 | if not N: 90 | print("\n*** No instances to display *** \n") 91 | else: 92 | assert boxes.shape[0] == masks.shape[-1] == class_ids.shape[0] 93 | 94 | if not ax: 95 | _, ax = plt.subplots(1, figsize=figsize) 96 | 97 | # Generate random colors 98 | colors = random_colors(N) 99 | 100 | # Show area outside image boundaries. 101 | height, width = image.shape[:2] 102 | ax.set_ylim(height + 10, -10) 103 | ax.set_xlim(-10, width + 10) 104 | ax.axis('off') 105 | ax.set_title(title) 106 | 107 | masked_image = image.astype(np.uint32).copy() 108 | for i in range(N): 109 | color = colors[i] 110 | 111 | # Bounding box 112 | if not np.any(boxes[i]): 113 | # Skip this instance. Has no bbox. Likely lost in image cropping. 114 | continue 115 | y1, x1, y2, x2 = boxes[i] 116 | p = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2, 117 | alpha=0.7, linestyle="dashed", 118 | edgecolor=color, facecolor='none') 119 | ax.add_patch(p) 120 | 121 | # Label 122 | class_id = class_ids[i] 123 | score = scores[i] if scores is not None else None 124 | label = class_names[class_id] 125 | x = random.randint(x1, (x1 + x2) // 2) 126 | caption = "{} {}".format(label, score) if score else label 127 | ax.text(x1, y1 + 8, caption, 128 | color='w', size=11, backgroundcolor="none") 129 | 130 | # Mask 131 | mask = masks[:, :, i] 132 | masked_image = apply_mask(masked_image, mask, color) 133 | 134 | # Mask Polygon 135 | # Pad to ensure proper polygons for masks that touch image edges. 136 | padded_mask = np.zeros( 137 | (mask.shape[0] + 2, mask.shape[1] + 2), dtype=np.uint8) 138 | padded_mask[1:-1, 1:-1] = mask 139 | contours = find_contours(padded_mask, 0.5) 140 | for verts in contours: 141 | # Subtract the padding and flip (y, x) to (x, y) 142 | verts = np.fliplr(verts) - 1 143 | p = Polygon(verts, facecolor="none", edgecolor=color) 144 | ax.add_patch(p) 145 | 146 | ax.imshow(masked_image.astype(np.uint8)) 147 | #plt.show() 148 | plt.savefig('person_blocked_labels.png', bbox_inches='tight') 149 | 150 | 151 | def draw_rois(image, rois, refined_rois, mask, class_ids, class_names, limit=10): 152 | """ 153 | anchors: [n, (y1, x1, y2, x2)] list of anchors in image coordinates. 154 | proposals: [n, 4] the same anchors but refined to fit objects better. 155 | """ 156 | masked_image = image.copy() 157 | 158 | # Pick random anchors in case there are too many. 159 | ids = np.arange(rois.shape[0], dtype=np.int32) 160 | ids = np.random.choice( 161 | ids, limit, replace=False) if ids.shape[0] > limit else ids 162 | 163 | fig, ax = plt.subplots(1, figsize=(12, 12)) 164 | if rois.shape[0] > limit: 165 | plt.title("Showing {} random ROIs out of {}".format( 166 | len(ids), rois.shape[0])) 167 | else: 168 | plt.title("{} ROIs".format(len(ids))) 169 | 170 | # Show area outside image boundaries. 171 | ax.set_ylim(image.shape[0] + 20, -20) 172 | ax.set_xlim(-50, image.shape[1] + 20) 173 | ax.axis('off') 174 | 175 | for i, id in enumerate(ids): 176 | color = np.random.rand(3) 177 | class_id = class_ids[id] 178 | # ROI 179 | y1, x1, y2, x2 = rois[id] 180 | p = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2, 181 | edgecolor=color if class_id else "gray", 182 | facecolor='none', linestyle="dashed") 183 | ax.add_patch(p) 184 | # Refined ROI 185 | if class_id: 186 | ry1, rx1, ry2, rx2 = refined_rois[id] 187 | p = patches.Rectangle((rx1, ry1), rx2 - rx1, ry2 - ry1, linewidth=2, 188 | edgecolor=color, facecolor='none') 189 | ax.add_patch(p) 190 | # Connect the top-left corners of the anchor and proposal for easy visualization 191 | ax.add_line(lines.Line2D([x1, rx1], [y1, ry1], color=color)) 192 | 193 | # Label 194 | label = class_names[class_id] 195 | ax.text(rx1, ry1 + 8, "{}".format(label), 196 | color='w', size=11, backgroundcolor="none") 197 | 198 | # Mask 199 | m = utils.unmold_mask(mask[id], rois[id] 200 | [:4].astype(np.int32), image.shape) 201 | masked_image = apply_mask(masked_image, m, color) 202 | 203 | ax.imshow(masked_image) 204 | 205 | # Print stats 206 | print("Positive ROIs: ", class_ids[class_ids > 0].shape[0]) 207 | print("Negative ROIs: ", class_ids[class_ids == 0].shape[0]) 208 | print("Positive Ratio: {:.2f}".format( 209 | class_ids[class_ids > 0].shape[0] / class_ids.shape[0])) 210 | 211 | 212 | # TODO: Replace with matplotlib equivalent? 213 | def draw_box(image, box, color): 214 | """Draw 3-pixel width bounding boxes on the given image array. 215 | color: list of 3 int values for RGB. 216 | """ 217 | y1, x1, y2, x2 = box 218 | image[y1:y1 + 2, x1:x2] = color 219 | image[y2:y2 + 2, x1:x2] = color 220 | image[y1:y2, x1:x1 + 2] = color 221 | image[y1:y2, x2:x2 + 2] = color 222 | return image 223 | 224 | 225 | def display_top_masks(image, mask, class_ids, class_names, limit=4): 226 | """Display the given image and the top few class masks.""" 227 | to_display = [] 228 | titles = [] 229 | to_display.append(image) 230 | titles.append("H x W={}x{}".format(image.shape[0], image.shape[1])) 231 | # Pick top prominent classes in this image 232 | unique_class_ids = np.unique(class_ids) 233 | mask_area = [np.sum(mask[:, :, np.where(class_ids == i)[0]]) 234 | for i in unique_class_ids] 235 | top_ids = [v[0] for v in sorted(zip(unique_class_ids, mask_area), 236 | key=lambda r: r[1], reverse=True) if v[1] > 0] 237 | # Generate images and titles 238 | for i in range(limit): 239 | class_id = top_ids[i] if i < len(top_ids) else -1 240 | # Pull masks of instances belonging to the same class. 241 | m = mask[:, :, np.where(class_ids == class_id)[0]] 242 | m = np.sum(m * np.arange(1, m.shape[-1] + 1), -1) 243 | to_display.append(m) 244 | titles.append(class_names[class_id] if class_id != -1 else "-") 245 | display_images(to_display, titles=titles, cols=limit + 1, cmap="Blues_r") 246 | 247 | 248 | def plot_precision_recall(AP, precisions, recalls): 249 | """Draw the precision-recall curve. 250 | 251 | AP: Average precision at IoU >= 0.5 252 | precisions: list of precision values 253 | recalls: list of recall values 254 | """ 255 | # Plot the Precision-Recall curve 256 | _, ax = plt.subplots(1) 257 | ax.set_title("Precision-Recall Curve. AP@50 = {:.3f}".format(AP)) 258 | ax.set_ylim(0, 1.1) 259 | ax.set_xlim(0, 1.1) 260 | _ = ax.plot(recalls, precisions) 261 | 262 | 263 | def plot_overlaps(gt_class_ids, pred_class_ids, pred_scores, 264 | overlaps, class_names, threshold=0.5): 265 | """Draw a grid showing how ground truth objects are classified. 266 | gt_class_ids: [N] int. Ground truth class IDs 267 | pred_class_id: [N] int. Predicted class IDs 268 | pred_scores: [N] float. The probability scores of predicted classes 269 | overlaps: [pred_boxes, gt_boxes] IoU overlaps of predictins and GT boxes. 270 | class_names: list of all class names in the dataset 271 | threshold: Float. The prediction probability required to predict a class 272 | """ 273 | gt_class_ids = gt_class_ids[gt_class_ids != 0] 274 | pred_class_ids = pred_class_ids[pred_class_ids != 0] 275 | 276 | plt.figure(figsize=(12, 10)) 277 | plt.imshow(overlaps, interpolation='nearest', cmap=plt.cm.Blues) 278 | plt.yticks(np.arange(len(pred_class_ids)), 279 | ["{} ({:.2f})".format(class_names[int(id)], pred_scores[i]) 280 | for i, id in enumerate(pred_class_ids)]) 281 | plt.xticks(np.arange(len(gt_class_ids)), 282 | [class_names[int(id)] for id in gt_class_ids], rotation=90) 283 | 284 | thresh = overlaps.max() / 2. 285 | for i, j in itertools.product(range(overlaps.shape[0]), 286 | range(overlaps.shape[1])): 287 | text = "" 288 | if overlaps[i, j] > threshold: 289 | text = "match" if gt_class_ids[j] == pred_class_ids[i] else "wrong" 290 | color = ("white" if overlaps[i, j] > thresh 291 | else "black" if overlaps[i, j] > 0 292 | else "grey") 293 | plt.text(j, i, "{:.3f}\n{}".format(overlaps[i, j], text), 294 | horizontalalignment="center", verticalalignment="center", 295 | fontsize=9, color=color) 296 | 297 | plt.tight_layout() 298 | plt.xlabel("Ground Truth") 299 | plt.ylabel("Predictions") 300 | 301 | 302 | def draw_boxes(image, boxes=None, refined_boxes=None, 303 | masks=None, captions=None, visibilities=None, 304 | title="", ax=None): 305 | """Draw bounding boxes and segmentation masks with differnt 306 | customizations. 307 | 308 | boxes: [N, (y1, x1, y2, x2, class_id)] in image coordinates. 309 | refined_boxes: Like boxes, but draw with solid lines to show 310 | that they're the result of refining 'boxes'. 311 | masks: [N, height, width] 312 | captions: List of N titles to display on each box 313 | visibilities: (optional) List of values of 0, 1, or 2. Determine how 314 | prominant each bounding box should be. 315 | title: An optional title to show over the image 316 | ax: (optional) Matplotlib axis to draw on. 317 | """ 318 | # Number of boxes 319 | assert boxes is not None or refined_boxes is not None 320 | N = boxes.shape[0] if boxes is not None else refined_boxes.shape[0] 321 | 322 | # Matplotlib Axis 323 | if not ax: 324 | _, ax = plt.subplots(1, figsize=(12, 12)) 325 | 326 | # Generate random colors 327 | colors = random_colors(N) 328 | 329 | # Show area outside image boundaries. 330 | margin = image.shape[0] // 10 331 | ax.set_ylim(image.shape[0] + margin, -margin) 332 | ax.set_xlim(-margin, image.shape[1] + margin) 333 | ax.axis('off') 334 | 335 | ax.set_title(title) 336 | 337 | masked_image = image.astype(np.uint32).copy() 338 | for i in range(N): 339 | # Box visibility 340 | visibility = visibilities[i] if visibilities is not None else 1 341 | if visibility == 0: 342 | color = "gray" 343 | style = "dotted" 344 | alpha = 0.5 345 | elif visibility == 1: 346 | color = colors[i] 347 | style = "dotted" 348 | alpha = 1 349 | elif visibility == 2: 350 | color = colors[i] 351 | style = "solid" 352 | alpha = 1 353 | 354 | # Boxes 355 | if boxes is not None: 356 | if not np.any(boxes[i]): 357 | # Skip this instance. Has no bbox. Likely lost in cropping. 358 | continue 359 | y1, x1, y2, x2 = boxes[i] 360 | p = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2, 361 | alpha=alpha, linestyle=style, 362 | edgecolor=color, facecolor='none') 363 | ax.add_patch(p) 364 | 365 | # Refined boxes 366 | if refined_boxes is not None and visibility > 0: 367 | ry1, rx1, ry2, rx2 = refined_boxes[i].astype(np.int32) 368 | p = patches.Rectangle((rx1, ry1), rx2 - rx1, ry2 - ry1, linewidth=2, 369 | edgecolor=color, facecolor='none') 370 | ax.add_patch(p) 371 | # Connect the top-left corners of the anchor and proposal 372 | if boxes is not None: 373 | ax.add_line(lines.Line2D([x1, rx1], [y1, ry1], color=color)) 374 | 375 | # Captions 376 | if captions is not None: 377 | caption = captions[i] 378 | # If there are refined boxes, display captions on them 379 | if refined_boxes is not None: 380 | y1, x1, y2, x2 = ry1, rx1, ry2, rx2 381 | x = random.randint(x1, (x1 + x2) // 2) 382 | ax.text(x1, y1, caption, size=11, verticalalignment='top', 383 | color='w', backgroundcolor="none", 384 | bbox={'facecolor': color, 'alpha': 0.5, 385 | 'pad': 2, 'edgecolor': 'none'}) 386 | 387 | # Masks 388 | if masks is not None: 389 | mask = masks[:, :, i] 390 | masked_image = apply_mask(masked_image, mask, color) 391 | # Mask Polygon 392 | # Pad to ensure proper polygons for masks that touch image edges. 393 | padded_mask = np.zeros( 394 | (mask.shape[0] + 2, mask.shape[1] + 2), dtype=np.uint8) 395 | padded_mask[1:-1, 1:-1] = mask 396 | contours = find_contours(padded_mask, 0.5) 397 | for verts in contours: 398 | # Subtract the padding and flip (y, x) to (x, y) 399 | verts = np.fliplr(verts) - 1 400 | p = Polygon(verts, facecolor="none", edgecolor=color) 401 | ax.add_patch(p) 402 | ax.imshow(masked_image.astype(np.uint8)) 403 | 404 | 405 | def display_table(table): 406 | """Display values in a table format. 407 | table: an iterable of rows, and each row is an iterable of values. 408 | """ 409 | html = "" 410 | for row in table: 411 | row_html = "" 412 | for col in row: 413 | row_html += "{:40}".format(str(col)) 414 | html += "" + row_html + "" 415 | html = "" + html + "
" 416 | IPython.display.display(IPython.display.HTML(html)) 417 | 418 | 419 | def display_weight_stats(model): 420 | """Scans all the weights in the model and returns a list of tuples 421 | that contain stats about each weight. 422 | """ 423 | layers = model.get_trainable_layers() 424 | table = [["WEIGHT NAME", "SHAPE", "MIN", "MAX", "STD"]] 425 | for l in layers: 426 | weight_values = l.get_weights() # list of Numpy arrays 427 | weight_tensors = l.weights # list of TF tensors 428 | for i, w in enumerate(weight_values): 429 | weight_name = weight_tensors[i].name 430 | # Detect problematic layers. Exclude biases of conv layers. 431 | alert = "" 432 | if w.min() == w.max() and not (l.__class__.__name__ == "Conv2D" and i == 1): 433 | alert += "*** dead?" 434 | if np.abs(w.min()) > 1000 or np.abs(w.max()) > 1000: 435 | alert += "*** Overflow?" 436 | # Add row 437 | table.append([ 438 | weight_name + alert, 439 | str(w.shape), 440 | "{:+9.4f}".format(w.min()), 441 | "{:+10.4f}".format(w.max()), 442 | "{:+9.4f}".format(w.std()), 443 | ]) 444 | display_table(table) 445 | --------------------------------------------------------------------------------