├── .gitignore ├── README.md ├── _config.yml ├── assets └── images │ ├── Untitled presentation.jpg │ ├── Webp.net-resizeimage-50.png │ ├── Webp.net-resizeimage-75.png │ ├── Webp.net-resizeimage.png │ ├── change.png │ ├── combined-intro.png │ ├── del.jpg │ ├── dh-govandi.png │ ├── dharavi.png │ ├── intro-min.jpg │ ├── intro.jpg │ ├── intro.png │ ├── intro_2.jpg │ ├── kurla-result.png │ ├── kurla-result_2.png │ ├── kurla.jpg │ ├── results_github_2.jpg │ ├── slum.png │ └── slum_480.gif ├── index.md ├── intro.jpg ├── mrcnn ├── __init__.py ├── config.py ├── model.py ├── parallel_model.py ├── utils.py └── visualize.py ├── requirements.txt ├── setup.py └── slums ├── README.md ├── change_det ├── 1_raw.jpg ├── 2_raw.jpg ├── change.png ├── mask_1.png └── mask_2.png ├── change_detection.py ├── slum.py ├── test_images └── bhandup_1.jpg ├── test_outputs └── pred_0.jpg └── testing.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | share/python-wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | MANIFEST 28 | 29 | # PyInstaller 30 | # Usually these files are written by a python script from a template 31 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 32 | *.manifest 33 | *.spec 34 | 35 | # Installer logs 36 | pip-log.txt 37 | pip-delete-this-directory.txt 38 | 39 | # Unit test / coverage reports 40 | htmlcov/ 41 | .tox/ 42 | .nox/ 43 | .coverage 44 | .coverage.* 45 | .cache 46 | nosetests.xml 47 | coverage.xml 48 | *.cover 49 | .hypothesis/ 50 | .pytest_cache/ 51 | 52 | # Translations 53 | *.mo 54 | *.pot 55 | 56 | # Django stuff: 57 | *.log 58 | local_settings.py 59 | db.sqlite3 60 | 61 | # Flask stuff: 62 | instance/ 63 | .webassets-cache 64 | 65 | # Scrapy stuff: 66 | .scrapy 67 | 68 | # Sphinx documentation 69 | docs/_build/ 70 | 71 | # PyBuilder 72 | target/ 73 | 74 | # Jupyter Notebook 75 | .ipynb_checkpoints 76 | 77 | # IPython 78 | profile_default/ 79 | ipython_config.py 80 | 81 | # pyenv 82 | .python-version 83 | 84 | # celery beat schedule file 85 | celerybeat-schedule 86 | 87 | # SageMath parsed files 88 | *.sage.py 89 | 90 | # Environments 91 | .env 92 | .venv 93 | env/ 94 | venv/ 95 | ENV/ 96 | env.bak/ 97 | venv.bak/ 98 | 99 | # Spyder project settings 100 | .spyderproject 101 | .spyproject 102 | 103 | # Rope project settings 104 | .ropeproject 105 | 106 | # mkdocs documentation 107 | /site 108 | 109 | # mypy 110 | .mypy_cache/ 111 | .dmypy.json 112 | dmypy.json 113 | 114 | # Pyre type checker 115 | .pyre/ 116 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Mumbai slum segmentation 2 | 3 | More than one billion people live in slums around the world. In some developing 4 | countries, slum residents make up for more than half of the population and lack 5 | reliable sanitation services, clean water, electricity, other basic services. We wanted to help. We built a deep learning model to map and and monitor slum growth over time. Check out our [project site](https://cbsudux.github.io/Mumbai-slum-segmentation/) for more information and how you can be a part of this and contribute. 6 | 7 | 8 |

9 | 10 |

11 | 12 | ## Mumbai Slums 13 | 14 | Mumbai is one of the most populous and wealthiest cities in India. However, it is also home to some of the world’s biggest slums -- **Dharavi, Mankhurd-Govandi belt, Kurla-Ghatkopar belt, Dindoshi and The Bhandup-Mulund slums**. The number of slum-dwellers in Mumbai is estimated to be around 9 million, up from 6 million in 2001 that is, 62% of of Mumbai live in informal slums. 15 | 16 | ![dharavi-govandi](/assets/images/dh-govandi.png) 17 | 18 | ![kurla](/assets/images/kurla.jpg) 19 | 20 | When we spoke to the local slum dwellers, we realised that the situation was worse than we expected. Most of them lack access to clean water, basic sanitation and any form of reliable healthcare. 21 | 22 | We wanted to help. 23 | 24 | ## What did we do? 25 | 26 | Any intitative on slum rehabitiation and improvement relies heavily on **slum mapping** and **monitoring**. When we spoke to the relevant authorities, we found out that they mapped slums manually (human annotators), which takes a substantial amount of time. We realised we could automate this and used a deep learning approach to **segment and map individual slums from satellite imagery**. In addition, we also wrote code to **perform change detection and monitor slum change over time**. Slum change detection is an important task and analysing increase/decrease of a slum can provide valuable insights. 27 | 28 | ## How did we go about it? 29 | 30 | We curated a **dataset** containing 3-band (RGB) satellite imagery with 65 cm per pixel resolution 31 | collected from Google Earth. Each image has a pixel size of 1280x720. The satellite imagery covers most of 32 | Mumbai and we include images from 2002 to 2018, to analyze slum change. We used 513 images for training, and 97 images for testing. (Unfortunately, we cannot redistribute the dataset, due to Google policy.) 33 | 34 | For **slum segmentation and mapping**, we trained a Mask R-CNN on our custom dataset. Check our [github readme](https://github.com/cbsudux/Mumbai-slum-segmentation/tree/master/slums) for our training and testing approaches, and our [paper](https://arxiv.org/abs/1811.07896) for more details. 35 | 36 | ![kurla result](/assets/images/kurla-result_2.png) 37 | 38 | For **slum change detection**, we took a pair of satellite images, representing the same location at different points of time. We predicted masks for both these images and then subtracted the masks to obtain a percentage icrease/decrease. The following images (below) show a change of +35.25% between 2018 (top row) and 2005 (bottom row) of the same slum. 39 | 40 | ![change result](/assets/images/change.png) 41 | 42 | ## Training and Testing 43 | 44 | Read [this](https://github.com/cbsudux/Mumbai-slum-segmentation/tree/master/slums) for training and testing, and how to prepare your own satellite dataset. 45 | 46 | 47 | ## Contributors 48 | 49 | - [Sudharshan Chandra Babu](http://github.com/cbsudux) 50 | - [Shishira R Maiya](https://github.com/abhyantrika) 51 | 52 | 53 | ## Acknowledgements 54 | 55 | We would like to thank the Slum Rehabiliation Authority of Mumbai for their data. 56 | 57 | ## Citing 58 | 59 | We published our work in the NeurIPS (NIPS) 2018 ML4D workshop. If you'd like to use our research, please cite using : 60 | ``` 61 | @article{maiya2018slum, 62 | title={Slum Segmentation and Change Detection: A Deep Learning Approach}, 63 | author={Maiya, Shishira R and Babu, Sudharshan Chandra}, 64 | journal={arXiv preprint arXiv:1811.07896}, 65 | year={2018} 66 | } 67 | ``` 68 | 69 | ## License 70 | 71 | Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License. 72 | 73 | 74 | 75 | -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | theme: jekyll-theme-cayman -------------------------------------------------------------------------------- /assets/images/Untitled presentation.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/Untitled presentation.jpg -------------------------------------------------------------------------------- /assets/images/Webp.net-resizeimage-50.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/Webp.net-resizeimage-50.png -------------------------------------------------------------------------------- /assets/images/Webp.net-resizeimage-75.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/Webp.net-resizeimage-75.png -------------------------------------------------------------------------------- /assets/images/Webp.net-resizeimage.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/Webp.net-resizeimage.png -------------------------------------------------------------------------------- /assets/images/change.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/change.png -------------------------------------------------------------------------------- /assets/images/combined-intro.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/combined-intro.png -------------------------------------------------------------------------------- /assets/images/del.jpg: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /assets/images/dh-govandi.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/dh-govandi.png -------------------------------------------------------------------------------- /assets/images/dharavi.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/dharavi.png -------------------------------------------------------------------------------- /assets/images/intro-min.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/intro-min.jpg -------------------------------------------------------------------------------- /assets/images/intro.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/intro.jpg -------------------------------------------------------------------------------- /assets/images/intro.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/intro.png -------------------------------------------------------------------------------- /assets/images/intro_2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/intro_2.jpg -------------------------------------------------------------------------------- /assets/images/kurla-result.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/kurla-result.png -------------------------------------------------------------------------------- /assets/images/kurla-result_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/kurla-result_2.png -------------------------------------------------------------------------------- /assets/images/kurla.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/kurla.jpg -------------------------------------------------------------------------------- /assets/images/results_github_2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/results_github_2.jpg -------------------------------------------------------------------------------- /assets/images/slum.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/slum.png -------------------------------------------------------------------------------- /assets/images/slum_480.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/slum_480.gif -------------------------------------------------------------------------------- /index.md: -------------------------------------------------------------------------------- 1 | # Mumbai slum segmentation 2 | 3 | More than one billion people live in slums around the world. In some developing countries, slum residents make up for more than half of the population and lack reliable sanitation services, clean water, electricity, other basic services. We wanted to help. 4 | 5 | ![intro-pic](/assets/images/combined-intro.png) 6 | 7 | 8 | ## Mumbai Slums 9 | 10 | Mumbai is one of the most populous and wealthiest cities in India. However, it is also home to some of the world’s biggest slums -- **Dharavi, Mankhurd-Govandi belt, Kurla-Ghatkopar belt, Dindoshi and The Bhandup-Mulund slums**. The number of slum-dwellers in Mumbai is estimated to be around 9 million, up from 6 million in 2001 that is, 62% of of Mumbai live in informal slums. 11 | 12 | ![dharavi-govandi](/assets/images/dh-govandi.png) 13 | 14 | ![kurla](/assets/images/kurla.jpg) 15 | 16 | When we spoke to the local slum dwellers, we realised that the situation was worse than we expected. Most of them lack access to clean water, basic sanitation and any form of reliable healthcare. 17 | 18 | We wanted to help. 19 | 20 | 21 | ## What did we do? 22 | 23 | Any intitative on slum rehabitiation and improvement relies heavily on **slum mapping** and **monitoring**. When we spoke to the relevant authorities, we found out that they mapped slums manually (human annotators), which takes a substantial amount of time. We realised we could automate this and used a deep learning approach to **segment and map individual slums from satellite imagery**. In addition, we also wrote code to **perform change detection and monitor slum change over time**. Slum change detection is an important task and analysing increase/decrease of a slum can provide valuable insights. 24 | 25 | ## How did we go about it? 26 | 27 | We curated a **dataset** containing 3-band (RGB) satellite imagery with 65 cm per pixel resolution 28 | collected from Google Earth. Each image has a pixel size of 1280x720. The satellite imagery covers most of 29 | Mumbai and we include images from 2002 to 2018, to analyze slum change. We used 513 images for training, and 97 images for testing. (Unfortunately, we cannot redistribute the dataset, due to Google policy.) 30 | 31 | For **slum segmentation and mapping**, we trained a Mask R-CNN on our custom dataset. Check our [github readme](https://github.com/cbsudux/Mumbai-slum-segmentation/tree/master/slums) for our training and testing approaches, and our [paper](https://arxiv.org/abs/1811.07896) for more details. 32 | 33 | ![kurla result](/assets/images/kurla-result_2.png) 34 | 35 | The Kurla-Ghatokopar slums (above) are one of the first things you see when you land in Mumbai, given their proximity to the Chhatrapati Shivaji Maharaj International Airport. 36 |
37 |
38 |
39 | 40 |
41 |
42 |
43 | 44 | Here's a short video (above) of our model mapping the Govandi slums. 45 | 46 | For **slum change detection**, we took a pair of satellite images, representing the same location at different points of time. We predicted masks for both these images and then subtracted the masks to obtain a percentage icrease/decrease. The following images (below) show a change of +35.25% between 2018 (top row) and 2005 (bottom row) of the same slum. 47 | 48 | ![change result](/assets/images/change.png) 49 | 50 | 51 | ## Contributors 52 | 53 | - [Sudharshan Chandra Babu](http://github.com/cbsudux) 54 | - [Shishira R Maiya](https://github.com/abhyantrika) 55 | 56 | ## How can you help? 57 | 58 | Quite a lot of NGOs work towards slum rehabilitation in Mumbai. You can volunteer (or) donate. 59 | 60 | ### NGOs 61 | 62 | - [Slum Aid](http://slumaid.org/) 63 | - [Red Boys Foundation](http://www.redboysfoundation.com/) 64 | - [SAKHI](http://sakhiforgirlseducation.org/) 65 | - [Society for Nutrition, Education & Health Action (SNEHA)](http://snehamumbai.org/) 66 | 67 | ## Acknowledgements 68 | 69 | We would like to thank the Slum Rehabiliation Authority of Mumbai for their data. 70 | 71 | ## Citing 72 | 73 | We published our work in the NeurIPS (NIPS) 2018 ML4D workshop. If you'd like to use our research, please cite using - 74 | ``` 75 | @article{maiya2018slum, 76 | title={Slum Segmentation and Change Detection: A Deep Learning Approach}, 77 | author={Maiya, Shishira R and Babu, Sudharshan Chandra}, 78 | journal={arXiv preprint arXiv:1811.07896}, 79 | year={2018} 80 | } 81 | ``` 82 | 83 | 84 | 85 | 86 | -------------------------------------------------------------------------------- /intro.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/intro.jpg -------------------------------------------------------------------------------- /mrcnn/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /mrcnn/config.py: -------------------------------------------------------------------------------- 1 | """ 2 | Mask R-CNN 3 | Base Configurations class. 4 | 5 | Copyright (c) 2017 Matterport, Inc. 6 | Licensed under the MIT License (see LICENSE for details) 7 | Written by Waleed Abdulla 8 | """ 9 | 10 | import numpy as np 11 | 12 | 13 | # Base Configuration Class 14 | # Don't use this class directly. Instead, sub-class it and override 15 | # the configurations you need to change. 16 | 17 | class Config(object): 18 | """Base configuration class. For custom configurations, create a 19 | sub-class that inherits from this one and override properties 20 | that need to be changed. 21 | """ 22 | # Name the configurations. For example, 'COCO', 'Experiment 3', ...etc. 23 | # Useful if your code needs to do things differently depending on which 24 | # experiment is running. 25 | NAME = None # Override in sub-classes 26 | 27 | # NUMBER OF GPUs to use. When using only a CPU, this needs to be set to 1. 28 | GPU_COUNT = 1 29 | 30 | # Number of images to train with on each GPU. A 12GB GPU can typically 31 | # handle 2 images of 1024x1024px. 32 | # Adjust based on your GPU memory and image sizes. Use the highest 33 | # number that your GPU can handle for best performance. 34 | IMAGES_PER_GPU = 2 35 | 36 | # Number of training steps per epoch 37 | # This doesn't need to match the size of the training set. Tensorboard 38 | # updates are saved at the end of each epoch, so setting this to a 39 | # smaller number means getting more frequent TensorBoard updates. 40 | # Validation stats are also calculated at each epoch end and they 41 | # might take a while, so don't set this too small to avoid spending 42 | # a lot of time on validation stats. 43 | STEPS_PER_EPOCH = 1000 44 | 45 | # Number of validation steps to run at the end of every training epoch. 46 | # A bigger number improves accuracy of validation stats, but slows 47 | # down the training. 48 | VALIDATION_STEPS = 50 49 | 50 | # Backbone network architecture 51 | # Supported values are: resnet50, resnet101. 52 | # You can also provide a callable that should have the signature 53 | # of model.resnet_graph. If you do so, you need to supply a callable 54 | # to COMPUTE_BACKBONE_SHAPE as well 55 | BACKBONE = "resnet101" 56 | 57 | # Only useful if you supply a callable to BACKBONE. Should compute 58 | # the shape of each layer of the FPN Pyramid. 59 | # See model.compute_backbone_shapes 60 | COMPUTE_BACKBONE_SHAPE = None 61 | 62 | # The strides of each layer of the FPN Pyramid. These values 63 | # are based on a Resnet101 backbone. 64 | BACKBONE_STRIDES = [4, 8, 16, 32, 64] 65 | 66 | # Size of the fully-connected layers in the classification graph 67 | FPN_CLASSIF_FC_LAYERS_SIZE = 1024 68 | 69 | # Size of the top-down layers used to build the feature pyramid 70 | TOP_DOWN_PYRAMID_SIZE = 256 71 | 72 | # Number of classification classes (including background) 73 | NUM_CLASSES = 1 # Override in sub-classes 74 | 75 | # Length of square anchor side in pixels 76 | RPN_ANCHOR_SCALES = (32, 64, 128, 256, 512) 77 | 78 | # Ratios of anchors at each cell (width/height) 79 | # A value of 1 represents a square anchor, and 0.5 is a wide anchor 80 | RPN_ANCHOR_RATIOS = [0.5, 1, 2] 81 | 82 | # Anchor stride 83 | # If 1 then anchors are created for each cell in the backbone feature map. 84 | # If 2, then anchors are created for every other cell, and so on. 85 | RPN_ANCHOR_STRIDE = 1 86 | 87 | # Non-max suppression threshold to filter RPN proposals. 88 | # You can increase this during training to generate more propsals. 89 | RPN_NMS_THRESHOLD = 0.7 90 | 91 | # How many anchors per image to use for RPN training 92 | RPN_TRAIN_ANCHORS_PER_IMAGE = 256 93 | 94 | # ROIs kept after tf.nn.top_k and before non-maximum suppression 95 | PRE_NMS_LIMIT = 6000 96 | 97 | # ROIs kept after non-maximum suppression (training and inference) 98 | POST_NMS_ROIS_TRAINING = 2000 99 | POST_NMS_ROIS_INFERENCE = 1000 100 | 101 | # If enabled, resizes instance masks to a smaller size to reduce 102 | # memory load. Recommended when using high-resolution images. 103 | USE_MINI_MASK = True 104 | MINI_MASK_SHAPE = (56, 56) # (height, width) of the mini-mask 105 | 106 | # Input image resizing 107 | # Generally, use the "square" resizing mode for training and predicting 108 | # and it should work well in most cases. In this mode, images are scaled 109 | # up such that the small side is = IMAGE_MIN_DIM, but ensuring that the 110 | # scaling doesn't make the long side > IMAGE_MAX_DIM. Then the image is 111 | # padded with zeros to make it a square so multiple images can be put 112 | # in one batch. 113 | # Available resizing modes: 114 | # none: No resizing or padding. Return the image unchanged. 115 | # square: Resize and pad with zeros to get a square image 116 | # of size [max_dim, max_dim]. 117 | # pad64: Pads width and height with zeros to make them multiples of 64. 118 | # If IMAGE_MIN_DIM or IMAGE_MIN_SCALE are not None, then it scales 119 | # up before padding. IMAGE_MAX_DIM is ignored in this mode. 120 | # The multiple of 64 is needed to ensure smooth scaling of feature 121 | # maps up and down the 6 levels of the FPN pyramid (2**6=64). 122 | # crop: Picks random crops from the image. First, scales the image based 123 | # on IMAGE_MIN_DIM and IMAGE_MIN_SCALE, then picks a random crop of 124 | # size IMAGE_MIN_DIM x IMAGE_MIN_DIM. Can be used in training only. 125 | # IMAGE_MAX_DIM is not used in this mode. 126 | IMAGE_RESIZE_MODE = "square" 127 | IMAGE_MIN_DIM = 800 128 | IMAGE_MAX_DIM = 1024 129 | # Minimum scaling ratio. Checked after MIN_IMAGE_DIM and can force further 130 | # up scaling. For example, if set to 2 then images are scaled up to double 131 | # the width and height, or more, even if MIN_IMAGE_DIM doesn't require it. 132 | # Howver, in 'square' mode, it can be overruled by IMAGE_MAX_DIM. 133 | IMAGE_MIN_SCALE = 0 134 | # Number of color channels per image. RGB = 3, grayscale = 1, RGB-D = 4 135 | # Changing this requires other changes in the code. See the WIKI for more 136 | # details: https://github.com/matterport/Mask_RCNN/wiki 137 | IMAGE_CHANNEL_COUNT = 3 138 | 139 | # Image mean (RGB) 140 | MEAN_PIXEL = np.array([123.7, 116.8, 103.9]) 141 | 142 | # Number of ROIs per image to feed to classifier/mask heads 143 | # The Mask RCNN paper uses 512 but often the RPN doesn't generate 144 | # enough positive proposals to fill this and keep a positive:negative 145 | # ratio of 1:3. You can increase the number of proposals by adjusting 146 | # the RPN NMS threshold. 147 | TRAIN_ROIS_PER_IMAGE = 200 148 | 149 | # Percent of positive ROIs used to train classifier/mask heads 150 | ROI_POSITIVE_RATIO = 0.33 151 | 152 | # Pooled ROIs 153 | POOL_SIZE = 7 154 | MASK_POOL_SIZE = 14 155 | 156 | # Shape of output mask 157 | # To change this you also need to change the neural network mask branch 158 | MASK_SHAPE = [28, 28] 159 | 160 | # Maximum number of ground truth instances to use in one image 161 | MAX_GT_INSTANCES = 100 162 | 163 | # Bounding box refinement standard deviation for RPN and final detections. 164 | RPN_BBOX_STD_DEV = np.array([0.1, 0.1, 0.2, 0.2]) 165 | BBOX_STD_DEV = np.array([0.1, 0.1, 0.2, 0.2]) 166 | 167 | # Max number of final detections 168 | DETECTION_MAX_INSTANCES = 100 169 | 170 | # Minimum probability value to accept a detected instance 171 | # ROIs below this threshold are skipped 172 | DETECTION_MIN_CONFIDENCE = 0.7 173 | 174 | # Non-maximum suppression threshold for detection 175 | DETECTION_NMS_THRESHOLD = 0.3 176 | 177 | # Learning rate and momentum 178 | # The Mask RCNN paper uses lr=0.02, but on TensorFlow it causes 179 | # weights to explode. Likely due to differences in optimizer 180 | # implementation. 181 | 182 | #For SGD 183 | #LEARNING_RATE = 0.001 184 | 185 | #For ADAM 186 | LEARNING_RATE = 0.0001 187 | 188 | LEARNING_MOMENTUM = 0.9 189 | 190 | # Weight decay regularization 191 | WEIGHT_DECAY = 0.0001 192 | 193 | # Loss weights for more precise optimization. 194 | # Can be used for R-CNN training setup. 195 | LOSS_WEIGHTS = { 196 | "rpn_class_loss": 1., 197 | "rpn_bbox_loss": 1., 198 | "mrcnn_class_loss": 1., 199 | "mrcnn_bbox_loss": 1., 200 | "mrcnn_mask_loss": 1. 201 | } 202 | 203 | # Use RPN ROIs or externally generated ROIs for training 204 | # Keep this True for most situations. Set to False if you want to train 205 | # the head branches on ROI generated by code rather than the ROIs from 206 | # the RPN. For example, to debug the classifier head without having to 207 | # train the RPN. 208 | USE_RPN_ROIS = True 209 | 210 | # Train or freeze batch normalization layers 211 | # None: Train BN layers. This is the normal mode 212 | # False: Freeze BN layers. Good when using a small batch size 213 | # True: (don't use). Set layer in training mode even when predicting 214 | TRAIN_BN = False # Defaulting to False since batch size is often small 215 | 216 | # Gradient norm clipping 217 | GRADIENT_CLIP_NORM = 5.0 218 | 219 | def __init__(self): 220 | """Set values of computed attributes.""" 221 | # Effective batch size 222 | self.BATCH_SIZE = self.IMAGES_PER_GPU * self.GPU_COUNT 223 | 224 | # Input image size 225 | if self.IMAGE_RESIZE_MODE == "crop": 226 | self.IMAGE_SHAPE = np.array([self.IMAGE_MIN_DIM, self.IMAGE_MIN_DIM, 227 | self.IMAGE_CHANNEL_COUNT]) 228 | else: 229 | self.IMAGE_SHAPE = np.array([self.IMAGE_MAX_DIM, self.IMAGE_MAX_DIM, 230 | self.IMAGE_CHANNEL_COUNT]) 231 | 232 | # Image meta data length 233 | # See compose_image_meta() for details 234 | self.IMAGE_META_SIZE = 1 + 3 + 3 + 4 + 1 + self.NUM_CLASSES 235 | 236 | def display(self): 237 | """Display Configuration values.""" 238 | print("\nConfigurations:") 239 | for a in dir(self): 240 | if not a.startswith("__") and not callable(getattr(self, a)): 241 | print("{:30} {}".format(a, getattr(self, a))) 242 | print("\n") 243 | -------------------------------------------------------------------------------- /mrcnn/parallel_model.py: -------------------------------------------------------------------------------- 1 | """ 2 | Mask R-CNN 3 | Multi-GPU Support for Keras. 4 | 5 | Copyright (c) 2017 Matterport, Inc. 6 | Licensed under the MIT License (see LICENSE for details) 7 | Written by Waleed Abdulla 8 | 9 | Ideas and a small code snippets from these sources: 10 | https://github.com/fchollet/keras/issues/2436 11 | https://medium.com/@kuza55/transparent-multi-gpu-training-on-tensorflow-with-keras-8b0016fd9012 12 | https://github.com/avolkov1/keras_experiments/blob/master/keras_exp/multigpu/ 13 | https://github.com/fchollet/keras/blob/master/keras/utils/training_utils.py 14 | """ 15 | 16 | import tensorflow as tf 17 | import keras.backend as K 18 | import keras.layers as KL 19 | import keras.models as KM 20 | 21 | 22 | class ParallelModel(KM.Model): 23 | """Subclasses the standard Keras Model and adds multi-GPU support. 24 | It works by creating a copy of the model on each GPU. Then it slices 25 | the inputs and sends a slice to each copy of the model, and then 26 | merges the outputs together and applies the loss on the combined 27 | outputs. 28 | """ 29 | 30 | def __init__(self, keras_model, gpu_count): 31 | """Class constructor. 32 | keras_model: The Keras model to parallelize 33 | gpu_count: Number of GPUs. Must be > 1 34 | """ 35 | self.inner_model = keras_model 36 | self.gpu_count = gpu_count 37 | merged_outputs = self.make_parallel() 38 | super(ParallelModel, self).__init__(inputs=self.inner_model.inputs, 39 | outputs=merged_outputs) 40 | 41 | def __getattribute__(self, attrname): 42 | """Redirect loading and saving methods to the inner model. That's where 43 | the weights are stored.""" 44 | if 'load' in attrname or 'save' in attrname: 45 | return getattr(self.inner_model, attrname) 46 | return super(ParallelModel, self).__getattribute__(attrname) 47 | 48 | def summary(self, *args, **kwargs): 49 | """Override summary() to display summaries of both, the wrapper 50 | and inner models.""" 51 | super(ParallelModel, self).summary(*args, **kwargs) 52 | self.inner_model.summary(*args, **kwargs) 53 | 54 | def make_parallel(self): 55 | """Creates a new wrapper model that consists of multiple replicas of 56 | the original model placed on different GPUs. 57 | """ 58 | # Slice inputs. Slice inputs on the CPU to avoid sending a copy 59 | # of the full inputs to all GPUs. Saves on bandwidth and memory. 60 | input_slices = {name: tf.split(x, self.gpu_count) 61 | for name, x in zip(self.inner_model.input_names, 62 | self.inner_model.inputs)} 63 | 64 | output_names = self.inner_model.output_names 65 | outputs_all = [] 66 | for i in range(len(self.inner_model.outputs)): 67 | outputs_all.append([]) 68 | 69 | # Run the model call() on each GPU to place the ops there 70 | for i in range(self.gpu_count): 71 | with tf.device('/gpu:%d' % i): 72 | with tf.name_scope('tower_%d' % i): 73 | # Run a slice of inputs through this replica 74 | zipped_inputs = zip(self.inner_model.input_names, 75 | self.inner_model.inputs) 76 | inputs = [ 77 | KL.Lambda(lambda s: input_slices[name][i], 78 | output_shape=lambda s: (None,) + s[1:])(tensor) 79 | for name, tensor in zipped_inputs] 80 | # Create the model replica and get the outputs 81 | outputs = self.inner_model(inputs) 82 | if not isinstance(outputs, list): 83 | outputs = [outputs] 84 | # Save the outputs for merging back together later 85 | for l, o in enumerate(outputs): 86 | outputs_all[l].append(o) 87 | 88 | # Merge outputs on CPU 89 | with tf.device('/cpu:0'): 90 | merged = [] 91 | for outputs, name in zip(outputs_all, output_names): 92 | # Concatenate or average outputs? 93 | # Outputs usually have a batch dimension and we concatenate 94 | # across it. If they don't, then the output is likely a loss 95 | # or a metric value that gets averaged across the batch. 96 | # Keras expects losses and metrics to be scalars. 97 | if K.int_shape(outputs[0]) == (): 98 | # Average 99 | m = KL.Lambda(lambda o: tf.add_n(o) / len(outputs), name=name)(outputs) 100 | else: 101 | # Concatenate 102 | m = KL.Concatenate(axis=0, name=name)(outputs) 103 | merged.append(m) 104 | return merged 105 | 106 | 107 | if __name__ == "__main__": 108 | # Testing code below. It creates a simple model to train on MNIST and 109 | # tries to run it on 2 GPUs. It saves the graph so it can be viewed 110 | # in TensorBoard. Run it as: 111 | # 112 | # python3 parallel_model.py 113 | 114 | import os 115 | import numpy as np 116 | import keras.optimizers 117 | from keras.datasets import mnist 118 | from keras.preprocessing.image import ImageDataGenerator 119 | 120 | GPU_COUNT = 2 121 | 122 | # Root directory of the project 123 | ROOT_DIR = os.path.abspath("../") 124 | 125 | # Directory to save logs and trained model 126 | MODEL_DIR = os.path.join(ROOT_DIR, "logs") 127 | 128 | def build_model(x_train, num_classes): 129 | # Reset default graph. Keras leaves old ops in the graph, 130 | # which are ignored for execution but clutter graph 131 | # visualization in TensorBoard. 132 | tf.reset_default_graph() 133 | 134 | inputs = KL.Input(shape=x_train.shape[1:], name="input_image") 135 | x = KL.Conv2D(32, (3, 3), activation='relu', padding="same", 136 | name="conv1")(inputs) 137 | x = KL.Conv2D(64, (3, 3), activation='relu', padding="same", 138 | name="conv2")(x) 139 | x = KL.MaxPooling2D(pool_size=(2, 2), name="pool1")(x) 140 | x = KL.Flatten(name="flat1")(x) 141 | x = KL.Dense(128, activation='relu', name="dense1")(x) 142 | x = KL.Dense(num_classes, activation='softmax', name="dense2")(x) 143 | 144 | return KM.Model(inputs, x, "digit_classifier_model") 145 | 146 | # Load MNIST Data 147 | (x_train, y_train), (x_test, y_test) = mnist.load_data() 148 | x_train = np.expand_dims(x_train, -1).astype('float32') / 255 149 | x_test = np.expand_dims(x_test, -1).astype('float32') / 255 150 | 151 | print('x_train shape:', x_train.shape) 152 | print('x_test shape:', x_test.shape) 153 | 154 | # Build data generator and model 155 | datagen = ImageDataGenerator() 156 | model = build_model(x_train, 10) 157 | 158 | # Add multi-GPU support. 159 | model = ParallelModel(model, GPU_COUNT) 160 | 161 | optimizer = keras.optimizers.SGD(lr=0.01, momentum=0.9, clipnorm=5.0) 162 | 163 | model.compile(loss='sparse_categorical_crossentropy', 164 | optimizer=optimizer, metrics=['accuracy']) 165 | 166 | model.summary() 167 | 168 | # Train 169 | model.fit_generator( 170 | datagen.flow(x_train, y_train, batch_size=64), 171 | steps_per_epoch=50, epochs=10, verbose=1, 172 | validation_data=(x_test, y_test), 173 | callbacks=[keras.callbacks.TensorBoard(log_dir=MODEL_DIR, 174 | write_graph=True)] 175 | ) 176 | -------------------------------------------------------------------------------- /mrcnn/utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | Mask R-CNN 3 | Common utility functions and classes. 4 | 5 | Copyright (c) 2017 Matterport, Inc. 6 | Licensed under the MIT License (see LICENSE for details) 7 | Written by Waleed Abdulla 8 | """ 9 | 10 | import sys 11 | import os 12 | import math 13 | import random 14 | import numpy as np 15 | import tensorflow as tf 16 | import scipy 17 | import skimage.color 18 | import skimage.io 19 | import skimage.transform 20 | import urllib.request 21 | import shutil 22 | import warnings 23 | from distutils.version import LooseVersion 24 | 25 | # URL from which to download the latest COCO trained weights 26 | COCO_MODEL_URL = "https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5" 27 | 28 | 29 | ############################################################ 30 | # Bounding Boxes 31 | ############################################################ 32 | 33 | def extract_bboxes(mask): 34 | """Compute bounding boxes from masks. 35 | mask: [height, width, num_instances]. Mask pixels are either 1 or 0. 36 | 37 | Returns: bbox array [num_instances, (y1, x1, y2, x2)]. 38 | """ 39 | boxes = np.zeros([mask.shape[-1], 4], dtype=np.int32) 40 | for i in range(mask.shape[-1]): 41 | m = mask[:, :, i] 42 | # Bounding box. 43 | horizontal_indicies = np.where(np.any(m, axis=0))[0] 44 | vertical_indicies = np.where(np.any(m, axis=1))[0] 45 | if horizontal_indicies.shape[0]: 46 | x1, x2 = horizontal_indicies[[0, -1]] 47 | y1, y2 = vertical_indicies[[0, -1]] 48 | # x2 and y2 should not be part of the box. Increment by 1. 49 | x2 += 1 50 | y2 += 1 51 | else: 52 | # No mask for this instance. Might happen due to 53 | # resizing or cropping. Set bbox to zeros 54 | x1, x2, y1, y2 = 0, 0, 0, 0 55 | boxes[i] = np.array([y1, x1, y2, x2]) 56 | return boxes.astype(np.int32) 57 | 58 | 59 | def compute_iou(box, boxes, box_area, boxes_area): 60 | """Calculates IoU of the given box with the array of the given boxes. 61 | box: 1D vector [y1, x1, y2, x2] 62 | boxes: [boxes_count, (y1, x1, y2, x2)] 63 | box_area: float. the area of 'box' 64 | boxes_area: array of length boxes_count. 65 | 66 | Note: the areas are passed in rather than calculated here for 67 | efficiency. Calculate once in the caller to avoid duplicate work. 68 | """ 69 | # Calculate intersection areas 70 | y1 = np.maximum(box[0], boxes[:, 0]) 71 | y2 = np.minimum(box[2], boxes[:, 2]) 72 | x1 = np.maximum(box[1], boxes[:, 1]) 73 | x2 = np.minimum(box[3], boxes[:, 3]) 74 | intersection = np.maximum(x2 - x1, 0) * np.maximum(y2 - y1, 0) 75 | union = box_area + boxes_area[:] - intersection[:] 76 | iou = intersection / union 77 | return iou 78 | 79 | 80 | def compute_overlaps(boxes1, boxes2): 81 | """Computes IoU overlaps between two sets of boxes. 82 | boxes1, boxes2: [N, (y1, x1, y2, x2)]. 83 | 84 | For better performance, pass the largest set first and the smaller second. 85 | """ 86 | # Areas of anchors and GT boxes 87 | area1 = (boxes1[:, 2] - boxes1[:, 0]) * (boxes1[:, 3] - boxes1[:, 1]) 88 | area2 = (boxes2[:, 2] - boxes2[:, 0]) * (boxes2[:, 3] - boxes2[:, 1]) 89 | 90 | # Compute overlaps to generate matrix [boxes1 count, boxes2 count] 91 | # Each cell contains the IoU value. 92 | overlaps = np.zeros((boxes1.shape[0], boxes2.shape[0])) 93 | for i in range(overlaps.shape[1]): 94 | box2 = boxes2[i] 95 | overlaps[:, i] = compute_iou(box2, boxes1, area2[i], area1) 96 | return overlaps 97 | 98 | 99 | def compute_overlaps_masks(masks1, masks2): 100 | """Computes IoU overlaps between two sets of masks. 101 | masks1, masks2: [Height, Width, instances] 102 | """ 103 | 104 | # If either set of masks is empty return empty result 105 | if masks1.shape[0] == 0 or masks2.shape[0] == 0: 106 | return np.zeros((masks1.shape[0], masks2.shape[-1])) 107 | # flatten masks and compute their areas 108 | masks1 = np.reshape(masks1 > .5, (-1, masks1.shape[-1])).astype(np.float32) 109 | masks2 = np.reshape(masks2 > .5, (-1, masks2.shape[-1])).astype(np.float32) 110 | area1 = np.sum(masks1, axis=0) 111 | area2 = np.sum(masks2, axis=0) 112 | 113 | # intersections and union 114 | intersections = np.dot(masks1.T, masks2) 115 | union = area1[:, None] + area2[None, :] - intersections 116 | overlaps = intersections / union 117 | 118 | return overlaps 119 | 120 | 121 | def non_max_suppression(boxes, scores, threshold): 122 | """Performs non-maximum suppression and returns indices of kept boxes. 123 | boxes: [N, (y1, x1, y2, x2)]. Notice that (y2, x2) lays outside the box. 124 | scores: 1-D array of box scores. 125 | threshold: Float. IoU threshold to use for filtering. 126 | """ 127 | assert boxes.shape[0] > 0 128 | if boxes.dtype.kind != "f": 129 | boxes = boxes.astype(np.float32) 130 | 131 | # Compute box areas 132 | y1 = boxes[:, 0] 133 | x1 = boxes[:, 1] 134 | y2 = boxes[:, 2] 135 | x2 = boxes[:, 3] 136 | area = (y2 - y1) * (x2 - x1) 137 | 138 | # Get indicies of boxes sorted by scores (highest first) 139 | ixs = scores.argsort()[::-1] 140 | 141 | pick = [] 142 | while len(ixs) > 0: 143 | # Pick top box and add its index to the list 144 | i = ixs[0] 145 | pick.append(i) 146 | # Compute IoU of the picked box with the rest 147 | iou = compute_iou(boxes[i], boxes[ixs[1:]], area[i], area[ixs[1:]]) 148 | # Identify boxes with IoU over the threshold. This 149 | # returns indices into ixs[1:], so add 1 to get 150 | # indices into ixs. 151 | remove_ixs = np.where(iou > threshold)[0] + 1 152 | # Remove indices of the picked and overlapped boxes. 153 | ixs = np.delete(ixs, remove_ixs) 154 | ixs = np.delete(ixs, 0) 155 | return np.array(pick, dtype=np.int32) 156 | 157 | 158 | def apply_box_deltas(boxes, deltas): 159 | """Applies the given deltas to the given boxes. 160 | boxes: [N, (y1, x1, y2, x2)]. Note that (y2, x2) is outside the box. 161 | deltas: [N, (dy, dx, log(dh), log(dw))] 162 | """ 163 | boxes = boxes.astype(np.float32) 164 | # Convert to y, x, h, w 165 | height = boxes[:, 2] - boxes[:, 0] 166 | width = boxes[:, 3] - boxes[:, 1] 167 | center_y = boxes[:, 0] + 0.5 * height 168 | center_x = boxes[:, 1] + 0.5 * width 169 | # Apply deltas 170 | center_y += deltas[:, 0] * height 171 | center_x += deltas[:, 1] * width 172 | height *= np.exp(deltas[:, 2]) 173 | width *= np.exp(deltas[:, 3]) 174 | # Convert back to y1, x1, y2, x2 175 | y1 = center_y - 0.5 * height 176 | x1 = center_x - 0.5 * width 177 | y2 = y1 + height 178 | x2 = x1 + width 179 | return np.stack([y1, x1, y2, x2], axis=1) 180 | 181 | 182 | def box_refinement_graph(box, gt_box): 183 | """Compute refinement needed to transform box to gt_box. 184 | box and gt_box are [N, (y1, x1, y2, x2)] 185 | """ 186 | box = tf.cast(box, tf.float32) 187 | gt_box = tf.cast(gt_box, tf.float32) 188 | 189 | height = box[:, 2] - box[:, 0] 190 | width = box[:, 3] - box[:, 1] 191 | center_y = box[:, 0] + 0.5 * height 192 | center_x = box[:, 1] + 0.5 * width 193 | 194 | gt_height = gt_box[:, 2] - gt_box[:, 0] 195 | gt_width = gt_box[:, 3] - gt_box[:, 1] 196 | gt_center_y = gt_box[:, 0] + 0.5 * gt_height 197 | gt_center_x = gt_box[:, 1] + 0.5 * gt_width 198 | 199 | dy = (gt_center_y - center_y) / height 200 | dx = (gt_center_x - center_x) / width 201 | dh = tf.log(gt_height / height) 202 | dw = tf.log(gt_width / width) 203 | 204 | result = tf.stack([dy, dx, dh, dw], axis=1) 205 | return result 206 | 207 | 208 | def box_refinement(box, gt_box): 209 | """Compute refinement needed to transform box to gt_box. 210 | box and gt_box are [N, (y1, x1, y2, x2)]. (y2, x2) is 211 | assumed to be outside the box. 212 | """ 213 | box = box.astype(np.float32) 214 | gt_box = gt_box.astype(np.float32) 215 | 216 | height = box[:, 2] - box[:, 0] 217 | width = box[:, 3] - box[:, 1] 218 | center_y = box[:, 0] + 0.5 * height 219 | center_x = box[:, 1] + 0.5 * width 220 | 221 | gt_height = gt_box[:, 2] - gt_box[:, 0] 222 | gt_width = gt_box[:, 3] - gt_box[:, 1] 223 | gt_center_y = gt_box[:, 0] + 0.5 * gt_height 224 | gt_center_x = gt_box[:, 1] + 0.5 * gt_width 225 | 226 | dy = (gt_center_y - center_y) / height 227 | dx = (gt_center_x - center_x) / width 228 | dh = np.log(gt_height / height) 229 | dw = np.log(gt_width / width) 230 | 231 | return np.stack([dy, dx, dh, dw], axis=1) 232 | 233 | 234 | ############################################################ 235 | # Dataset 236 | ############################################################ 237 | 238 | class Dataset(object): 239 | """The base class for dataset classes. 240 | To use it, create a new class that adds functions specific to the dataset 241 | you want to use. For example: 242 | 243 | class CatsAndDogsDataset(Dataset): 244 | def load_cats_and_dogs(self): 245 | ... 246 | def load_mask(self, image_id): 247 | ... 248 | def image_reference(self, image_id): 249 | ... 250 | 251 | See COCODataset and ShapesDataset as examples. 252 | """ 253 | 254 | def __init__(self, class_map=None): 255 | self._image_ids = [] 256 | self.image_info = [] 257 | # Background is always the first class 258 | self.class_info = [{"source": "", "id": 0, "name": "BG"}] 259 | self.source_class_ids = {} 260 | 261 | def add_class(self, source, class_id, class_name): 262 | assert "." not in source, "Source name cannot contain a dot" 263 | # Does the class exist already? 264 | for info in self.class_info: 265 | if info['source'] == source and info["id"] == class_id: 266 | # source.class_id combination already available, skip 267 | return 268 | # Add the class 269 | self.class_info.append({ 270 | "source": source, 271 | "id": class_id, 272 | "name": class_name, 273 | }) 274 | 275 | def add_image(self, source, image_id, path, **kwargs): 276 | image_info = { 277 | "id": image_id, 278 | "source": source, 279 | "path": path, 280 | } 281 | image_info.update(kwargs) 282 | self.image_info.append(image_info) 283 | 284 | def image_reference(self, image_id): 285 | """Return a link to the image in its source Website or details about 286 | the image that help looking it up or debugging it. 287 | 288 | Override for your dataset, but pass to this function 289 | if you encounter images not in your dataset. 290 | """ 291 | return "" 292 | 293 | def prepare(self, class_map=None): 294 | """Prepares the Dataset class for use. 295 | 296 | TODO: class map is not supported yet. When done, it should handle mapping 297 | classes from different datasets to the same class ID. 298 | """ 299 | 300 | def clean_name(name): 301 | """Returns a shorter version of object names for cleaner display.""" 302 | return ",".join(name.split(",")[:1]) 303 | 304 | # Build (or rebuild) everything else from the info dicts. 305 | self.num_classes = len(self.class_info) 306 | self.class_ids = np.arange(self.num_classes) 307 | self.class_names = [clean_name(c["name"]) for c in self.class_info] 308 | self.num_images = len(self.image_info) 309 | self._image_ids = np.arange(self.num_images) 310 | 311 | # Mapping from source class and image IDs to internal IDs 312 | self.class_from_source_map = {"{}.{}".format(info['source'], info['id']): id 313 | for info, id in zip(self.class_info, self.class_ids)} 314 | self.image_from_source_map = {"{}.{}".format(info['source'], info['id']): id 315 | for info, id in zip(self.image_info, self.image_ids)} 316 | 317 | # Map sources to class_ids they support 318 | self.sources = list(set([i['source'] for i in self.class_info])) 319 | self.source_class_ids = {} 320 | # Loop over datasets 321 | for source in self.sources: 322 | self.source_class_ids[source] = [] 323 | # Find classes that belong to this dataset 324 | for i, info in enumerate(self.class_info): 325 | # Include BG class in all datasets 326 | if i == 0 or source == info['source']: 327 | self.source_class_ids[source].append(i) 328 | 329 | def map_source_class_id(self, source_class_id): 330 | """Takes a source class ID and returns the int class ID assigned to it. 331 | 332 | For example: 333 | dataset.map_source_class_id("coco.12") -> 23 334 | """ 335 | return self.class_from_source_map[source_class_id] 336 | 337 | def get_source_class_id(self, class_id, source): 338 | """Map an internal class ID to the corresponding class ID in the source dataset.""" 339 | info = self.class_info[class_id] 340 | assert info['source'] == source 341 | return info['id'] 342 | 343 | @property 344 | def image_ids(self): 345 | return self._image_ids 346 | 347 | def source_image_link(self, image_id): 348 | """Returns the path or URL to the image. 349 | Override this to return a URL to the image if it's available online for easy 350 | debugging. 351 | """ 352 | return self.image_info[image_id]["path"] 353 | 354 | def load_image(self, image_id): 355 | """Load the specified image and return a [H,W,3] Numpy array. 356 | """ 357 | # Load image 358 | image = skimage.io.imread(self.image_info[image_id]['path']) 359 | # If grayscale. Convert to RGB for consistency. 360 | if image.ndim != 3: 361 | image = skimage.color.gray2rgb(image) 362 | # If has an alpha channel, remove it for consistency 363 | if image.shape[-1] == 4: 364 | image = image[..., :3] 365 | return image 366 | 367 | def load_mask(self, image_id): 368 | """Load instance masks for the given image. 369 | 370 | Different datasets use different ways to store masks. Override this 371 | method to load instance masks and return them in the form of am 372 | array of binary masks of shape [height, width, instances]. 373 | 374 | Returns: 375 | masks: A bool array of shape [height, width, instance count] with 376 | a binary mask per instance. 377 | class_ids: a 1D array of class IDs of the instance masks. 378 | """ 379 | # Override this function to load a mask from your dataset. 380 | # Otherwise, it returns an empty mask. 381 | mask = np.empty([0, 0, 0]) 382 | class_ids = np.empty([0], np.int32) 383 | return mask, class_ids 384 | 385 | 386 | def resize_image(image, min_dim=None, max_dim=None, min_scale=None, mode="square"): 387 | """Resizes an image keeping the aspect ratio unchanged. 388 | 389 | min_dim: if provided, resizes the image such that it's smaller 390 | dimension == min_dim 391 | max_dim: if provided, ensures that the image longest side doesn't 392 | exceed this value. 393 | min_scale: if provided, ensure that the image is scaled up by at least 394 | this percent even if min_dim doesn't require it. 395 | mode: Resizing mode. 396 | none: No resizing. Return the image unchanged. 397 | square: Resize and pad with zeros to get a square image 398 | of size [max_dim, max_dim]. 399 | pad64: Pads width and height with zeros to make them multiples of 64. 400 | If min_dim or min_scale are provided, it scales the image up 401 | before padding. max_dim is ignored in this mode. 402 | The multiple of 64 is needed to ensure smooth scaling of feature 403 | maps up and down the 6 levels of the FPN pyramid (2**6=64). 404 | crop: Picks random crops from the image. First, scales the image based 405 | on min_dim and min_scale, then picks a random crop of 406 | size min_dim x min_dim. Can be used in training only. 407 | max_dim is not used in this mode. 408 | 409 | Returns: 410 | image: the resized image 411 | window: (y1, x1, y2, x2). If max_dim is provided, padding might 412 | be inserted in the returned image. If so, this window is the 413 | coordinates of the image part of the full image (excluding 414 | the padding). The x2, y2 pixels are not included. 415 | scale: The scale factor used to resize the image 416 | padding: Padding added to the image [(top, bottom), (left, right), (0, 0)] 417 | """ 418 | # Keep track of image dtype and return results in the same dtype 419 | image_dtype = image.dtype 420 | # Default window (y1, x1, y2, x2) and default scale == 1. 421 | h, w = image.shape[:2] 422 | window = (0, 0, h, w) 423 | scale = 1 424 | padding = [(0, 0), (0, 0), (0, 0)] 425 | crop = None 426 | 427 | if mode == "none": 428 | return image, window, scale, padding, crop 429 | 430 | # Scale? 431 | if min_dim: 432 | # Scale up but not down 433 | scale = max(1, min_dim / min(h, w)) 434 | if min_scale and scale < min_scale: 435 | scale = min_scale 436 | 437 | # Does it exceed max dim? 438 | if max_dim and mode == "square": 439 | image_max = max(h, w) 440 | if round(image_max * scale) > max_dim: 441 | scale = max_dim / image_max 442 | 443 | # Resize image using bilinear interpolation 444 | if scale != 1: 445 | image = resize(image, (round(h * scale), round(w * scale)), 446 | preserve_range=True) 447 | 448 | # Need padding or cropping? 449 | if mode == "square": 450 | # Get new height and width 451 | h, w = image.shape[:2] 452 | top_pad = (max_dim - h) // 2 453 | bottom_pad = max_dim - h - top_pad 454 | left_pad = (max_dim - w) // 2 455 | right_pad = max_dim - w - left_pad 456 | padding = [(top_pad, bottom_pad), (left_pad, right_pad), (0, 0)] 457 | image = np.pad(image, padding, mode='constant', constant_values=0) 458 | window = (top_pad, left_pad, h + top_pad, w + left_pad) 459 | elif mode == "pad64": 460 | h, w = image.shape[:2] 461 | # Both sides must be divisible by 64 462 | assert min_dim % 64 == 0, "Minimum dimension must be a multiple of 64" 463 | # Height 464 | if h % 64 > 0: 465 | max_h = h - (h % 64) + 64 466 | top_pad = (max_h - h) // 2 467 | bottom_pad = max_h - h - top_pad 468 | else: 469 | top_pad = bottom_pad = 0 470 | # Width 471 | if w % 64 > 0: 472 | max_w = w - (w % 64) + 64 473 | left_pad = (max_w - w) // 2 474 | right_pad = max_w - w - left_pad 475 | else: 476 | left_pad = right_pad = 0 477 | padding = [(top_pad, bottom_pad), (left_pad, right_pad), (0, 0)] 478 | image = np.pad(image, padding, mode='constant', constant_values=0) 479 | window = (top_pad, left_pad, h + top_pad, w + left_pad) 480 | elif mode == "crop": 481 | # Pick a random crop 482 | h, w = image.shape[:2] 483 | y = random.randint(0, (h - min_dim)) 484 | x = random.randint(0, (w - min_dim)) 485 | crop = (y, x, min_dim, min_dim) 486 | image = image[y:y + min_dim, x:x + min_dim] 487 | window = (0, 0, min_dim, min_dim) 488 | else: 489 | raise Exception("Mode {} not supported".format(mode)) 490 | return image.astype(image_dtype), window, scale, padding, crop 491 | 492 | 493 | def resize_mask(mask, scale, padding, crop=None): 494 | """Resizes a mask using the given scale and padding. 495 | Typically, you get the scale and padding from resize_image() to 496 | ensure both, the image and the mask, are resized consistently. 497 | 498 | scale: mask scaling factor 499 | padding: Padding to add to the mask in the form 500 | [(top, bottom), (left, right), (0, 0)] 501 | """ 502 | # Suppress warning from scipy 0.13.0, the output shape of zoom() is 503 | # calculated with round() instead of int() 504 | with warnings.catch_warnings(): 505 | warnings.simplefilter("ignore") 506 | mask = scipy.ndimage.zoom(mask, zoom=[scale, scale, 1], order=0) 507 | if crop is not None: 508 | y, x, h, w = crop 509 | mask = mask[y:y + h, x:x + w] 510 | else: 511 | mask = np.pad(mask, padding, mode='constant', constant_values=0) 512 | return mask 513 | 514 | 515 | def minimize_mask(bbox, mask, mini_shape): 516 | """Resize masks to a smaller version to reduce memory load. 517 | Mini-masks can be resized back to image scale using expand_masks() 518 | 519 | See inspect_data.ipynb notebook for more details. 520 | """ 521 | mini_mask = np.zeros(mini_shape + (mask.shape[-1],), dtype=bool) 522 | for i in range(mask.shape[-1]): 523 | # Pick slice and cast to bool in case load_mask() returned wrong dtype 524 | m = mask[:, :, i].astype(bool) 525 | y1, x1, y2, x2 = bbox[i][:4] 526 | m = m[y1:y2, x1:x2] 527 | if m.size == 0: 528 | raise Exception("Invalid bounding box with area of zero") 529 | # Resize with bilinear interpolation 530 | m = resize(m, mini_shape) 531 | mini_mask[:, :, i] = np.around(m).astype(np.bool) 532 | return mini_mask 533 | 534 | 535 | def expand_mask(bbox, mini_mask, image_shape): 536 | """Resizes mini masks back to image size. Reverses the change 537 | of minimize_mask(). 538 | 539 | See inspect_data.ipynb notebook for more details. 540 | """ 541 | mask = np.zeros(image_shape[:2] + (mini_mask.shape[-1],), dtype=bool) 542 | for i in range(mask.shape[-1]): 543 | m = mini_mask[:, :, i] 544 | y1, x1, y2, x2 = bbox[i][:4] 545 | h = y2 - y1 546 | w = x2 - x1 547 | # Resize with bilinear interpolation 548 | m = resize(m, (h, w)) 549 | mask[y1:y2, x1:x2, i] = np.around(m).astype(np.bool) 550 | return mask 551 | 552 | 553 | # TODO: Build and use this function to reduce code duplication 554 | def mold_mask(mask, config): 555 | pass 556 | 557 | 558 | def unmold_mask(mask, bbox, image_shape): 559 | """Converts a mask generated by the neural network to a format similar 560 | to its original shape. 561 | mask: [height, width] of type float. A small, typically 28x28 mask. 562 | bbox: [y1, x1, y2, x2]. The box to fit the mask in. 563 | 564 | Returns a binary mask with the same size as the original image. 565 | """ 566 | threshold = 0.5 567 | y1, x1, y2, x2 = bbox 568 | mask = resize(mask, (y2 - y1, x2 - x1)) 569 | mask = np.where(mask >= threshold, 1, 0).astype(np.bool) 570 | 571 | # Put the mask in the right location. 572 | full_mask = np.zeros(image_shape[:2], dtype=np.bool) 573 | full_mask[y1:y2, x1:x2] = mask 574 | return full_mask 575 | 576 | 577 | ############################################################ 578 | # Anchors 579 | ############################################################ 580 | 581 | def generate_anchors(scales, ratios, shape, feature_stride, anchor_stride): 582 | """ 583 | scales: 1D array of anchor sizes in pixels. Example: [32, 64, 128] 584 | ratios: 1D array of anchor ratios of width/height. Example: [0.5, 1, 2] 585 | shape: [height, width] spatial shape of the feature map over which 586 | to generate anchors. 587 | feature_stride: Stride of the feature map relative to the image in pixels. 588 | anchor_stride: Stride of anchors on the feature map. For example, if the 589 | value is 2 then generate anchors for every other feature map pixel. 590 | """ 591 | # Get all combinations of scales and ratios 592 | scales, ratios = np.meshgrid(np.array(scales), np.array(ratios)) 593 | scales = scales.flatten() 594 | ratios = ratios.flatten() 595 | 596 | # Enumerate heights and widths from scales and ratios 597 | heights = scales / np.sqrt(ratios) 598 | widths = scales * np.sqrt(ratios) 599 | 600 | # Enumerate shifts in feature space 601 | shifts_y = np.arange(0, shape[0], anchor_stride) * feature_stride 602 | shifts_x = np.arange(0, shape[1], anchor_stride) * feature_stride 603 | shifts_x, shifts_y = np.meshgrid(shifts_x, shifts_y) 604 | 605 | # Enumerate combinations of shifts, widths, and heights 606 | box_widths, box_centers_x = np.meshgrid(widths, shifts_x) 607 | box_heights, box_centers_y = np.meshgrid(heights, shifts_y) 608 | 609 | # Reshape to get a list of (y, x) and a list of (h, w) 610 | box_centers = np.stack( 611 | [box_centers_y, box_centers_x], axis=2).reshape([-1, 2]) 612 | box_sizes = np.stack([box_heights, box_widths], axis=2).reshape([-1, 2]) 613 | 614 | # Convert to corner coordinates (y1, x1, y2, x2) 615 | boxes = np.concatenate([box_centers - 0.5 * box_sizes, 616 | box_centers + 0.5 * box_sizes], axis=1) 617 | return boxes 618 | 619 | 620 | def generate_pyramid_anchors(scales, ratios, feature_shapes, feature_strides, 621 | anchor_stride): 622 | """Generate anchors at different levels of a feature pyramid. Each scale 623 | is associated with a level of the pyramid, but each ratio is used in 624 | all levels of the pyramid. 625 | 626 | Returns: 627 | anchors: [N, (y1, x1, y2, x2)]. All generated anchors in one array. Sorted 628 | with the same order of the given scales. So, anchors of scale[0] come 629 | first, then anchors of scale[1], and so on. 630 | """ 631 | # Anchors 632 | # [anchor_count, (y1, x1, y2, x2)] 633 | anchors = [] 634 | for i in range(len(scales)): 635 | anchors.append(generate_anchors(scales[i], ratios, feature_shapes[i], 636 | feature_strides[i], anchor_stride)) 637 | return np.concatenate(anchors, axis=0) 638 | 639 | 640 | ############################################################ 641 | # Miscellaneous 642 | ############################################################ 643 | 644 | def trim_zeros(x): 645 | """It's common to have tensors larger than the available data and 646 | pad with zeros. This function removes rows that are all zeros. 647 | 648 | x: [rows, columns]. 649 | """ 650 | assert len(x.shape) == 2 651 | return x[~np.all(x == 0, axis=1)] 652 | 653 | 654 | def compute_matches(gt_boxes, gt_class_ids, gt_masks, 655 | pred_boxes, pred_class_ids, pred_scores, pred_masks, 656 | iou_threshold=0.5, score_threshold=0.0): 657 | """Finds matches between prediction and ground truth instances. 658 | 659 | Returns: 660 | gt_match: 1-D array. For each GT box it has the index of the matched 661 | predicted box. 662 | pred_match: 1-D array. For each predicted box, it has the index of 663 | the matched ground truth box. 664 | overlaps: [pred_boxes, gt_boxes] IoU overlaps. 665 | """ 666 | # Trim zero padding 667 | # TODO: cleaner to do zero unpadding upstream 668 | gt_boxes = trim_zeros(gt_boxes) 669 | gt_masks = gt_masks[..., :gt_boxes.shape[0]] 670 | pred_boxes = trim_zeros(pred_boxes) 671 | pred_scores = pred_scores[:pred_boxes.shape[0]] 672 | # Sort predictions by score from high to low 673 | indices = np.argsort(pred_scores)[::-1] 674 | pred_boxes = pred_boxes[indices] 675 | pred_class_ids = pred_class_ids[indices] 676 | pred_scores = pred_scores[indices] 677 | pred_masks = pred_masks[..., indices] 678 | 679 | # Compute IoU overlaps [pred_masks, gt_masks] 680 | overlaps = compute_overlaps_masks(pred_masks, gt_masks) 681 | 682 | # Loop through predictions and find matching ground truth boxes 683 | match_count = 0 684 | pred_match = -1 * np.ones([pred_boxes.shape[0]]) 685 | gt_match = -1 * np.ones([gt_boxes.shape[0]]) 686 | for i in range(len(pred_boxes)): 687 | # Find best matching ground truth box 688 | # 1. Sort matches by score 689 | sorted_ixs = np.argsort(overlaps[i])[::-1] 690 | # 2. Remove low scores 691 | low_score_idx = np.where(overlaps[i, sorted_ixs] < score_threshold)[0] 692 | if low_score_idx.size > 0: 693 | sorted_ixs = sorted_ixs[:low_score_idx[0]] 694 | # 3. Find the match 695 | for j in sorted_ixs: 696 | # If ground truth box is already matched, go to next one 697 | if gt_match[j] > 0: 698 | continue 699 | # If we reach IoU smaller than the threshold, end the loop 700 | iou = overlaps[i, j] 701 | if iou < iou_threshold: 702 | break 703 | # Do we have a match? 704 | if pred_class_ids[i] == gt_class_ids[j]: 705 | match_count += 1 706 | gt_match[j] = i 707 | pred_match[i] = j 708 | break 709 | 710 | return gt_match, pred_match, overlaps 711 | 712 | 713 | def compute_ap(gt_boxes, gt_class_ids, gt_masks, 714 | pred_boxes, pred_class_ids, pred_scores, pred_masks, 715 | iou_threshold=0.5): 716 | """Compute Average Precision at a set IoU threshold (default 0.5). 717 | 718 | Returns: 719 | mAP: Mean Average Precision 720 | precisions: List of precisions at different class score thresholds. 721 | recalls: List of recall values at different class score thresholds. 722 | overlaps: [pred_boxes, gt_boxes] IoU overlaps. 723 | """ 724 | # Get matches and overlaps 725 | gt_match, pred_match, overlaps = compute_matches( 726 | gt_boxes, gt_class_ids, gt_masks, 727 | pred_boxes, pred_class_ids, pred_scores, pred_masks, 728 | iou_threshold) 729 | 730 | # Compute precision and recall at each prediction box step 731 | precisions = np.cumsum(pred_match > -1) / (np.arange(len(pred_match)) + 1) 732 | recalls = np.cumsum(pred_match > -1).astype(np.float32) / len(gt_match) 733 | 734 | # Pad with start and end values to simplify the math 735 | precisions = np.concatenate([[0], precisions, [0]]) 736 | recalls = np.concatenate([[0], recalls, [1]]) 737 | 738 | # Ensure precision values decrease but don't increase. This way, the 739 | # precision value at each recall threshold is the maximum it can be 740 | # for all following recall thresholds, as specified by the VOC paper. 741 | for i in range(len(precisions) - 2, -1, -1): 742 | precisions[i] = np.maximum(precisions[i], precisions[i + 1]) 743 | 744 | # Compute mean AP over recall range 745 | indices = np.where(recalls[:-1] != recalls[1:])[0] + 1 746 | mAP = np.sum((recalls[indices] - recalls[indices - 1]) * 747 | precisions[indices]) 748 | 749 | return mAP, precisions, recalls, overlaps 750 | 751 | 752 | def compute_ap_range(gt_box, gt_class_id, gt_mask, 753 | pred_box, pred_class_id, pred_score, pred_mask, 754 | iou_thresholds=None, verbose=1): 755 | """Compute AP over a range or IoU thresholds. Default range is 0.5-0.95.""" 756 | # Default is 0.5 to 0.95 with increments of 0.05 757 | if iou_thresholds is None: 758 | iou_thresholds = np.arange(0.5, 1.0, 0.05) 759 | 760 | #iou_thresholds = iou_thresholds or np.arange(0.5, 1.0, 0.05) 761 | 762 | # Compute AP over range of IoU thresholds 763 | AP = [] 764 | for iou_threshold in iou_thresholds: 765 | ap, precisions, recalls, overlaps =\ 766 | compute_ap(gt_box, gt_class_id, gt_mask, 767 | pred_box, pred_class_id, pred_score, pred_mask, 768 | iou_threshold=iou_threshold) 769 | if verbose: 770 | print("AP @{:.2f}:\t {:.3f}".format(iou_threshold, ap)) 771 | AP.append(ap) 772 | AP = np.array(AP).mean() 773 | if verbose: 774 | print("AP @{:.2f}-{:.2f}:\t {:.3f}".format( 775 | iou_thresholds[0], iou_thresholds[-1], AP)) 776 | return AP 777 | 778 | 779 | def compute_recall(pred_boxes, gt_boxes, iou): 780 | """Compute the recall at the given IoU threshold. It's an indication 781 | of how many GT boxes were found by the given prediction boxes. 782 | 783 | pred_boxes: [N, (y1, x1, y2, x2)] in image coordinates 784 | gt_boxes: [N, (y1, x1, y2, x2)] in image coordinates 785 | """ 786 | # Measure overlaps 787 | overlaps = compute_overlaps(pred_boxes, gt_boxes) 788 | iou_max = np.max(overlaps, axis=1) 789 | iou_argmax = np.argmax(overlaps, axis=1) 790 | positive_ids = np.where(iou_max >= iou)[0] 791 | matched_gt_boxes = iou_argmax[positive_ids] 792 | 793 | recall = len(set(matched_gt_boxes)) / gt_boxes.shape[0] 794 | return recall, positive_ids 795 | 796 | 797 | # ## Batch Slicing 798 | # Some custom layers support a batch size of 1 only, and require a lot of work 799 | # to support batches greater than 1. This function slices an input tensor 800 | # across the batch dimension and feeds batches of size 1. Effectively, 801 | # an easy way to support batches > 1 quickly with little code modification. 802 | # In the long run, it's more efficient to modify the code to support large 803 | # batches and getting rid of this function. Consider this a temporary solution 804 | def batch_slice(inputs, graph_fn, batch_size, names=None): 805 | """Splits inputs into slices and feeds each slice to a copy of the given 806 | computation graph and then combines the results. It allows you to run a 807 | graph on a batch of inputs even if the graph is written to support one 808 | instance only. 809 | 810 | inputs: list of tensors. All must have the same first dimension length 811 | graph_fn: A function that returns a TF tensor that's part of a graph. 812 | batch_size: number of slices to divide the data into. 813 | names: If provided, assigns names to the resulting tensors. 814 | """ 815 | if not isinstance(inputs, list): 816 | inputs = [inputs] 817 | 818 | outputs = [] 819 | for i in range(batch_size): 820 | inputs_slice = [x[i] for x in inputs] 821 | output_slice = graph_fn(*inputs_slice) 822 | if not isinstance(output_slice, (tuple, list)): 823 | output_slice = [output_slice] 824 | outputs.append(output_slice) 825 | # Change outputs from a list of slices where each is 826 | # a list of outputs to a list of outputs and each has 827 | # a list of slices 828 | outputs = list(zip(*outputs)) 829 | 830 | if names is None: 831 | names = [None] * len(outputs) 832 | 833 | result = [tf.stack(o, axis=0, name=n) 834 | for o, n in zip(outputs, names)] 835 | if len(result) == 1: 836 | result = result[0] 837 | 838 | return result 839 | 840 | 841 | def download_trained_weights(coco_model_path, verbose=1): 842 | """Download COCO trained weights from Releases. 843 | 844 | coco_model_path: local path of COCO trained weights 845 | """ 846 | if verbose > 0: 847 | print("Downloading pretrained model to " + coco_model_path + " ...") 848 | with urllib.request.urlopen(COCO_MODEL_URL) as resp, open(coco_model_path, 'wb') as out: 849 | shutil.copyfileobj(resp, out) 850 | if verbose > 0: 851 | print("... done downloading pretrained model!") 852 | 853 | 854 | def norm_boxes(boxes, shape): 855 | """Converts boxes from pixel coordinates to normalized coordinates. 856 | boxes: [N, (y1, x1, y2, x2)] in pixel coordinates 857 | shape: [..., (height, width)] in pixels 858 | 859 | Note: In pixel coordinates (y2, x2) is outside the box. But in normalized 860 | coordinates it's inside the box. 861 | 862 | Returns: 863 | [N, (y1, x1, y2, x2)] in normalized coordinates 864 | """ 865 | h, w = shape 866 | scale = np.array([h - 1, w - 1, h - 1, w - 1]) 867 | shift = np.array([0, 0, 1, 1]) 868 | return np.divide((boxes - shift), scale).astype(np.float32) 869 | 870 | 871 | def denorm_boxes(boxes, shape): 872 | """Converts boxes from normalized coordinates to pixel coordinates. 873 | boxes: [N, (y1, x1, y2, x2)] in normalized coordinates 874 | shape: [..., (height, width)] in pixels 875 | 876 | Note: In pixel coordinates (y2, x2) is outside the box. But in normalized 877 | coordinates it's inside the box. 878 | 879 | Returns: 880 | [N, (y1, x1, y2, x2)] in pixel coordinates 881 | """ 882 | h, w = shape 883 | scale = np.array([h - 1, w - 1, h - 1, w - 1]) 884 | shift = np.array([0, 0, 1, 1]) 885 | return np.around(np.multiply(boxes, scale) + shift).astype(np.int32) 886 | 887 | 888 | def resize(image, output_shape, order=1, mode='constant', cval=0, clip=True, 889 | preserve_range=False, anti_aliasing=False, anti_aliasing_sigma=None): 890 | """A wrapper for Scikit-Image resize(). 891 | 892 | Scikit-Image generates warnings on every call to resize() if it doesn't 893 | receive the right parameters. The right parameters depend on the version 894 | of skimage. This solves the problem by using different parameters per 895 | version. And it provides a central place to control resizing defaults. 896 | """ 897 | if LooseVersion(skimage.__version__) >= LooseVersion("0.14"): 898 | # New in 0.14: anti_aliasing. Default it to False for backward 899 | # compatibility with skimage 0.13. 900 | return skimage.transform.resize( 901 | image, output_shape, 902 | order=order, mode=mode, cval=cval, clip=clip, 903 | preserve_range=preserve_range, anti_aliasing=anti_aliasing, 904 | anti_aliasing_sigma=anti_aliasing_sigma) 905 | else: 906 | return skimage.transform.resize( 907 | image, output_shape, 908 | order=order, mode=mode, cval=cval, clip=clip, 909 | preserve_range=preserve_range) 910 | -------------------------------------------------------------------------------- /mrcnn/visualize.py: -------------------------------------------------------------------------------- 1 | """ 2 | Mask R-CNN 3 | Display and Visualization Functions. 4 | 5 | Copyright (c) 2017 Matterport, Inc. 6 | Licensed under the MIT License (see LICENSE for details) 7 | Written by Waleed Abdulla 8 | """ 9 | 10 | import os 11 | import sys 12 | import random 13 | import itertools 14 | import colorsys 15 | 16 | import numpy as np 17 | from skimage.measure import find_contours 18 | import matplotlib.pyplot as plt 19 | from matplotlib import patches, lines 20 | from matplotlib.patches import Polygon 21 | import IPython.display 22 | 23 | # Root directory of the project 24 | ROOT_DIR = os.path.abspath("../") 25 | 26 | # Import Mask RCNN 27 | sys.path.append(ROOT_DIR) # To find local version of the library 28 | from mrcnn import utils 29 | 30 | 31 | ############################################################ 32 | # Visualization 33 | ############################################################ 34 | 35 | def display_images(images, titles=None, cols=4, cmap=None, norm=None, 36 | interpolation=None): 37 | """Display the given set of images, optionally with titles. 38 | images: list or array of image tensors in HWC format. 39 | titles: optional. A list of titles to display with each image. 40 | cols: number of images per row 41 | cmap: Optional. Color map to use. For example, "Blues". 42 | norm: Optional. A Normalize instance to map values to colors. 43 | interpolation: Optional. Image interpolation to use for display. 44 | """ 45 | titles = titles if titles is not None else [""] * len(images) 46 | rows = len(images) // cols + 1 47 | plt.figure(figsize=(14, 14 * rows // cols)) 48 | i = 1 49 | for image, title in zip(images, titles): 50 | plt.subplot(rows, cols, i) 51 | plt.title(title, fontsize=9) 52 | plt.axis('off') 53 | plt.imshow(image.astype(np.uint8), cmap=cmap, 54 | norm=norm, interpolation=interpolation) 55 | i += 1 56 | plt.show() 57 | 58 | 59 | def random_colors(N, bright=True): 60 | """ 61 | Generate random colors. 62 | To get visually distinct colors, generate them in HSV space then 63 | convert to RGB. 64 | """ 65 | brightness = 1.0 if bright else 0.7 66 | hsv = [(i / N, 1, brightness) for i in range(N)] 67 | colors = list(map(lambda c: colorsys.hsv_to_rgb(*c), hsv)) 68 | random.shuffle(colors) 69 | return colors 70 | 71 | 72 | def apply_mask(image, mask, color, alpha=0.5): 73 | """Apply the given mask to the image. 74 | """ 75 | for c in range(3): 76 | image[:, :, c] = np.where(mask == 1, 77 | image[:, :, c] * 78 | (1 - alpha) + alpha * color[c] * 255, 79 | image[:, :, c]) 80 | return image 81 | 82 | def save_instances(image, boxes, masks, class_ids, class_names,target_filename=None, 83 | scores=None, title="", 84 | figsize=(16, 16), ax=None, 85 | show_mask=True, show_bbox=True, 86 | colors=None, captions=None,return_instance=None): 87 | """ 88 | boxes: [num_instance, (y1, x1, y2, x2, class_id)] in image coordinates. 89 | masks: [height, width, num_instances] 90 | class_ids: [num_instances] 91 | class_names: list of class names of the dataset 92 | target_folder: Saves the instances in target folder. 93 | scores: (optional) confidence scores for each box 94 | title: (optional) Figure title 95 | show_mask, show_bbox: To show masks and bounding boxes or not 96 | figsize: (optional) the size of the image 97 | colors: (optional) An array or colors to use with each object 98 | captions: (optional) A list of strings to use as captions for each object 99 | """ 100 | # Number of instances 101 | N = boxes.shape[0] 102 | if not N: 103 | print("\n*** No instances to display *** \n") 104 | else: 105 | assert boxes.shape[0] == masks.shape[-1] == class_ids.shape[0] 106 | 107 | # If no axis is passed, create one and automatically call show() 108 | auto_show = False 109 | if not ax: 110 | _, ax = plt.subplots(1, figsize=figsize) 111 | auto_show = True 112 | 113 | # Generate random colors 114 | colors = colors or random_colors(N) 115 | 116 | # Show area outside image boundaries. 117 | height, width = image.shape[:2] 118 | ax.set_ylim(height + 10, -10) 119 | ax.set_xlim(-10, width + 10) 120 | ax.axis('off') 121 | ax.set_title(title) 122 | 123 | masked_image = image.astype(np.uint32).copy() 124 | for i in range(N): 125 | color = colors[i] 126 | 127 | # Bounding box 128 | if not np.any(boxes[i]): 129 | # Skip this instance. Has no bbox. Likely lost in image cropping. 130 | continue 131 | y1, x1, y2, x2 = boxes[i] 132 | if show_bbox: 133 | p = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2, 134 | alpha=0.7, linestyle="dashed", 135 | edgecolor=color, facecolor='none') 136 | ax.add_patch(p) 137 | 138 | # Label 139 | if not captions: 140 | class_id = class_ids[i] 141 | score = scores[i] if scores is not None else None 142 | label = class_names[class_id] 143 | x = random.randint(x1, (x1 + x2) // 2) 144 | caption = "{} {:.3f}".format(label, score) if score else label 145 | else: 146 | caption = captions[i] 147 | ax.text(x1, y1 + 8, caption, 148 | color='w', size=11, backgroundcolor="none") 149 | 150 | # Mask 151 | mask = masks[:, :, i] 152 | if show_mask: 153 | masked_image = apply_mask(masked_image, mask, color) 154 | 155 | # Mask Polygon 156 | # Pad to ensure proper polygons for masks that touch image edges. 157 | padded_mask = np.zeros( 158 | (mask.shape[0] + 2, mask.shape[1] + 2), dtype=np.uint8) 159 | padded_mask[1:-1, 1:-1] = mask 160 | contours = find_contours(padded_mask, 0.5) 161 | for verts in contours: 162 | # Subtract the padding and flip (y, x) to (x, y) 163 | verts = np.fliplr(verts) - 1 164 | p = Polygon(verts, facecolor="none", edgecolor=color) 165 | ax.add_patch(p) 166 | 167 | if target_filename is None: 168 | target_filename = 'temp_'+str(label)+'_'+str(score)+'.jpg' 169 | 170 | if return_instance is None: 171 | ax.imshow(masked_image.astype(np.uint8)) 172 | masked_image = masked_image.astype(np.uint8) 173 | plt.savefig(target_filename) 174 | else: 175 | #return ax.imshow(masked_image.astype(np.uint8)),masked_image 176 | return ax,masked_image 177 | 178 | 179 | #if auto_show: 180 | # plt.show() 181 | 182 | 183 | 184 | def display_instances(image, boxes, masks, class_ids, class_names, 185 | scores=None, title="", 186 | figsize=(16, 16), ax=None, 187 | show_mask=True, show_bbox=True, 188 | colors=None, captions=None): 189 | """ 190 | boxes: [num_instance, (y1, x1, y2, x2, class_id)] in image coordinates. 191 | masks: [height, width, num_instances] 192 | class_ids: [num_instances] 193 | class_names: list of class names of the dataset 194 | scores: (optional) confidence scores for each box 195 | title: (optional) Figure title 196 | show_mask, show_bbox: To show masks and bounding boxes or not 197 | figsize: (optional) the size of the image 198 | colors: (optional) An array or colors to use with each object 199 | captions: (optional) A list of strings to use as captions for each object 200 | """ 201 | # Number of instances 202 | N = boxes.shape[0] 203 | if not N: 204 | print("\n*** No instances to display *** \n") 205 | else: 206 | assert boxes.shape[0] == masks.shape[-1] == class_ids.shape[0] 207 | 208 | # If no axis is passed, create one and automatically call show() 209 | auto_show = False 210 | if not ax: 211 | _, ax = plt.subplots(1, figsize=figsize) 212 | auto_show = True 213 | 214 | # Generate random colors 215 | colors = colors or random_colors(N) 216 | 217 | # Show area outside image boundaries. 218 | height, width = image.shape[:2] 219 | ax.set_ylim(height + 10, -10) 220 | ax.set_xlim(-10, width + 10) 221 | ax.axis('off') 222 | ax.set_title(title) 223 | 224 | masked_image = image.astype(np.uint32).copy() 225 | for i in range(N): 226 | color = colors[i] 227 | 228 | # Bounding box 229 | if not np.any(boxes[i]): 230 | # Skip this instance. Has no bbox. Likely lost in image cropping. 231 | continue 232 | y1, x1, y2, x2 = boxes[i] 233 | if show_bbox: 234 | p = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2, 235 | alpha=0.7, linestyle="dashed", 236 | edgecolor=color, facecolor='none') 237 | ax.add_patch(p) 238 | 239 | # Label 240 | if not captions: 241 | class_id = class_ids[i] 242 | score = scores[i] if scores is not None else None 243 | label = class_names[class_id] 244 | x = random.randint(x1, (x1 + x2) // 2) 245 | caption = "{} {:.3f}".format(label, score) if score else label 246 | else: 247 | caption = captions[i] 248 | ax.text(x1, y1 + 8, caption, 249 | color='w', size=11, backgroundcolor="none") 250 | 251 | # Mask 252 | mask = masks[:, :, i] 253 | if show_mask: 254 | masked_image = apply_mask(masked_image, mask, color) 255 | 256 | # Mask Polygon 257 | # Pad to ensure proper polygons for masks that touch image edges. 258 | padded_mask = np.zeros( 259 | (mask.shape[0] + 2, mask.shape[1] + 2), dtype=np.uint8) 260 | padded_mask[1:-1, 1:-1] = mask 261 | contours = find_contours(padded_mask, 0.5) 262 | for verts in contours: 263 | # Subtract the padding and flip (y, x) to (x, y) 264 | verts = np.fliplr(verts) - 1 265 | p = Polygon(verts, facecolor="none", edgecolor=color) 266 | ax.add_patch(p) 267 | ax.imshow(masked_image.astype(np.uint8)) 268 | if auto_show: 269 | plt.show() 270 | 271 | 272 | def display_differences(image, 273 | gt_box, gt_class_id, gt_mask, 274 | pred_box, pred_class_id, pred_score, pred_mask, 275 | class_names, title="", ax=None, 276 | show_mask=True, show_box=True, 277 | iou_threshold=0.5, score_threshold=0.5): 278 | """Display ground truth and prediction instances on the same image.""" 279 | # Match predictions to ground truth 280 | gt_match, pred_match, overlaps = utils.compute_matches( 281 | gt_box, gt_class_id, gt_mask, 282 | pred_box, pred_class_id, pred_score, pred_mask, 283 | iou_threshold=iou_threshold, score_threshold=score_threshold) 284 | # Ground truth = green. Predictions = red 285 | colors = [(0, 1, 0, .8)] * len(gt_match)\ 286 | + [(1, 0, 0, 1)] * len(pred_match) 287 | # Concatenate GT and predictions 288 | class_ids = np.concatenate([gt_class_id, pred_class_id]) 289 | scores = np.concatenate([np.zeros([len(gt_match)]), pred_score]) 290 | boxes = np.concatenate([gt_box, pred_box]) 291 | masks = np.concatenate([gt_mask, pred_mask], axis=-1) 292 | # Captions per instance show score/IoU 293 | captions = ["" for m in gt_match] + ["{:.2f} / {:.2f}".format( 294 | pred_score[i], 295 | (overlaps[i, int(pred_match[i])] 296 | if pred_match[i] > -1 else overlaps[i].max())) 297 | for i in range(len(pred_match))] 298 | # Set title if not provided 299 | title = title or "Ground Truth and Detections\n GT=green, pred=red, captions: score/IoU" 300 | # Display 301 | display_instances( 302 | image, 303 | boxes, masks, class_ids, 304 | class_names, scores, ax=ax, 305 | show_bbox=show_box, show_mask=show_mask, 306 | colors=colors, captions=captions, 307 | title=title) 308 | 309 | 310 | def draw_rois(image, rois, refined_rois, mask, class_ids, class_names, limit=10): 311 | """ 312 | anchors: [n, (y1, x1, y2, x2)] list of anchors in image coordinates. 313 | proposals: [n, 4] the same anchors but refined to fit objects better. 314 | """ 315 | masked_image = image.copy() 316 | 317 | # Pick random anchors in case there are too many. 318 | ids = np.arange(rois.shape[0], dtype=np.int32) 319 | ids = np.random.choice( 320 | ids, limit, replace=False) if ids.shape[0] > limit else ids 321 | 322 | fig, ax = plt.subplots(1, figsize=(12, 12)) 323 | if rois.shape[0] > limit: 324 | plt.title("Showing {} random ROIs out of {}".format( 325 | len(ids), rois.shape[0])) 326 | else: 327 | plt.title("{} ROIs".format(len(ids))) 328 | 329 | # Show area outside image boundaries. 330 | ax.set_ylim(image.shape[0] + 20, -20) 331 | ax.set_xlim(-50, image.shape[1] + 20) 332 | ax.axis('off') 333 | 334 | for i, id in enumerate(ids): 335 | color = np.random.rand(3) 336 | class_id = class_ids[id] 337 | # ROI 338 | y1, x1, y2, x2 = rois[id] 339 | p = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2, 340 | edgecolor=color if class_id else "gray", 341 | facecolor='none', linestyle="dashed") 342 | ax.add_patch(p) 343 | # Refined ROI 344 | if class_id: 345 | ry1, rx1, ry2, rx2 = refined_rois[id] 346 | p = patches.Rectangle((rx1, ry1), rx2 - rx1, ry2 - ry1, linewidth=2, 347 | edgecolor=color, facecolor='none') 348 | ax.add_patch(p) 349 | # Connect the top-left corners of the anchor and proposal for easy visualization 350 | ax.add_line(lines.Line2D([x1, rx1], [y1, ry1], color=color)) 351 | 352 | # Label 353 | label = class_names[class_id] 354 | ax.text(rx1, ry1 + 8, "{}".format(label), 355 | color='w', size=11, backgroundcolor="none") 356 | 357 | # Mask 358 | m = utils.unmold_mask(mask[id], rois[id] 359 | [:4].astype(np.int32), image.shape) 360 | masked_image = apply_mask(masked_image, m, color) 361 | 362 | ax.imshow(masked_image) 363 | 364 | # Print stats 365 | print("Positive ROIs: ", class_ids[class_ids > 0].shape[0]) 366 | print("Negative ROIs: ", class_ids[class_ids == 0].shape[0]) 367 | print("Positive Ratio: {:.2f}".format( 368 | class_ids[class_ids > 0].shape[0] / class_ids.shape[0])) 369 | 370 | 371 | # TODO: Replace with matplotlib equivalent? 372 | def draw_box(image, box, color): 373 | """Draw 3-pixel width bounding boxes on the given image array. 374 | color: list of 3 int values for RGB. 375 | """ 376 | y1, x1, y2, x2 = box 377 | image[y1:y1 + 2, x1:x2] = color 378 | image[y2:y2 + 2, x1:x2] = color 379 | image[y1:y2, x1:x1 + 2] = color 380 | image[y1:y2, x2:x2 + 2] = color 381 | return image 382 | 383 | 384 | def display_top_masks(image, mask, class_ids, class_names, limit=4): 385 | """Display the given image and the top few class masks.""" 386 | to_display = [] 387 | titles = [] 388 | to_display.append(image) 389 | titles.append("H x W={}x{}".format(image.shape[0], image.shape[1])) 390 | # Pick top prominent classes in this image 391 | unique_class_ids = np.unique(class_ids) 392 | mask_area = [np.sum(mask[:, :, np.where(class_ids == i)[0]]) 393 | for i in unique_class_ids] 394 | top_ids = [v[0] for v in sorted(zip(unique_class_ids, mask_area), 395 | key=lambda r: r[1], reverse=True) if v[1] > 0] 396 | # Generate images and titles 397 | for i in range(limit): 398 | class_id = top_ids[i] if i < len(top_ids) else -1 399 | # Pull masks of instances belonging to the same class. 400 | m = mask[:, :, np.where(class_ids == class_id)[0]] 401 | m = np.sum(m * np.arange(1, m.shape[-1] + 1), -1) 402 | to_display.append(m) 403 | titles.append(class_names[class_id] if class_id != -1 else "-") 404 | display_images(to_display, titles=titles, cols=limit + 1, cmap="Blues_r") 405 | 406 | 407 | def plot_precision_recall(AP, precisions, recalls): 408 | """Draw the precision-recall curve. 409 | 410 | AP: Average precision at IoU >= 0.5 411 | precisions: list of precision values 412 | recalls: list of recall values 413 | """ 414 | # Plot the Precision-Recall curve 415 | _, ax = plt.subplots(1) 416 | ax.set_title("Precision-Recall Curve. AP@50 = {:.3f}".format(AP)) 417 | ax.set_ylim(0, 1.1) 418 | ax.set_xlim(0, 1.1) 419 | _ = ax.plot(recalls, precisions) 420 | 421 | 422 | def plot_overlaps(gt_class_ids, pred_class_ids, pred_scores, 423 | overlaps, class_names, threshold=0.5): 424 | """Draw a grid showing how ground truth objects are classified. 425 | gt_class_ids: [N] int. Ground truth class IDs 426 | pred_class_id: [N] int. Predicted class IDs 427 | pred_scores: [N] float. The probability scores of predicted classes 428 | overlaps: [pred_boxes, gt_boxes] IoU overlaps of predictions and GT boxes. 429 | class_names: list of all class names in the dataset 430 | threshold: Float. The prediction probability required to predict a class 431 | """ 432 | gt_class_ids = gt_class_ids[gt_class_ids != 0] 433 | pred_class_ids = pred_class_ids[pred_class_ids != 0] 434 | 435 | plt.figure(figsize=(12, 10)) 436 | plt.imshow(overlaps, interpolation='nearest', cmap=plt.cm.Blues) 437 | plt.yticks(np.arange(len(pred_class_ids)), 438 | ["{} ({:.2f})".format(class_names[int(id)], pred_scores[i]) 439 | for i, id in enumerate(pred_class_ids)]) 440 | plt.xticks(np.arange(len(gt_class_ids)), 441 | [class_names[int(id)] for id in gt_class_ids], rotation=90) 442 | 443 | thresh = overlaps.max() / 2. 444 | for i, j in itertools.product(range(overlaps.shape[0]), 445 | range(overlaps.shape[1])): 446 | text = "" 447 | if overlaps[i, j] > threshold: 448 | text = "match" if gt_class_ids[j] == pred_class_ids[i] else "wrong" 449 | color = ("white" if overlaps[i, j] > thresh 450 | else "black" if overlaps[i, j] > 0 451 | else "grey") 452 | plt.text(j, i, "{:.3f}\n{}".format(overlaps[i, j], text), 453 | horizontalalignment="center", verticalalignment="center", 454 | fontsize=9, color=color) 455 | 456 | plt.tight_layout() 457 | plt.xlabel("Ground Truth") 458 | plt.ylabel("Predictions") 459 | 460 | 461 | def draw_boxes(image, boxes=None, refined_boxes=None, 462 | masks=None, captions=None, visibilities=None, 463 | title="", ax=None): 464 | """Draw bounding boxes and segmentation masks with different 465 | customizations. 466 | 467 | boxes: [N, (y1, x1, y2, x2, class_id)] in image coordinates. 468 | refined_boxes: Like boxes, but draw with solid lines to show 469 | that they're the result of refining 'boxes'. 470 | masks: [N, height, width] 471 | captions: List of N titles to display on each box 472 | visibilities: (optional) List of values of 0, 1, or 2. Determine how 473 | prominent each bounding box should be. 474 | title: An optional title to show over the image 475 | ax: (optional) Matplotlib axis to draw on. 476 | """ 477 | # Number of boxes 478 | assert boxes is not None or refined_boxes is not None 479 | N = boxes.shape[0] if boxes is not None else refined_boxes.shape[0] 480 | 481 | # Matplotlib Axis 482 | if not ax: 483 | _, ax = plt.subplots(1, figsize=(12, 12)) 484 | 485 | # Generate random colors 486 | colors = random_colors(N) 487 | 488 | # Show area outside image boundaries. 489 | margin = image.shape[0] // 10 490 | ax.set_ylim(image.shape[0] + margin, -margin) 491 | ax.set_xlim(-margin, image.shape[1] + margin) 492 | ax.axis('off') 493 | 494 | ax.set_title(title) 495 | 496 | masked_image = image.astype(np.uint32).copy() 497 | for i in range(N): 498 | # Box visibility 499 | visibility = visibilities[i] if visibilities is not None else 1 500 | if visibility == 0: 501 | color = "gray" 502 | style = "dotted" 503 | alpha = 0.5 504 | elif visibility == 1: 505 | color = colors[i] 506 | style = "dotted" 507 | alpha = 1 508 | elif visibility == 2: 509 | color = colors[i] 510 | style = "solid" 511 | alpha = 1 512 | 513 | # Boxes 514 | if boxes is not None: 515 | if not np.any(boxes[i]): 516 | # Skip this instance. Has no bbox. Likely lost in cropping. 517 | continue 518 | y1, x1, y2, x2 = boxes[i] 519 | p = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2, 520 | alpha=alpha, linestyle=style, 521 | edgecolor=color, facecolor='none') 522 | ax.add_patch(p) 523 | 524 | # Refined boxes 525 | if refined_boxes is not None and visibility > 0: 526 | ry1, rx1, ry2, rx2 = refined_boxes[i].astype(np.int32) 527 | p = patches.Rectangle((rx1, ry1), rx2 - rx1, ry2 - ry1, linewidth=2, 528 | edgecolor=color, facecolor='none') 529 | ax.add_patch(p) 530 | # Connect the top-left corners of the anchor and proposal 531 | if boxes is not None: 532 | ax.add_line(lines.Line2D([x1, rx1], [y1, ry1], color=color)) 533 | 534 | # Captions 535 | if captions is not None: 536 | caption = captions[i] 537 | # If there are refined boxes, display captions on them 538 | if refined_boxes is not None: 539 | y1, x1, y2, x2 = ry1, rx1, ry2, rx2 540 | x = random.randint(x1, (x1 + x2) // 2) 541 | ax.text(x1, y1, caption, size=11, verticalalignment='top', 542 | color='w', backgroundcolor="none", 543 | bbox={'facecolor': color, 'alpha': 0.5, 544 | 'pad': 2, 'edgecolor': 'none'}) 545 | 546 | # Masks 547 | if masks is not None: 548 | mask = masks[:, :, i] 549 | masked_image = apply_mask(masked_image, mask, color) 550 | # Mask Polygon 551 | # Pad to ensure proper polygons for masks that touch image edges. 552 | padded_mask = np.zeros( 553 | (mask.shape[0] + 2, mask.shape[1] + 2), dtype=np.uint8) 554 | padded_mask[1:-1, 1:-1] = mask 555 | contours = find_contours(padded_mask, 0.5) 556 | for verts in contours: 557 | # Subtract the padding and flip (y, x) to (x, y) 558 | verts = np.fliplr(verts) - 1 559 | p = Polygon(verts, facecolor="none", edgecolor=color) 560 | ax.add_patch(p) 561 | ax.imshow(masked_image.astype(np.uint8)) 562 | 563 | 564 | def display_table(table): 565 | """Display values in a table format. 566 | table: an iterable of rows, and each row is an iterable of values. 567 | """ 568 | html = "" 569 | for row in table: 570 | row_html = "" 571 | for col in row: 572 | row_html += "{:40}".format(str(col)) 573 | html += "" + row_html + "" 574 | html = "" + html + "
" 575 | IPython.display.display(IPython.display.HTML(html)) 576 | 577 | 578 | def display_weight_stats(model): 579 | """Scans all the weights in the model and returns a list of tuples 580 | that contain stats about each weight. 581 | """ 582 | layers = model.get_trainable_layers() 583 | table = [["WEIGHT NAME", "SHAPE", "MIN", "MAX", "STD"]] 584 | for l in layers: 585 | weight_values = l.get_weights() # list of Numpy arrays 586 | weight_tensors = l.weights # list of TF tensors 587 | for i, w in enumerate(weight_values): 588 | weight_name = weight_tensors[i].name 589 | # Detect problematic layers. Exclude biases of conv layers. 590 | alert = "" 591 | if w.min() == w.max() and not (l.__class__.__name__ == "Conv2D" and i == 1): 592 | alert += "*** dead?" 593 | if np.abs(w.min()) > 1000 or np.abs(w.max()) > 1000: 594 | alert += "*** Overflow?" 595 | # Add row 596 | table.append([ 597 | weight_name + alert, 598 | str(w.shape), 599 | "{:+9.4f}".format(w.min()), 600 | "{:+10.4f}".format(w.max()), 601 | "{:+9.4f}".format(w.std()), 602 | ]) 603 | display_table(table) 604 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy 2 | scipy 3 | Pillow 4 | cython 5 | matplotlib 6 | scikit-image 7 | tensorflow>=1.3.0 8 | keras>=2.0.8 9 | opencv-python 10 | h5py 11 | imgaug 12 | IPython[all] -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | """ 2 | The build/compilations setup 3 | 4 | >> pip install -r requirements.txt 5 | >> python setup.py install 6 | """ 7 | import pip 8 | import logging 9 | import pkg_resources 10 | try: 11 | from setuptools import setup 12 | except ImportError: 13 | from distutils.core import setup 14 | 15 | 16 | def _parse_requirements(file_path): 17 | pip_ver = pkg_resources.get_distribution('pip').version 18 | pip_version = list(map(int, pip_ver.split('.')[:2])) 19 | if pip_version >= [6, 0]: 20 | raw = pip.req.parse_requirements(file_path, 21 | session=pip.download.PipSession()) 22 | else: 23 | raw = pip.req.parse_requirements(file_path) 24 | return [str(i.req) for i in raw] 25 | 26 | 27 | # parse_requirements() returns generator of pip.req.InstallRequirement objects 28 | try: 29 | install_reqs = _parse_requirements("requirements.txt") 30 | except Exception: 31 | logging.warning('Fail load requirements file, so using default ones.') 32 | install_reqs = [] 33 | 34 | setup( 35 | name='mask-rcnn', 36 | version='2.1', 37 | url='https://github.com/matterport/Mask_RCNN', 38 | author='Matterport', 39 | author_email='waleed.abdulla@gmail.com', 40 | license='MIT', 41 | description='Mask R-CNN for object detection and instance segmentation', 42 | packages=["mrcnn"], 43 | install_requires=install_reqs, 44 | include_package_data=True, 45 | python_requires='>=3.4', 46 | long_description="""This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. 47 | The model generates bounding boxes and segmentation masks for each instance of an object in the image. 48 | It's based on Feature Pyramid Network (FPN) and a ResNet101 backbone.""", 49 | classifiers=[ 50 | "Development Status :: 5 - Production/Stable", 51 | "Environment :: Console", 52 | "Intended Audience :: Developers", 53 | "Intended Audience :: Information Technology", 54 | "Intended Audience :: Education", 55 | "Intended Audience :: Science/Research", 56 | "License :: OSI Approved :: MIT License", 57 | "Natural Language :: English", 58 | "Operating System :: OS Independent", 59 | "Topic :: Scientific/Engineering :: Artificial Intelligence", 60 | "Topic :: Scientific/Engineering :: Image Recognition", 61 | "Topic :: Scientific/Engineering :: Visualization", 62 | "Topic :: Scientific/Engineering :: Image Segmentation", 63 | 'Programming Language :: Python :: 3.4', 64 | 'Programming Language :: Python :: 3.5', 65 | 'Programming Language :: Python :: 3.6', 66 | ], 67 | keywords="image instance segmentation object detection mask rcnn r-cnn tensorflow keras", 68 | ) 69 | -------------------------------------------------------------------------------- /slums/README.md: -------------------------------------------------------------------------------- 1 | # Training Details. 2 | 3 | This file contains details about training and inference for slum segmentation using Mask RCNN. 4 | 5 | 6 | 7 | ## Installation 8 | Use this Google Drive link to download the weights: 9 | * Download `mask_rcnn_slum_600_00128.h5` and save it in root directory. 10 | * Link : https://drive.google.com/file/d/1IIMZLrdCZXY_dA540Ve9lSJplYHLnTY4/view?usp=sharing 11 | 12 | ## Dataset 13 | Dataset of satellite images can be created using Google Earth's desktop application. For our project, we used 720X1280 images at 1000m and 100m views, from various Mumbai slums. Google's policy states that we cannot redistribute the dataset. 14 | 15 | Also, we recommend using VGG Image Annotator tool for annotating the segmentation masks as the code is written for that format. The tool gives the annotations in the form of a JSON file, that should be placed inside the dataset folder as follows: 16 | ``` 17 | dataset/ 18 | train/ 19 | all training images 20 | train.json 21 | val/ 22 | all val images 23 | val.json 24 | ``` 25 | 26 | Here are few links to help you to curate your own dataset:
27 | https://productforums.google.com/forum/#!msg/maps/8KjNgwbBzwc/4kNMfXB6CAAJ
28 | https://support.google.com/earth/answer/148146?hl=en
29 | http://www.robots.ox.ac.uk/~vgg/software/via/
30 | ## Training the model 31 | 32 | Train a new model starting from pre-trained COCO weights 33 | ``` 34 | python3 slum.py train --dataset=/path/to/slum/dataset --weights=coco 35 | ``` 36 | 37 | Resume training a model that you had trained earlier 38 | ``` 39 | python3 slum.py train --dataset=/path/to/slum/dataset --weights=last 40 | ``` 41 | 42 | Train a new model starting from ImageNet weights 43 | ``` 44 | python3 slum.py train --dataset=/path/to/slum/dataset --weights=imagenet 45 | ``` 46 | 47 | * The training details are specified inside slum.py code. 48 | * The model will save every checkpoint in root/logs folder. 49 | * The logs folders are timestamped according to start time and also have tensorboard visualizations. 50 | 51 | 52 | ## Inference 53 | Testing mode, where a segmentation mask is applied on the detected instances. Make sure to place the images inside ```test_images``` folder. 54 | 55 | ```bash 56 | python3 testing.py --weights=/path/to/mask_rcnn/mask_rcnn_slum.h5 57 | ``` 58 | This will save the detections (if any) for all the images in `test_images` and save it in `test_outputs`. 59 | 60 | Apply splash effect on a video. Requires OpenCV 3.2+: 61 | Segments out instances and applies masks on a video. 62 | ```bash 63 | python3 slum.py splash --weights=/path/to/mask_rcnn/mask_rcnn_slum.h5 --video= 64 | ``` 65 | ## Change Detection 66 | For detecting percentage change in masks, place the two images in ```change_det/ ``` folder and run: 67 | 68 | ```bash 69 | python3 change_detection.py --weights=/path/to/mask_rcnn/mask_rcnn_slum.h5 70 | ``` 71 | -------------------------------------------------------------------------------- /slums/change_det/1_raw.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/slums/change_det/1_raw.jpg -------------------------------------------------------------------------------- /slums/change_det/2_raw.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/slums/change_det/2_raw.jpg -------------------------------------------------------------------------------- /slums/change_det/change.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/slums/change_det/change.png -------------------------------------------------------------------------------- /slums/change_det/mask_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/slums/change_det/mask_1.png -------------------------------------------------------------------------------- /slums/change_det/mask_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/slums/change_det/mask_2.png -------------------------------------------------------------------------------- /slums/change_detection.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys,glob 3 | import random 4 | import math 5 | import re 6 | import time 7 | import numpy as np 8 | import tensorflow as tf 9 | import matplotlib 10 | import matplotlib.pyplot as plt 11 | import matplotlib.patches as patches 12 | import cv2 13 | # Root directory of the project 14 | ROOT_DIR = os.path.abspath("../") 15 | 16 | # Import Mask RCNN 17 | sys.path.append(ROOT_DIR) # To find local version of the library 18 | from mrcnn import utils 19 | from mrcnn import visualize 20 | from mrcnn.visualize import display_images 21 | import mrcnn.model as modellib 22 | from mrcnn.model import log 23 | 24 | from slums import slum 25 | 26 | import skimage.draw 27 | from skimage import measure 28 | from shapely.geometry.polygon import Polygon 29 | from skimage.measure import label 30 | 31 | 32 | import argparse 33 | 34 | #get largest conncected component in each mask 35 | def getLargestCC(segmentation): 36 | labels = label(segmentation) #Gives different integer value for each connected region 37 | #largest region is background. Ignore that and get second largest. 38 | largest_val = np.argmax(np.bincount(labels.flat)[1:]) + 1 #+1 as we ignore bg 39 | #print(np.bincount(labels.flat)[1:],' Largest ',largest_val) 40 | return labels==largest_val 41 | 42 | def merge_masks(masks): 43 | print('No of masks: ',masks.shape[2]) 44 | if masks.shape[2] <=1: 45 | return masks 46 | 47 | merged_mask_list = [] 48 | not_required = [] #list of indices not required as masks are merged 49 | for i in range(masks.shape[2]): 50 | m = masks[:,:,i] 51 | m = getLargestCC(m) 52 | m = np.expand_dims(m,axis=2) 53 | 54 | max_iou = -1 55 | max_mask = -1 56 | max_iou_index = -1 57 | 58 | #Calculate max_iou with other masks. 59 | for j in range(masks.shape[2]): 60 | #Same mask gives 1.0 ! 61 | if j!=i: 62 | n = masks[:,:,j] 63 | n = np.expand_dims(n,axis=2) 64 | intersection = np.logical_and(m,n) 65 | union = np.logical_or(m,n) 66 | iou_score = np.sum(intersection) / np.sum(union) 67 | #print(np.sum(intersection),np.sum(union)) 68 | #print(iou_score) 69 | if iou_score > max_iou: 70 | max_iou = iou_score 71 | max_mask = n 72 | max_iou_index = j 73 | 74 | #Need to merge if greater than 0.2 75 | if max_iou > 0.15: 76 | area_m = measure.regionprops(m[:,:,0].astype(np.uint8)) 77 | area_m = [prop.area for prop in area_m][0] 78 | #print(area_m,i) 79 | area_max_mask = measure.regionprops(max_mask[:,:,0].astype(np.uint8)) 80 | area_max_mask = [prop.area for prop in area_max_mask][0] 81 | #print(area_max_mask,max_iou_index) 82 | 83 | #print(area_m/(area_m + area_max_mask)) 84 | #print(area_max_mask/(area_m + area_max_mask)) 85 | 86 | if area_m >= area_max_mask: 87 | merged_mask_list.append(m) 88 | not_required.append(max_iou_index) 89 | else: 90 | merged_mask_list.append(max_mask) 91 | not_required.append(i) 92 | 93 | elif i not in not_required: 94 | merged_mask_list.append(m) 95 | 96 | #print('Matches: ',max_iou,i,max_iou_index) 97 | #print(not_required,len(merged_mask_list)) 98 | 99 | merged_mask_list = np.array(merged_mask_list) 100 | merged_mask_list = np.squeeze(merged_mask_list) 101 | merged_mask_list = np.transpose(merged_mask_list,(1,2,0)) 102 | 103 | return merged_mask_list 104 | 105 | 106 | def load_model(): 107 | with tf.device('/gpu:0'): 108 | model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR,config=config) 109 | weights_path = SLUM_WEIGHTS_PATH 110 | 111 | # Load weights 112 | print("Loading weights ", weights_path) 113 | model.load_weights(weights_path, by_name=True) 114 | return model 115 | 116 | 117 | def get_area(mask): 118 | area = measure.regionprops(mask.astype(np.uint8)) 119 | area = [prop.area for prop in area][0] 120 | return area 121 | 122 | def cal_diff(mask_1,mask_2,files,image_1,image_2,results_1,results_2): 123 | len_1 = mask_1.shape[2] 124 | len_2 = mask_2.shape[2] 125 | 126 | #Number of detections might be unequal 127 | #combine mask channels. 128 | m1 = np.zeros((mask_1.shape[:2])) 129 | for i in range(len_1): 130 | m1 = np.logical_or(m1,mask_1[:,:,i]) 131 | 132 | m2 = np.zeros((mask_2.shape[:2])) 133 | for i in range(len_2): 134 | m2 = np.logical_or(m2,mask_2[:,:,i]) 135 | 136 | 137 | #Calculate total area covered by mask_1 138 | mask_1_area = get_area(m1) 139 | mask_2_area = get_area(m2) 140 | 141 | m1 = m1.astype(np.uint8) 142 | m2 = m2.astype(np.uint8) 143 | 144 | print(m1.shape) 145 | print(m2.shape) 146 | 147 | diff = cv2.absdiff(m1,m2) 148 | diff_area = get_area(diff) 149 | 150 | print("M1 area :",mask_1_area) 151 | print("M2 area :",mask_2_area) 152 | print("Diff in area :",diff_area) 153 | 154 | max_area = max(mask_1_area,mask_2_area) 155 | 156 | d = diff_area/max_area 157 | if mask_1_area > mask_2_area: 158 | print(files[0],' greater area') 159 | else: 160 | print(files[1],' greater area') 161 | 162 | print('Change ',d*100,'%') 163 | 164 | return m1,m2,diff 165 | 166 | if __name__ == '__main__': 167 | 168 | parser = argparse.ArgumentParser() 169 | parser.add_argument("--weights_path",type=str,required=True) 170 | args = parser.parse_args() 171 | 172 | SLUM_WEIGHTS_PATH = args.weights_path 173 | 174 | 175 | config = slum.slumConfig() 176 | class InferenceConfig(config.__class__): 177 | # Run detection on one image at a time 178 | GPU_COUNT = 1 179 | IMAGES_PER_GPU = 1 180 | config = InferenceConfig() 181 | config.display() 182 | 183 | 184 | MODEL_DIR = os.path.join(ROOT_DIR, "logs") 185 | model = load_model() 186 | 187 | 188 | files = glob.glob('change_det/*.jpg') 189 | 190 | image_1 = skimage.io.imread(files[0]) 191 | image_2 = skimage.io.imread(files[1]) 192 | 193 | results_1 = model.detect(image_1[np.newaxis],verbose=0) 194 | results_2 = model.detect(image_2[np.newaxis],verbose=0) 195 | 196 | mask_1 = results_1[0]['masks'] 197 | mask_2 = results_2[0]['masks'] 198 | 199 | mask_1,mask_2,diff =cal_diff(mask_1,mask_2,files,image_1,image_2,results_1,results_2) 200 | 201 | 202 | r = results_2[0] 203 | r['masks'] = merge_masks(r['masks']) 204 | class_names = ['slum']*(len(r['class_ids'])+1) 205 | 206 | visualize.display_instances(image_2, r['rois'], r['masks'], r['class_ids'], 207 | class_names, r['scores'], ax=None,show_bbox=False,show_mask=True, 208 | title=files[0]) 209 | 210 | 211 | r = results_1[0] 212 | r['masks'] = merge_masks(r['masks']) 213 | class_names = ['slum']*(len(r['class_ids'])+1) 214 | 215 | visualize.display_instances(image_1, r['rois'], r['masks'], r['class_ids'], 216 | class_names, r['scores'], ax=None,show_bbox=False,show_mask=True, 217 | title=files[1]) 218 | 219 | 220 | 221 | print(files,' FILES') 222 | 223 | plt.imshow(mask_1) 224 | plt.axis('off') 225 | plt.savefig('change_det/mask_1.png',bbox_inches='tight') 226 | #plt.show() 227 | 228 | plt.imshow(mask_2) 229 | plt.axis('off') 230 | plt.savefig('change_det/mask_2.png',bbox_inches='tight') 231 | #plt.show() 232 | 233 | plt.imshow(diff) 234 | plt.axis('off') 235 | plt.savefig('change_det/change.png',bbox_inches='tight') 236 | #plt.show() -------------------------------------------------------------------------------- /slums/slum.py: -------------------------------------------------------------------------------- 1 | """ 2 | Mask R-CNN 3 | Train on the toy slum dataset and implement color splash effect. 4 | 5 | Copyright (c) 2018 Matterport, Inc. 6 | Licensed under the MIT License (see LICENSE for details) 7 | Written by Waleed Abdulla 8 | 9 | ------------------------------------------------------------ 10 | 11 | Usage: import the module (see Jupyter notebooks for examples), or run from 12 | the command line as such: 13 | 14 | # Train a new model starting from pre-trained COCO weights 15 | python3 slum.py train --dataset=/path/to/slum/dataset --weights=coco 16 | 17 | # Resume training a model that you had trained earlier 18 | python3 slum.py train --dataset=/path/to/slum/dataset --weights=last 19 | 20 | # Train a new model starting from ImageNet weights 21 | python3 slum.py train --dataset=/path/to/slum/dataset --weights=imagenet 22 | 23 | # Apply color splash to an image 24 | python3 slum.py splash --weights=/path/to/weights/file.h5 --image= 25 | 26 | # Apply color splash to video using the last weights you trained 27 | python3 slum.py splash --weights=last --video= 28 | """ 29 | 30 | import os 31 | import sys 32 | import json 33 | import datetime 34 | import numpy as np 35 | import skimage.draw 36 | 37 | """ 38 | Imgaug is an image augmentation library. 39 | """ 40 | from imgaug import augmenters as iaa 41 | from imgaug import parameters as iap 42 | import imgaug as ia 43 | 44 | 45 | # Root directory of the project 46 | ROOT_DIR = os.path.abspath("../") 47 | 48 | # Import Mask RCNN 49 | sys.path.append(ROOT_DIR) # To find local version of the library 50 | 51 | from mrcnn.config import Config 52 | from mrcnn import model as modellib, utils 53 | from mrcnn.visualize import random_colors,apply_mask 54 | # Path to trained weights file 55 | COCO_WEIGHTS_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5") 56 | 57 | # Directory to save logs and model checkpoints, if not provided 58 | # through the command line argument --logs 59 | DEFAULT_LOGS_DIR = os.path.join(ROOT_DIR, "logs") 60 | 61 | ############################################################ 62 | # Configurations 63 | ############################################################ 64 | 65 | 66 | class slumConfig(Config): 67 | """Configuration for training on the toy dataset. 68 | Derives from the base Config class and overrides some values. 69 | """ 70 | # Give the configuration a recognizable name 71 | NAME = "slum_100" 72 | 73 | # We use a GPU with 12GB memory, which can fit two images. 74 | # Adjust down if you use a smaller GPU. 75 | IMAGES_PER_GPU = 2 76 | 77 | # Number of classes (including background) 78 | NUM_CLASSES = 1 + 1 # Background + slum 79 | 80 | # Number of training steps per epoch 81 | STEPS_PER_EPOCH = 100 82 | 83 | # Skip detections with < 90% confidence 84 | DETECTION_MIN_CONFIDENCE = 0.9 85 | 86 | #Kills the machine 87 | #USE_MINI_MASK = False 88 | 89 | 90 | 91 | ############################################################ 92 | # Dataset 93 | ############################################################ 94 | 95 | class slumDataset(utils.Dataset): 96 | 97 | def load_slum(self, dataset_dir, subset): 98 | """Load a subset of the slum dataset. 99 | dataset_dir: Root directory of the dataset. 100 | subset: Subset to load: train or val 101 | """ 102 | # Add classes. We have only one class to add. 103 | self.add_class("slum_100", 1, "slum_100") 104 | 105 | dataset_dir = os.path.join(dataset_dir, subset) 106 | print(dataset_dir) 107 | 108 | annotations = json.load(open(os.path.join(dataset_dir, "via_region_data.json"))) 109 | annotations = list(annotations.values()) # don't need the dict keys 110 | 111 | # The VIA tool saves images in the JSON even if they don't have any 112 | # annotations. Skip unannotated images. 113 | annotations = [annotations[a] for a in range(len(annotations)) if annotations[a]['regions']] 114 | 115 | # Add images 116 | for a in annotations: 117 | # Get the x, y coordinaets of points of the polygons that make up 118 | # the outline of each object instance. These are stores in the 119 | # shape_attributes (see json format above) 120 | # The if condition is needed to support VIA versions 1.x and 2.x. 121 | if type(a['regions']) is dict: 122 | polygons = [r['shape_attributes'] for r in a['regions'].values()] 123 | else: 124 | polygons = [r['shape_attributes'] for r in a['regions']] 125 | 126 | # load_mask() needs the image size to convert polygons to masks. 127 | # Unfortunately, VIA doesn't include it in JSON, so we must read 128 | # the image. This is only managable since the dataset is tiny. 129 | image_path = os.path.join(dataset_dir, a['filename']) 130 | image = skimage.io.imread(image_path) 131 | height, width = image.shape[:2] 132 | 133 | self.add_image( 134 | "slum_100", 135 | image_id=a['filename'], # use file name as a unique image id 136 | path=image_path, 137 | width=width, height=height, 138 | polygons=polygons) 139 | 140 | def load_mask(self, image_id): 141 | """Generate instance masks for an image. 142 | Returns: 143 | masks: A bool array of shape [height, width, instance count] with 144 | one mask per instance. 145 | class_ids: a 1D array of class IDs of the instance masks. 146 | """ 147 | # If not a slum dataset image, delegate to parent class. 148 | image_info = self.image_info[image_id] 149 | if image_info["source"] != "slum_100": 150 | return super(self.__class__, self).load_mask(image_id) 151 | 152 | # Convert polygons to a bitmap mask of shape 153 | # [height, width, instance_count] 154 | info = self.image_info[image_id] 155 | mask = np.zeros([info["height"], info["width"], len(info["polygons"])], 156 | dtype=np.uint8) 157 | for i, p in enumerate(info["polygons"]): 158 | # Get indexes of pixels inside the polygon and set them to 1 159 | rr, cc = skimage.draw.polygon(p['all_points_y'], p['all_points_x']) 160 | mask[rr, cc, i] = 1 161 | 162 | # Return mask, and array of class IDs of each instance. Since we have 163 | # one class ID only, we return an array of 1s 164 | return mask.astype(np.bool), np.ones([mask.shape[-1]], dtype=np.int32) 165 | 166 | def image_reference(self, image_id): 167 | """Return the path of the image.""" 168 | info = self.image_info[image_id] 169 | if info["source"] == "slum_100": 170 | return info["path"] 171 | else: 172 | super(self.__class__, self).image_reference(image_id) 173 | 174 | 175 | def train(model): 176 | """Train the model.""" 177 | # Training dataset. 178 | sometimes = lambda aug: iaa.Sometimes(0.5, aug) 179 | seq = iaa.Sequential( 180 | [ 181 | # apply the following augmenters to most images 182 | iaa.Fliplr(0.5), # horizontally flip 50% of all images 183 | iaa.Flipud(0.2), # vertically flip 20% of all images 184 | # crop images by -5% to 10% of their height/width 185 | sometimes(iaa.CropAndPad( 186 | percent=(-0.05, 0.1), 187 | pad_mode=ia.ALL, 188 | pad_cval=(0, 255) 189 | )), 190 | sometimes(iaa.Affine( 191 | scale={"x": (0.8, 1.2), "y": (0.8, 1.2)}, # scale images to 80-120% of their size, individually per axis 192 | translate_percent={"x": (-0.2, 0.2), "y": (-0.2, 0.2)}, # translate by -20 to +20 percent (per axis) 193 | rotate=(-45, 45), # rotate by -45 to +45 degrees 194 | shear=(-16, 16), # shear by -16 to +16 degrees 195 | order=[0, 1], # use nearest neighbour or bilinear interpolation (fast) 196 | cval=(0, 255), # if mode is constant, use a cval between 0 and 255 197 | mode=ia.ALL # use any of scikit-image's warping modes (see 2nd image from the top for examples) 198 | )) 199 | ],random_order = True) 200 | 201 | 202 | dataset_train = slumDataset() 203 | dataset_train.load_slum(args.dataset, "train") 204 | dataset_train.prepare() 205 | 206 | # Validation dataset 207 | dataset_val = slumDataset() 208 | dataset_val.load_slum(args.dataset, "val") 209 | dataset_val.prepare() 210 | 211 | """ 212 | USING ADAM Optimizer. To change goto mrcnn/model.py, line no 2159 213 | For adam, we use a lesser learning rate. If you wish to use SGD, increase it 214 | in config.py 215 | """ 216 | 217 | # Training - Stage 1 218 | 219 | print("Training network heads") 220 | model.train(dataset_train, dataset_val, 221 | learning_rate=config.LEARNING_RATE, 222 | epochs=50, 223 | layers='heads', 224 | augmentation = seq 225 | ) 226 | 227 | 228 | # Training - Stage 2 229 | # Finetune layers from ResNet stage 4 and up 230 | print("Fine tune Resnet stage 4 and up") 231 | model.train(dataset_train, dataset_val, 232 | learning_rate=config.LEARNING_RATE/10, 233 | epochs=120, 234 | layers='4+', 235 | augmentation = seq 236 | ) 237 | 238 | # Training - Stage 3 239 | # Fine tune all layers. In this we stopped after 128 after recording no significant improvements. 240 | print("Fine tune all layers") 241 | model.train(dataset_train, dataset_val, 242 | learning_rate=config.LEARNING_RATE / 100, 243 | epochs=140, 244 | layers='all', 245 | augmentation = seq 246 | ) 247 | 248 | def color_splash(image, mask,color): 249 | """Apply color splash effect. 250 | image: RGB image [height, width, 3] 251 | mask: instance segmentation mask [height, width, instance count] 252 | 253 | Returns result image. 254 | """ 255 | mask = (np.sum(mask, -1, keepdims=True) >= 1) 256 | mask = np.squeeze(mask) 257 | splash = apply_mask(image,mask,color[0]) 258 | print(splash.shape) 259 | return splash 260 | 261 | 262 | def detect_and_color_splash(model, image_path=None, video_path=None): 263 | assert image_path or video_path 264 | 265 | # Image or video? 266 | if image_path: 267 | # Run model detection and generate the color splash effect 268 | print("Running on {}".format(args.image)) 269 | # Read image 270 | image = skimage.io.imread(args.image) 271 | # Detect objects 272 | r = model.detect([image], verbose=1)[0] 273 | # Color splash 274 | splash = color_splash(image, r['masks']) 275 | # Save output 276 | file_name = "splash_{:%Y%m%dT%H%M%S}.png".format(datetime.datetime.now()) 277 | skimage.io.imsave(file_name, splash) 278 | elif video_path: 279 | import cv2 280 | # Video capture 281 | vcapture = cv2.VideoCapture(video_path) 282 | width = int(vcapture.get(cv2.CAP_PROP_FRAME_WIDTH)) 283 | height = int(vcapture.get(cv2.CAP_PROP_FRAME_HEIGHT)) 284 | fps = vcapture.get(cv2.CAP_PROP_FPS) 285 | 286 | # Define codec and create video writer 287 | #file_name = "splash_{:%Y%m%dT%H%M%S}.m4v".format(datetime.datetime.now()) 288 | file_name = "splash_{:%Y%m%dT%H%M%S}.avi".format(datetime.datetime.now()) 289 | vwriter = cv2.VideoWriter(file_name,cv2.VideoWriter_fourcc(*'MJPG'),fps, (width, height)) 290 | 291 | count = 0 292 | success = True 293 | color = random_colors(5) 294 | while success: 295 | print("frame: ", count) 296 | # Read next image 297 | success, image = vcapture.read() 298 | if success: 299 | # OpenCV returns images as BGR, convert to RGB 300 | image = image[..., ::-1] 301 | # Detect objects 302 | r = model.detect([image], verbose=0)[0] 303 | # Color splash 304 | splash = color_splash(image, r['masks'],color) 305 | # RGB -> BGR to save image to video 306 | splash = splash[..., ::-1] 307 | # Add image to video writer 308 | vwriter.write(splash) 309 | count += 1 310 | vwriter.release() 311 | print("Saved to ", file_name) 312 | 313 | 314 | ############################################################ 315 | # Training 316 | ############################################################ 317 | 318 | if __name__ == '__main__': 319 | import argparse 320 | 321 | # Parse command line arguments 322 | parser = argparse.ArgumentParser( 323 | description='Train Mask R-CNN to detect slums.') 324 | parser.add_argument("command", 325 | metavar="", 326 | help="'train' or 'splash'") 327 | parser.add_argument('--dataset', required=False, 328 | metavar="/path/to/slum/dataset/", 329 | help='Directory of the slum dataset') 330 | parser.add_argument('--weights', required=True, 331 | metavar="/path/to/weights.h5", 332 | help="Path to weights .h5 file or 'coco'") 333 | parser.add_argument('--logs', required=False, 334 | default=DEFAULT_LOGS_DIR, 335 | metavar="/path/to/logs/", 336 | help='Logs and checkpoints directory (default=logs/)') 337 | parser.add_argument('--image', required=False, 338 | metavar="path or URL to image", 339 | help='Image to apply the color splash effect on') 340 | parser.add_argument('--video', required=False, 341 | metavar="path or URL to video", 342 | help='Video to apply the color splash effect on') 343 | args = parser.parse_args() 344 | 345 | # Validate arguments 346 | if args.command == "train": 347 | assert args.dataset, "Argument --dataset is required for training" 348 | elif args.command == "splash": 349 | assert args.image or args.video,\ 350 | "Provide --image or --video to apply color splash" 351 | 352 | print("Weights: ", args.weights) 353 | print("Dataset: ", args.dataset) 354 | print("Logs: ", args.logs) 355 | 356 | # Configurations 357 | if args.command == "train": 358 | config = slumConfig() 359 | else: 360 | class InferenceConfig(slumConfig): 361 | # Set batch size to 1 since we'll be running inference on 362 | # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU 363 | GPU_COUNT = 1 364 | IMAGES_PER_GPU = 1 365 | config = InferenceConfig() 366 | config.display() 367 | 368 | # Create model 369 | if args.command == "train": 370 | model = modellib.MaskRCNN(mode="training", config=config, 371 | model_dir=args.logs) 372 | else: 373 | model = modellib.MaskRCNN(mode="inference", config=config, 374 | model_dir=args.logs) 375 | 376 | # Select weights file to load 377 | if args.weights.lower() == "coco": 378 | weights_path = COCO_WEIGHTS_PATH 379 | # Download weights file 380 | if not os.path.exists(weights_path): 381 | utils.download_trained_weights(weights_path) 382 | elif args.weights.lower() == "last": 383 | # Find last trained weights 384 | weights_path = model.find_last() 385 | elif args.weights.lower() == "imagenet": 386 | # Start from ImageNet trained weights 387 | weights_path = model.get_imagenet_weights() 388 | else: 389 | weights_path = args.weights 390 | 391 | # Load weights 392 | print("Loading weights ", weights_path) 393 | if args.weights.lower() == "coco": 394 | # Exclude the last layers because they require a matching 395 | # number of classes 396 | model.load_weights(weights_path, by_name=True, exclude=[ 397 | "mrcnn_class_logits", "mrcnn_bbox_fc", 398 | "mrcnn_bbox", "mrcnn_mask"]) 399 | else: 400 | model.load_weights(weights_path, by_name=True) 401 | 402 | # Train or evaluate 403 | if args.command == "train": 404 | train(model) 405 | elif args.command == "splash": 406 | detect_and_color_splash(model, image_path=args.image, 407 | video_path=args.video) 408 | else: 409 | print("'{}' is not recognized. " 410 | "Use 'train' or 'splash'".format(args.command)) 411 | -------------------------------------------------------------------------------- /slums/test_images/bhandup_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/slums/test_images/bhandup_1.jpg -------------------------------------------------------------------------------- /slums/test_outputs/pred_0.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/slums/test_outputs/pred_0.jpg -------------------------------------------------------------------------------- /slums/testing.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys,glob 3 | import random 4 | import math 5 | import re 6 | import time 7 | import numpy as np 8 | import tensorflow as tf 9 | import matplotlib 10 | import matplotlib.pyplot as plt 11 | import matplotlib.patches as patches 12 | import cv2 13 | # Root directory of the project 14 | ROOT_DIR = os.path.abspath("../") 15 | 16 | # Import Mask RCNN 17 | sys.path.append(ROOT_DIR) # To find local version of the library 18 | from mrcnn import utils 19 | from mrcnn import visualize 20 | from mrcnn.visualize import display_images 21 | import mrcnn.model as modellib 22 | from mrcnn.model import log 23 | 24 | from slums import slum 25 | 26 | import skimage.draw 27 | from skimage import measure 28 | from shapely.geometry.polygon import Polygon 29 | from skimage.measure import label 30 | from sklearn.metrics import jaccard_similarity_score 31 | 32 | import argparse 33 | 34 | 35 | def get_ax(rows=1, cols=1, size=16): 36 | """Return a Matplotlib Axes array to be used in 37 | all visualizations in the notebook. Provide a 38 | central point to control graph sizes. 39 | 40 | Adjust the size attribute to control how big to render images 41 | """ 42 | _, ax = plt.subplots(rows, cols, figsize=(size*cols, size*rows)) 43 | return ax 44 | 45 | def getLargestCC(segmentation): 46 | labels = label(segmentation) #Gives different integer value for each connected region 47 | #largest region is background. Ignore that and get second largest. 48 | if len(np.bincount(labels.flat))==1: 49 | return labels 50 | 51 | largest_val = np.argmax(np.bincount(labels.flat)[1:]) + 1 #+1 as we ignore bg 52 | #print(np.bincount(labels.flat)[1:],' Largest ',largest_val) 53 | return labels==largest_val 54 | 55 | def load_model(): 56 | with tf.device(DEVICE): 57 | model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR,config=config) 58 | weights_path = SLUM_WEIGHTS_PATH 59 | 60 | # Load weights 61 | print("Loading weights ", weights_path) 62 | model.load_weights(weights_path, by_name=True) 63 | return model 64 | 65 | 66 | def compute_batch_ap(dataset, image_ids, verbose=1): 67 | """ 68 | # Load validation dataset if you need to use this function. 69 | dataset = slum.slumDataset() 70 | dataset.load_slum(folder_path,fol) 71 | 72 | """ 73 | 74 | APs = [] 75 | IOUs = [] 76 | 77 | for image_id in image_ids: 78 | # Load image 79 | image, image_meta, gt_class_id, gt_bbox, gt_mask =\ 80 | modellib.load_image_gt(dataset, config, 81 | image_id, use_mini_mask=False) 82 | 83 | # Run object detection 84 | results = model.detect_molded(image[np.newaxis], image_meta[np.newaxis], verbose=0) 85 | # Compute AP over range 0.5 to 0.95 86 | r = results[0] 87 | 88 | #merge_masks. 89 | gt_merge_mask = np.zeros((gt_mask.shape[:2])) 90 | for i in range(gt_mask.shape[2]): 91 | gt_merge_mask = np.logical_or(gt_merge_mask,gt_mask[:,:,i]) 92 | 93 | pred_merge_mask = np.zeros((r['masks'].shape[:2])) 94 | for i in range(r['masks'].shape[2]): 95 | pred_merge_mask = np.logical_or(pred_merge_mask,r['masks'][:,:,i]) 96 | 97 | 98 | pred_merge_mask = np.expand_dims(pred_merge_mask,2) 99 | #print(pred_merge_mask.shape) 100 | pred_merge_mask,wind,scale,pad,crop = utils.resize_image(pred_merge_mask,1024,1024) 101 | #print(pred_merge_mask.shape,gt_merge_mask.shape) 102 | 103 | iou = jaccard_similarity_score(np.squeeze(pred_merge_mask),gt_merge_mask) 104 | 105 | #mAP at 50 106 | print("mAP at 50") 107 | ap = utils.compute_ap_range( 108 | gt_bbox, gt_class_id, gt_mask, 109 | r['rois'], r['class_ids'], r['scores'], r['masks'],np.arange(0.5,1.0),verbose=0) 110 | 111 | #Make sure ap doesnt go above 1 ! 112 | if ap>1.0: 113 | ap = 1.0 114 | 115 | APs.append(ap) 116 | IOUs.append(iou) 117 | 118 | if verbose: 119 | info = dataset.image_info[image_id] 120 | meta = modellib.parse_image_meta(image_meta[np.newaxis,...]) 121 | print("{:3} {} AP: {:.2f} Image_id: {}, IOU: {}".format( 122 | meta["image_id"][0], meta["original_image_shape"][0], ap,image_id,iou)) 123 | return APs,IOUs 124 | 125 | 126 | def test_on_folder(model,folder_path,save_path='test_outputs/'): 127 | 128 | if not os.path.exists(save_path): 129 | os.mkdir(save_path) 130 | 131 | files = glob.glob(folder_path+'/*.jpg') 132 | 133 | for i in range(len(files)): 134 | image_id = i 135 | image = skimage.io.imread(files[image_id]) 136 | results = model.detect(image[np.newaxis],verbose=0) 137 | results = results[0] 138 | class_names = ['slum']*(len(results['class_ids'])+1) 139 | mask = results['masks'] 140 | 141 | file_to_save = save_path + '/pred_'+str(image_id) + '.jpg' 142 | 143 | visualize.save_instances(image, results['rois'], results['masks'], results['class_ids'], 144 | class_names,file_to_save,results['scores'], ax=None, 145 | show_bbox=False, show_mask=True, 146 | title="Predictions "+str(image_id)) 147 | 148 | #Uncomment to visualize using matpltolib. 149 | """ 150 | visualize.display_instances(image, resukts['rois'], results['masks'], results['class_ids'], 151 | class_names, results['scores'], ax=get_ax(0), 152 | show_bbox=False, show_mask=True, 153 | title="Predictions "+str(image_id)) 154 | """ 155 | 156 | 157 | if __name__ == '__main__': 158 | 159 | parser = argparse.ArgumentParser() 160 | parser.add_argument("--weights_path",type=str,required=True) 161 | args = parser.parse_args() 162 | 163 | SLUM_WEIGHTS_PATH = args.weights_path 164 | 165 | 166 | # Directory to save logs and trained model 167 | MODEL_DIR = os.path.join(ROOT_DIR, "logs") 168 | config = slum.slumConfig() 169 | 170 | 171 | class InferenceConfig(config.__class__): 172 | # Run detection on one image at a time 173 | GPU_COUNT = 1 174 | IMAGES_PER_GPU = 1 175 | 176 | config = InferenceConfig() 177 | config.display() 178 | 179 | DEVICE = "/gpu:0" # /cpu:0 or /gpu:0 180 | 181 | # Inspect the model in training or inference modes 182 | # values: 'inference' or 'training' 183 | # TODO: code for 'training' test mode not ready yet 184 | TEST_MODE = "inference" 185 | 186 | #Use to run over test_data 187 | model = load_model() 188 | test_on_folder(model,'test_images/') --------------------------------------------------------------------------------