├── .gitignore
├── README.md
├── _config.yml
├── assets
└── images
│ ├── Untitled presentation.jpg
│ ├── Webp.net-resizeimage-50.png
│ ├── Webp.net-resizeimage-75.png
│ ├── Webp.net-resizeimage.png
│ ├── change.png
│ ├── combined-intro.png
│ ├── del.jpg
│ ├── dh-govandi.png
│ ├── dharavi.png
│ ├── intro-min.jpg
│ ├── intro.jpg
│ ├── intro.png
│ ├── intro_2.jpg
│ ├── kurla-result.png
│ ├── kurla-result_2.png
│ ├── kurla.jpg
│ ├── results_github_2.jpg
│ ├── slum.png
│ └── slum_480.gif
├── index.md
├── intro.jpg
├── mrcnn
├── __init__.py
├── config.py
├── model.py
├── parallel_model.py
├── utils.py
└── visualize.py
├── requirements.txt
├── setup.py
└── slums
├── README.md
├── change_det
├── 1_raw.jpg
├── 2_raw.jpg
├── change.png
├── mask_1.png
└── mask_2.png
├── change_detection.py
├── slum.py
├── test_images
└── bhandup_1.jpg
├── test_outputs
└── pred_0.jpg
└── testing.py
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | wheels/
23 | share/python-wheels/
24 | *.egg-info/
25 | .installed.cfg
26 | *.egg
27 | MANIFEST
28 |
29 | # PyInstaller
30 | # Usually these files are written by a python script from a template
31 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
32 | *.manifest
33 | *.spec
34 |
35 | # Installer logs
36 | pip-log.txt
37 | pip-delete-this-directory.txt
38 |
39 | # Unit test / coverage reports
40 | htmlcov/
41 | .tox/
42 | .nox/
43 | .coverage
44 | .coverage.*
45 | .cache
46 | nosetests.xml
47 | coverage.xml
48 | *.cover
49 | .hypothesis/
50 | .pytest_cache/
51 |
52 | # Translations
53 | *.mo
54 | *.pot
55 |
56 | # Django stuff:
57 | *.log
58 | local_settings.py
59 | db.sqlite3
60 |
61 | # Flask stuff:
62 | instance/
63 | .webassets-cache
64 |
65 | # Scrapy stuff:
66 | .scrapy
67 |
68 | # Sphinx documentation
69 | docs/_build/
70 |
71 | # PyBuilder
72 | target/
73 |
74 | # Jupyter Notebook
75 | .ipynb_checkpoints
76 |
77 | # IPython
78 | profile_default/
79 | ipython_config.py
80 |
81 | # pyenv
82 | .python-version
83 |
84 | # celery beat schedule file
85 | celerybeat-schedule
86 |
87 | # SageMath parsed files
88 | *.sage.py
89 |
90 | # Environments
91 | .env
92 | .venv
93 | env/
94 | venv/
95 | ENV/
96 | env.bak/
97 | venv.bak/
98 |
99 | # Spyder project settings
100 | .spyderproject
101 | .spyproject
102 |
103 | # Rope project settings
104 | .ropeproject
105 |
106 | # mkdocs documentation
107 | /site
108 |
109 | # mypy
110 | .mypy_cache/
111 | .dmypy.json
112 | dmypy.json
113 |
114 | # Pyre type checker
115 | .pyre/
116 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Mumbai slum segmentation
2 |
3 | More than one billion people live in slums around the world. In some developing
4 | countries, slum residents make up for more than half of the population and lack
5 | reliable sanitation services, clean water, electricity, other basic services. We wanted to help. We built a deep learning model to map and and monitor slum growth over time. Check out our [project site](https://cbsudux.github.io/Mumbai-slum-segmentation/) for more information and how you can be a part of this and contribute.
6 |
7 |
8 |
9 |
10 |
11 |
12 | ## Mumbai Slums
13 |
14 | Mumbai is one of the most populous and wealthiest cities in India. However, it is also home to some of the world’s biggest slums -- **Dharavi, Mankhurd-Govandi belt, Kurla-Ghatkopar belt, Dindoshi and The Bhandup-Mulund slums**. The number of slum-dwellers in Mumbai is estimated to be around 9 million, up from 6 million in 2001 that is, 62% of of Mumbai live in informal slums.
15 |
16 | 
17 |
18 | 
19 |
20 | When we spoke to the local slum dwellers, we realised that the situation was worse than we expected. Most of them lack access to clean water, basic sanitation and any form of reliable healthcare.
21 |
22 | We wanted to help.
23 |
24 | ## What did we do?
25 |
26 | Any intitative on slum rehabitiation and improvement relies heavily on **slum mapping** and **monitoring**. When we spoke to the relevant authorities, we found out that they mapped slums manually (human annotators), which takes a substantial amount of time. We realised we could automate this and used a deep learning approach to **segment and map individual slums from satellite imagery**. In addition, we also wrote code to **perform change detection and monitor slum change over time**. Slum change detection is an important task and analysing increase/decrease of a slum can provide valuable insights.
27 |
28 | ## How did we go about it?
29 |
30 | We curated a **dataset** containing 3-band (RGB) satellite imagery with 65 cm per pixel resolution
31 | collected from Google Earth. Each image has a pixel size of 1280x720. The satellite imagery covers most of
32 | Mumbai and we include images from 2002 to 2018, to analyze slum change. We used 513 images for training, and 97 images for testing. (Unfortunately, we cannot redistribute the dataset, due to Google policy.)
33 |
34 | For **slum segmentation and mapping**, we trained a Mask R-CNN on our custom dataset. Check our [github readme](https://github.com/cbsudux/Mumbai-slum-segmentation/tree/master/slums) for our training and testing approaches, and our [paper](https://arxiv.org/abs/1811.07896) for more details.
35 |
36 | 
37 |
38 | For **slum change detection**, we took a pair of satellite images, representing the same location at different points of time. We predicted masks for both these images and then subtracted the masks to obtain a percentage icrease/decrease. The following images (below) show a change of +35.25% between 2018 (top row) and 2005 (bottom row) of the same slum.
39 |
40 | 
41 |
42 | ## Training and Testing
43 |
44 | Read [this](https://github.com/cbsudux/Mumbai-slum-segmentation/tree/master/slums) for training and testing, and how to prepare your own satellite dataset.
45 |
46 |
47 | ## Contributors
48 |
49 | - [Sudharshan Chandra Babu](http://github.com/cbsudux)
50 | - [Shishira R Maiya](https://github.com/abhyantrika)
51 |
52 |
53 | ## Acknowledgements
54 |
55 | We would like to thank the Slum Rehabiliation Authority of Mumbai for their data.
56 |
57 | ## Citing
58 |
59 | We published our work in the NeurIPS (NIPS) 2018 ML4D workshop. If you'd like to use our research, please cite using :
60 | ```
61 | @article{maiya2018slum,
62 | title={Slum Segmentation and Change Detection: A Deep Learning Approach},
63 | author={Maiya, Shishira R and Babu, Sudharshan Chandra},
64 | journal={arXiv preprint arXiv:1811.07896},
65 | year={2018}
66 | }
67 | ```
68 |
69 | ## License
70 |
71 | This work is licensed under a Creative Commons Attribution 4.0 International License.
72 |
73 |
74 |
75 |
--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
1 | theme: jekyll-theme-cayman
--------------------------------------------------------------------------------
/assets/images/Untitled presentation.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/Untitled presentation.jpg
--------------------------------------------------------------------------------
/assets/images/Webp.net-resizeimage-50.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/Webp.net-resizeimage-50.png
--------------------------------------------------------------------------------
/assets/images/Webp.net-resizeimage-75.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/Webp.net-resizeimage-75.png
--------------------------------------------------------------------------------
/assets/images/Webp.net-resizeimage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/Webp.net-resizeimage.png
--------------------------------------------------------------------------------
/assets/images/change.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/change.png
--------------------------------------------------------------------------------
/assets/images/combined-intro.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/combined-intro.png
--------------------------------------------------------------------------------
/assets/images/del.jpg:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/assets/images/dh-govandi.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/dh-govandi.png
--------------------------------------------------------------------------------
/assets/images/dharavi.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/dharavi.png
--------------------------------------------------------------------------------
/assets/images/intro-min.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/intro-min.jpg
--------------------------------------------------------------------------------
/assets/images/intro.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/intro.jpg
--------------------------------------------------------------------------------
/assets/images/intro.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/intro.png
--------------------------------------------------------------------------------
/assets/images/intro_2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/intro_2.jpg
--------------------------------------------------------------------------------
/assets/images/kurla-result.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/kurla-result.png
--------------------------------------------------------------------------------
/assets/images/kurla-result_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/kurla-result_2.png
--------------------------------------------------------------------------------
/assets/images/kurla.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/kurla.jpg
--------------------------------------------------------------------------------
/assets/images/results_github_2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/results_github_2.jpg
--------------------------------------------------------------------------------
/assets/images/slum.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/slum.png
--------------------------------------------------------------------------------
/assets/images/slum_480.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/assets/images/slum_480.gif
--------------------------------------------------------------------------------
/index.md:
--------------------------------------------------------------------------------
1 | # Mumbai slum segmentation
2 |
3 | More than one billion people live in slums around the world. In some developing countries, slum residents make up for more than half of the population and lack reliable sanitation services, clean water, electricity, other basic services. We wanted to help.
4 |
5 | 
6 |
7 |
8 | ## Mumbai Slums
9 |
10 | Mumbai is one of the most populous and wealthiest cities in India. However, it is also home to some of the world’s biggest slums -- **Dharavi, Mankhurd-Govandi belt, Kurla-Ghatkopar belt, Dindoshi and The Bhandup-Mulund slums**. The number of slum-dwellers in Mumbai is estimated to be around 9 million, up from 6 million in 2001 that is, 62% of of Mumbai live in informal slums.
11 |
12 | 
13 |
14 | 
15 |
16 | When we spoke to the local slum dwellers, we realised that the situation was worse than we expected. Most of them lack access to clean water, basic sanitation and any form of reliable healthcare.
17 |
18 | We wanted to help.
19 |
20 |
21 | ## What did we do?
22 |
23 | Any intitative on slum rehabitiation and improvement relies heavily on **slum mapping** and **monitoring**. When we spoke to the relevant authorities, we found out that they mapped slums manually (human annotators), which takes a substantial amount of time. We realised we could automate this and used a deep learning approach to **segment and map individual slums from satellite imagery**. In addition, we also wrote code to **perform change detection and monitor slum change over time**. Slum change detection is an important task and analysing increase/decrease of a slum can provide valuable insights.
24 |
25 | ## How did we go about it?
26 |
27 | We curated a **dataset** containing 3-band (RGB) satellite imagery with 65 cm per pixel resolution
28 | collected from Google Earth. Each image has a pixel size of 1280x720. The satellite imagery covers most of
29 | Mumbai and we include images from 2002 to 2018, to analyze slum change. We used 513 images for training, and 97 images for testing. (Unfortunately, we cannot redistribute the dataset, due to Google policy.)
30 |
31 | For **slum segmentation and mapping**, we trained a Mask R-CNN on our custom dataset. Check our [github readme](https://github.com/cbsudux/Mumbai-slum-segmentation/tree/master/slums) for our training and testing approaches, and our [paper](https://arxiv.org/abs/1811.07896) for more details.
32 |
33 | 
34 |
35 | The Kurla-Ghatokopar slums (above) are one of the first things you see when you land in Mumbai, given their proximity to the Chhatrapati Shivaji Maharaj International Airport.
36 |
37 |
38 |
39 |
40 |
41 |
42 |
43 |
44 | Here's a short video (above) of our model mapping the Govandi slums.
45 |
46 | For **slum change detection**, we took a pair of satellite images, representing the same location at different points of time. We predicted masks for both these images and then subtracted the masks to obtain a percentage icrease/decrease. The following images (below) show a change of +35.25% between 2018 (top row) and 2005 (bottom row) of the same slum.
47 |
48 | 
49 |
50 |
51 | ## Contributors
52 |
53 | - [Sudharshan Chandra Babu](http://github.com/cbsudux)
54 | - [Shishira R Maiya](https://github.com/abhyantrika)
55 |
56 | ## How can you help?
57 |
58 | Quite a lot of NGOs work towards slum rehabilitation in Mumbai. You can volunteer (or) donate.
59 |
60 | ### NGOs
61 |
62 | - [Slum Aid](http://slumaid.org/)
63 | - [Red Boys Foundation](http://www.redboysfoundation.com/)
64 | - [SAKHI](http://sakhiforgirlseducation.org/)
65 | - [Society for Nutrition, Education & Health Action (SNEHA)](http://snehamumbai.org/)
66 |
67 | ## Acknowledgements
68 |
69 | We would like to thank the Slum Rehabiliation Authority of Mumbai for their data.
70 |
71 | ## Citing
72 |
73 | We published our work in the NeurIPS (NIPS) 2018 ML4D workshop. If you'd like to use our research, please cite using -
74 | ```
75 | @article{maiya2018slum,
76 | title={Slum Segmentation and Change Detection: A Deep Learning Approach},
77 | author={Maiya, Shishira R and Babu, Sudharshan Chandra},
78 | journal={arXiv preprint arXiv:1811.07896},
79 | year={2018}
80 | }
81 | ```
82 |
83 |
84 |
85 |
86 |
--------------------------------------------------------------------------------
/intro.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/intro.jpg
--------------------------------------------------------------------------------
/mrcnn/__init__.py:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/mrcnn/config.py:
--------------------------------------------------------------------------------
1 | """
2 | Mask R-CNN
3 | Base Configurations class.
4 |
5 | Copyright (c) 2017 Matterport, Inc.
6 | Licensed under the MIT License (see LICENSE for details)
7 | Written by Waleed Abdulla
8 | """
9 |
10 | import numpy as np
11 |
12 |
13 | # Base Configuration Class
14 | # Don't use this class directly. Instead, sub-class it and override
15 | # the configurations you need to change.
16 |
17 | class Config(object):
18 | """Base configuration class. For custom configurations, create a
19 | sub-class that inherits from this one and override properties
20 | that need to be changed.
21 | """
22 | # Name the configurations. For example, 'COCO', 'Experiment 3', ...etc.
23 | # Useful if your code needs to do things differently depending on which
24 | # experiment is running.
25 | NAME = None # Override in sub-classes
26 |
27 | # NUMBER OF GPUs to use. When using only a CPU, this needs to be set to 1.
28 | GPU_COUNT = 1
29 |
30 | # Number of images to train with on each GPU. A 12GB GPU can typically
31 | # handle 2 images of 1024x1024px.
32 | # Adjust based on your GPU memory and image sizes. Use the highest
33 | # number that your GPU can handle for best performance.
34 | IMAGES_PER_GPU = 2
35 |
36 | # Number of training steps per epoch
37 | # This doesn't need to match the size of the training set. Tensorboard
38 | # updates are saved at the end of each epoch, so setting this to a
39 | # smaller number means getting more frequent TensorBoard updates.
40 | # Validation stats are also calculated at each epoch end and they
41 | # might take a while, so don't set this too small to avoid spending
42 | # a lot of time on validation stats.
43 | STEPS_PER_EPOCH = 1000
44 |
45 | # Number of validation steps to run at the end of every training epoch.
46 | # A bigger number improves accuracy of validation stats, but slows
47 | # down the training.
48 | VALIDATION_STEPS = 50
49 |
50 | # Backbone network architecture
51 | # Supported values are: resnet50, resnet101.
52 | # You can also provide a callable that should have the signature
53 | # of model.resnet_graph. If you do so, you need to supply a callable
54 | # to COMPUTE_BACKBONE_SHAPE as well
55 | BACKBONE = "resnet101"
56 |
57 | # Only useful if you supply a callable to BACKBONE. Should compute
58 | # the shape of each layer of the FPN Pyramid.
59 | # See model.compute_backbone_shapes
60 | COMPUTE_BACKBONE_SHAPE = None
61 |
62 | # The strides of each layer of the FPN Pyramid. These values
63 | # are based on a Resnet101 backbone.
64 | BACKBONE_STRIDES = [4, 8, 16, 32, 64]
65 |
66 | # Size of the fully-connected layers in the classification graph
67 | FPN_CLASSIF_FC_LAYERS_SIZE = 1024
68 |
69 | # Size of the top-down layers used to build the feature pyramid
70 | TOP_DOWN_PYRAMID_SIZE = 256
71 |
72 | # Number of classification classes (including background)
73 | NUM_CLASSES = 1 # Override in sub-classes
74 |
75 | # Length of square anchor side in pixels
76 | RPN_ANCHOR_SCALES = (32, 64, 128, 256, 512)
77 |
78 | # Ratios of anchors at each cell (width/height)
79 | # A value of 1 represents a square anchor, and 0.5 is a wide anchor
80 | RPN_ANCHOR_RATIOS = [0.5, 1, 2]
81 |
82 | # Anchor stride
83 | # If 1 then anchors are created for each cell in the backbone feature map.
84 | # If 2, then anchors are created for every other cell, and so on.
85 | RPN_ANCHOR_STRIDE = 1
86 |
87 | # Non-max suppression threshold to filter RPN proposals.
88 | # You can increase this during training to generate more propsals.
89 | RPN_NMS_THRESHOLD = 0.7
90 |
91 | # How many anchors per image to use for RPN training
92 | RPN_TRAIN_ANCHORS_PER_IMAGE = 256
93 |
94 | # ROIs kept after tf.nn.top_k and before non-maximum suppression
95 | PRE_NMS_LIMIT = 6000
96 |
97 | # ROIs kept after non-maximum suppression (training and inference)
98 | POST_NMS_ROIS_TRAINING = 2000
99 | POST_NMS_ROIS_INFERENCE = 1000
100 |
101 | # If enabled, resizes instance masks to a smaller size to reduce
102 | # memory load. Recommended when using high-resolution images.
103 | USE_MINI_MASK = True
104 | MINI_MASK_SHAPE = (56, 56) # (height, width) of the mini-mask
105 |
106 | # Input image resizing
107 | # Generally, use the "square" resizing mode for training and predicting
108 | # and it should work well in most cases. In this mode, images are scaled
109 | # up such that the small side is = IMAGE_MIN_DIM, but ensuring that the
110 | # scaling doesn't make the long side > IMAGE_MAX_DIM. Then the image is
111 | # padded with zeros to make it a square so multiple images can be put
112 | # in one batch.
113 | # Available resizing modes:
114 | # none: No resizing or padding. Return the image unchanged.
115 | # square: Resize and pad with zeros to get a square image
116 | # of size [max_dim, max_dim].
117 | # pad64: Pads width and height with zeros to make them multiples of 64.
118 | # If IMAGE_MIN_DIM or IMAGE_MIN_SCALE are not None, then it scales
119 | # up before padding. IMAGE_MAX_DIM is ignored in this mode.
120 | # The multiple of 64 is needed to ensure smooth scaling of feature
121 | # maps up and down the 6 levels of the FPN pyramid (2**6=64).
122 | # crop: Picks random crops from the image. First, scales the image based
123 | # on IMAGE_MIN_DIM and IMAGE_MIN_SCALE, then picks a random crop of
124 | # size IMAGE_MIN_DIM x IMAGE_MIN_DIM. Can be used in training only.
125 | # IMAGE_MAX_DIM is not used in this mode.
126 | IMAGE_RESIZE_MODE = "square"
127 | IMAGE_MIN_DIM = 800
128 | IMAGE_MAX_DIM = 1024
129 | # Minimum scaling ratio. Checked after MIN_IMAGE_DIM and can force further
130 | # up scaling. For example, if set to 2 then images are scaled up to double
131 | # the width and height, or more, even if MIN_IMAGE_DIM doesn't require it.
132 | # Howver, in 'square' mode, it can be overruled by IMAGE_MAX_DIM.
133 | IMAGE_MIN_SCALE = 0
134 | # Number of color channels per image. RGB = 3, grayscale = 1, RGB-D = 4
135 | # Changing this requires other changes in the code. See the WIKI for more
136 | # details: https://github.com/matterport/Mask_RCNN/wiki
137 | IMAGE_CHANNEL_COUNT = 3
138 |
139 | # Image mean (RGB)
140 | MEAN_PIXEL = np.array([123.7, 116.8, 103.9])
141 |
142 | # Number of ROIs per image to feed to classifier/mask heads
143 | # The Mask RCNN paper uses 512 but often the RPN doesn't generate
144 | # enough positive proposals to fill this and keep a positive:negative
145 | # ratio of 1:3. You can increase the number of proposals by adjusting
146 | # the RPN NMS threshold.
147 | TRAIN_ROIS_PER_IMAGE = 200
148 |
149 | # Percent of positive ROIs used to train classifier/mask heads
150 | ROI_POSITIVE_RATIO = 0.33
151 |
152 | # Pooled ROIs
153 | POOL_SIZE = 7
154 | MASK_POOL_SIZE = 14
155 |
156 | # Shape of output mask
157 | # To change this you also need to change the neural network mask branch
158 | MASK_SHAPE = [28, 28]
159 |
160 | # Maximum number of ground truth instances to use in one image
161 | MAX_GT_INSTANCES = 100
162 |
163 | # Bounding box refinement standard deviation for RPN and final detections.
164 | RPN_BBOX_STD_DEV = np.array([0.1, 0.1, 0.2, 0.2])
165 | BBOX_STD_DEV = np.array([0.1, 0.1, 0.2, 0.2])
166 |
167 | # Max number of final detections
168 | DETECTION_MAX_INSTANCES = 100
169 |
170 | # Minimum probability value to accept a detected instance
171 | # ROIs below this threshold are skipped
172 | DETECTION_MIN_CONFIDENCE = 0.7
173 |
174 | # Non-maximum suppression threshold for detection
175 | DETECTION_NMS_THRESHOLD = 0.3
176 |
177 | # Learning rate and momentum
178 | # The Mask RCNN paper uses lr=0.02, but on TensorFlow it causes
179 | # weights to explode. Likely due to differences in optimizer
180 | # implementation.
181 |
182 | #For SGD
183 | #LEARNING_RATE = 0.001
184 |
185 | #For ADAM
186 | LEARNING_RATE = 0.0001
187 |
188 | LEARNING_MOMENTUM = 0.9
189 |
190 | # Weight decay regularization
191 | WEIGHT_DECAY = 0.0001
192 |
193 | # Loss weights for more precise optimization.
194 | # Can be used for R-CNN training setup.
195 | LOSS_WEIGHTS = {
196 | "rpn_class_loss": 1.,
197 | "rpn_bbox_loss": 1.,
198 | "mrcnn_class_loss": 1.,
199 | "mrcnn_bbox_loss": 1.,
200 | "mrcnn_mask_loss": 1.
201 | }
202 |
203 | # Use RPN ROIs or externally generated ROIs for training
204 | # Keep this True for most situations. Set to False if you want to train
205 | # the head branches on ROI generated by code rather than the ROIs from
206 | # the RPN. For example, to debug the classifier head without having to
207 | # train the RPN.
208 | USE_RPN_ROIS = True
209 |
210 | # Train or freeze batch normalization layers
211 | # None: Train BN layers. This is the normal mode
212 | # False: Freeze BN layers. Good when using a small batch size
213 | # True: (don't use). Set layer in training mode even when predicting
214 | TRAIN_BN = False # Defaulting to False since batch size is often small
215 |
216 | # Gradient norm clipping
217 | GRADIENT_CLIP_NORM = 5.0
218 |
219 | def __init__(self):
220 | """Set values of computed attributes."""
221 | # Effective batch size
222 | self.BATCH_SIZE = self.IMAGES_PER_GPU * self.GPU_COUNT
223 |
224 | # Input image size
225 | if self.IMAGE_RESIZE_MODE == "crop":
226 | self.IMAGE_SHAPE = np.array([self.IMAGE_MIN_DIM, self.IMAGE_MIN_DIM,
227 | self.IMAGE_CHANNEL_COUNT])
228 | else:
229 | self.IMAGE_SHAPE = np.array([self.IMAGE_MAX_DIM, self.IMAGE_MAX_DIM,
230 | self.IMAGE_CHANNEL_COUNT])
231 |
232 | # Image meta data length
233 | # See compose_image_meta() for details
234 | self.IMAGE_META_SIZE = 1 + 3 + 3 + 4 + 1 + self.NUM_CLASSES
235 |
236 | def display(self):
237 | """Display Configuration values."""
238 | print("\nConfigurations:")
239 | for a in dir(self):
240 | if not a.startswith("__") and not callable(getattr(self, a)):
241 | print("{:30} {}".format(a, getattr(self, a)))
242 | print("\n")
243 |
--------------------------------------------------------------------------------
/mrcnn/parallel_model.py:
--------------------------------------------------------------------------------
1 | """
2 | Mask R-CNN
3 | Multi-GPU Support for Keras.
4 |
5 | Copyright (c) 2017 Matterport, Inc.
6 | Licensed under the MIT License (see LICENSE for details)
7 | Written by Waleed Abdulla
8 |
9 | Ideas and a small code snippets from these sources:
10 | https://github.com/fchollet/keras/issues/2436
11 | https://medium.com/@kuza55/transparent-multi-gpu-training-on-tensorflow-with-keras-8b0016fd9012
12 | https://github.com/avolkov1/keras_experiments/blob/master/keras_exp/multigpu/
13 | https://github.com/fchollet/keras/blob/master/keras/utils/training_utils.py
14 | """
15 |
16 | import tensorflow as tf
17 | import keras.backend as K
18 | import keras.layers as KL
19 | import keras.models as KM
20 |
21 |
22 | class ParallelModel(KM.Model):
23 | """Subclasses the standard Keras Model and adds multi-GPU support.
24 | It works by creating a copy of the model on each GPU. Then it slices
25 | the inputs and sends a slice to each copy of the model, and then
26 | merges the outputs together and applies the loss on the combined
27 | outputs.
28 | """
29 |
30 | def __init__(self, keras_model, gpu_count):
31 | """Class constructor.
32 | keras_model: The Keras model to parallelize
33 | gpu_count: Number of GPUs. Must be > 1
34 | """
35 | self.inner_model = keras_model
36 | self.gpu_count = gpu_count
37 | merged_outputs = self.make_parallel()
38 | super(ParallelModel, self).__init__(inputs=self.inner_model.inputs,
39 | outputs=merged_outputs)
40 |
41 | def __getattribute__(self, attrname):
42 | """Redirect loading and saving methods to the inner model. That's where
43 | the weights are stored."""
44 | if 'load' in attrname or 'save' in attrname:
45 | return getattr(self.inner_model, attrname)
46 | return super(ParallelModel, self).__getattribute__(attrname)
47 |
48 | def summary(self, *args, **kwargs):
49 | """Override summary() to display summaries of both, the wrapper
50 | and inner models."""
51 | super(ParallelModel, self).summary(*args, **kwargs)
52 | self.inner_model.summary(*args, **kwargs)
53 |
54 | def make_parallel(self):
55 | """Creates a new wrapper model that consists of multiple replicas of
56 | the original model placed on different GPUs.
57 | """
58 | # Slice inputs. Slice inputs on the CPU to avoid sending a copy
59 | # of the full inputs to all GPUs. Saves on bandwidth and memory.
60 | input_slices = {name: tf.split(x, self.gpu_count)
61 | for name, x in zip(self.inner_model.input_names,
62 | self.inner_model.inputs)}
63 |
64 | output_names = self.inner_model.output_names
65 | outputs_all = []
66 | for i in range(len(self.inner_model.outputs)):
67 | outputs_all.append([])
68 |
69 | # Run the model call() on each GPU to place the ops there
70 | for i in range(self.gpu_count):
71 | with tf.device('/gpu:%d' % i):
72 | with tf.name_scope('tower_%d' % i):
73 | # Run a slice of inputs through this replica
74 | zipped_inputs = zip(self.inner_model.input_names,
75 | self.inner_model.inputs)
76 | inputs = [
77 | KL.Lambda(lambda s: input_slices[name][i],
78 | output_shape=lambda s: (None,) + s[1:])(tensor)
79 | for name, tensor in zipped_inputs]
80 | # Create the model replica and get the outputs
81 | outputs = self.inner_model(inputs)
82 | if not isinstance(outputs, list):
83 | outputs = [outputs]
84 | # Save the outputs for merging back together later
85 | for l, o in enumerate(outputs):
86 | outputs_all[l].append(o)
87 |
88 | # Merge outputs on CPU
89 | with tf.device('/cpu:0'):
90 | merged = []
91 | for outputs, name in zip(outputs_all, output_names):
92 | # Concatenate or average outputs?
93 | # Outputs usually have a batch dimension and we concatenate
94 | # across it. If they don't, then the output is likely a loss
95 | # or a metric value that gets averaged across the batch.
96 | # Keras expects losses and metrics to be scalars.
97 | if K.int_shape(outputs[0]) == ():
98 | # Average
99 | m = KL.Lambda(lambda o: tf.add_n(o) / len(outputs), name=name)(outputs)
100 | else:
101 | # Concatenate
102 | m = KL.Concatenate(axis=0, name=name)(outputs)
103 | merged.append(m)
104 | return merged
105 |
106 |
107 | if __name__ == "__main__":
108 | # Testing code below. It creates a simple model to train on MNIST and
109 | # tries to run it on 2 GPUs. It saves the graph so it can be viewed
110 | # in TensorBoard. Run it as:
111 | #
112 | # python3 parallel_model.py
113 |
114 | import os
115 | import numpy as np
116 | import keras.optimizers
117 | from keras.datasets import mnist
118 | from keras.preprocessing.image import ImageDataGenerator
119 |
120 | GPU_COUNT = 2
121 |
122 | # Root directory of the project
123 | ROOT_DIR = os.path.abspath("../")
124 |
125 | # Directory to save logs and trained model
126 | MODEL_DIR = os.path.join(ROOT_DIR, "logs")
127 |
128 | def build_model(x_train, num_classes):
129 | # Reset default graph. Keras leaves old ops in the graph,
130 | # which are ignored for execution but clutter graph
131 | # visualization in TensorBoard.
132 | tf.reset_default_graph()
133 |
134 | inputs = KL.Input(shape=x_train.shape[1:], name="input_image")
135 | x = KL.Conv2D(32, (3, 3), activation='relu', padding="same",
136 | name="conv1")(inputs)
137 | x = KL.Conv2D(64, (3, 3), activation='relu', padding="same",
138 | name="conv2")(x)
139 | x = KL.MaxPooling2D(pool_size=(2, 2), name="pool1")(x)
140 | x = KL.Flatten(name="flat1")(x)
141 | x = KL.Dense(128, activation='relu', name="dense1")(x)
142 | x = KL.Dense(num_classes, activation='softmax', name="dense2")(x)
143 |
144 | return KM.Model(inputs, x, "digit_classifier_model")
145 |
146 | # Load MNIST Data
147 | (x_train, y_train), (x_test, y_test) = mnist.load_data()
148 | x_train = np.expand_dims(x_train, -1).astype('float32') / 255
149 | x_test = np.expand_dims(x_test, -1).astype('float32') / 255
150 |
151 | print('x_train shape:', x_train.shape)
152 | print('x_test shape:', x_test.shape)
153 |
154 | # Build data generator and model
155 | datagen = ImageDataGenerator()
156 | model = build_model(x_train, 10)
157 |
158 | # Add multi-GPU support.
159 | model = ParallelModel(model, GPU_COUNT)
160 |
161 | optimizer = keras.optimizers.SGD(lr=0.01, momentum=0.9, clipnorm=5.0)
162 |
163 | model.compile(loss='sparse_categorical_crossentropy',
164 | optimizer=optimizer, metrics=['accuracy'])
165 |
166 | model.summary()
167 |
168 | # Train
169 | model.fit_generator(
170 | datagen.flow(x_train, y_train, batch_size=64),
171 | steps_per_epoch=50, epochs=10, verbose=1,
172 | validation_data=(x_test, y_test),
173 | callbacks=[keras.callbacks.TensorBoard(log_dir=MODEL_DIR,
174 | write_graph=True)]
175 | )
176 |
--------------------------------------------------------------------------------
/mrcnn/utils.py:
--------------------------------------------------------------------------------
1 | """
2 | Mask R-CNN
3 | Common utility functions and classes.
4 |
5 | Copyright (c) 2017 Matterport, Inc.
6 | Licensed under the MIT License (see LICENSE for details)
7 | Written by Waleed Abdulla
8 | """
9 |
10 | import sys
11 | import os
12 | import math
13 | import random
14 | import numpy as np
15 | import tensorflow as tf
16 | import scipy
17 | import skimage.color
18 | import skimage.io
19 | import skimage.transform
20 | import urllib.request
21 | import shutil
22 | import warnings
23 | from distutils.version import LooseVersion
24 |
25 | # URL from which to download the latest COCO trained weights
26 | COCO_MODEL_URL = "https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5"
27 |
28 |
29 | ############################################################
30 | # Bounding Boxes
31 | ############################################################
32 |
33 | def extract_bboxes(mask):
34 | """Compute bounding boxes from masks.
35 | mask: [height, width, num_instances]. Mask pixels are either 1 or 0.
36 |
37 | Returns: bbox array [num_instances, (y1, x1, y2, x2)].
38 | """
39 | boxes = np.zeros([mask.shape[-1], 4], dtype=np.int32)
40 | for i in range(mask.shape[-1]):
41 | m = mask[:, :, i]
42 | # Bounding box.
43 | horizontal_indicies = np.where(np.any(m, axis=0))[0]
44 | vertical_indicies = np.where(np.any(m, axis=1))[0]
45 | if horizontal_indicies.shape[0]:
46 | x1, x2 = horizontal_indicies[[0, -1]]
47 | y1, y2 = vertical_indicies[[0, -1]]
48 | # x2 and y2 should not be part of the box. Increment by 1.
49 | x2 += 1
50 | y2 += 1
51 | else:
52 | # No mask for this instance. Might happen due to
53 | # resizing or cropping. Set bbox to zeros
54 | x1, x2, y1, y2 = 0, 0, 0, 0
55 | boxes[i] = np.array([y1, x1, y2, x2])
56 | return boxes.astype(np.int32)
57 |
58 |
59 | def compute_iou(box, boxes, box_area, boxes_area):
60 | """Calculates IoU of the given box with the array of the given boxes.
61 | box: 1D vector [y1, x1, y2, x2]
62 | boxes: [boxes_count, (y1, x1, y2, x2)]
63 | box_area: float. the area of 'box'
64 | boxes_area: array of length boxes_count.
65 |
66 | Note: the areas are passed in rather than calculated here for
67 | efficiency. Calculate once in the caller to avoid duplicate work.
68 | """
69 | # Calculate intersection areas
70 | y1 = np.maximum(box[0], boxes[:, 0])
71 | y2 = np.minimum(box[2], boxes[:, 2])
72 | x1 = np.maximum(box[1], boxes[:, 1])
73 | x2 = np.minimum(box[3], boxes[:, 3])
74 | intersection = np.maximum(x2 - x1, 0) * np.maximum(y2 - y1, 0)
75 | union = box_area + boxes_area[:] - intersection[:]
76 | iou = intersection / union
77 | return iou
78 |
79 |
80 | def compute_overlaps(boxes1, boxes2):
81 | """Computes IoU overlaps between two sets of boxes.
82 | boxes1, boxes2: [N, (y1, x1, y2, x2)].
83 |
84 | For better performance, pass the largest set first and the smaller second.
85 | """
86 | # Areas of anchors and GT boxes
87 | area1 = (boxes1[:, 2] - boxes1[:, 0]) * (boxes1[:, 3] - boxes1[:, 1])
88 | area2 = (boxes2[:, 2] - boxes2[:, 0]) * (boxes2[:, 3] - boxes2[:, 1])
89 |
90 | # Compute overlaps to generate matrix [boxes1 count, boxes2 count]
91 | # Each cell contains the IoU value.
92 | overlaps = np.zeros((boxes1.shape[0], boxes2.shape[0]))
93 | for i in range(overlaps.shape[1]):
94 | box2 = boxes2[i]
95 | overlaps[:, i] = compute_iou(box2, boxes1, area2[i], area1)
96 | return overlaps
97 |
98 |
99 | def compute_overlaps_masks(masks1, masks2):
100 | """Computes IoU overlaps between two sets of masks.
101 | masks1, masks2: [Height, Width, instances]
102 | """
103 |
104 | # If either set of masks is empty return empty result
105 | if masks1.shape[0] == 0 or masks2.shape[0] == 0:
106 | return np.zeros((masks1.shape[0], masks2.shape[-1]))
107 | # flatten masks and compute their areas
108 | masks1 = np.reshape(masks1 > .5, (-1, masks1.shape[-1])).astype(np.float32)
109 | masks2 = np.reshape(masks2 > .5, (-1, masks2.shape[-1])).astype(np.float32)
110 | area1 = np.sum(masks1, axis=0)
111 | area2 = np.sum(masks2, axis=0)
112 |
113 | # intersections and union
114 | intersections = np.dot(masks1.T, masks2)
115 | union = area1[:, None] + area2[None, :] - intersections
116 | overlaps = intersections / union
117 |
118 | return overlaps
119 |
120 |
121 | def non_max_suppression(boxes, scores, threshold):
122 | """Performs non-maximum suppression and returns indices of kept boxes.
123 | boxes: [N, (y1, x1, y2, x2)]. Notice that (y2, x2) lays outside the box.
124 | scores: 1-D array of box scores.
125 | threshold: Float. IoU threshold to use for filtering.
126 | """
127 | assert boxes.shape[0] > 0
128 | if boxes.dtype.kind != "f":
129 | boxes = boxes.astype(np.float32)
130 |
131 | # Compute box areas
132 | y1 = boxes[:, 0]
133 | x1 = boxes[:, 1]
134 | y2 = boxes[:, 2]
135 | x2 = boxes[:, 3]
136 | area = (y2 - y1) * (x2 - x1)
137 |
138 | # Get indicies of boxes sorted by scores (highest first)
139 | ixs = scores.argsort()[::-1]
140 |
141 | pick = []
142 | while len(ixs) > 0:
143 | # Pick top box and add its index to the list
144 | i = ixs[0]
145 | pick.append(i)
146 | # Compute IoU of the picked box with the rest
147 | iou = compute_iou(boxes[i], boxes[ixs[1:]], area[i], area[ixs[1:]])
148 | # Identify boxes with IoU over the threshold. This
149 | # returns indices into ixs[1:], so add 1 to get
150 | # indices into ixs.
151 | remove_ixs = np.where(iou > threshold)[0] + 1
152 | # Remove indices of the picked and overlapped boxes.
153 | ixs = np.delete(ixs, remove_ixs)
154 | ixs = np.delete(ixs, 0)
155 | return np.array(pick, dtype=np.int32)
156 |
157 |
158 | def apply_box_deltas(boxes, deltas):
159 | """Applies the given deltas to the given boxes.
160 | boxes: [N, (y1, x1, y2, x2)]. Note that (y2, x2) is outside the box.
161 | deltas: [N, (dy, dx, log(dh), log(dw))]
162 | """
163 | boxes = boxes.astype(np.float32)
164 | # Convert to y, x, h, w
165 | height = boxes[:, 2] - boxes[:, 0]
166 | width = boxes[:, 3] - boxes[:, 1]
167 | center_y = boxes[:, 0] + 0.5 * height
168 | center_x = boxes[:, 1] + 0.5 * width
169 | # Apply deltas
170 | center_y += deltas[:, 0] * height
171 | center_x += deltas[:, 1] * width
172 | height *= np.exp(deltas[:, 2])
173 | width *= np.exp(deltas[:, 3])
174 | # Convert back to y1, x1, y2, x2
175 | y1 = center_y - 0.5 * height
176 | x1 = center_x - 0.5 * width
177 | y2 = y1 + height
178 | x2 = x1 + width
179 | return np.stack([y1, x1, y2, x2], axis=1)
180 |
181 |
182 | def box_refinement_graph(box, gt_box):
183 | """Compute refinement needed to transform box to gt_box.
184 | box and gt_box are [N, (y1, x1, y2, x2)]
185 | """
186 | box = tf.cast(box, tf.float32)
187 | gt_box = tf.cast(gt_box, tf.float32)
188 |
189 | height = box[:, 2] - box[:, 0]
190 | width = box[:, 3] - box[:, 1]
191 | center_y = box[:, 0] + 0.5 * height
192 | center_x = box[:, 1] + 0.5 * width
193 |
194 | gt_height = gt_box[:, 2] - gt_box[:, 0]
195 | gt_width = gt_box[:, 3] - gt_box[:, 1]
196 | gt_center_y = gt_box[:, 0] + 0.5 * gt_height
197 | gt_center_x = gt_box[:, 1] + 0.5 * gt_width
198 |
199 | dy = (gt_center_y - center_y) / height
200 | dx = (gt_center_x - center_x) / width
201 | dh = tf.log(gt_height / height)
202 | dw = tf.log(gt_width / width)
203 |
204 | result = tf.stack([dy, dx, dh, dw], axis=1)
205 | return result
206 |
207 |
208 | def box_refinement(box, gt_box):
209 | """Compute refinement needed to transform box to gt_box.
210 | box and gt_box are [N, (y1, x1, y2, x2)]. (y2, x2) is
211 | assumed to be outside the box.
212 | """
213 | box = box.astype(np.float32)
214 | gt_box = gt_box.astype(np.float32)
215 |
216 | height = box[:, 2] - box[:, 0]
217 | width = box[:, 3] - box[:, 1]
218 | center_y = box[:, 0] + 0.5 * height
219 | center_x = box[:, 1] + 0.5 * width
220 |
221 | gt_height = gt_box[:, 2] - gt_box[:, 0]
222 | gt_width = gt_box[:, 3] - gt_box[:, 1]
223 | gt_center_y = gt_box[:, 0] + 0.5 * gt_height
224 | gt_center_x = gt_box[:, 1] + 0.5 * gt_width
225 |
226 | dy = (gt_center_y - center_y) / height
227 | dx = (gt_center_x - center_x) / width
228 | dh = np.log(gt_height / height)
229 | dw = np.log(gt_width / width)
230 |
231 | return np.stack([dy, dx, dh, dw], axis=1)
232 |
233 |
234 | ############################################################
235 | # Dataset
236 | ############################################################
237 |
238 | class Dataset(object):
239 | """The base class for dataset classes.
240 | To use it, create a new class that adds functions specific to the dataset
241 | you want to use. For example:
242 |
243 | class CatsAndDogsDataset(Dataset):
244 | def load_cats_and_dogs(self):
245 | ...
246 | def load_mask(self, image_id):
247 | ...
248 | def image_reference(self, image_id):
249 | ...
250 |
251 | See COCODataset and ShapesDataset as examples.
252 | """
253 |
254 | def __init__(self, class_map=None):
255 | self._image_ids = []
256 | self.image_info = []
257 | # Background is always the first class
258 | self.class_info = [{"source": "", "id": 0, "name": "BG"}]
259 | self.source_class_ids = {}
260 |
261 | def add_class(self, source, class_id, class_name):
262 | assert "." not in source, "Source name cannot contain a dot"
263 | # Does the class exist already?
264 | for info in self.class_info:
265 | if info['source'] == source and info["id"] == class_id:
266 | # source.class_id combination already available, skip
267 | return
268 | # Add the class
269 | self.class_info.append({
270 | "source": source,
271 | "id": class_id,
272 | "name": class_name,
273 | })
274 |
275 | def add_image(self, source, image_id, path, **kwargs):
276 | image_info = {
277 | "id": image_id,
278 | "source": source,
279 | "path": path,
280 | }
281 | image_info.update(kwargs)
282 | self.image_info.append(image_info)
283 |
284 | def image_reference(self, image_id):
285 | """Return a link to the image in its source Website or details about
286 | the image that help looking it up or debugging it.
287 |
288 | Override for your dataset, but pass to this function
289 | if you encounter images not in your dataset.
290 | """
291 | return ""
292 |
293 | def prepare(self, class_map=None):
294 | """Prepares the Dataset class for use.
295 |
296 | TODO: class map is not supported yet. When done, it should handle mapping
297 | classes from different datasets to the same class ID.
298 | """
299 |
300 | def clean_name(name):
301 | """Returns a shorter version of object names for cleaner display."""
302 | return ",".join(name.split(",")[:1])
303 |
304 | # Build (or rebuild) everything else from the info dicts.
305 | self.num_classes = len(self.class_info)
306 | self.class_ids = np.arange(self.num_classes)
307 | self.class_names = [clean_name(c["name"]) for c in self.class_info]
308 | self.num_images = len(self.image_info)
309 | self._image_ids = np.arange(self.num_images)
310 |
311 | # Mapping from source class and image IDs to internal IDs
312 | self.class_from_source_map = {"{}.{}".format(info['source'], info['id']): id
313 | for info, id in zip(self.class_info, self.class_ids)}
314 | self.image_from_source_map = {"{}.{}".format(info['source'], info['id']): id
315 | for info, id in zip(self.image_info, self.image_ids)}
316 |
317 | # Map sources to class_ids they support
318 | self.sources = list(set([i['source'] for i in self.class_info]))
319 | self.source_class_ids = {}
320 | # Loop over datasets
321 | for source in self.sources:
322 | self.source_class_ids[source] = []
323 | # Find classes that belong to this dataset
324 | for i, info in enumerate(self.class_info):
325 | # Include BG class in all datasets
326 | if i == 0 or source == info['source']:
327 | self.source_class_ids[source].append(i)
328 |
329 | def map_source_class_id(self, source_class_id):
330 | """Takes a source class ID and returns the int class ID assigned to it.
331 |
332 | For example:
333 | dataset.map_source_class_id("coco.12") -> 23
334 | """
335 | return self.class_from_source_map[source_class_id]
336 |
337 | def get_source_class_id(self, class_id, source):
338 | """Map an internal class ID to the corresponding class ID in the source dataset."""
339 | info = self.class_info[class_id]
340 | assert info['source'] == source
341 | return info['id']
342 |
343 | @property
344 | def image_ids(self):
345 | return self._image_ids
346 |
347 | def source_image_link(self, image_id):
348 | """Returns the path or URL to the image.
349 | Override this to return a URL to the image if it's available online for easy
350 | debugging.
351 | """
352 | return self.image_info[image_id]["path"]
353 |
354 | def load_image(self, image_id):
355 | """Load the specified image and return a [H,W,3] Numpy array.
356 | """
357 | # Load image
358 | image = skimage.io.imread(self.image_info[image_id]['path'])
359 | # If grayscale. Convert to RGB for consistency.
360 | if image.ndim != 3:
361 | image = skimage.color.gray2rgb(image)
362 | # If has an alpha channel, remove it for consistency
363 | if image.shape[-1] == 4:
364 | image = image[..., :3]
365 | return image
366 |
367 | def load_mask(self, image_id):
368 | """Load instance masks for the given image.
369 |
370 | Different datasets use different ways to store masks. Override this
371 | method to load instance masks and return them in the form of am
372 | array of binary masks of shape [height, width, instances].
373 |
374 | Returns:
375 | masks: A bool array of shape [height, width, instance count] with
376 | a binary mask per instance.
377 | class_ids: a 1D array of class IDs of the instance masks.
378 | """
379 | # Override this function to load a mask from your dataset.
380 | # Otherwise, it returns an empty mask.
381 | mask = np.empty([0, 0, 0])
382 | class_ids = np.empty([0], np.int32)
383 | return mask, class_ids
384 |
385 |
386 | def resize_image(image, min_dim=None, max_dim=None, min_scale=None, mode="square"):
387 | """Resizes an image keeping the aspect ratio unchanged.
388 |
389 | min_dim: if provided, resizes the image such that it's smaller
390 | dimension == min_dim
391 | max_dim: if provided, ensures that the image longest side doesn't
392 | exceed this value.
393 | min_scale: if provided, ensure that the image is scaled up by at least
394 | this percent even if min_dim doesn't require it.
395 | mode: Resizing mode.
396 | none: No resizing. Return the image unchanged.
397 | square: Resize and pad with zeros to get a square image
398 | of size [max_dim, max_dim].
399 | pad64: Pads width and height with zeros to make them multiples of 64.
400 | If min_dim or min_scale are provided, it scales the image up
401 | before padding. max_dim is ignored in this mode.
402 | The multiple of 64 is needed to ensure smooth scaling of feature
403 | maps up and down the 6 levels of the FPN pyramid (2**6=64).
404 | crop: Picks random crops from the image. First, scales the image based
405 | on min_dim and min_scale, then picks a random crop of
406 | size min_dim x min_dim. Can be used in training only.
407 | max_dim is not used in this mode.
408 |
409 | Returns:
410 | image: the resized image
411 | window: (y1, x1, y2, x2). If max_dim is provided, padding might
412 | be inserted in the returned image. If so, this window is the
413 | coordinates of the image part of the full image (excluding
414 | the padding). The x2, y2 pixels are not included.
415 | scale: The scale factor used to resize the image
416 | padding: Padding added to the image [(top, bottom), (left, right), (0, 0)]
417 | """
418 | # Keep track of image dtype and return results in the same dtype
419 | image_dtype = image.dtype
420 | # Default window (y1, x1, y2, x2) and default scale == 1.
421 | h, w = image.shape[:2]
422 | window = (0, 0, h, w)
423 | scale = 1
424 | padding = [(0, 0), (0, 0), (0, 0)]
425 | crop = None
426 |
427 | if mode == "none":
428 | return image, window, scale, padding, crop
429 |
430 | # Scale?
431 | if min_dim:
432 | # Scale up but not down
433 | scale = max(1, min_dim / min(h, w))
434 | if min_scale and scale < min_scale:
435 | scale = min_scale
436 |
437 | # Does it exceed max dim?
438 | if max_dim and mode == "square":
439 | image_max = max(h, w)
440 | if round(image_max * scale) > max_dim:
441 | scale = max_dim / image_max
442 |
443 | # Resize image using bilinear interpolation
444 | if scale != 1:
445 | image = resize(image, (round(h * scale), round(w * scale)),
446 | preserve_range=True)
447 |
448 | # Need padding or cropping?
449 | if mode == "square":
450 | # Get new height and width
451 | h, w = image.shape[:2]
452 | top_pad = (max_dim - h) // 2
453 | bottom_pad = max_dim - h - top_pad
454 | left_pad = (max_dim - w) // 2
455 | right_pad = max_dim - w - left_pad
456 | padding = [(top_pad, bottom_pad), (left_pad, right_pad), (0, 0)]
457 | image = np.pad(image, padding, mode='constant', constant_values=0)
458 | window = (top_pad, left_pad, h + top_pad, w + left_pad)
459 | elif mode == "pad64":
460 | h, w = image.shape[:2]
461 | # Both sides must be divisible by 64
462 | assert min_dim % 64 == 0, "Minimum dimension must be a multiple of 64"
463 | # Height
464 | if h % 64 > 0:
465 | max_h = h - (h % 64) + 64
466 | top_pad = (max_h - h) // 2
467 | bottom_pad = max_h - h - top_pad
468 | else:
469 | top_pad = bottom_pad = 0
470 | # Width
471 | if w % 64 > 0:
472 | max_w = w - (w % 64) + 64
473 | left_pad = (max_w - w) // 2
474 | right_pad = max_w - w - left_pad
475 | else:
476 | left_pad = right_pad = 0
477 | padding = [(top_pad, bottom_pad), (left_pad, right_pad), (0, 0)]
478 | image = np.pad(image, padding, mode='constant', constant_values=0)
479 | window = (top_pad, left_pad, h + top_pad, w + left_pad)
480 | elif mode == "crop":
481 | # Pick a random crop
482 | h, w = image.shape[:2]
483 | y = random.randint(0, (h - min_dim))
484 | x = random.randint(0, (w - min_dim))
485 | crop = (y, x, min_dim, min_dim)
486 | image = image[y:y + min_dim, x:x + min_dim]
487 | window = (0, 0, min_dim, min_dim)
488 | else:
489 | raise Exception("Mode {} not supported".format(mode))
490 | return image.astype(image_dtype), window, scale, padding, crop
491 |
492 |
493 | def resize_mask(mask, scale, padding, crop=None):
494 | """Resizes a mask using the given scale and padding.
495 | Typically, you get the scale and padding from resize_image() to
496 | ensure both, the image and the mask, are resized consistently.
497 |
498 | scale: mask scaling factor
499 | padding: Padding to add to the mask in the form
500 | [(top, bottom), (left, right), (0, 0)]
501 | """
502 | # Suppress warning from scipy 0.13.0, the output shape of zoom() is
503 | # calculated with round() instead of int()
504 | with warnings.catch_warnings():
505 | warnings.simplefilter("ignore")
506 | mask = scipy.ndimage.zoom(mask, zoom=[scale, scale, 1], order=0)
507 | if crop is not None:
508 | y, x, h, w = crop
509 | mask = mask[y:y + h, x:x + w]
510 | else:
511 | mask = np.pad(mask, padding, mode='constant', constant_values=0)
512 | return mask
513 |
514 |
515 | def minimize_mask(bbox, mask, mini_shape):
516 | """Resize masks to a smaller version to reduce memory load.
517 | Mini-masks can be resized back to image scale using expand_masks()
518 |
519 | See inspect_data.ipynb notebook for more details.
520 | """
521 | mini_mask = np.zeros(mini_shape + (mask.shape[-1],), dtype=bool)
522 | for i in range(mask.shape[-1]):
523 | # Pick slice and cast to bool in case load_mask() returned wrong dtype
524 | m = mask[:, :, i].astype(bool)
525 | y1, x1, y2, x2 = bbox[i][:4]
526 | m = m[y1:y2, x1:x2]
527 | if m.size == 0:
528 | raise Exception("Invalid bounding box with area of zero")
529 | # Resize with bilinear interpolation
530 | m = resize(m, mini_shape)
531 | mini_mask[:, :, i] = np.around(m).astype(np.bool)
532 | return mini_mask
533 |
534 |
535 | def expand_mask(bbox, mini_mask, image_shape):
536 | """Resizes mini masks back to image size. Reverses the change
537 | of minimize_mask().
538 |
539 | See inspect_data.ipynb notebook for more details.
540 | """
541 | mask = np.zeros(image_shape[:2] + (mini_mask.shape[-1],), dtype=bool)
542 | for i in range(mask.shape[-1]):
543 | m = mini_mask[:, :, i]
544 | y1, x1, y2, x2 = bbox[i][:4]
545 | h = y2 - y1
546 | w = x2 - x1
547 | # Resize with bilinear interpolation
548 | m = resize(m, (h, w))
549 | mask[y1:y2, x1:x2, i] = np.around(m).astype(np.bool)
550 | return mask
551 |
552 |
553 | # TODO: Build and use this function to reduce code duplication
554 | def mold_mask(mask, config):
555 | pass
556 |
557 |
558 | def unmold_mask(mask, bbox, image_shape):
559 | """Converts a mask generated by the neural network to a format similar
560 | to its original shape.
561 | mask: [height, width] of type float. A small, typically 28x28 mask.
562 | bbox: [y1, x1, y2, x2]. The box to fit the mask in.
563 |
564 | Returns a binary mask with the same size as the original image.
565 | """
566 | threshold = 0.5
567 | y1, x1, y2, x2 = bbox
568 | mask = resize(mask, (y2 - y1, x2 - x1))
569 | mask = np.where(mask >= threshold, 1, 0).astype(np.bool)
570 |
571 | # Put the mask in the right location.
572 | full_mask = np.zeros(image_shape[:2], dtype=np.bool)
573 | full_mask[y1:y2, x1:x2] = mask
574 | return full_mask
575 |
576 |
577 | ############################################################
578 | # Anchors
579 | ############################################################
580 |
581 | def generate_anchors(scales, ratios, shape, feature_stride, anchor_stride):
582 | """
583 | scales: 1D array of anchor sizes in pixels. Example: [32, 64, 128]
584 | ratios: 1D array of anchor ratios of width/height. Example: [0.5, 1, 2]
585 | shape: [height, width] spatial shape of the feature map over which
586 | to generate anchors.
587 | feature_stride: Stride of the feature map relative to the image in pixels.
588 | anchor_stride: Stride of anchors on the feature map. For example, if the
589 | value is 2 then generate anchors for every other feature map pixel.
590 | """
591 | # Get all combinations of scales and ratios
592 | scales, ratios = np.meshgrid(np.array(scales), np.array(ratios))
593 | scales = scales.flatten()
594 | ratios = ratios.flatten()
595 |
596 | # Enumerate heights and widths from scales and ratios
597 | heights = scales / np.sqrt(ratios)
598 | widths = scales * np.sqrt(ratios)
599 |
600 | # Enumerate shifts in feature space
601 | shifts_y = np.arange(0, shape[0], anchor_stride) * feature_stride
602 | shifts_x = np.arange(0, shape[1], anchor_stride) * feature_stride
603 | shifts_x, shifts_y = np.meshgrid(shifts_x, shifts_y)
604 |
605 | # Enumerate combinations of shifts, widths, and heights
606 | box_widths, box_centers_x = np.meshgrid(widths, shifts_x)
607 | box_heights, box_centers_y = np.meshgrid(heights, shifts_y)
608 |
609 | # Reshape to get a list of (y, x) and a list of (h, w)
610 | box_centers = np.stack(
611 | [box_centers_y, box_centers_x], axis=2).reshape([-1, 2])
612 | box_sizes = np.stack([box_heights, box_widths], axis=2).reshape([-1, 2])
613 |
614 | # Convert to corner coordinates (y1, x1, y2, x2)
615 | boxes = np.concatenate([box_centers - 0.5 * box_sizes,
616 | box_centers + 0.5 * box_sizes], axis=1)
617 | return boxes
618 |
619 |
620 | def generate_pyramid_anchors(scales, ratios, feature_shapes, feature_strides,
621 | anchor_stride):
622 | """Generate anchors at different levels of a feature pyramid. Each scale
623 | is associated with a level of the pyramid, but each ratio is used in
624 | all levels of the pyramid.
625 |
626 | Returns:
627 | anchors: [N, (y1, x1, y2, x2)]. All generated anchors in one array. Sorted
628 | with the same order of the given scales. So, anchors of scale[0] come
629 | first, then anchors of scale[1], and so on.
630 | """
631 | # Anchors
632 | # [anchor_count, (y1, x1, y2, x2)]
633 | anchors = []
634 | for i in range(len(scales)):
635 | anchors.append(generate_anchors(scales[i], ratios, feature_shapes[i],
636 | feature_strides[i], anchor_stride))
637 | return np.concatenate(anchors, axis=0)
638 |
639 |
640 | ############################################################
641 | # Miscellaneous
642 | ############################################################
643 |
644 | def trim_zeros(x):
645 | """It's common to have tensors larger than the available data and
646 | pad with zeros. This function removes rows that are all zeros.
647 |
648 | x: [rows, columns].
649 | """
650 | assert len(x.shape) == 2
651 | return x[~np.all(x == 0, axis=1)]
652 |
653 |
654 | def compute_matches(gt_boxes, gt_class_ids, gt_masks,
655 | pred_boxes, pred_class_ids, pred_scores, pred_masks,
656 | iou_threshold=0.5, score_threshold=0.0):
657 | """Finds matches between prediction and ground truth instances.
658 |
659 | Returns:
660 | gt_match: 1-D array. For each GT box it has the index of the matched
661 | predicted box.
662 | pred_match: 1-D array. For each predicted box, it has the index of
663 | the matched ground truth box.
664 | overlaps: [pred_boxes, gt_boxes] IoU overlaps.
665 | """
666 | # Trim zero padding
667 | # TODO: cleaner to do zero unpadding upstream
668 | gt_boxes = trim_zeros(gt_boxes)
669 | gt_masks = gt_masks[..., :gt_boxes.shape[0]]
670 | pred_boxes = trim_zeros(pred_boxes)
671 | pred_scores = pred_scores[:pred_boxes.shape[0]]
672 | # Sort predictions by score from high to low
673 | indices = np.argsort(pred_scores)[::-1]
674 | pred_boxes = pred_boxes[indices]
675 | pred_class_ids = pred_class_ids[indices]
676 | pred_scores = pred_scores[indices]
677 | pred_masks = pred_masks[..., indices]
678 |
679 | # Compute IoU overlaps [pred_masks, gt_masks]
680 | overlaps = compute_overlaps_masks(pred_masks, gt_masks)
681 |
682 | # Loop through predictions and find matching ground truth boxes
683 | match_count = 0
684 | pred_match = -1 * np.ones([pred_boxes.shape[0]])
685 | gt_match = -1 * np.ones([gt_boxes.shape[0]])
686 | for i in range(len(pred_boxes)):
687 | # Find best matching ground truth box
688 | # 1. Sort matches by score
689 | sorted_ixs = np.argsort(overlaps[i])[::-1]
690 | # 2. Remove low scores
691 | low_score_idx = np.where(overlaps[i, sorted_ixs] < score_threshold)[0]
692 | if low_score_idx.size > 0:
693 | sorted_ixs = sorted_ixs[:low_score_idx[0]]
694 | # 3. Find the match
695 | for j in sorted_ixs:
696 | # If ground truth box is already matched, go to next one
697 | if gt_match[j] > 0:
698 | continue
699 | # If we reach IoU smaller than the threshold, end the loop
700 | iou = overlaps[i, j]
701 | if iou < iou_threshold:
702 | break
703 | # Do we have a match?
704 | if pred_class_ids[i] == gt_class_ids[j]:
705 | match_count += 1
706 | gt_match[j] = i
707 | pred_match[i] = j
708 | break
709 |
710 | return gt_match, pred_match, overlaps
711 |
712 |
713 | def compute_ap(gt_boxes, gt_class_ids, gt_masks,
714 | pred_boxes, pred_class_ids, pred_scores, pred_masks,
715 | iou_threshold=0.5):
716 | """Compute Average Precision at a set IoU threshold (default 0.5).
717 |
718 | Returns:
719 | mAP: Mean Average Precision
720 | precisions: List of precisions at different class score thresholds.
721 | recalls: List of recall values at different class score thresholds.
722 | overlaps: [pred_boxes, gt_boxes] IoU overlaps.
723 | """
724 | # Get matches and overlaps
725 | gt_match, pred_match, overlaps = compute_matches(
726 | gt_boxes, gt_class_ids, gt_masks,
727 | pred_boxes, pred_class_ids, pred_scores, pred_masks,
728 | iou_threshold)
729 |
730 | # Compute precision and recall at each prediction box step
731 | precisions = np.cumsum(pred_match > -1) / (np.arange(len(pred_match)) + 1)
732 | recalls = np.cumsum(pred_match > -1).astype(np.float32) / len(gt_match)
733 |
734 | # Pad with start and end values to simplify the math
735 | precisions = np.concatenate([[0], precisions, [0]])
736 | recalls = np.concatenate([[0], recalls, [1]])
737 |
738 | # Ensure precision values decrease but don't increase. This way, the
739 | # precision value at each recall threshold is the maximum it can be
740 | # for all following recall thresholds, as specified by the VOC paper.
741 | for i in range(len(precisions) - 2, -1, -1):
742 | precisions[i] = np.maximum(precisions[i], precisions[i + 1])
743 |
744 | # Compute mean AP over recall range
745 | indices = np.where(recalls[:-1] != recalls[1:])[0] + 1
746 | mAP = np.sum((recalls[indices] - recalls[indices - 1]) *
747 | precisions[indices])
748 |
749 | return mAP, precisions, recalls, overlaps
750 |
751 |
752 | def compute_ap_range(gt_box, gt_class_id, gt_mask,
753 | pred_box, pred_class_id, pred_score, pred_mask,
754 | iou_thresholds=None, verbose=1):
755 | """Compute AP over a range or IoU thresholds. Default range is 0.5-0.95."""
756 | # Default is 0.5 to 0.95 with increments of 0.05
757 | if iou_thresholds is None:
758 | iou_thresholds = np.arange(0.5, 1.0, 0.05)
759 |
760 | #iou_thresholds = iou_thresholds or np.arange(0.5, 1.0, 0.05)
761 |
762 | # Compute AP over range of IoU thresholds
763 | AP = []
764 | for iou_threshold in iou_thresholds:
765 | ap, precisions, recalls, overlaps =\
766 | compute_ap(gt_box, gt_class_id, gt_mask,
767 | pred_box, pred_class_id, pred_score, pred_mask,
768 | iou_threshold=iou_threshold)
769 | if verbose:
770 | print("AP @{:.2f}:\t {:.3f}".format(iou_threshold, ap))
771 | AP.append(ap)
772 | AP = np.array(AP).mean()
773 | if verbose:
774 | print("AP @{:.2f}-{:.2f}:\t {:.3f}".format(
775 | iou_thresholds[0], iou_thresholds[-1], AP))
776 | return AP
777 |
778 |
779 | def compute_recall(pred_boxes, gt_boxes, iou):
780 | """Compute the recall at the given IoU threshold. It's an indication
781 | of how many GT boxes were found by the given prediction boxes.
782 |
783 | pred_boxes: [N, (y1, x1, y2, x2)] in image coordinates
784 | gt_boxes: [N, (y1, x1, y2, x2)] in image coordinates
785 | """
786 | # Measure overlaps
787 | overlaps = compute_overlaps(pred_boxes, gt_boxes)
788 | iou_max = np.max(overlaps, axis=1)
789 | iou_argmax = np.argmax(overlaps, axis=1)
790 | positive_ids = np.where(iou_max >= iou)[0]
791 | matched_gt_boxes = iou_argmax[positive_ids]
792 |
793 | recall = len(set(matched_gt_boxes)) / gt_boxes.shape[0]
794 | return recall, positive_ids
795 |
796 |
797 | # ## Batch Slicing
798 | # Some custom layers support a batch size of 1 only, and require a lot of work
799 | # to support batches greater than 1. This function slices an input tensor
800 | # across the batch dimension and feeds batches of size 1. Effectively,
801 | # an easy way to support batches > 1 quickly with little code modification.
802 | # In the long run, it's more efficient to modify the code to support large
803 | # batches and getting rid of this function. Consider this a temporary solution
804 | def batch_slice(inputs, graph_fn, batch_size, names=None):
805 | """Splits inputs into slices and feeds each slice to a copy of the given
806 | computation graph and then combines the results. It allows you to run a
807 | graph on a batch of inputs even if the graph is written to support one
808 | instance only.
809 |
810 | inputs: list of tensors. All must have the same first dimension length
811 | graph_fn: A function that returns a TF tensor that's part of a graph.
812 | batch_size: number of slices to divide the data into.
813 | names: If provided, assigns names to the resulting tensors.
814 | """
815 | if not isinstance(inputs, list):
816 | inputs = [inputs]
817 |
818 | outputs = []
819 | for i in range(batch_size):
820 | inputs_slice = [x[i] for x in inputs]
821 | output_slice = graph_fn(*inputs_slice)
822 | if not isinstance(output_slice, (tuple, list)):
823 | output_slice = [output_slice]
824 | outputs.append(output_slice)
825 | # Change outputs from a list of slices where each is
826 | # a list of outputs to a list of outputs and each has
827 | # a list of slices
828 | outputs = list(zip(*outputs))
829 |
830 | if names is None:
831 | names = [None] * len(outputs)
832 |
833 | result = [tf.stack(o, axis=0, name=n)
834 | for o, n in zip(outputs, names)]
835 | if len(result) == 1:
836 | result = result[0]
837 |
838 | return result
839 |
840 |
841 | def download_trained_weights(coco_model_path, verbose=1):
842 | """Download COCO trained weights from Releases.
843 |
844 | coco_model_path: local path of COCO trained weights
845 | """
846 | if verbose > 0:
847 | print("Downloading pretrained model to " + coco_model_path + " ...")
848 | with urllib.request.urlopen(COCO_MODEL_URL) as resp, open(coco_model_path, 'wb') as out:
849 | shutil.copyfileobj(resp, out)
850 | if verbose > 0:
851 | print("... done downloading pretrained model!")
852 |
853 |
854 | def norm_boxes(boxes, shape):
855 | """Converts boxes from pixel coordinates to normalized coordinates.
856 | boxes: [N, (y1, x1, y2, x2)] in pixel coordinates
857 | shape: [..., (height, width)] in pixels
858 |
859 | Note: In pixel coordinates (y2, x2) is outside the box. But in normalized
860 | coordinates it's inside the box.
861 |
862 | Returns:
863 | [N, (y1, x1, y2, x2)] in normalized coordinates
864 | """
865 | h, w = shape
866 | scale = np.array([h - 1, w - 1, h - 1, w - 1])
867 | shift = np.array([0, 0, 1, 1])
868 | return np.divide((boxes - shift), scale).astype(np.float32)
869 |
870 |
871 | def denorm_boxes(boxes, shape):
872 | """Converts boxes from normalized coordinates to pixel coordinates.
873 | boxes: [N, (y1, x1, y2, x2)] in normalized coordinates
874 | shape: [..., (height, width)] in pixels
875 |
876 | Note: In pixel coordinates (y2, x2) is outside the box. But in normalized
877 | coordinates it's inside the box.
878 |
879 | Returns:
880 | [N, (y1, x1, y2, x2)] in pixel coordinates
881 | """
882 | h, w = shape
883 | scale = np.array([h - 1, w - 1, h - 1, w - 1])
884 | shift = np.array([0, 0, 1, 1])
885 | return np.around(np.multiply(boxes, scale) + shift).astype(np.int32)
886 |
887 |
888 | def resize(image, output_shape, order=1, mode='constant', cval=0, clip=True,
889 | preserve_range=False, anti_aliasing=False, anti_aliasing_sigma=None):
890 | """A wrapper for Scikit-Image resize().
891 |
892 | Scikit-Image generates warnings on every call to resize() if it doesn't
893 | receive the right parameters. The right parameters depend on the version
894 | of skimage. This solves the problem by using different parameters per
895 | version. And it provides a central place to control resizing defaults.
896 | """
897 | if LooseVersion(skimage.__version__) >= LooseVersion("0.14"):
898 | # New in 0.14: anti_aliasing. Default it to False for backward
899 | # compatibility with skimage 0.13.
900 | return skimage.transform.resize(
901 | image, output_shape,
902 | order=order, mode=mode, cval=cval, clip=clip,
903 | preserve_range=preserve_range, anti_aliasing=anti_aliasing,
904 | anti_aliasing_sigma=anti_aliasing_sigma)
905 | else:
906 | return skimage.transform.resize(
907 | image, output_shape,
908 | order=order, mode=mode, cval=cval, clip=clip,
909 | preserve_range=preserve_range)
910 |
--------------------------------------------------------------------------------
/mrcnn/visualize.py:
--------------------------------------------------------------------------------
1 | """
2 | Mask R-CNN
3 | Display and Visualization Functions.
4 |
5 | Copyright (c) 2017 Matterport, Inc.
6 | Licensed under the MIT License (see LICENSE for details)
7 | Written by Waleed Abdulla
8 | """
9 |
10 | import os
11 | import sys
12 | import random
13 | import itertools
14 | import colorsys
15 |
16 | import numpy as np
17 | from skimage.measure import find_contours
18 | import matplotlib.pyplot as plt
19 | from matplotlib import patches, lines
20 | from matplotlib.patches import Polygon
21 | import IPython.display
22 |
23 | # Root directory of the project
24 | ROOT_DIR = os.path.abspath("../")
25 |
26 | # Import Mask RCNN
27 | sys.path.append(ROOT_DIR) # To find local version of the library
28 | from mrcnn import utils
29 |
30 |
31 | ############################################################
32 | # Visualization
33 | ############################################################
34 |
35 | def display_images(images, titles=None, cols=4, cmap=None, norm=None,
36 | interpolation=None):
37 | """Display the given set of images, optionally with titles.
38 | images: list or array of image tensors in HWC format.
39 | titles: optional. A list of titles to display with each image.
40 | cols: number of images per row
41 | cmap: Optional. Color map to use. For example, "Blues".
42 | norm: Optional. A Normalize instance to map values to colors.
43 | interpolation: Optional. Image interpolation to use for display.
44 | """
45 | titles = titles if titles is not None else [""] * len(images)
46 | rows = len(images) // cols + 1
47 | plt.figure(figsize=(14, 14 * rows // cols))
48 | i = 1
49 | for image, title in zip(images, titles):
50 | plt.subplot(rows, cols, i)
51 | plt.title(title, fontsize=9)
52 | plt.axis('off')
53 | plt.imshow(image.astype(np.uint8), cmap=cmap,
54 | norm=norm, interpolation=interpolation)
55 | i += 1
56 | plt.show()
57 |
58 |
59 | def random_colors(N, bright=True):
60 | """
61 | Generate random colors.
62 | To get visually distinct colors, generate them in HSV space then
63 | convert to RGB.
64 | """
65 | brightness = 1.0 if bright else 0.7
66 | hsv = [(i / N, 1, brightness) for i in range(N)]
67 | colors = list(map(lambda c: colorsys.hsv_to_rgb(*c), hsv))
68 | random.shuffle(colors)
69 | return colors
70 |
71 |
72 | def apply_mask(image, mask, color, alpha=0.5):
73 | """Apply the given mask to the image.
74 | """
75 | for c in range(3):
76 | image[:, :, c] = np.where(mask == 1,
77 | image[:, :, c] *
78 | (1 - alpha) + alpha * color[c] * 255,
79 | image[:, :, c])
80 | return image
81 |
82 | def save_instances(image, boxes, masks, class_ids, class_names,target_filename=None,
83 | scores=None, title="",
84 | figsize=(16, 16), ax=None,
85 | show_mask=True, show_bbox=True,
86 | colors=None, captions=None,return_instance=None):
87 | """
88 | boxes: [num_instance, (y1, x1, y2, x2, class_id)] in image coordinates.
89 | masks: [height, width, num_instances]
90 | class_ids: [num_instances]
91 | class_names: list of class names of the dataset
92 | target_folder: Saves the instances in target folder.
93 | scores: (optional) confidence scores for each box
94 | title: (optional) Figure title
95 | show_mask, show_bbox: To show masks and bounding boxes or not
96 | figsize: (optional) the size of the image
97 | colors: (optional) An array or colors to use with each object
98 | captions: (optional) A list of strings to use as captions for each object
99 | """
100 | # Number of instances
101 | N = boxes.shape[0]
102 | if not N:
103 | print("\n*** No instances to display *** \n")
104 | else:
105 | assert boxes.shape[0] == masks.shape[-1] == class_ids.shape[0]
106 |
107 | # If no axis is passed, create one and automatically call show()
108 | auto_show = False
109 | if not ax:
110 | _, ax = plt.subplots(1, figsize=figsize)
111 | auto_show = True
112 |
113 | # Generate random colors
114 | colors = colors or random_colors(N)
115 |
116 | # Show area outside image boundaries.
117 | height, width = image.shape[:2]
118 | ax.set_ylim(height + 10, -10)
119 | ax.set_xlim(-10, width + 10)
120 | ax.axis('off')
121 | ax.set_title(title)
122 |
123 | masked_image = image.astype(np.uint32).copy()
124 | for i in range(N):
125 | color = colors[i]
126 |
127 | # Bounding box
128 | if not np.any(boxes[i]):
129 | # Skip this instance. Has no bbox. Likely lost in image cropping.
130 | continue
131 | y1, x1, y2, x2 = boxes[i]
132 | if show_bbox:
133 | p = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2,
134 | alpha=0.7, linestyle="dashed",
135 | edgecolor=color, facecolor='none')
136 | ax.add_patch(p)
137 |
138 | # Label
139 | if not captions:
140 | class_id = class_ids[i]
141 | score = scores[i] if scores is not None else None
142 | label = class_names[class_id]
143 | x = random.randint(x1, (x1 + x2) // 2)
144 | caption = "{} {:.3f}".format(label, score) if score else label
145 | else:
146 | caption = captions[i]
147 | ax.text(x1, y1 + 8, caption,
148 | color='w', size=11, backgroundcolor="none")
149 |
150 | # Mask
151 | mask = masks[:, :, i]
152 | if show_mask:
153 | masked_image = apply_mask(masked_image, mask, color)
154 |
155 | # Mask Polygon
156 | # Pad to ensure proper polygons for masks that touch image edges.
157 | padded_mask = np.zeros(
158 | (mask.shape[0] + 2, mask.shape[1] + 2), dtype=np.uint8)
159 | padded_mask[1:-1, 1:-1] = mask
160 | contours = find_contours(padded_mask, 0.5)
161 | for verts in contours:
162 | # Subtract the padding and flip (y, x) to (x, y)
163 | verts = np.fliplr(verts) - 1
164 | p = Polygon(verts, facecolor="none", edgecolor=color)
165 | ax.add_patch(p)
166 |
167 | if target_filename is None:
168 | target_filename = 'temp_'+str(label)+'_'+str(score)+'.jpg'
169 |
170 | if return_instance is None:
171 | ax.imshow(masked_image.astype(np.uint8))
172 | masked_image = masked_image.astype(np.uint8)
173 | plt.savefig(target_filename)
174 | else:
175 | #return ax.imshow(masked_image.astype(np.uint8)),masked_image
176 | return ax,masked_image
177 |
178 |
179 | #if auto_show:
180 | # plt.show()
181 |
182 |
183 |
184 | def display_instances(image, boxes, masks, class_ids, class_names,
185 | scores=None, title="",
186 | figsize=(16, 16), ax=None,
187 | show_mask=True, show_bbox=True,
188 | colors=None, captions=None):
189 | """
190 | boxes: [num_instance, (y1, x1, y2, x2, class_id)] in image coordinates.
191 | masks: [height, width, num_instances]
192 | class_ids: [num_instances]
193 | class_names: list of class names of the dataset
194 | scores: (optional) confidence scores for each box
195 | title: (optional) Figure title
196 | show_mask, show_bbox: To show masks and bounding boxes or not
197 | figsize: (optional) the size of the image
198 | colors: (optional) An array or colors to use with each object
199 | captions: (optional) A list of strings to use as captions for each object
200 | """
201 | # Number of instances
202 | N = boxes.shape[0]
203 | if not N:
204 | print("\n*** No instances to display *** \n")
205 | else:
206 | assert boxes.shape[0] == masks.shape[-1] == class_ids.shape[0]
207 |
208 | # If no axis is passed, create one and automatically call show()
209 | auto_show = False
210 | if not ax:
211 | _, ax = plt.subplots(1, figsize=figsize)
212 | auto_show = True
213 |
214 | # Generate random colors
215 | colors = colors or random_colors(N)
216 |
217 | # Show area outside image boundaries.
218 | height, width = image.shape[:2]
219 | ax.set_ylim(height + 10, -10)
220 | ax.set_xlim(-10, width + 10)
221 | ax.axis('off')
222 | ax.set_title(title)
223 |
224 | masked_image = image.astype(np.uint32).copy()
225 | for i in range(N):
226 | color = colors[i]
227 |
228 | # Bounding box
229 | if not np.any(boxes[i]):
230 | # Skip this instance. Has no bbox. Likely lost in image cropping.
231 | continue
232 | y1, x1, y2, x2 = boxes[i]
233 | if show_bbox:
234 | p = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2,
235 | alpha=0.7, linestyle="dashed",
236 | edgecolor=color, facecolor='none')
237 | ax.add_patch(p)
238 |
239 | # Label
240 | if not captions:
241 | class_id = class_ids[i]
242 | score = scores[i] if scores is not None else None
243 | label = class_names[class_id]
244 | x = random.randint(x1, (x1 + x2) // 2)
245 | caption = "{} {:.3f}".format(label, score) if score else label
246 | else:
247 | caption = captions[i]
248 | ax.text(x1, y1 + 8, caption,
249 | color='w', size=11, backgroundcolor="none")
250 |
251 | # Mask
252 | mask = masks[:, :, i]
253 | if show_mask:
254 | masked_image = apply_mask(masked_image, mask, color)
255 |
256 | # Mask Polygon
257 | # Pad to ensure proper polygons for masks that touch image edges.
258 | padded_mask = np.zeros(
259 | (mask.shape[0] + 2, mask.shape[1] + 2), dtype=np.uint8)
260 | padded_mask[1:-1, 1:-1] = mask
261 | contours = find_contours(padded_mask, 0.5)
262 | for verts in contours:
263 | # Subtract the padding and flip (y, x) to (x, y)
264 | verts = np.fliplr(verts) - 1
265 | p = Polygon(verts, facecolor="none", edgecolor=color)
266 | ax.add_patch(p)
267 | ax.imshow(masked_image.astype(np.uint8))
268 | if auto_show:
269 | plt.show()
270 |
271 |
272 | def display_differences(image,
273 | gt_box, gt_class_id, gt_mask,
274 | pred_box, pred_class_id, pred_score, pred_mask,
275 | class_names, title="", ax=None,
276 | show_mask=True, show_box=True,
277 | iou_threshold=0.5, score_threshold=0.5):
278 | """Display ground truth and prediction instances on the same image."""
279 | # Match predictions to ground truth
280 | gt_match, pred_match, overlaps = utils.compute_matches(
281 | gt_box, gt_class_id, gt_mask,
282 | pred_box, pred_class_id, pred_score, pred_mask,
283 | iou_threshold=iou_threshold, score_threshold=score_threshold)
284 | # Ground truth = green. Predictions = red
285 | colors = [(0, 1, 0, .8)] * len(gt_match)\
286 | + [(1, 0, 0, 1)] * len(pred_match)
287 | # Concatenate GT and predictions
288 | class_ids = np.concatenate([gt_class_id, pred_class_id])
289 | scores = np.concatenate([np.zeros([len(gt_match)]), pred_score])
290 | boxes = np.concatenate([gt_box, pred_box])
291 | masks = np.concatenate([gt_mask, pred_mask], axis=-1)
292 | # Captions per instance show score/IoU
293 | captions = ["" for m in gt_match] + ["{:.2f} / {:.2f}".format(
294 | pred_score[i],
295 | (overlaps[i, int(pred_match[i])]
296 | if pred_match[i] > -1 else overlaps[i].max()))
297 | for i in range(len(pred_match))]
298 | # Set title if not provided
299 | title = title or "Ground Truth and Detections\n GT=green, pred=red, captions: score/IoU"
300 | # Display
301 | display_instances(
302 | image,
303 | boxes, masks, class_ids,
304 | class_names, scores, ax=ax,
305 | show_bbox=show_box, show_mask=show_mask,
306 | colors=colors, captions=captions,
307 | title=title)
308 |
309 |
310 | def draw_rois(image, rois, refined_rois, mask, class_ids, class_names, limit=10):
311 | """
312 | anchors: [n, (y1, x1, y2, x2)] list of anchors in image coordinates.
313 | proposals: [n, 4] the same anchors but refined to fit objects better.
314 | """
315 | masked_image = image.copy()
316 |
317 | # Pick random anchors in case there are too many.
318 | ids = np.arange(rois.shape[0], dtype=np.int32)
319 | ids = np.random.choice(
320 | ids, limit, replace=False) if ids.shape[0] > limit else ids
321 |
322 | fig, ax = plt.subplots(1, figsize=(12, 12))
323 | if rois.shape[0] > limit:
324 | plt.title("Showing {} random ROIs out of {}".format(
325 | len(ids), rois.shape[0]))
326 | else:
327 | plt.title("{} ROIs".format(len(ids)))
328 |
329 | # Show area outside image boundaries.
330 | ax.set_ylim(image.shape[0] + 20, -20)
331 | ax.set_xlim(-50, image.shape[1] + 20)
332 | ax.axis('off')
333 |
334 | for i, id in enumerate(ids):
335 | color = np.random.rand(3)
336 | class_id = class_ids[id]
337 | # ROI
338 | y1, x1, y2, x2 = rois[id]
339 | p = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2,
340 | edgecolor=color if class_id else "gray",
341 | facecolor='none', linestyle="dashed")
342 | ax.add_patch(p)
343 | # Refined ROI
344 | if class_id:
345 | ry1, rx1, ry2, rx2 = refined_rois[id]
346 | p = patches.Rectangle((rx1, ry1), rx2 - rx1, ry2 - ry1, linewidth=2,
347 | edgecolor=color, facecolor='none')
348 | ax.add_patch(p)
349 | # Connect the top-left corners of the anchor and proposal for easy visualization
350 | ax.add_line(lines.Line2D([x1, rx1], [y1, ry1], color=color))
351 |
352 | # Label
353 | label = class_names[class_id]
354 | ax.text(rx1, ry1 + 8, "{}".format(label),
355 | color='w', size=11, backgroundcolor="none")
356 |
357 | # Mask
358 | m = utils.unmold_mask(mask[id], rois[id]
359 | [:4].astype(np.int32), image.shape)
360 | masked_image = apply_mask(masked_image, m, color)
361 |
362 | ax.imshow(masked_image)
363 |
364 | # Print stats
365 | print("Positive ROIs: ", class_ids[class_ids > 0].shape[0])
366 | print("Negative ROIs: ", class_ids[class_ids == 0].shape[0])
367 | print("Positive Ratio: {:.2f}".format(
368 | class_ids[class_ids > 0].shape[0] / class_ids.shape[0]))
369 |
370 |
371 | # TODO: Replace with matplotlib equivalent?
372 | def draw_box(image, box, color):
373 | """Draw 3-pixel width bounding boxes on the given image array.
374 | color: list of 3 int values for RGB.
375 | """
376 | y1, x1, y2, x2 = box
377 | image[y1:y1 + 2, x1:x2] = color
378 | image[y2:y2 + 2, x1:x2] = color
379 | image[y1:y2, x1:x1 + 2] = color
380 | image[y1:y2, x2:x2 + 2] = color
381 | return image
382 |
383 |
384 | def display_top_masks(image, mask, class_ids, class_names, limit=4):
385 | """Display the given image and the top few class masks."""
386 | to_display = []
387 | titles = []
388 | to_display.append(image)
389 | titles.append("H x W={}x{}".format(image.shape[0], image.shape[1]))
390 | # Pick top prominent classes in this image
391 | unique_class_ids = np.unique(class_ids)
392 | mask_area = [np.sum(mask[:, :, np.where(class_ids == i)[0]])
393 | for i in unique_class_ids]
394 | top_ids = [v[0] for v in sorted(zip(unique_class_ids, mask_area),
395 | key=lambda r: r[1], reverse=True) if v[1] > 0]
396 | # Generate images and titles
397 | for i in range(limit):
398 | class_id = top_ids[i] if i < len(top_ids) else -1
399 | # Pull masks of instances belonging to the same class.
400 | m = mask[:, :, np.where(class_ids == class_id)[0]]
401 | m = np.sum(m * np.arange(1, m.shape[-1] + 1), -1)
402 | to_display.append(m)
403 | titles.append(class_names[class_id] if class_id != -1 else "-")
404 | display_images(to_display, titles=titles, cols=limit + 1, cmap="Blues_r")
405 |
406 |
407 | def plot_precision_recall(AP, precisions, recalls):
408 | """Draw the precision-recall curve.
409 |
410 | AP: Average precision at IoU >= 0.5
411 | precisions: list of precision values
412 | recalls: list of recall values
413 | """
414 | # Plot the Precision-Recall curve
415 | _, ax = plt.subplots(1)
416 | ax.set_title("Precision-Recall Curve. AP@50 = {:.3f}".format(AP))
417 | ax.set_ylim(0, 1.1)
418 | ax.set_xlim(0, 1.1)
419 | _ = ax.plot(recalls, precisions)
420 |
421 |
422 | def plot_overlaps(gt_class_ids, pred_class_ids, pred_scores,
423 | overlaps, class_names, threshold=0.5):
424 | """Draw a grid showing how ground truth objects are classified.
425 | gt_class_ids: [N] int. Ground truth class IDs
426 | pred_class_id: [N] int. Predicted class IDs
427 | pred_scores: [N] float. The probability scores of predicted classes
428 | overlaps: [pred_boxes, gt_boxes] IoU overlaps of predictions and GT boxes.
429 | class_names: list of all class names in the dataset
430 | threshold: Float. The prediction probability required to predict a class
431 | """
432 | gt_class_ids = gt_class_ids[gt_class_ids != 0]
433 | pred_class_ids = pred_class_ids[pred_class_ids != 0]
434 |
435 | plt.figure(figsize=(12, 10))
436 | plt.imshow(overlaps, interpolation='nearest', cmap=plt.cm.Blues)
437 | plt.yticks(np.arange(len(pred_class_ids)),
438 | ["{} ({:.2f})".format(class_names[int(id)], pred_scores[i])
439 | for i, id in enumerate(pred_class_ids)])
440 | plt.xticks(np.arange(len(gt_class_ids)),
441 | [class_names[int(id)] for id in gt_class_ids], rotation=90)
442 |
443 | thresh = overlaps.max() / 2.
444 | for i, j in itertools.product(range(overlaps.shape[0]),
445 | range(overlaps.shape[1])):
446 | text = ""
447 | if overlaps[i, j] > threshold:
448 | text = "match" if gt_class_ids[j] == pred_class_ids[i] else "wrong"
449 | color = ("white" if overlaps[i, j] > thresh
450 | else "black" if overlaps[i, j] > 0
451 | else "grey")
452 | plt.text(j, i, "{:.3f}\n{}".format(overlaps[i, j], text),
453 | horizontalalignment="center", verticalalignment="center",
454 | fontsize=9, color=color)
455 |
456 | plt.tight_layout()
457 | plt.xlabel("Ground Truth")
458 | plt.ylabel("Predictions")
459 |
460 |
461 | def draw_boxes(image, boxes=None, refined_boxes=None,
462 | masks=None, captions=None, visibilities=None,
463 | title="", ax=None):
464 | """Draw bounding boxes and segmentation masks with different
465 | customizations.
466 |
467 | boxes: [N, (y1, x1, y2, x2, class_id)] in image coordinates.
468 | refined_boxes: Like boxes, but draw with solid lines to show
469 | that they're the result of refining 'boxes'.
470 | masks: [N, height, width]
471 | captions: List of N titles to display on each box
472 | visibilities: (optional) List of values of 0, 1, or 2. Determine how
473 | prominent each bounding box should be.
474 | title: An optional title to show over the image
475 | ax: (optional) Matplotlib axis to draw on.
476 | """
477 | # Number of boxes
478 | assert boxes is not None or refined_boxes is not None
479 | N = boxes.shape[0] if boxes is not None else refined_boxes.shape[0]
480 |
481 | # Matplotlib Axis
482 | if not ax:
483 | _, ax = plt.subplots(1, figsize=(12, 12))
484 |
485 | # Generate random colors
486 | colors = random_colors(N)
487 |
488 | # Show area outside image boundaries.
489 | margin = image.shape[0] // 10
490 | ax.set_ylim(image.shape[0] + margin, -margin)
491 | ax.set_xlim(-margin, image.shape[1] + margin)
492 | ax.axis('off')
493 |
494 | ax.set_title(title)
495 |
496 | masked_image = image.astype(np.uint32).copy()
497 | for i in range(N):
498 | # Box visibility
499 | visibility = visibilities[i] if visibilities is not None else 1
500 | if visibility == 0:
501 | color = "gray"
502 | style = "dotted"
503 | alpha = 0.5
504 | elif visibility == 1:
505 | color = colors[i]
506 | style = "dotted"
507 | alpha = 1
508 | elif visibility == 2:
509 | color = colors[i]
510 | style = "solid"
511 | alpha = 1
512 |
513 | # Boxes
514 | if boxes is not None:
515 | if not np.any(boxes[i]):
516 | # Skip this instance. Has no bbox. Likely lost in cropping.
517 | continue
518 | y1, x1, y2, x2 = boxes[i]
519 | p = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2,
520 | alpha=alpha, linestyle=style,
521 | edgecolor=color, facecolor='none')
522 | ax.add_patch(p)
523 |
524 | # Refined boxes
525 | if refined_boxes is not None and visibility > 0:
526 | ry1, rx1, ry2, rx2 = refined_boxes[i].astype(np.int32)
527 | p = patches.Rectangle((rx1, ry1), rx2 - rx1, ry2 - ry1, linewidth=2,
528 | edgecolor=color, facecolor='none')
529 | ax.add_patch(p)
530 | # Connect the top-left corners of the anchor and proposal
531 | if boxes is not None:
532 | ax.add_line(lines.Line2D([x1, rx1], [y1, ry1], color=color))
533 |
534 | # Captions
535 | if captions is not None:
536 | caption = captions[i]
537 | # If there are refined boxes, display captions on them
538 | if refined_boxes is not None:
539 | y1, x1, y2, x2 = ry1, rx1, ry2, rx2
540 | x = random.randint(x1, (x1 + x2) // 2)
541 | ax.text(x1, y1, caption, size=11, verticalalignment='top',
542 | color='w', backgroundcolor="none",
543 | bbox={'facecolor': color, 'alpha': 0.5,
544 | 'pad': 2, 'edgecolor': 'none'})
545 |
546 | # Masks
547 | if masks is not None:
548 | mask = masks[:, :, i]
549 | masked_image = apply_mask(masked_image, mask, color)
550 | # Mask Polygon
551 | # Pad to ensure proper polygons for masks that touch image edges.
552 | padded_mask = np.zeros(
553 | (mask.shape[0] + 2, mask.shape[1] + 2), dtype=np.uint8)
554 | padded_mask[1:-1, 1:-1] = mask
555 | contours = find_contours(padded_mask, 0.5)
556 | for verts in contours:
557 | # Subtract the padding and flip (y, x) to (x, y)
558 | verts = np.fliplr(verts) - 1
559 | p = Polygon(verts, facecolor="none", edgecolor=color)
560 | ax.add_patch(p)
561 | ax.imshow(masked_image.astype(np.uint8))
562 |
563 |
564 | def display_table(table):
565 | """Display values in a table format.
566 | table: an iterable of rows, and each row is an iterable of values.
567 | """
568 | html = ""
569 | for row in table:
570 | row_html = ""
571 | for col in row:
572 | row_html += "
{:40}
".format(str(col))
573 | html += "
" + row_html + "
"
574 | html = "
" + html + "
"
575 | IPython.display.display(IPython.display.HTML(html))
576 |
577 |
578 | def display_weight_stats(model):
579 | """Scans all the weights in the model and returns a list of tuples
580 | that contain stats about each weight.
581 | """
582 | layers = model.get_trainable_layers()
583 | table = [["WEIGHT NAME", "SHAPE", "MIN", "MAX", "STD"]]
584 | for l in layers:
585 | weight_values = l.get_weights() # list of Numpy arrays
586 | weight_tensors = l.weights # list of TF tensors
587 | for i, w in enumerate(weight_values):
588 | weight_name = weight_tensors[i].name
589 | # Detect problematic layers. Exclude biases of conv layers.
590 | alert = ""
591 | if w.min() == w.max() and not (l.__class__.__name__ == "Conv2D" and i == 1):
592 | alert += "*** dead?"
593 | if np.abs(w.min()) > 1000 or np.abs(w.max()) > 1000:
594 | alert += "*** Overflow?"
595 | # Add row
596 | table.append([
597 | weight_name + alert,
598 | str(w.shape),
599 | "{:+9.4f}".format(w.min()),
600 | "{:+10.4f}".format(w.max()),
601 | "{:+9.4f}".format(w.std()),
602 | ])
603 | display_table(table)
604 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy
2 | scipy
3 | Pillow
4 | cython
5 | matplotlib
6 | scikit-image
7 | tensorflow>=1.3.0
8 | keras>=2.0.8
9 | opencv-python
10 | h5py
11 | imgaug
12 | IPython[all]
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | """
2 | The build/compilations setup
3 |
4 | >> pip install -r requirements.txt
5 | >> python setup.py install
6 | """
7 | import pip
8 | import logging
9 | import pkg_resources
10 | try:
11 | from setuptools import setup
12 | except ImportError:
13 | from distutils.core import setup
14 |
15 |
16 | def _parse_requirements(file_path):
17 | pip_ver = pkg_resources.get_distribution('pip').version
18 | pip_version = list(map(int, pip_ver.split('.')[:2]))
19 | if pip_version >= [6, 0]:
20 | raw = pip.req.parse_requirements(file_path,
21 | session=pip.download.PipSession())
22 | else:
23 | raw = pip.req.parse_requirements(file_path)
24 | return [str(i.req) for i in raw]
25 |
26 |
27 | # parse_requirements() returns generator of pip.req.InstallRequirement objects
28 | try:
29 | install_reqs = _parse_requirements("requirements.txt")
30 | except Exception:
31 | logging.warning('Fail load requirements file, so using default ones.')
32 | install_reqs = []
33 |
34 | setup(
35 | name='mask-rcnn',
36 | version='2.1',
37 | url='https://github.com/matterport/Mask_RCNN',
38 | author='Matterport',
39 | author_email='waleed.abdulla@gmail.com',
40 | license='MIT',
41 | description='Mask R-CNN for object detection and instance segmentation',
42 | packages=["mrcnn"],
43 | install_requires=install_reqs,
44 | include_package_data=True,
45 | python_requires='>=3.4',
46 | long_description="""This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow.
47 | The model generates bounding boxes and segmentation masks for each instance of an object in the image.
48 | It's based on Feature Pyramid Network (FPN) and a ResNet101 backbone.""",
49 | classifiers=[
50 | "Development Status :: 5 - Production/Stable",
51 | "Environment :: Console",
52 | "Intended Audience :: Developers",
53 | "Intended Audience :: Information Technology",
54 | "Intended Audience :: Education",
55 | "Intended Audience :: Science/Research",
56 | "License :: OSI Approved :: MIT License",
57 | "Natural Language :: English",
58 | "Operating System :: OS Independent",
59 | "Topic :: Scientific/Engineering :: Artificial Intelligence",
60 | "Topic :: Scientific/Engineering :: Image Recognition",
61 | "Topic :: Scientific/Engineering :: Visualization",
62 | "Topic :: Scientific/Engineering :: Image Segmentation",
63 | 'Programming Language :: Python :: 3.4',
64 | 'Programming Language :: Python :: 3.5',
65 | 'Programming Language :: Python :: 3.6',
66 | ],
67 | keywords="image instance segmentation object detection mask rcnn r-cnn tensorflow keras",
68 | )
69 |
--------------------------------------------------------------------------------
/slums/README.md:
--------------------------------------------------------------------------------
1 | # Training Details.
2 |
3 | This file contains details about training and inference for slum segmentation using Mask RCNN.
4 |
5 |
6 |
7 | ## Installation
8 | Use this Google Drive link to download the weights:
9 | * Download `mask_rcnn_slum_600_00128.h5` and save it in root directory.
10 | * Link : https://drive.google.com/file/d/1IIMZLrdCZXY_dA540Ve9lSJplYHLnTY4/view?usp=sharing
11 |
12 | ## Dataset
13 | Dataset of satellite images can be created using Google Earth's desktop application. For our project, we used 720X1280 images at 1000m and 100m views, from various Mumbai slums. Google's policy states that we cannot redistribute the dataset.
14 |
15 | Also, we recommend using VGG Image Annotator tool for annotating the segmentation masks as the code is written for that format. The tool gives the annotations in the form of a JSON file, that should be placed inside the dataset folder as follows:
16 | ```
17 | dataset/
18 | train/
19 | all training images
20 | train.json
21 | val/
22 | all val images
23 | val.json
24 | ```
25 |
26 | Here are few links to help you to curate your own dataset:
27 | https://productforums.google.com/forum/#!msg/maps/8KjNgwbBzwc/4kNMfXB6CAAJ
28 | https://support.google.com/earth/answer/148146?hl=en
29 | http://www.robots.ox.ac.uk/~vgg/software/via/
30 | ## Training the model
31 |
32 | Train a new model starting from pre-trained COCO weights
33 | ```
34 | python3 slum.py train --dataset=/path/to/slum/dataset --weights=coco
35 | ```
36 |
37 | Resume training a model that you had trained earlier
38 | ```
39 | python3 slum.py train --dataset=/path/to/slum/dataset --weights=last
40 | ```
41 |
42 | Train a new model starting from ImageNet weights
43 | ```
44 | python3 slum.py train --dataset=/path/to/slum/dataset --weights=imagenet
45 | ```
46 |
47 | * The training details are specified inside slum.py code.
48 | * The model will save every checkpoint in root/logs folder.
49 | * The logs folders are timestamped according to start time and also have tensorboard visualizations.
50 |
51 |
52 | ## Inference
53 | Testing mode, where a segmentation mask is applied on the detected instances. Make sure to place the images inside ```test_images``` folder.
54 |
55 | ```bash
56 | python3 testing.py --weights=/path/to/mask_rcnn/mask_rcnn_slum.h5
57 | ```
58 | This will save the detections (if any) for all the images in `test_images` and save it in `test_outputs`.
59 |
60 | Apply splash effect on a video. Requires OpenCV 3.2+:
61 | Segments out instances and applies masks on a video.
62 | ```bash
63 | python3 slum.py splash --weights=/path/to/mask_rcnn/mask_rcnn_slum.h5 --video=
64 | ```
65 | ## Change Detection
66 | For detecting percentage change in masks, place the two images in ```change_det/ ``` folder and run:
67 |
68 | ```bash
69 | python3 change_detection.py --weights=/path/to/mask_rcnn/mask_rcnn_slum.h5
70 | ```
71 |
--------------------------------------------------------------------------------
/slums/change_det/1_raw.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/slums/change_det/1_raw.jpg
--------------------------------------------------------------------------------
/slums/change_det/2_raw.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/slums/change_det/2_raw.jpg
--------------------------------------------------------------------------------
/slums/change_det/change.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/slums/change_det/change.png
--------------------------------------------------------------------------------
/slums/change_det/mask_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/slums/change_det/mask_1.png
--------------------------------------------------------------------------------
/slums/change_det/mask_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/slums/change_det/mask_2.png
--------------------------------------------------------------------------------
/slums/change_detection.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sys,glob
3 | import random
4 | import math
5 | import re
6 | import time
7 | import numpy as np
8 | import tensorflow as tf
9 | import matplotlib
10 | import matplotlib.pyplot as plt
11 | import matplotlib.patches as patches
12 | import cv2
13 | # Root directory of the project
14 | ROOT_DIR = os.path.abspath("../")
15 |
16 | # Import Mask RCNN
17 | sys.path.append(ROOT_DIR) # To find local version of the library
18 | from mrcnn import utils
19 | from mrcnn import visualize
20 | from mrcnn.visualize import display_images
21 | import mrcnn.model as modellib
22 | from mrcnn.model import log
23 |
24 | from slums import slum
25 |
26 | import skimage.draw
27 | from skimage import measure
28 | from shapely.geometry.polygon import Polygon
29 | from skimage.measure import label
30 |
31 |
32 | import argparse
33 |
34 | #get largest conncected component in each mask
35 | def getLargestCC(segmentation):
36 | labels = label(segmentation) #Gives different integer value for each connected region
37 | #largest region is background. Ignore that and get second largest.
38 | largest_val = np.argmax(np.bincount(labels.flat)[1:]) + 1 #+1 as we ignore bg
39 | #print(np.bincount(labels.flat)[1:],' Largest ',largest_val)
40 | return labels==largest_val
41 |
42 | def merge_masks(masks):
43 | print('No of masks: ',masks.shape[2])
44 | if masks.shape[2] <=1:
45 | return masks
46 |
47 | merged_mask_list = []
48 | not_required = [] #list of indices not required as masks are merged
49 | for i in range(masks.shape[2]):
50 | m = masks[:,:,i]
51 | m = getLargestCC(m)
52 | m = np.expand_dims(m,axis=2)
53 |
54 | max_iou = -1
55 | max_mask = -1
56 | max_iou_index = -1
57 |
58 | #Calculate max_iou with other masks.
59 | for j in range(masks.shape[2]):
60 | #Same mask gives 1.0 !
61 | if j!=i:
62 | n = masks[:,:,j]
63 | n = np.expand_dims(n,axis=2)
64 | intersection = np.logical_and(m,n)
65 | union = np.logical_or(m,n)
66 | iou_score = np.sum(intersection) / np.sum(union)
67 | #print(np.sum(intersection),np.sum(union))
68 | #print(iou_score)
69 | if iou_score > max_iou:
70 | max_iou = iou_score
71 | max_mask = n
72 | max_iou_index = j
73 |
74 | #Need to merge if greater than 0.2
75 | if max_iou > 0.15:
76 | area_m = measure.regionprops(m[:,:,0].astype(np.uint8))
77 | area_m = [prop.area for prop in area_m][0]
78 | #print(area_m,i)
79 | area_max_mask = measure.regionprops(max_mask[:,:,0].astype(np.uint8))
80 | area_max_mask = [prop.area for prop in area_max_mask][0]
81 | #print(area_max_mask,max_iou_index)
82 |
83 | #print(area_m/(area_m + area_max_mask))
84 | #print(area_max_mask/(area_m + area_max_mask))
85 |
86 | if area_m >= area_max_mask:
87 | merged_mask_list.append(m)
88 | not_required.append(max_iou_index)
89 | else:
90 | merged_mask_list.append(max_mask)
91 | not_required.append(i)
92 |
93 | elif i not in not_required:
94 | merged_mask_list.append(m)
95 |
96 | #print('Matches: ',max_iou,i,max_iou_index)
97 | #print(not_required,len(merged_mask_list))
98 |
99 | merged_mask_list = np.array(merged_mask_list)
100 | merged_mask_list = np.squeeze(merged_mask_list)
101 | merged_mask_list = np.transpose(merged_mask_list,(1,2,0))
102 |
103 | return merged_mask_list
104 |
105 |
106 | def load_model():
107 | with tf.device('/gpu:0'):
108 | model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR,config=config)
109 | weights_path = SLUM_WEIGHTS_PATH
110 |
111 | # Load weights
112 | print("Loading weights ", weights_path)
113 | model.load_weights(weights_path, by_name=True)
114 | return model
115 |
116 |
117 | def get_area(mask):
118 | area = measure.regionprops(mask.astype(np.uint8))
119 | area = [prop.area for prop in area][0]
120 | return area
121 |
122 | def cal_diff(mask_1,mask_2,files,image_1,image_2,results_1,results_2):
123 | len_1 = mask_1.shape[2]
124 | len_2 = mask_2.shape[2]
125 |
126 | #Number of detections might be unequal
127 | #combine mask channels.
128 | m1 = np.zeros((mask_1.shape[:2]))
129 | for i in range(len_1):
130 | m1 = np.logical_or(m1,mask_1[:,:,i])
131 |
132 | m2 = np.zeros((mask_2.shape[:2]))
133 | for i in range(len_2):
134 | m2 = np.logical_or(m2,mask_2[:,:,i])
135 |
136 |
137 | #Calculate total area covered by mask_1
138 | mask_1_area = get_area(m1)
139 | mask_2_area = get_area(m2)
140 |
141 | m1 = m1.astype(np.uint8)
142 | m2 = m2.astype(np.uint8)
143 |
144 | print(m1.shape)
145 | print(m2.shape)
146 |
147 | diff = cv2.absdiff(m1,m2)
148 | diff_area = get_area(diff)
149 |
150 | print("M1 area :",mask_1_area)
151 | print("M2 area :",mask_2_area)
152 | print("Diff in area :",diff_area)
153 |
154 | max_area = max(mask_1_area,mask_2_area)
155 |
156 | d = diff_area/max_area
157 | if mask_1_area > mask_2_area:
158 | print(files[0],' greater area')
159 | else:
160 | print(files[1],' greater area')
161 |
162 | print('Change ',d*100,'%')
163 |
164 | return m1,m2,diff
165 |
166 | if __name__ == '__main__':
167 |
168 | parser = argparse.ArgumentParser()
169 | parser.add_argument("--weights_path",type=str,required=True)
170 | args = parser.parse_args()
171 |
172 | SLUM_WEIGHTS_PATH = args.weights_path
173 |
174 |
175 | config = slum.slumConfig()
176 | class InferenceConfig(config.__class__):
177 | # Run detection on one image at a time
178 | GPU_COUNT = 1
179 | IMAGES_PER_GPU = 1
180 | config = InferenceConfig()
181 | config.display()
182 |
183 |
184 | MODEL_DIR = os.path.join(ROOT_DIR, "logs")
185 | model = load_model()
186 |
187 |
188 | files = glob.glob('change_det/*.jpg')
189 |
190 | image_1 = skimage.io.imread(files[0])
191 | image_2 = skimage.io.imread(files[1])
192 |
193 | results_1 = model.detect(image_1[np.newaxis],verbose=0)
194 | results_2 = model.detect(image_2[np.newaxis],verbose=0)
195 |
196 | mask_1 = results_1[0]['masks']
197 | mask_2 = results_2[0]['masks']
198 |
199 | mask_1,mask_2,diff =cal_diff(mask_1,mask_2,files,image_1,image_2,results_1,results_2)
200 |
201 |
202 | r = results_2[0]
203 | r['masks'] = merge_masks(r['masks'])
204 | class_names = ['slum']*(len(r['class_ids'])+1)
205 |
206 | visualize.display_instances(image_2, r['rois'], r['masks'], r['class_ids'],
207 | class_names, r['scores'], ax=None,show_bbox=False,show_mask=True,
208 | title=files[0])
209 |
210 |
211 | r = results_1[0]
212 | r['masks'] = merge_masks(r['masks'])
213 | class_names = ['slum']*(len(r['class_ids'])+1)
214 |
215 | visualize.display_instances(image_1, r['rois'], r['masks'], r['class_ids'],
216 | class_names, r['scores'], ax=None,show_bbox=False,show_mask=True,
217 | title=files[1])
218 |
219 |
220 |
221 | print(files,' FILES')
222 |
223 | plt.imshow(mask_1)
224 | plt.axis('off')
225 | plt.savefig('change_det/mask_1.png',bbox_inches='tight')
226 | #plt.show()
227 |
228 | plt.imshow(mask_2)
229 | plt.axis('off')
230 | plt.savefig('change_det/mask_2.png',bbox_inches='tight')
231 | #plt.show()
232 |
233 | plt.imshow(diff)
234 | plt.axis('off')
235 | plt.savefig('change_det/change.png',bbox_inches='tight')
236 | #plt.show()
--------------------------------------------------------------------------------
/slums/slum.py:
--------------------------------------------------------------------------------
1 | """
2 | Mask R-CNN
3 | Train on the toy slum dataset and implement color splash effect.
4 |
5 | Copyright (c) 2018 Matterport, Inc.
6 | Licensed under the MIT License (see LICENSE for details)
7 | Written by Waleed Abdulla
8 |
9 | ------------------------------------------------------------
10 |
11 | Usage: import the module (see Jupyter notebooks for examples), or run from
12 | the command line as such:
13 |
14 | # Train a new model starting from pre-trained COCO weights
15 | python3 slum.py train --dataset=/path/to/slum/dataset --weights=coco
16 |
17 | # Resume training a model that you had trained earlier
18 | python3 slum.py train --dataset=/path/to/slum/dataset --weights=last
19 |
20 | # Train a new model starting from ImageNet weights
21 | python3 slum.py train --dataset=/path/to/slum/dataset --weights=imagenet
22 |
23 | # Apply color splash to an image
24 | python3 slum.py splash --weights=/path/to/weights/file.h5 --image=
25 |
26 | # Apply color splash to video using the last weights you trained
27 | python3 slum.py splash --weights=last --video=
28 | """
29 |
30 | import os
31 | import sys
32 | import json
33 | import datetime
34 | import numpy as np
35 | import skimage.draw
36 |
37 | """
38 | Imgaug is an image augmentation library.
39 | """
40 | from imgaug import augmenters as iaa
41 | from imgaug import parameters as iap
42 | import imgaug as ia
43 |
44 |
45 | # Root directory of the project
46 | ROOT_DIR = os.path.abspath("../")
47 |
48 | # Import Mask RCNN
49 | sys.path.append(ROOT_DIR) # To find local version of the library
50 |
51 | from mrcnn.config import Config
52 | from mrcnn import model as modellib, utils
53 | from mrcnn.visualize import random_colors,apply_mask
54 | # Path to trained weights file
55 | COCO_WEIGHTS_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
56 |
57 | # Directory to save logs and model checkpoints, if not provided
58 | # through the command line argument --logs
59 | DEFAULT_LOGS_DIR = os.path.join(ROOT_DIR, "logs")
60 |
61 | ############################################################
62 | # Configurations
63 | ############################################################
64 |
65 |
66 | class slumConfig(Config):
67 | """Configuration for training on the toy dataset.
68 | Derives from the base Config class and overrides some values.
69 | """
70 | # Give the configuration a recognizable name
71 | NAME = "slum_100"
72 |
73 | # We use a GPU with 12GB memory, which can fit two images.
74 | # Adjust down if you use a smaller GPU.
75 | IMAGES_PER_GPU = 2
76 |
77 | # Number of classes (including background)
78 | NUM_CLASSES = 1 + 1 # Background + slum
79 |
80 | # Number of training steps per epoch
81 | STEPS_PER_EPOCH = 100
82 |
83 | # Skip detections with < 90% confidence
84 | DETECTION_MIN_CONFIDENCE = 0.9
85 |
86 | #Kills the machine
87 | #USE_MINI_MASK = False
88 |
89 |
90 |
91 | ############################################################
92 | # Dataset
93 | ############################################################
94 |
95 | class slumDataset(utils.Dataset):
96 |
97 | def load_slum(self, dataset_dir, subset):
98 | """Load a subset of the slum dataset.
99 | dataset_dir: Root directory of the dataset.
100 | subset: Subset to load: train or val
101 | """
102 | # Add classes. We have only one class to add.
103 | self.add_class("slum_100", 1, "slum_100")
104 |
105 | dataset_dir = os.path.join(dataset_dir, subset)
106 | print(dataset_dir)
107 |
108 | annotations = json.load(open(os.path.join(dataset_dir, "via_region_data.json")))
109 | annotations = list(annotations.values()) # don't need the dict keys
110 |
111 | # The VIA tool saves images in the JSON even if they don't have any
112 | # annotations. Skip unannotated images.
113 | annotations = [annotations[a] for a in range(len(annotations)) if annotations[a]['regions']]
114 |
115 | # Add images
116 | for a in annotations:
117 | # Get the x, y coordinaets of points of the polygons that make up
118 | # the outline of each object instance. These are stores in the
119 | # shape_attributes (see json format above)
120 | # The if condition is needed to support VIA versions 1.x and 2.x.
121 | if type(a['regions']) is dict:
122 | polygons = [r['shape_attributes'] for r in a['regions'].values()]
123 | else:
124 | polygons = [r['shape_attributes'] for r in a['regions']]
125 |
126 | # load_mask() needs the image size to convert polygons to masks.
127 | # Unfortunately, VIA doesn't include it in JSON, so we must read
128 | # the image. This is only managable since the dataset is tiny.
129 | image_path = os.path.join(dataset_dir, a['filename'])
130 | image = skimage.io.imread(image_path)
131 | height, width = image.shape[:2]
132 |
133 | self.add_image(
134 | "slum_100",
135 | image_id=a['filename'], # use file name as a unique image id
136 | path=image_path,
137 | width=width, height=height,
138 | polygons=polygons)
139 |
140 | def load_mask(self, image_id):
141 | """Generate instance masks for an image.
142 | Returns:
143 | masks: A bool array of shape [height, width, instance count] with
144 | one mask per instance.
145 | class_ids: a 1D array of class IDs of the instance masks.
146 | """
147 | # If not a slum dataset image, delegate to parent class.
148 | image_info = self.image_info[image_id]
149 | if image_info["source"] != "slum_100":
150 | return super(self.__class__, self).load_mask(image_id)
151 |
152 | # Convert polygons to a bitmap mask of shape
153 | # [height, width, instance_count]
154 | info = self.image_info[image_id]
155 | mask = np.zeros([info["height"], info["width"], len(info["polygons"])],
156 | dtype=np.uint8)
157 | for i, p in enumerate(info["polygons"]):
158 | # Get indexes of pixels inside the polygon and set them to 1
159 | rr, cc = skimage.draw.polygon(p['all_points_y'], p['all_points_x'])
160 | mask[rr, cc, i] = 1
161 |
162 | # Return mask, and array of class IDs of each instance. Since we have
163 | # one class ID only, we return an array of 1s
164 | return mask.astype(np.bool), np.ones([mask.shape[-1]], dtype=np.int32)
165 |
166 | def image_reference(self, image_id):
167 | """Return the path of the image."""
168 | info = self.image_info[image_id]
169 | if info["source"] == "slum_100":
170 | return info["path"]
171 | else:
172 | super(self.__class__, self).image_reference(image_id)
173 |
174 |
175 | def train(model):
176 | """Train the model."""
177 | # Training dataset.
178 | sometimes = lambda aug: iaa.Sometimes(0.5, aug)
179 | seq = iaa.Sequential(
180 | [
181 | # apply the following augmenters to most images
182 | iaa.Fliplr(0.5), # horizontally flip 50% of all images
183 | iaa.Flipud(0.2), # vertically flip 20% of all images
184 | # crop images by -5% to 10% of their height/width
185 | sometimes(iaa.CropAndPad(
186 | percent=(-0.05, 0.1),
187 | pad_mode=ia.ALL,
188 | pad_cval=(0, 255)
189 | )),
190 | sometimes(iaa.Affine(
191 | scale={"x": (0.8, 1.2), "y": (0.8, 1.2)}, # scale images to 80-120% of their size, individually per axis
192 | translate_percent={"x": (-0.2, 0.2), "y": (-0.2, 0.2)}, # translate by -20 to +20 percent (per axis)
193 | rotate=(-45, 45), # rotate by -45 to +45 degrees
194 | shear=(-16, 16), # shear by -16 to +16 degrees
195 | order=[0, 1], # use nearest neighbour or bilinear interpolation (fast)
196 | cval=(0, 255), # if mode is constant, use a cval between 0 and 255
197 | mode=ia.ALL # use any of scikit-image's warping modes (see 2nd image from the top for examples)
198 | ))
199 | ],random_order = True)
200 |
201 |
202 | dataset_train = slumDataset()
203 | dataset_train.load_slum(args.dataset, "train")
204 | dataset_train.prepare()
205 |
206 | # Validation dataset
207 | dataset_val = slumDataset()
208 | dataset_val.load_slum(args.dataset, "val")
209 | dataset_val.prepare()
210 |
211 | """
212 | USING ADAM Optimizer. To change goto mrcnn/model.py, line no 2159
213 | For adam, we use a lesser learning rate. If you wish to use SGD, increase it
214 | in config.py
215 | """
216 |
217 | # Training - Stage 1
218 |
219 | print("Training network heads")
220 | model.train(dataset_train, dataset_val,
221 | learning_rate=config.LEARNING_RATE,
222 | epochs=50,
223 | layers='heads',
224 | augmentation = seq
225 | )
226 |
227 |
228 | # Training - Stage 2
229 | # Finetune layers from ResNet stage 4 and up
230 | print("Fine tune Resnet stage 4 and up")
231 | model.train(dataset_train, dataset_val,
232 | learning_rate=config.LEARNING_RATE/10,
233 | epochs=120,
234 | layers='4+',
235 | augmentation = seq
236 | )
237 |
238 | # Training - Stage 3
239 | # Fine tune all layers. In this we stopped after 128 after recording no significant improvements.
240 | print("Fine tune all layers")
241 | model.train(dataset_train, dataset_val,
242 | learning_rate=config.LEARNING_RATE / 100,
243 | epochs=140,
244 | layers='all',
245 | augmentation = seq
246 | )
247 |
248 | def color_splash(image, mask,color):
249 | """Apply color splash effect.
250 | image: RGB image [height, width, 3]
251 | mask: instance segmentation mask [height, width, instance count]
252 |
253 | Returns result image.
254 | """
255 | mask = (np.sum(mask, -1, keepdims=True) >= 1)
256 | mask = np.squeeze(mask)
257 | splash = apply_mask(image,mask,color[0])
258 | print(splash.shape)
259 | return splash
260 |
261 |
262 | def detect_and_color_splash(model, image_path=None, video_path=None):
263 | assert image_path or video_path
264 |
265 | # Image or video?
266 | if image_path:
267 | # Run model detection and generate the color splash effect
268 | print("Running on {}".format(args.image))
269 | # Read image
270 | image = skimage.io.imread(args.image)
271 | # Detect objects
272 | r = model.detect([image], verbose=1)[0]
273 | # Color splash
274 | splash = color_splash(image, r['masks'])
275 | # Save output
276 | file_name = "splash_{:%Y%m%dT%H%M%S}.png".format(datetime.datetime.now())
277 | skimage.io.imsave(file_name, splash)
278 | elif video_path:
279 | import cv2
280 | # Video capture
281 | vcapture = cv2.VideoCapture(video_path)
282 | width = int(vcapture.get(cv2.CAP_PROP_FRAME_WIDTH))
283 | height = int(vcapture.get(cv2.CAP_PROP_FRAME_HEIGHT))
284 | fps = vcapture.get(cv2.CAP_PROP_FPS)
285 |
286 | # Define codec and create video writer
287 | #file_name = "splash_{:%Y%m%dT%H%M%S}.m4v".format(datetime.datetime.now())
288 | file_name = "splash_{:%Y%m%dT%H%M%S}.avi".format(datetime.datetime.now())
289 | vwriter = cv2.VideoWriter(file_name,cv2.VideoWriter_fourcc(*'MJPG'),fps, (width, height))
290 |
291 | count = 0
292 | success = True
293 | color = random_colors(5)
294 | while success:
295 | print("frame: ", count)
296 | # Read next image
297 | success, image = vcapture.read()
298 | if success:
299 | # OpenCV returns images as BGR, convert to RGB
300 | image = image[..., ::-1]
301 | # Detect objects
302 | r = model.detect([image], verbose=0)[0]
303 | # Color splash
304 | splash = color_splash(image, r['masks'],color)
305 | # RGB -> BGR to save image to video
306 | splash = splash[..., ::-1]
307 | # Add image to video writer
308 | vwriter.write(splash)
309 | count += 1
310 | vwriter.release()
311 | print("Saved to ", file_name)
312 |
313 |
314 | ############################################################
315 | # Training
316 | ############################################################
317 |
318 | if __name__ == '__main__':
319 | import argparse
320 |
321 | # Parse command line arguments
322 | parser = argparse.ArgumentParser(
323 | description='Train Mask R-CNN to detect slums.')
324 | parser.add_argument("command",
325 | metavar="",
326 | help="'train' or 'splash'")
327 | parser.add_argument('--dataset', required=False,
328 | metavar="/path/to/slum/dataset/",
329 | help='Directory of the slum dataset')
330 | parser.add_argument('--weights', required=True,
331 | metavar="/path/to/weights.h5",
332 | help="Path to weights .h5 file or 'coco'")
333 | parser.add_argument('--logs', required=False,
334 | default=DEFAULT_LOGS_DIR,
335 | metavar="/path/to/logs/",
336 | help='Logs and checkpoints directory (default=logs/)')
337 | parser.add_argument('--image', required=False,
338 | metavar="path or URL to image",
339 | help='Image to apply the color splash effect on')
340 | parser.add_argument('--video', required=False,
341 | metavar="path or URL to video",
342 | help='Video to apply the color splash effect on')
343 | args = parser.parse_args()
344 |
345 | # Validate arguments
346 | if args.command == "train":
347 | assert args.dataset, "Argument --dataset is required for training"
348 | elif args.command == "splash":
349 | assert args.image or args.video,\
350 | "Provide --image or --video to apply color splash"
351 |
352 | print("Weights: ", args.weights)
353 | print("Dataset: ", args.dataset)
354 | print("Logs: ", args.logs)
355 |
356 | # Configurations
357 | if args.command == "train":
358 | config = slumConfig()
359 | else:
360 | class InferenceConfig(slumConfig):
361 | # Set batch size to 1 since we'll be running inference on
362 | # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
363 | GPU_COUNT = 1
364 | IMAGES_PER_GPU = 1
365 | config = InferenceConfig()
366 | config.display()
367 |
368 | # Create model
369 | if args.command == "train":
370 | model = modellib.MaskRCNN(mode="training", config=config,
371 | model_dir=args.logs)
372 | else:
373 | model = modellib.MaskRCNN(mode="inference", config=config,
374 | model_dir=args.logs)
375 |
376 | # Select weights file to load
377 | if args.weights.lower() == "coco":
378 | weights_path = COCO_WEIGHTS_PATH
379 | # Download weights file
380 | if not os.path.exists(weights_path):
381 | utils.download_trained_weights(weights_path)
382 | elif args.weights.lower() == "last":
383 | # Find last trained weights
384 | weights_path = model.find_last()
385 | elif args.weights.lower() == "imagenet":
386 | # Start from ImageNet trained weights
387 | weights_path = model.get_imagenet_weights()
388 | else:
389 | weights_path = args.weights
390 |
391 | # Load weights
392 | print("Loading weights ", weights_path)
393 | if args.weights.lower() == "coco":
394 | # Exclude the last layers because they require a matching
395 | # number of classes
396 | model.load_weights(weights_path, by_name=True, exclude=[
397 | "mrcnn_class_logits", "mrcnn_bbox_fc",
398 | "mrcnn_bbox", "mrcnn_mask"])
399 | else:
400 | model.load_weights(weights_path, by_name=True)
401 |
402 | # Train or evaluate
403 | if args.command == "train":
404 | train(model)
405 | elif args.command == "splash":
406 | detect_and_color_splash(model, image_path=args.image,
407 | video_path=args.video)
408 | else:
409 | print("'{}' is not recognized. "
410 | "Use 'train' or 'splash'".format(args.command))
411 |
--------------------------------------------------------------------------------
/slums/test_images/bhandup_1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/slums/test_images/bhandup_1.jpg
--------------------------------------------------------------------------------
/slums/test_outputs/pred_0.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cbsudux/Mumbai-slum-segmentation/b42c473af9dbd422cfa290d056125dc0174b01cb/slums/test_outputs/pred_0.jpg
--------------------------------------------------------------------------------
/slums/testing.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sys,glob
3 | import random
4 | import math
5 | import re
6 | import time
7 | import numpy as np
8 | import tensorflow as tf
9 | import matplotlib
10 | import matplotlib.pyplot as plt
11 | import matplotlib.patches as patches
12 | import cv2
13 | # Root directory of the project
14 | ROOT_DIR = os.path.abspath("../")
15 |
16 | # Import Mask RCNN
17 | sys.path.append(ROOT_DIR) # To find local version of the library
18 | from mrcnn import utils
19 | from mrcnn import visualize
20 | from mrcnn.visualize import display_images
21 | import mrcnn.model as modellib
22 | from mrcnn.model import log
23 |
24 | from slums import slum
25 |
26 | import skimage.draw
27 | from skimage import measure
28 | from shapely.geometry.polygon import Polygon
29 | from skimage.measure import label
30 | from sklearn.metrics import jaccard_similarity_score
31 |
32 | import argparse
33 |
34 |
35 | def get_ax(rows=1, cols=1, size=16):
36 | """Return a Matplotlib Axes array to be used in
37 | all visualizations in the notebook. Provide a
38 | central point to control graph sizes.
39 |
40 | Adjust the size attribute to control how big to render images
41 | """
42 | _, ax = plt.subplots(rows, cols, figsize=(size*cols, size*rows))
43 | return ax
44 |
45 | def getLargestCC(segmentation):
46 | labels = label(segmentation) #Gives different integer value for each connected region
47 | #largest region is background. Ignore that and get second largest.
48 | if len(np.bincount(labels.flat))==1:
49 | return labels
50 |
51 | largest_val = np.argmax(np.bincount(labels.flat)[1:]) + 1 #+1 as we ignore bg
52 | #print(np.bincount(labels.flat)[1:],' Largest ',largest_val)
53 | return labels==largest_val
54 |
55 | def load_model():
56 | with tf.device(DEVICE):
57 | model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR,config=config)
58 | weights_path = SLUM_WEIGHTS_PATH
59 |
60 | # Load weights
61 | print("Loading weights ", weights_path)
62 | model.load_weights(weights_path, by_name=True)
63 | return model
64 |
65 |
66 | def compute_batch_ap(dataset, image_ids, verbose=1):
67 | """
68 | # Load validation dataset if you need to use this function.
69 | dataset = slum.slumDataset()
70 | dataset.load_slum(folder_path,fol)
71 |
72 | """
73 |
74 | APs = []
75 | IOUs = []
76 |
77 | for image_id in image_ids:
78 | # Load image
79 | image, image_meta, gt_class_id, gt_bbox, gt_mask =\
80 | modellib.load_image_gt(dataset, config,
81 | image_id, use_mini_mask=False)
82 |
83 | # Run object detection
84 | results = model.detect_molded(image[np.newaxis], image_meta[np.newaxis], verbose=0)
85 | # Compute AP over range 0.5 to 0.95
86 | r = results[0]
87 |
88 | #merge_masks.
89 | gt_merge_mask = np.zeros((gt_mask.shape[:2]))
90 | for i in range(gt_mask.shape[2]):
91 | gt_merge_mask = np.logical_or(gt_merge_mask,gt_mask[:,:,i])
92 |
93 | pred_merge_mask = np.zeros((r['masks'].shape[:2]))
94 | for i in range(r['masks'].shape[2]):
95 | pred_merge_mask = np.logical_or(pred_merge_mask,r['masks'][:,:,i])
96 |
97 |
98 | pred_merge_mask = np.expand_dims(pred_merge_mask,2)
99 | #print(pred_merge_mask.shape)
100 | pred_merge_mask,wind,scale,pad,crop = utils.resize_image(pred_merge_mask,1024,1024)
101 | #print(pred_merge_mask.shape,gt_merge_mask.shape)
102 |
103 | iou = jaccard_similarity_score(np.squeeze(pred_merge_mask),gt_merge_mask)
104 |
105 | #mAP at 50
106 | print("mAP at 50")
107 | ap = utils.compute_ap_range(
108 | gt_bbox, gt_class_id, gt_mask,
109 | r['rois'], r['class_ids'], r['scores'], r['masks'],np.arange(0.5,1.0),verbose=0)
110 |
111 | #Make sure ap doesnt go above 1 !
112 | if ap>1.0:
113 | ap = 1.0
114 |
115 | APs.append(ap)
116 | IOUs.append(iou)
117 |
118 | if verbose:
119 | info = dataset.image_info[image_id]
120 | meta = modellib.parse_image_meta(image_meta[np.newaxis,...])
121 | print("{:3} {} AP: {:.2f} Image_id: {}, IOU: {}".format(
122 | meta["image_id"][0], meta["original_image_shape"][0], ap,image_id,iou))
123 | return APs,IOUs
124 |
125 |
126 | def test_on_folder(model,folder_path,save_path='test_outputs/'):
127 |
128 | if not os.path.exists(save_path):
129 | os.mkdir(save_path)
130 |
131 | files = glob.glob(folder_path+'/*.jpg')
132 |
133 | for i in range(len(files)):
134 | image_id = i
135 | image = skimage.io.imread(files[image_id])
136 | results = model.detect(image[np.newaxis],verbose=0)
137 | results = results[0]
138 | class_names = ['slum']*(len(results['class_ids'])+1)
139 | mask = results['masks']
140 |
141 | file_to_save = save_path + '/pred_'+str(image_id) + '.jpg'
142 |
143 | visualize.save_instances(image, results['rois'], results['masks'], results['class_ids'],
144 | class_names,file_to_save,results['scores'], ax=None,
145 | show_bbox=False, show_mask=True,
146 | title="Predictions "+str(image_id))
147 |
148 | #Uncomment to visualize using matpltolib.
149 | """
150 | visualize.display_instances(image, resukts['rois'], results['masks'], results['class_ids'],
151 | class_names, results['scores'], ax=get_ax(0),
152 | show_bbox=False, show_mask=True,
153 | title="Predictions "+str(image_id))
154 | """
155 |
156 |
157 | if __name__ == '__main__':
158 |
159 | parser = argparse.ArgumentParser()
160 | parser.add_argument("--weights_path",type=str,required=True)
161 | args = parser.parse_args()
162 |
163 | SLUM_WEIGHTS_PATH = args.weights_path
164 |
165 |
166 | # Directory to save logs and trained model
167 | MODEL_DIR = os.path.join(ROOT_DIR, "logs")
168 | config = slum.slumConfig()
169 |
170 |
171 | class InferenceConfig(config.__class__):
172 | # Run detection on one image at a time
173 | GPU_COUNT = 1
174 | IMAGES_PER_GPU = 1
175 |
176 | config = InferenceConfig()
177 | config.display()
178 |
179 | DEVICE = "/gpu:0" # /cpu:0 or /gpu:0
180 |
181 | # Inspect the model in training or inference modes
182 | # values: 'inference' or 'training'
183 | # TODO: code for 'training' test mode not ready yet
184 | TEST_MODE = "inference"
185 |
186 | #Use to run over test_data
187 | model = load_model()
188 | test_on_folder(model,'test_images/')
--------------------------------------------------------------------------------