├── Dockerfile
├── LICENSE
├── README.md
├── Robosat Labeling.ipynb
├── entrypoint.sh
├── images
    ├── mask.png
    └── satellite.png
├── osm
    └── map.osm.pbf
└── reload_docker.sh


/Dockerfile:
--------------------------------------------------------------------------------
 1 | FROM mapbox/robosat:latest-gpu
 2 | 
 3 | RUN apt-get update && \
 4 |     apt-get upgrade -y && \
 5 |     apt-get install -y git vim
 6 | 
 7 | RUN pip3 install jupyter
 8 | 
 9 | # Install E84's fork of robosat repo
10 | RUN cd /tmp && \
11 | 	git clone https://github.com/Element84/robosat && \
12 | 	rsync -av /tmp/robosat/ /app
13 | 
14 | # Create the directory used to store checkpoints during learning
15 | RUN mkdir -p /app/container_mount/checkpoints/
16 | RUN mkdir /app/robosat_container_files/
17 | 
18 | # Copy our notebook and area of interest into docker
19 | COPY *.ipynb /app
20 | COPY osm/*.pbf /app/container_mount
21 | COPY images/* /app/images
22 | 
23 | # Substitute required ENV variables 'DESIRED_ZOOM_LEVEL' and 'PUBLIC_IP'
24 | COPY entrypoint.sh /
25 | RUN chmod +x /entrypoint.sh
26 | ENTRYPOINT ["/entrypoint.sh"]
27 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright 2018 Element 84
 2 | 
 3 | Licensed under the Apache License, Version 2.0 (the "License");
 4 | you may not use this file except in compliance with the License.
 5 | You may obtain a copy of the License at
 6 | 
 7 |     http://www.apache.org/licenses/LICENSE-2.0
 8 | 
 9 | Unless required by applicable law or agreed to in writing, software
10 | distributed under the License is distributed on an "AS IS" BASIS,
11 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | See the License for the specific language governing permissions and
13 | limitations under the License.
14 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | Introduction
 2 | =======
 3 | This is a fully-functioning Jupyter Notebook that describes and walks through all of the steps in an excellent blog post on the Robosat feature extraction and machine learning pipeline. The original post:
 4 | 
 5 | https://www.openstreetmap.org/user/daniel-j-h/diary/44321
 6 | 
 7 | We'll be using the Robosat feature extraction pipeline:
 8 | 
 9 | https://github.com/mapbox/robosat
10 | 
11 | Prerequisites
12 | =======
13 | ### Docker and EC2
14 | This notebook requires NVIDIA graphics drivers (e.g. an EC2 P2 instance) to run every step end-to-end, although some commands may be run locally. MapBox recommends AWS P2/P3 instances and GTX 1080 TI GPUs.
15 | 
16 | Instructions for provisioning an EC2 instance:
17 | 
18 | https://docs.aws.amazon.com/efs/latest/ug/gs-step-one-create-ec2-resources.html
19 | 
20 | Once set up, you will need to build and run the docker image from the EC2. Instructions for building this docker image below, in "Building and Running the Notebook". General Docker build instructions:
21 | 
22 | https://docs.docker.com/engine/reference/builder/#usage
23 | 
24 | ### Environment Variables
25 | You are running this Jupyter notebook from inside a docker container and passed in two `docker run` environment variables: `DESIRED_ZOOM_LEVEL` and `PUBLIC_IP`.
26 | 
27 | `DESIRED_ZOOM_LEVEL` is the zoom level for your imagery.
28 | `PUBLIC_IP` is the public IP address if you're running from an EC2 instance. 
29 | 
30 | ### Prior to running the notebook
31 | 
32 | ## These steps have already been completed and the result is included. You do not need to do this unless you want to use a different area of interest.
33 | 
34 | 1. Install Osmium locally using brew
35 | 
36 | `brew install osmium-tool`
37 | 
38 | 2. Get the GeoFrabrik extract
39 | 
40 | `http://download.geofabrik.de/africa/tanzania-latest.osm.pbf`
41 | 
42 | 3. Extract the area of intrerest
43 | 
44 | `osmium extract --bbox '38.9410400390625,-7.0545565715284955,39.70458984374999,-5.711646879515092' tanzania-latest.osm.pbf --output map.osm.pbf`
45 | 
46 | Also, it may be necessary to adjust dataset-parking.toml and model-unet.toml. Specfically, `classes = ['background', 'building']` in dataset-parking.toml and the `image_size` in model-unet.toml should match your requirements .
47 | 
48 | Building and Running the Notebook
49 | =======
50 | 
51 | These steps need to be completed inside of your EC2 instance, where this repo should live.
52 | 
53 | Build the Docker image:
54 | =======
55 | These steps have been performed and the resulting .pbf file is bundled with this image.
56 | 
57 | `docker build -t robosat-jupyter .`
58 | 
59 | Run the Docker image detached:
60 | 
61 | Note: this command assumes you're running on our CUDA-enabled EC2 instance.
62 | 
63 | IMPORTANT: You _must_ update the `PUBLIC_IP` and `DESIRED_ZOOM_LEVEL`. See the Environment Variables section above.
64 | 
65 | `docker run -d --runtime=nvidia -v /home/ubuntu/robosat/container_mount:/app/container_mount -v /home/ubuntu/robosat_container_files:/app/robosat_container_files -e DESIRED_ZOOM_LEVEL=19 -e PUBLIC_IP=34.56.78.90 -p 8888:8888 -p 5000:5000 -t robosat-jupyter jupyter notebook --ip=0.0.0.0 --allow-root`
66 | 
67 | If you're running this docker locally, use this instead:
68 | 
69 | `docker run -d -p 8888:8888 -e DESIRED_ZOOM_LEVEL=19 -e PUBLIC_IP=34.56.78.90 -p 5000:5000 -t robosat-jupyter jupyter notebook --ip=0.0.0.0 --allow-root`
70 | 
71 | `docker ps` to find the currently running image
72 | 
73 | `docker logs {Image ID}` To see the Jupyter notebook, open the docker logs and get the URL with token.
74 | 
75 | `docker exec -it {Image ID} bash` to get into the container
76 | 
77 | Enter the URL in your browser, replacing the local IP with the IP of the remote docker container if appropriate, supply the token, and behold the notebook!
78 | 
79 | 
80 | License
81 | =======
82 | 
83 | Copyright 2018 Element 84
84 | 
85 | Licensed under the Apache License, Version 2.0 (the "License");
86 | you may not use this file except in compliance with the License.
87 | You may obtain a copy of the License at
88 | 
89 |     http://www.apache.org/licenses/LICENSE-2.0
90 | 
91 | Unless required by applicable law or agreed to in writing, software
92 | distributed under the License is distributed on an "AS IS" BASIS,
93 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
94 | See the License for the specific language governing permissions and
95 | limitations under the License.
96 | 


--------------------------------------------------------------------------------
/Robosat Labeling.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# E84 Robosat Guide"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "### This notebook is adapted from https://www.openstreetmap.org/user/daniel-j-h/diary/44321"
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "markdown",
 19 |    "metadata": {},
 20 |    "source": [
 21 |     "### Prelude"
 22 |    ]
 23 |   },
 24 |   {
 25 |    "cell_type": "markdown",
 26 |    "metadata": {},
 27 |    "source": [
 28 |     "Sanity check your ENV variables passed into the docker below (see README):"
 29 |    ]
 30 |   },
 31 |   {
 32 |    "cell_type": "code",
 33 |    "execution_count": null,
 34 |    "metadata": {},
 35 |    "outputs": [],
 36 |    "source": [
 37 |     "!echo $DESIRED_ZOOM_LEVEL\n",
 38 |     "!echo $PUBLIC_IP"
 39 |    ]
 40 |   },
 41 |   {
 42 |    "cell_type": "markdown",
 43 |    "metadata": {},
 44 |    "source": [
 45 |     "This notebook starts with a tanzania-latest.osm.pbf file. It is an OpenStreetMap file that represents all of Tanzania. The file itself is a \"Protocolbuffer Binary Format\" file. More info here:\n",
 46 |     "\n",
 47 |     "https://wiki.openstreetmap.org/wiki/PBF_Format\n",
 48 |     "\n",
 49 |     "Osmium on our P2 Ubuntu AMI is not the latest version, so we performed the following steps locally (Macbook), then uploaded the files to S3 and pulled down on our EC2 instance.\n",
 50 |     "\n",
 51 |     "First, install Osmium. On a Mac this is easiest with Homebrew:\n",
 52 |     "```\n",
 53 |     "brew install osmium-tool\n",
 54 |     "```"
 55 |    ]
 56 |   },
 57 |   {
 58 |    "cell_type": "markdown",
 59 |    "metadata": {},
 60 |    "source": [
 61 |     "Then, we grabbed this file from http://download.geofabrik.de/ using the command:\n",
 62 |     "```\n",
 63 |     "wget --limit-rate=1M http://download.geofabrik.de/africa/tanzania-latest.osm.pbf\n",
 64 |     "```"
 65 |    ]
 66 |   },
 67 |   {
 68 |    "cell_type": "markdown",
 69 |    "metadata": {},
 70 |    "source": [
 71 |     "Once we have the entire Tanzania OSM.PBF, we need to carve out our smaller area of interest. To do this we use the Osmium library to extract a bounding box and produce a smaller OSM.PBF that represents our AOI.\n",
 72 |     "\n",
 73 |     "```\n",
 74 |     "osmium extract --bbox '38.9410400390625,-7.0545565715284955,39.70458984374999,-5.711646879515092' tanzania-latest.osm.pbf --output map.osm.pbf\n",
 75 |     "```"
 76 |    ]
 77 |   },
 78 |   {
 79 |    "cell_type": "markdown",
 80 |    "metadata": {},
 81 |    "source": [
 82 |     "We copied `map.osm.pbf` from our laptop to a private S3 bucket using the AWS CLI:\n",
 83 |     "```\n",
 84 |     "aws s3 cp <local_path>/map.osm.pbf s3://<our_private_bucket>\n",
 85 |     "```\n",
 86 |     "And then onto our P2 instance (make sure IAM permissions allow your laptop to PUT and EC2 instance to GET)\n",
 87 |     "```\n",
 88 |     "aws s3 cp s3://<our_private_bucket>/map.osm.pbf /home/ubuntu/robosat/container_mount/\n",
 89 |     "```"
 90 |    ]
 91 |   },
 92 |   {
 93 |    "cell_type": "markdown",
 94 |    "metadata": {},
 95 |    "source": [
 96 |     "### Data Preparation"
 97 |    ]
 98 |   },
 99 |   {
100 |    "cell_type": "markdown",
101 |    "metadata": {},
102 |    "source": [
103 |     "RoboSat comes with a tool `rs extract` to extract geometries from an OpenStreetMap base map. We will need these geometries in a minute to create labels for each building. \n",
104 |     "\n",
105 |     "This command will generate buildings.geojson with each building geometry from map.osm.pbf (our area of interest) in this format:\n",
106 |     "\n",
107 |     "```\n",
108 |     "{\n",
109 |     "  \"type\": \"Feature\",\n",
110 |     "  \"geometry\": {\n",
111 |     "    \"type\": \"Point\",\n",
112 |     "    \"coordinates\": [125.6, 10.1]\n",
113 |     "  },\n",
114 |     "  \"properties\": {\n",
115 |     "    \"name\": \"Dinagat Islands\"\n",
116 |     "  }\n",
117 |     "}\n",
118 |     "```"
119 |    ]
120 |   },
121 |   {
122 |    "cell_type": "code",
123 |    "execution_count": null,
124 |    "metadata": {
125 |     "scrolled": true
126 |    },
127 |    "outputs": [],
128 |    "source": [
129 |     "!./rs extract --type building container_mount/map.osm.pbf container_mount/buildings.geojson"
130 |    ]
131 |   },
132 |   {
133 |    "cell_type": "markdown",
134 |    "metadata": {},
135 |    "source": [
136 |     "From that GeoJSON we need a list of all tiles with buildings in the Slippy Map filename and directory format:\n",
137 |     "\n",
138 |     "`/zoom/x/y.png` (or .webp)\n",
139 |     "\n",
140 |     "The resulting buildings.tiles is a CSV that looks like this (x, y, z):\n",
141 |     "\n",
142 |     "```\n",
143 |     "639431,544670,19\n",
144 |     "639429,544952,19\n",
145 |     "639429,544845,19 \n",
146 |     "639429,544823,19 \n",
147 |     "...\n",
148 |     "```\n",
149 |     "\n",
150 |     "More info on the SlippyMap file format here: https://wiki.openstreetmap.org/wiki/Slippy_map_tilenames\n",
151 |     "\n",
152 |     "We're using a zoom of 19 in this example but that can be changed with `$DESIRED_ZOOM_LEVEL`."
153 |    ]
154 |   },
155 |   {
156 |    "cell_type": "code",
157 |    "execution_count": null,
158 |    "metadata": {},
159 |    "outputs": [],
160 |    "source": [
161 |     "!./rs cover --zoom $DESIRED_ZOOM_LEVEL container_mount/buildings.geojson container_mount/buildings.tiles"
162 |    ]
163 |   },
164 |   {
165 |    "cell_type": "markdown",
166 |    "metadata": {},
167 |    "source": [
168 |     "Once we have our list of SlippyMap tiles, we need to download satellite imagery for each one. `rs download` takes the buildings.tiles we created earlier and downloads a 256x256 satellite image for each one if it's available. You may see some `failed, skipping` output in the next command, that's okay as we will have plenty in our dataset. These images arrive as .webp image files in the directory specified.\n",
169 |     "\n",
170 |     "**NOTE: Mapbox access token required.** (Sign-up for free at https://www.mapbox.com to get an access token)"
171 |    ]
172 |   },
173 |   {
174 |    "cell_type": "code",
175 |    "execution_count": null,
176 |    "metadata": {},
177 |    "outputs": [],
178 |    "source": [
179 |     "!./rs download https://api.mapbox.com/v4/mapbox.satellite/{z}/{x}/{y}.png?access_token=xxx container_mount/buildings.tiles container_mount/tiles/mapbox_satellite_tiles"
180 |    ]
181 |   },
182 |   {
183 |    "cell_type": "markdown",
184 |    "metadata": {},
185 |    "source": [
186 |     "For each building in buildings.tiles we also need to generate a mask. The tile size and zoom parameters must match the zoom specified earlier in the `rs cover` step (19 in our case) and the `rs download` step (a tile size of 256 is the default).\n",
187 |     "\n",
188 |     "The mask is a binary representation of a feature over a background. In our case we're showing orange buildings over a denim background. This is configured in `dataset-building.toml`:\n",
189 |     "\n",
190 |     "```\n",
191 |     "  # Human representation for classes.\n",
192 |     "  classes = ['background', 'building']\n",
193 |     "\n",
194 |     "  # Color map for visualization and representing classes in masks.\n",
195 |     "  # Note: available colors can be found in `robosat/colors.py`\n",
196 |     "  colors  = ['denim', 'orange']\n",
197 |     "```\n",
198 |     "\n",
199 |     "Satellite image and mask equivelant (from https://www.openstreetmap.org/user/daniel-j-h/diary/44321)\n",
200 |     "\n",
201 |     "![satellite image](images/satellite.png)\n",
202 |     "![mask](images/mask.png)"
203 |    ]
204 |   },
205 |   {
206 |    "cell_type": "code",
207 |    "execution_count": null,
208 |    "metadata": {
209 |     "scrolled": true
210 |    },
211 |    "outputs": [],
212 |    "source": [
213 |     "!./rs rasterize --dataset container_mount/dataset-building.toml --zoom $DESIRED_ZOOM_LEVEL --size 256 container_mount/buildings.geojson container_mount/buildings.tiles container_mount/masks"
214 |    ]
215 |   },
216 |   {
217 |    "cell_type": "markdown",
218 |    "metadata": {},
219 |    "source": [
220 |     "We generate our masks without regard to the aerial imagery we downloaded. It's therefor possible to have masks with no associated image because we weren't able to get an image for that tile. The number of masks and images must match exactly. Each image needs a corresponding mask to train the model.\n",
221 |     "\n",
222 |     "This script removes any masks that don't have images so we're left with the same number of files in /mapbox_satellite_images and /masks. It also updates buildings.tiles by removing any tiles for which we do not have an aerial image and mask."
223 |    ]
224 |   },
225 |   {
226 |    "cell_type": "code",
227 |    "execution_count": null,
228 |    "metadata": {},
229 |    "outputs": [],
230 |    "source": [
231 |     "import os\n",
232 |     "import shutil\n",
233 |     "\n",
234 |     "def remove_masks(dry_run):\n",
235 |     "    csv_lines_to_remove = []\n",
236 |     "    desired_zoom_level = os.environ['DESIRED_ZOOM_LEVEL']\n",
237 |     "    buildings_tiles_path = \"container_mount/buildings.tiles\"\n",
238 |     "    masks_path = \"container_mount/masks/\" + desired_zoom_level\n",
239 |     "    satellite_images_path = \"container_mount/tiles/mapbox_satellite_tiles/\" + desired_zoom_level\n",
240 |     "\n",
241 |     "    # Open the buildings.tiles file and get all of the lines so that we can re-write them later\n",
242 |     "    f = open(buildings_tiles_path, \"r\")\n",
243 |     "    lines = f.readlines()\n",
244 |     "    f.close()\n",
245 |     "    \n",
246 |     "    # for each mask directory, if we don't have an image dir remove the mask directory\n",
247 |     "    # else if we have the image directory, remove any masks that don't have an image in the image dir\n",
248 |     "    for dir in os.listdir(masks_path):\n",
249 |     "        if dir not in os.listdir(satellite_images_path):\n",
250 |     "            if dry_run:\n",
251 |     "                print(\"Removing mask directory: \" + dir)\n",
252 |     "            else:\n",
253 |     "                shutil.rmtree('masks/' + desired_zoom_level + '/' + dir)\n",
254 |     "                \n",
255 |     "            csv_lines_to_remove.append(dir)\n",
256 |     "        else:\n",
257 |     "            # for each mask in the masks dir, check if we have an image.\n",
258 |     "            # if we don't, remove the mask\n",
259 |     "            for mask_file in os.listdir(masks_path + \"/\" + dir):\n",
260 |     "                file_name_only = mask_file.split(\".\")[0]\n",
261 |     "                image_file = file_name_only + \".webp\"\n",
262 |     "                \n",
263 |     "                if image_file not in os.listdir(satellite_images_path + \"/\" + dir):\n",
264 |     "                    csv_line = dir + \",\" + file_name_only\n",
265 |     "                    csv_lines_to_remove.append(csv_line)\n",
266 |     "\n",
267 |     "                    if dry_run:\n",
268 |     "                        print(\"Removing mask file: \" + mask_file)\n",
269 |     "                    else:\n",
270 |     "                        os.remove(masks_path + \"/\" + dir + \"/\" + mask_file)\n",
271 |     "                    \n",
272 |     "    \n",
273 |     "    # We also need to remove the line from buildings.tiles if we don't have an image\n",
274 |     "    f = open(buildings_tiles_path, \"w\")\n",
275 |     "    for line in lines:\n",
276 |     "        skip = False\n",
277 |     "        \n",
278 |     "        for path in csv_lines_to_remove:\n",
279 |     "            if path in line:\n",
280 |     "                skip = True\n",
281 |     "        \n",
282 |     "        if not skip:\n",
283 |     "            if dry_run:\n",
284 |     "                print(\"Writing CSV line: \" + line)\n",
285 |     "            else:\n",
286 |     "                f.write(line)\n",
287 |     "            \n",
288 |     "    f.close()\n",
289 |     "\n",
290 |     "\n",
291 |     "# Switch param to True to perform a dry-run and print the files that would be removed\n",
292 |     "remove_masks(False)"
293 |    ]
294 |   },
295 |   {
296 |    "cell_type": "markdown",
297 |    "metadata": {},
298 |    "source": [
299 |     "### Training"
300 |    ]
301 |   },
302 |   {
303 |    "cell_type": "markdown",
304 |    "metadata": {},
305 |    "source": [
306 |     "The following commands split our buildings.tiles set into 3 subsets:\n",
307 |     "\n",
308 |     "- a training dataset on which we train the model on\n",
309 |     "- a validation dataset on which we calculate metrics on after training\n",
310 |     "- a hold-out evaluation dataset if you want to do hyper-parameter tuning\n",
311 |     "\n",
312 |     "Split 80/10/10. The resulting files are:\n",
313 |     "\n",
314 |     "- training.tiles\n",
315 |     "- validation.tiles\n",
316 |     "- evaluation.tiles"
317 |    ]
318 |   },
319 |   {
320 |    "cell_type": "code",
321 |    "execution_count": null,
322 |    "metadata": {},
323 |    "outputs": [],
324 |    "source": [
325 |     "!split -l $(expr $(cat container_mount/buildings.tiles | wc -l) \\* 80 / 100) container_mount/buildings.tiles training_\n",
326 |     "!split -l $(expr $(cat training_ab | wc -l) \\* 50 / 100) training_ab holdout_validation_\n",
327 |     "!mv training_aa container_mount/training.tiles\n",
328 |     "!mv holdout_validation_aa container_mount/validation.tiles\n",
329 |     "!mv holdout_validation_ab container_mount/evaluation.tiles"
330 |    ]
331 |   },
332 |   {
333 |    "cell_type": "markdown",
334 |    "metadata": {},
335 |    "source": [
336 |     "We then use `rs subset` to split the images and masks according to each .tiles csv we created in the previous script. The result of these commands should be a set of folders that looks like:\n",
337 |     "\n",
338 |     "```\n",
339 |     "├── dataset\n",
340 |     "    └── training\n",
341 |     "        └── images\n",
342 |     "        └── labels\n",
343 |     "    └── validation\n",
344 |     "        └── images\n",
345 |     "        └── labels\n",
346 |     "    └── evaluation\n",
347 |     "        └── images\n",
348 |     "        └── labels\n",
349 |     "```\n",
350 |     "\n",
351 |     "Each images and labels directory should contain .webps and .pngs for each tile in the corresponding .tiles file.\n",
352 |     "\n"
353 |    ]
354 |   },
355 |   {
356 |    "cell_type": "code",
357 |    "execution_count": null,
358 |    "metadata": {},
359 |    "outputs": [],
360 |    "source": [
361 |     "!./rs subset container_mount/tiles/mapbox_satellite_tiles container_mount/validation.tiles container_mount/dataset/validation/images\n",
362 |     "!./rs subset container_mount/masks container_mount/validation.tiles container_mount/dataset/validation/labels\n",
363 |     "\n",
364 |     "!./rs subset container_mount/tiles/mapbox_satellite_tiles container_mount/training.tiles container_mount/dataset/training/images\n",
365 |     "!./rs subset container_mount/masks container_mount/training.tiles container_mount/dataset/training/labels\n",
366 |     "\n",
367 |     "!./rs subset container_mount/tiles/mapbox_satellite_tiles container_mount/evaluation.tiles container_mount/dataset/evaluation/images\n",
368 |     "!./rs subset container_mount/masks container_mount/evaluation.tiles container_mount/dataset/evaluation/labels"
369 |    ]
370 |   },
371 |   {
372 |    "cell_type": "markdown",
373 |    "metadata": {},
374 |    "source": [
375 |     "We're almost ready to start training. \n",
376 |     "\n",
377 |     "Before training the model we need to calculate the class distribution since background and building pixels are not evenly distributed in our images. `rs weights` will use the classes and dataset that we set up in dataset-buildint.toml and assign weights to each class."
378 |    ]
379 |   },
380 |   {
381 |    "cell_type": "code",
382 |    "execution_count": null,
383 |    "metadata": {},
384 |    "outputs": [],
385 |    "source": [
386 |     "!./rs weights --dataset /app/container_mount/dataset-building.toml"
387 |    ]
388 |   },
389 |   {
390 |    "cell_type": "markdown",
391 |    "metadata": {},
392 |    "source": [
393 |     "**Important:** With the output of the `rs weights` command, update the dataset-building.toml. You will need to replace the existing values with the new ones. For example:\n",
394 |     "\n",
395 |     "```\n",
396 |     "[weights]\n",
397 |     "   values = [1.615929, 5.943651]\n",
398 |     "```"
399 |    ]
400 |   },
401 |   {
402 |    "cell_type": "markdown",
403 |    "metadata": {},
404 |    "source": [
405 |     "Once the weights are updated in the toml file, train the model!\n",
406 |     "\n",
407 |     "For the first pass you can use the parameters already set in `model-unet.toml`. One thing to double-check is the checkpoint output directory. Be sure this is pointing to a location that you can access. E.g.\n",
408 |     "\n",
409 |     "`checkpoint = '/app/container_mount/checkpoints/'`"
410 |    ]
411 |   },
412 |   {
413 |    "cell_type": "code",
414 |    "execution_count": null,
415 |    "metadata": {},
416 |    "outputs": [],
417 |    "source": [
418 |     "!./rs train --model /app/container_mount/model-unet.toml --dataset /app/container_mount/dataset-building.toml"
419 |    ]
420 |   },
421 |   {
422 |    "cell_type": "markdown",
423 |    "metadata": {},
424 |    "source": [
425 |     "Training will take quite a lot of time depending on your EC2 instance. You will see the progress in the output of the previous step and, at the end of each epoch, you will see output in your checkpoint directory. In our case `/app/robosat_container_files/retraining/`."
426 |    ]
427 |   },
428 |   {
429 |    "cell_type": "markdown",
430 |    "metadata": {},
431 |    "source": [
432 |     "### Prediction"
433 |    ]
434 |   },
435 |   {
436 |    "cell_type": "markdown",
437 |    "metadata": {},
438 |    "source": [
439 |     "Now that we have our model we can use it to visualize predictions using the serve tool. Note that you need to both export your token in MAPBOX_ACCESS_TOKEN **and as part of the mapbox url** (at the end of the `./rs serve` string)!\n",
440 |     "\n",
441 |     "The checkpoint we're using here is `checkpoint-00002-of-00010.pth`. Change this to the checkpoint you would like to use for your predictions.\n",
442 |     "\n",
443 |     "It is not recommended to run this if `rs train` is running unless you have an EC2 instance that can support it."
444 |    ]
445 |   },
446 |   {
447 |    "cell_type": "code",
448 |    "execution_count": null,
449 |    "metadata": {},
450 |    "outputs": [],
451 |    "source": [
452 |     "!export MAPBOX_ACCESS_TOKEN=xxx && \\\n",
453 |     "./rs serve --model /app/container_mount/model-unet.toml --dataset /app/container_mount/dataset-building.toml --checkpoint /app/container_mount/checkpoints/best-chkpt.pth --tile_size 256 --host 0.0.0.0 --url https://api.mapbox.com/v4/mapbox.satellite/{z}/{x}/{y}@2x.webp?access_token=xxx"
454 |    ]
455 |   },
456 |   {
457 |    "cell_type": "markdown",
458 |    "metadata": {},
459 |    "source": [
460 |     "Make sure you have port 5000 open in security groups if running on EC2, and access via `http://<public_ip_or_dns>:5000`. You should see models being rendered!"
461 |    ]
462 |   },
463 |   {
464 |    "cell_type": "markdown",
465 |    "metadata": {},
466 |    "source": [
467 |     "### Hard-negative mining"
468 |    ]
469 |   },
470 |   {
471 |    "cell_type": "markdown",
472 |    "metadata": {},
473 |    "source": [
474 |     "This section walks through the steps necessary to tune the model using \"negative\" images. In our case that means satellite images that 100% definitely do **not** have a building in them. Our objective here is to add a set of negative images and their associated negative masks to the dataset and retrain. The result should be a better performing model with fewer false positives.\n",
475 |     "\n",
476 |     "There are a couple different approaches here. From the original blog post (https://www.openstreetmap.org/user/daniel-j-h/diary/44321):\n",
477 |     "\n",
478 |     ">The false positives are due to how we created the dataset: we bootstrapped a dataset based on tiles with buildings in them. Even though these tiles have some background pixels they won't contain enough background (so called negative samples) to properly learn what is not a building. If we never showed the model a single image of water it has a hard time classifying it as background.\n",
479 |     "\n",
480 |     ">There are two ways for us to approach this problem:\n",
481 |     "1. add many randomly sampled background tiles to the training set, re-compute class distribution weights, then train again, or\n",
482 |     "2. use the model we trained on the bootstrapped dataset and predict on tiles where we know there are no buildings; if the model tells us there is a building put these tiles into the dataset with an all-background mask, then train again\n",
483 |     "\n",
484 |     "Although you may achieve better results with option 2, we're going to demonstrate option for for simplicity."
485 |    ]
486 |   },
487 |   {
488 |    "cell_type": "markdown",
489 |    "metadata": {},
490 |    "source": [
491 |     "We have created a new Robosat module called `NotBuildingHandler` which is the inverse of the building handler. Instead of extracting the geojson for buildings, it extracts the geojson for everything that is not a building. We can use this geojson file to download (what we believe will be) negative tiles. We will need to verify manually."
492 |    ]
493 |   },
494 |   {
495 |    "cell_type": "code",
496 |    "execution_count": null,
497 |    "metadata": {},
498 |    "outputs": [],
499 |    "source": [
500 |     "!./rs extract --type not_building container_mount/map.osm.pbf container_mount/maybe_not_buildings.geojson"
501 |    ]
502 |   },
503 |   {
504 |    "cell_type": "markdown",
505 |    "metadata": {},
506 |    "source": [
507 |     "Once we have the GeoJSON, run `rs cover` to create the .tiles csv and use that to rs download all of the images from mapbox"
508 |    ]
509 |   },
510 |   {
511 |    "cell_type": "code",
512 |    "execution_count": null,
513 |    "metadata": {},
514 |    "outputs": [],
515 |    "source": [
516 |     "!./rs cover --zoom $DESIRED_ZOOM_LEVEL container_mount/maybe_not_buildings.geojson container_mount/maybe_not_buildings.tiles"
517 |    ]
518 |   },
519 |   {
520 |    "cell_type": "markdown",
521 |    "metadata": {},
522 |    "source": [
523 |     "Download all (non-building) tiles listed in our new .tiles file from Mapbox"
524 |    ]
525 |   },
526 |   {
527 |    "cell_type": "code",
528 |    "execution_count": null,
529 |    "metadata": {},
530 |    "outputs": [],
531 |    "source": [
532 |     "!./rs download https://api.mapbox.com/v4/mapbox.satellite/{z}/{x}/{y}.png?access_token=xxx container_mount/maybe_not_buildings.tiles container_mount/negative_mining_images"
533 |    ]
534 |   },
535 |   {
536 |    "cell_type": "markdown",
537 |    "metadata": {},
538 |    "source": [
539 |     "After downloading all of the possible negatives from Mapbox, manually review and select a set of negative images. For each select, add it to a new .tiles file (in our case definitely_not_buildings.tiles) and re-download only those tiles.\n",
540 |     "\n",
541 |     "This is redundant. We could also copy from our already downloaded image library and save this request. This approach is taken because it presents cleaner in the notebook but feel free to do either."
542 |    ]
543 |   },
544 |   {
545 |    "cell_type": "code",
546 |    "execution_count": null,
547 |    "metadata": {},
548 |    "outputs": [],
549 |    "source": [
550 |     "!./rs download https://api.mapbox.com/v4/mapbox.satellite/{z}/{x}/{y}.png?access_token=xxx container_mount/definitely_not_buildings.tiles container_mount/negative_mining_select_images"
551 |    ]
552 |   },
553 |   {
554 |    "cell_type": "markdown",
555 |    "metadata": {},
556 |    "source": [
557 |     "Create the masks for each negative tile in definitely_not_buildings.tiles. Note that we need to use a separate dataset-negative-buildings.toml to configure the colors correctly. If we use the original .toml configuration we'll see all orange masks because our features are not \"building\" but \"not_building\".\n",
558 |     "\n",
559 |     "We have also made 100% sure that the masks generated here are going to be all negative by adding only one color to the dataset-building-negative.toml file - `colors  = ['denim', 'denim']`. As above, this could be accomplished in different ways."
560 |    ]
561 |   },
562 |   {
563 |    "cell_type": "code",
564 |    "execution_count": null,
565 |    "metadata": {},
566 |    "outputs": [],
567 |    "source": [
568 |     "!./rs rasterize --dataset container_mount/dataset-building-negative.toml --zoom $DESIRED_ZOOM_LEVEL --size 256 container_mount/maybe_not_buildings.geojson container_mount/definitely_not_buildings.tiles container_mount/negative_mining_masks"
569 |    ]
570 |   },
571 |   {
572 |    "cell_type": "markdown",
573 |    "metadata": {},
574 |    "source": [
575 |     "Once we have our negative masks and images, we need to subset them into the existing directories. The following two blocks will create new .tiles files for only our negative samples and use those to place the images and masks in the existing /dataset/ subdirectories."
576 |    ]
577 |   },
578 |   {
579 |    "cell_type": "code",
580 |    "execution_count": null,
581 |    "metadata": {},
582 |    "outputs": [],
583 |    "source": [
584 |     "!split -l $(expr $(cat container_mount/definitely_not_buildings.tiles | wc -l) \\* 80 / 100) container_mount/definitely_not_buildings.tiles training_\n",
585 |     "!split -l $(expr $(cat training_ab | wc -l) \\* 50 / 100) training_ab holdout_validation_\n",
586 |     "!mv training_aa negative_training.tiles\n",
587 |     "!mv holdout_validation_aa negative_validation.tiles\n",
588 |     "!mv holdout_validation_ab negative_evaluation.tiles"
589 |    ]
590 |   },
591 |   {
592 |    "cell_type": "code",
593 |    "execution_count": null,
594 |    "metadata": {},
595 |    "outputs": [],
596 |    "source": [
597 |     "!./rs subset container_mount/negative_mining_select_images negative_validation.tiles container_mount/dataset/validation/images\n",
598 |     "!./rs subset container_mount/negative_mining_masks negative_validation.tiles container_mount/dataset/validation/labels\n",
599 |     "\n",
600 |     "!./rs subset container_mount/negative_mining_select_images negative_training.tiles container_mount/dataset/training/images\n",
601 |     "!./rs subset container_mount/negative_mining_masks negative_training.tiles container_mount/dataset/training/labels\n",
602 |     "\n",
603 |     "!./rs subset container_mount/negative_mining_select_images negative_evaluation.tiles container_mount/dataset/evaluation/images\n",
604 |     "!./rs subset container_mount/negative_mining_masks negative_evaluation.tiles container_mount/dataset/evaluation/labels"
605 |    ]
606 |   },
607 |   {
608 |    "cell_type": "markdown",
609 |    "metadata": {},
610 |    "source": [
611 |     "Re-run rs weights with the negative tiles and masks added to the dataset. This will output an array of two numbers, add these to dataset-building.toml under [weights] as before, replacing the previous values. While you're in the .toml file, edit the `dataset = `line to point to your mounted volume so that you will not lose checkpoints once the instance is terminated."
612 |    ]
613 |   },
614 |   {
615 |    "cell_type": "code",
616 |    "execution_count": null,
617 |    "metadata": {},
618 |    "outputs": [],
619 |    "source": [
620 |     "!./rs weights --dataset /app/container_mount/dataset-building.toml"
621 |    ]
622 |   },
623 |   {
624 |    "cell_type": "markdown",
625 |    "metadata": {},
626 |    "source": [
627 |     "Re-run train and look for an improvement!"
628 |    ]
629 |   },
630 |   {
631 |    "cell_type": "code",
632 |    "execution_count": null,
633 |    "metadata": {},
634 |    "outputs": [],
635 |    "source": [
636 |     "!./rs train --model /app/container_mount/model-unet.toml --dataset /app/container_mount/dataset-building.toml"
637 |    ]
638 |   }
639 |  ],
640 |  "metadata": {
641 |   "kernelspec": {
642 |    "display_name": "Python 3",
643 |    "language": "python",
644 |    "name": "python3"
645 |   },
646 |   "language_info": {
647 |    "codemirror_mode": {
648 |     "name": "ipython",
649 |     "version": 3
650 |    },
651 |    "file_extension": ".py",
652 |    "mimetype": "text/x-python",
653 |    "name": "python",
654 |    "nbconvert_exporter": "python",
655 |    "pygments_lexer": "ipython3",
656 |    "version": "3.5.2"
657 |   }
658 |  },
659 |  "nbformat": 4,
660 |  "nbformat_minor": 2
661 | }
662 | 


--------------------------------------------------------------------------------
/entrypoint.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | if [[ -z $DESIRED_ZOOM_LEVEL ]] || [[ -z $PUBLIC_IP ]]; then
 4 | 	echo 'ERROR: Missing required docker environment variables DESIRED_ZOOM_LEVEL or PUBLIC_IP!'
 5 | 	exit 1
 6 | fi
 7 | 
 8 | sed -i "s/DESIRED_ZOOM_LEVEL/$DESIRED_ZOOM_LEVEL/g" /app/robosat/tools/serve.py
 9 | sed -i "s/DESIRED_ZOOM_LEVEL/$DESIRED_ZOOM_LEVEL/g" /app/robosat/tools/templates/map.html
10 | sed -i "s/PUBLIC_IP/$PUBLIC_IP/g" /app/robosat/tools/templates/map.html
11 | exec "$@"
12 | 


--------------------------------------------------------------------------------
/images/mask.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Element84/robosat-jupyter-notebook/24fe9c5eac59b6d36c2d2303314b2c8b75984c56/images/mask.png


--------------------------------------------------------------------------------
/images/satellite.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Element84/robosat-jupyter-notebook/24fe9c5eac59b6d36c2d2303314b2c8b75984c56/images/satellite.png


--------------------------------------------------------------------------------
/osm/map.osm.pbf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Element84/robosat-jupyter-notebook/24fe9c5eac59b6d36c2d2303314b2c8b75984c56/osm/map.osm.pbf


--------------------------------------------------------------------------------
/reload_docker.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | OS=$1
 3 | if [[ "$#" != 1 ]] || [[ "$OS" != 'mac' && "$OS" != 'ubuntu' ]]; then
 4 |   echo "Must supply OS where this is running ['mac','ubuntu']. E.g., './reload_docker.sh mac'"
 5 |   exit 1
 6 | fi
 7 | 
 8 | echo "git pulling for the latest"
 9 | git pull
10 | 
11 | echo 'Killing the currently running docker, if there is one'
12 | RUNNING_DOCKER=`docker ps --filter ancestor=robosat-jupyter --format={{.ID}}`
13 | if [[ -z "$RUNNING_DOCKER" ]]; then
14 | 	echo "No robosat-jupyter docker image running, continuing"
15 | else
16 | 	echo "Killing running robosat-jupyter docker with ID: $RUNNING_DOCKER"
17 | 	docker kill "$RUNNING_DOCKER"
18 | fi
19 | 
20 | echo "Rebuilding image from source"
21 | docker build -t robosat-jupyter .
22 | 
23 | if [[ "$OS" == 'mac' ]]; then
24 |   echo "Running local for a mac (no nvida drivers)"
25 |   docker run -d -p 8888:8888 -p 5000:5000 -e DESIRED_ZOOM_LEVEL=19 -e PUBLIC_IP=127.0.0.1 -t robosat-jupyter jupyter notebook --ip=0.0.0.0 --allow-root
26 | elif [[ "$OS" == 'ubuntu' ]]; then
27 |   echo "Running local for an EC2 ubuntu image with nvida drivers"
28 |   docker run -d --runtime=nvidia -e DESIRED_ZOOM_LEVEL=19 -e PUBLIC_IP=127.0.0.1 -v /home/ubuntu/robosat/container_mount:/app/container_mount -v /home/ubuntu/robosat_container_files:/app/robosat_container_files -p 8888:8888 -p 5000:5000 -t robosat-jupyter jupyter notebook --ip=0.0.0.0 --allow-root
29 | else
30 |   echo "OS $OS is not supported"
31 |   exit 1
32 | fi
33 | 
34 | NEW_DOCKER=`docker ps --filter ancestor=robosat-jupyter --format={{.ID}}`
35 | sleep 5
36 | echo "New Jupyter Token: "
37 | docker logs "$NEW_DOCKER" | grep token | tail -1 | awk -F= '{print $2}'
38 | 
39 | 


--------------------------------------------------------------------------------