"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": null,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "%matplotlib inline\n",
17 | "\n",
18 | "import pandas as pd\n",
19 | "import geopandas"
20 | ]
21 | },
22 | {
23 | "cell_type": "code",
24 | "execution_count": null,
25 | "metadata": {},
26 | "outputs": [],
27 | "source": [
28 | "countries = geopandas.read_file(\"data/ne_110m_admin_0_countries.zip\")\n",
29 | "cities = geopandas.read_file(\"data/ne_110m_populated_places.zip\")\n",
30 | "rivers = geopandas.read_file(\"data/ne_50m_rivers_lake_centerlines.zip\")"
31 | ]
32 | },
33 | {
34 | "cell_type": "markdown",
35 | "metadata": {},
36 | "source": [
37 | "## Coordinate reference systems\n",
38 | "\n",
39 | "Up to now, we have used the geometry data with certain coordinates without further wondering what those coordinates mean or how they are expressed.\n",
40 | "\n",
41 | "> The **Coordinate Reference System (CRS)** relates the coordinates to a specific location on earth.\n",
42 | "\n",
43 | "For an in-depth explanation, see https://docs.qgis.org/2.8/en/docs/gentle_gis_introduction/coordinate_reference_systems.html"
44 | ]
45 | },
46 | {
47 | "cell_type": "markdown",
48 | "metadata": {},
49 | "source": [
50 | "### Geographic coordinates\n",
51 | "\n",
52 | "> Degrees of latitude and longitude.\n",
53 | ">\n",
54 | "> E.g. 48°51′N, 2°17′E\n",
55 | "\n",
56 | "The most known type of coordinates are geographic coordinates: we define a position on the globe in degrees of latitude and longitude, relative to the equator and the prime meridian. \n",
57 | "With this system, we can easily specify any location on earth. It is used widely, for example in GPS. If you inspect the coordinates of a location in Google Maps, you will also see latitude and longitude.\n",
58 | "\n",
59 | "**Attention!**\n",
60 | "\n",
61 | "in Python we use (lon, lat) and not (lat, lon)\n",
62 | "\n",
63 | "- Longitude: [-180, 180]{{1}}\n",
64 | "- Latitude: [-90, 90]{{1}}"
65 | ]
66 | },
67 | {
68 | "cell_type": "markdown",
69 | "metadata": {},
70 | "source": [
71 | "### Projected coordinates\n",
72 | "\n",
73 | "> `(x, y)` coordinates are usually in meters or feet\n",
74 | "\n",
75 | "Although the earth is a globe, in practice we usually represent it on a flat surface: think about a physical map, or the figures we have made with Python on our computer screen.\n",
76 | "Going from the globe to a flat map is what we call a *projection*.\n",
77 | "\n",
78 | "\n",
79 | "\n",
80 | "We project the surface of the earth onto a 2D plane so we can express locations in cartesian x and y coordinates, on a flat surface. In this plane, we then typically work with a length unit such as meters instead of degrees, which makes the analysis more convenient and effective.\n",
81 | "\n",
82 | "However, there is an important remark: the 3 dimensional earth can never be represented perfectly on a 2 dimensional map, so projections inevitably introduce distortions. To minimize such errors, there are different approaches to project, each with specific advantages and disadvantages.\n",
83 | "\n",
84 | "Some projection systems will try to preserve the area size of geometries, such as the Albers Equal Area projection. Other projection systems try to preserve angles, such as the Mercator projection, but will see big distortions in the area. Every projection system will always have some distortion of area, angle or distance.\n",
85 | "\n",
86 | "
\n",
87 | "
\n",
88 | "
\n",
89 | "
\n",
90 | "
\n",
91 | "
\n",
92 | "
"
93 | ]
94 | },
95 | {
96 | "cell_type": "markdown",
97 | "metadata": {},
98 | "source": [
99 | "**Projected size vs actual size (Mercator projection)**:\n",
100 | "\n",
101 | ""
102 | ]
103 | },
104 | {
105 | "cell_type": "markdown",
106 | "metadata": {},
107 | "source": [
108 | "## Coordinate Reference Systems in Python / GeoPandas"
109 | ]
110 | },
111 | {
112 | "cell_type": "markdown",
113 | "metadata": {},
114 | "source": [
115 | "A GeoDataFrame or GeoSeries has a `.crs` attribute which holds (optionally) a description of the coordinate reference system of the geometries:"
116 | ]
117 | },
118 | {
119 | "cell_type": "code",
120 | "execution_count": null,
121 | "metadata": {
122 | "jupyter": {
123 | "outputs_hidden": false
124 | }
125 | },
126 | "outputs": [],
127 | "source": [
128 | "countries.crs"
129 | ]
130 | },
131 | {
132 | "cell_type": "markdown",
133 | "metadata": {},
134 | "source": [
135 | "For the `countries` dataframe, it indicates that it uses the EPSG 4326 / WGS84 lon/lat reference system, which is one of the most used for geographic coordinates.\n",
136 | "\n",
137 | "\n",
138 | "It uses coordinates as latitude and longitude in degrees, as can you be seen from the x/y labels on the plot:"
139 | ]
140 | },
141 | {
142 | "cell_type": "code",
143 | "execution_count": null,
144 | "metadata": {
145 | "jupyter": {
146 | "outputs_hidden": false
147 | }
148 | },
149 | "outputs": [],
150 | "source": [
151 | "countries.plot()"
152 | ]
153 | },
154 | {
155 | "cell_type": "markdown",
156 | "metadata": {},
157 | "source": [
158 | "The `.crs` attribute returns a `pyproj.CRS` object. To specify a CRS, we typically use some string representation:\n",
159 | "\n",
160 | "\n",
161 | "- **EPSG code**\n",
162 | " \n",
163 | " Example: `EPSG:4326` = WGS84 geographic CRS (longitude, latitude)\n",
164 | " \n",
165 | "- **Well-Know-Text (WKT)** representation\n",
166 | "\n",
167 | "- In older software and datasets, you might also encounter a \"`proj4` string\" representation:\n",
168 | " \n",
169 | " Example: `+proj=longlat +datum=WGS84 +no_defs`\n",
170 | "\n",
171 | " This is however no longer recommended.\n",
172 | "\n",
173 | "\n",
174 | "See eg https://epsg.io/4326\n",
175 | "\n",
176 | "Under the hood, GeoPandas uses the `pyproj` / `PROJ` libraries to deal with the re-projections.\n",
177 | "\n",
178 | "For more information, see also http://geopandas.readthedocs.io/en/latest/projections.html."
179 | ]
180 | },
181 | {
182 | "cell_type": "markdown",
183 | "metadata": {},
184 | "source": [
185 | "### Transforming to another CRS\n",
186 | "\n",
187 | "We can convert a GeoDataFrame to another reference system using the `to_crs` function. \n",
188 | "\n",
189 | "For example, let's convert the countries to the World Mercator projection (http://epsg.io/3395):"
190 | ]
191 | },
192 | {
193 | "cell_type": "code",
194 | "execution_count": null,
195 | "metadata": {},
196 | "outputs": [],
197 | "source": [
198 | "# remove Antartica, as the Mercator projection cannot deal with the poles\n",
199 | "countries = countries[(countries['name'] != \"Antarctica\")]"
200 | ]
201 | },
202 | {
203 | "cell_type": "code",
204 | "execution_count": null,
205 | "metadata": {},
206 | "outputs": [],
207 | "source": [
208 | "countries_mercator = countries.to_crs(epsg=3395) # or .to_crs(\"EPSG:3395\")"
209 | ]
210 | },
211 | {
212 | "cell_type": "code",
213 | "execution_count": null,
214 | "metadata": {
215 | "jupyter": {
216 | "outputs_hidden": false
217 | }
218 | },
219 | "outputs": [],
220 | "source": [
221 | "countries_mercator.plot()"
222 | ]
223 | },
224 | {
225 | "cell_type": "markdown",
226 | "metadata": {},
227 | "source": [
228 | "Note the different scale of x and y."
229 | ]
230 | },
231 | {
232 | "cell_type": "markdown",
233 | "metadata": {},
234 | "source": [
235 | "### Why using a different CRS?\n",
236 | "\n",
237 | "There are sometimes good reasons you want to change the coordinate references system of your dataset, for example:\n",
238 | "\n",
239 | "- Different sources with different CRS -> need to convert to the same crs\n",
240 | "\n",
241 | " ```python\n",
242 | " df1 = geopandas.read_file(...)\n",
243 | " df2 = geopandas.read_file(...)\n",
244 | "\n",
245 | " df2 = df2.to_crs(df1.crs)\n",
246 | " ```\n",
247 | "\n",
248 | "- Mapping (distortion of shape and distances)\n",
249 | "\n",
250 | "- Distance / area based calculations -> ensure you use an appropriate projected coordinate system expressed in a meaningful unit such as meters or feet (not degrees).\n",
251 | "\n",
252 | "
\n",
253 | "\n",
254 | "**ATTENTION:**\n",
255 | "\n",
256 | "All the calculations that happen in GeoPandas and Shapely assume that your data is in a 2D cartesian plane, and thus the result of those calculations will only be correct if your data is properly projected.\n",
257 | "\n",
258 | "
"
259 | ]
260 | },
261 | {
262 | "cell_type": "markdown",
263 | "metadata": {},
264 | "source": [
265 | "## Let's practice!\n",
266 | "\n",
267 | "Again, we will go back to the Paris datasets. Up to now, we provided the datasets in an appropriate projected CRS for the exercises. But the original data were actually using geographic coordinates. In the following exercises, we will start from there.\n",
268 | "\n",
269 | "---"
270 | ]
271 | },
272 | {
273 | "cell_type": "markdown",
274 | "metadata": {},
275 | "source": [
276 | "Going back to the Paris districts dataset, this is now provided as a GeoJSON file (`\"data/paris_districts.geojson\"`) in geographic coordinates.\n",
277 | "\n",
278 | "For converting to projected coordinates, we will use the standard projected CRS for France is the RGF93 / Lambert-93 reference system, referenced by the `EPSG:2154` number (in Belgium this would be Lambert 72, EPSG:31370).\n",
279 | "\n",
280 | "
\n",
281 | "\n",
282 | "**EXERCISE 1: Projecting a GeoDataFrame**\n",
283 | "\n",
284 | "* Read the districts datasets (`\"data/paris_districts.geojson\"`) into a GeoDataFrame called `districts`.\n",
285 | "* Look at the CRS attribute of the GeoDataFrame. Do you recognize the EPSG number?\n",
286 | "* Make a plot of the `districts` dataset.\n",
287 | "* Calculate the area of all districts.\n",
288 | "* Convert the `districts` to a projected CRS (using the `EPSG:2154` for France). Call the new dataset `districts_RGF93`.\n",
289 | "* Make a similar plot of `districts_RGF93`.\n",
290 | "* Calculate the area of all districts again with `districts_RGF93` (the result will now be expressed in m²).\n",
291 | " \n",
292 | " \n",
293 | "Hints\n",
294 | "\n",
295 | "* The CRS information is stored in the `.crs` attribute of a GeoDataFrame.\n",
296 | "* Making a simple plot of a GeoDataFrame can be done with the `.plot()` method.\n",
297 | "* Converting to a different CRS can be done with the `.to_crs()` method, and the CRS can be specified as an EPSG number using the `epsg` keyword.\n",
298 | "\n",
299 | "\n",
300 | "\n",
301 | "
\n",
431 | "\n",
432 | "**EXERCISE 2:**\n",
433 | "\n",
434 | "In the previous notebook, we did an exercise on plotting the bike stations locations in Paris and adding a background map to it using the `contextily` package.\n",
435 | "\n",
436 | "Currently, `contextily` assumes that your data is in the Web Mercator projection, the system used by most web tile services. And in that first exercise, we provided the data in the appropriate CRS so you didn't need to care about this aspect.\n",
437 | "\n",
438 | "However, typically, your data will not come in Web Mercator (`EPSG:3857`) and you will have to align them with web tiles on your own.\n",
439 | " \n",
440 | "* Read the bike stations datasets (`\"data/paris_bike_stations.geojson\"`) into a GeoDataFrame called `stations`.\n",
441 | "* Convert the `stations` dataset to the Web Mercator projection (`EPSG:3857`). Call the result `stations_webmercator`, and inspect the result.\n",
442 | "* Make a plot of this projected dataset (specify the marker size to be 5) and add a background map using `contextily`.\n",
443 | "\n",
444 | " \n",
445 | "Hints\n",
446 | "\n",
447 | "* Making a simple plot of a GeoDataFrame can be done with the `.plot()` method. This returns a matplotlib axes object.\n",
448 | "* The marker size can be specified with the `markersize` keyword if the `.plot()` method.\n",
449 | "* To add a background map, use the `contextily.add_basemap()` function. It takes the matplotlib `ax` to which to add a map as the first argument.\n",
450 | "\n",
451 | "\n",
452 | "\n",
453 | "
\n",
485 | "\n",
486 | "**NOTE**: \n",
487 | "\n",
488 | "Making a quick plot using Folium is now also available as the `.explore()` method on a GeoDataFrame or GeoSeries.\n",
489 | "\n",
490 | "See https://geopandas.org/en/stable/docs/user_guide/interactive_mapping.html for more examples.\n",
491 | "
"
492 | ]
493 | }
494 | ],
495 | "metadata": {
496 | "kernelspec": {
497 | "display_name": "Python 3 (ipykernel)",
498 | "language": "python",
499 | "name": "python3"
500 | },
501 | "language_info": {
502 | "codemirror_mode": {
503 | "name": "ipython",
504 | "version": 3
505 | },
506 | "file_extension": ".py",
507 | "mimetype": "text/x-python",
508 | "name": "python",
509 | "nbconvert_exporter": "python",
510 | "pygments_lexer": "ipython3",
511 | "version": "3.10.6"
512 | },
513 | "widgets": {
514 | "application/vnd.jupyter.widget-state+json": {
515 | "state": {},
516 | "version_major": 2,
517 | "version_minor": 0
518 | }
519 | }
520 | },
521 | "nbformat": 4,
522 | "nbformat_minor": 4
523 | }
524 |
--------------------------------------------------------------------------------
/06-scaling-geopandas-dask.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Scaling geospatial analysis with GeoPandas and Dask"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "For now, see https://dask-geopandas.readthedocs.io/en/stable/guide/basic-intro.html"
15 | ]
16 | },
17 | {
18 | "cell_type": "code",
19 | "execution_count": null,
20 | "metadata": {},
21 | "outputs": [],
22 | "source": []
23 | }
24 | ],
25 | "metadata": {
26 | "kernelspec": {
27 | "display_name": "Python 3 (ipykernel)",
28 | "language": "python",
29 | "name": "python3"
30 | },
31 | "language_info": {
32 | "codemirror_mode": {
33 | "name": "ipython",
34 | "version": 3
35 | },
36 | "file_extension": ".py",
37 | "mimetype": "text/x-python",
38 | "name": "python",
39 | "nbconvert_exporter": "python",
40 | "pygments_lexer": "ipython3",
41 | "version": "3.10.6"
42 | },
43 | "widgets": {
44 | "application/vnd.jupyter.widget-state+json": {
45 | "state": {},
46 | "version_major": 2,
47 | "version_minor": 0
48 | }
49 | }
50 | },
51 | "nbformat": 4,
52 | "nbformat_minor": 4
53 | }
54 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | BSD 3-Clause License
2 |
3 | Copyright (c) 2018, Joris Van den Bossche
4 | All rights reserved.
5 |
6 | Redistribution and use in source and binary forms, with or without
7 | modification, are permitted provided that the following conditions are met:
8 |
9 | * Redistributions of source code must retain the above copyright notice, this
10 | list of conditions and the following disclaimer.
11 |
12 | * Redistributions in binary form must reproduce the above copyright notice,
13 | this list of conditions and the following disclaimer in the documentation
14 | and/or other materials provided with the distribution.
15 |
16 | * Neither the name of the copyright holder nor the names of its
17 | contributors may be used to endorse or promote products derived from
18 | this software without specific prior written permission.
19 |
20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Introduction to geospatial data analysis with GeoPandas and the PyData stack
2 |
3 | [](https://mybinder.org/v2/gh/jorisvandenbossche/geopandas-tutorial/main)
4 |
5 | ## Tutorial on geospatial data manipulation with Python
6 |
7 | This tutorial is an introduction to geospatial data analysis in Python, with a focus on tabular vector data using GeoPandas.
8 | It will introduce the different libraries to work with geospatial data and will cover munging geo-data and exploring relations over space. This includes importing data in different formats (e.g. shapefile, GeoJSON), visualizing, combining and tidying them up for analysis, exploring spatial relationships, ... and will use libraries such as pandas, geopandas, shapely, pyproj, matplotlib, ...
9 |
10 | The tutorial will cover the following topics, each of them using Jupyter notebooks and hands-on exercises with real-world data:
11 |
12 | 1. Introduction to vector data and GeoPandas
13 | 2. Visualizing geospatial data
14 | 3. Spatial relationships and joins
15 | 4. Spatial operations and overlays
16 | 5. Short showcase of parallel/distributed geospatial analysis with Dask
17 |
18 | This repository initially contained the teaching material for the geospatial data analysis tutorial
19 | at [GeoPython 2018](http://2018.geopython.net), May 7-9 2018, Basel, Switzerland, and was later updated and also
20 | used at [Scipy 2018](https://scipy2018.scipy.org/), [EuroScipy 2018](https://www.euroscipy.org/2018/), [GeoPython 2019](http://2019.geopython.net), [EuroScipy 2019](https://www.euroscipy.org/2019/), [EuroScipy 2022](https://www.euroscipy.org/2022/).
21 |
22 |
23 | ## Installation notes
24 |
25 | Following this tutorial will require recent installations of:
26 |
27 | - Python >= 3.9
28 | - pandas
29 | - geopandas >= 0.10.0
30 | - matplotlib
31 | - mapclassify
32 | - contextily
33 | - folium
34 | - [Jupyter Notebook or Lab](http://jupyter.org)
35 | - *(optional for mining sites case study)* rasterio, rasterstats
36 | - *(optional for visualisation showcase)* cartopy, geoplot, ipyleaflet
37 |
38 | If you do not yet have these packages installed, we recommend to use the [conda](http://conda.pydata.org/docs/intro.html) package manager to install all the requirements
39 | (you can install [miniconda](http://conda.pydata.org/miniconda.html) or install the (larger) Anaconda
40 | distribution, found at https://www.anaconda.com/download/).
41 |
42 | Using conda, we recommend to create a new environment with all packages using the
43 | following commands:
44 |
45 | ```bash
46 | # setting the configuation so all packages come from the conda-forge channel
47 | conda config --add channels conda-forge
48 | conda config --set channel_priority strict
49 | # navigate to the downloaded (or git cloned) material
50 | cd .../geopandas-tutorial/
51 | # creating the environment
52 | conda env create --name geo-tutorial --file environment.yml
53 | # activating the environment
54 | conda activate geo-tutorial
55 | ```
56 |
57 | For this, you need to already download the materials first (see below), as it
58 | makes use of the `environment.yml` file included in this repo.
59 |
60 | Alternatively, you can install the packages using conda manually, or you can
61 | use ``pip``, as long as you have the above packages installed. In that case,
62 | we refer to the installation instructions of the individual packages (note:
63 | this won't work on Windows out of the box).
64 |
65 | **Want to try out without installing anything?** You can use the "launch binder" link above at the top of this README, which will launch a notebook instance on Binder with all required libraries installed.
66 |
67 |
68 | ## Downloading the tutorial materials
69 |
70 | **Note**: *I am still updating the materials, so I recommend to only download the materials the morning before the tutorial starts, or to update your local copy then. To update a local copy, you can download the latest version again, or do a `git pull` if you are using git.*
71 |
72 | If you have git installed, you can get the tutorial materials by cloning this repo:
73 |
74 | git clone https://github.com/jorisvandenbossche/geopandas-tutorial.git
75 |
76 | Otherwise, you can download the repository as a .zip file by heading over
77 | to the GitHub repository (https://github.com/jorisvandenbossche/geopandas-tutorial) in
78 | your browser and click the green "Download" button in the upper right:
79 |
80 | 
81 |
82 |
83 | ## Test the tutorial environment
84 |
85 | To make sure everything was installed correctly, open a terminal, and change its directory (`cd`) so that your working directory is the tutorial materials you downloaded in the step above. Then enter the following:
86 |
87 | ```sh
88 | python check_environment.py
89 | ```
90 |
91 | Make sure that this scripts prints "All good. Enjoy the tutorial!"
92 |
93 |
--------------------------------------------------------------------------------
/_solved/solutions/01-introduction-geospatial-data1.py:
--------------------------------------------------------------------------------
1 | stations = geopandas.read_file("data/paris_bike_stations_mercator.gpkg")
--------------------------------------------------------------------------------
/_solved/solutions/01-introduction-geospatial-data10.py:
--------------------------------------------------------------------------------
1 | districts = geopandas.read_file("data/paris_districts_utm.geojson")
--------------------------------------------------------------------------------
/_solved/solutions/01-introduction-geospatial-data11.py:
--------------------------------------------------------------------------------
1 | districts.head()
--------------------------------------------------------------------------------
/_solved/solutions/01-introduction-geospatial-data12.py:
--------------------------------------------------------------------------------
1 | districts.shape
--------------------------------------------------------------------------------
/_solved/solutions/01-introduction-geospatial-data13.py:
--------------------------------------------------------------------------------
1 | districts.plot(figsize=(12, 6))
--------------------------------------------------------------------------------
/_solved/solutions/01-introduction-geospatial-data14.py:
--------------------------------------------------------------------------------
1 | districts.geometry.area
--------------------------------------------------------------------------------
/_solved/solutions/01-introduction-geospatial-data15.py:
--------------------------------------------------------------------------------
1 | # dividing by 10^6 for showing km²
2 | districts['area'] = districts.geometry.area / 1e6
--------------------------------------------------------------------------------
/_solved/solutions/01-introduction-geospatial-data16.py:
--------------------------------------------------------------------------------
1 | districts.sort_values(by='area', ascending=False)
--------------------------------------------------------------------------------
/_solved/solutions/01-introduction-geospatial-data17.py:
--------------------------------------------------------------------------------
1 | # Add a population density column
2 | districts['population_density'] = districts['population'] / districts.geometry.area * 10**6
--------------------------------------------------------------------------------
/_solved/solutions/01-introduction-geospatial-data18.py:
--------------------------------------------------------------------------------
1 | # Make a plot of the districts colored by the population density
2 | districts.plot(column='population_density', figsize=(12, 6), legend=True)
--------------------------------------------------------------------------------
/_solved/solutions/01-introduction-geospatial-data19.py:
--------------------------------------------------------------------------------
1 | # As comparison, the misleading plot when not turning the population number into a density
2 | districts.plot(column='population', figsize=(12, 6), legend=True)
--------------------------------------------------------------------------------
/_solved/solutions/01-introduction-geospatial-data2.py:
--------------------------------------------------------------------------------
1 | type(stations)
--------------------------------------------------------------------------------
/_solved/solutions/01-introduction-geospatial-data3.py:
--------------------------------------------------------------------------------
1 | stations.head()
--------------------------------------------------------------------------------
/_solved/solutions/01-introduction-geospatial-data4.py:
--------------------------------------------------------------------------------
1 | stations.shape
--------------------------------------------------------------------------------
/_solved/solutions/01-introduction-geospatial-data5.py:
--------------------------------------------------------------------------------
1 | stations.plot(figsize=(12,6)) # or .explore()
--------------------------------------------------------------------------------
/_solved/solutions/01-introduction-geospatial-data6.py:
--------------------------------------------------------------------------------
1 | import contextily
--------------------------------------------------------------------------------
/_solved/solutions/01-introduction-geospatial-data7.py:
--------------------------------------------------------------------------------
1 | ax = stations.plot(figsize=(12,6), markersize=5)
2 | contextily.add_basemap(ax)
--------------------------------------------------------------------------------
/_solved/solutions/01-introduction-geospatial-data8.py:
--------------------------------------------------------------------------------
1 | stations['bike_stands'].hist()
--------------------------------------------------------------------------------
/_solved/solutions/01-introduction-geospatial-data9.py:
--------------------------------------------------------------------------------
1 | stations.plot(figsize=(12, 6), column='available_bikes', legend=True)
--------------------------------------------------------------------------------
/_solved/solutions/02-coordinate-reference-systems1.py:
--------------------------------------------------------------------------------
1 | # Import the districts dataset
2 | districts = geopandas.read_file("data/paris_districts.geojson")
--------------------------------------------------------------------------------
/_solved/solutions/02-coordinate-reference-systems10.py:
--------------------------------------------------------------------------------
1 | # Convert to the Web Mercator projection
2 | stations_webmercator = stations.to_crs("EPSG:3857")
3 | stations.head()
--------------------------------------------------------------------------------
/_solved/solutions/02-coordinate-reference-systems11.py:
--------------------------------------------------------------------------------
1 | # Plot the stations with a background map
2 | import contextily
3 | ax = stations_webmercator.plot(markersize=5)
4 | contextily.add_basemap(ax)
--------------------------------------------------------------------------------
/_solved/solutions/02-coordinate-reference-systems2.py:
--------------------------------------------------------------------------------
1 | # Check the CRS information
2 | districts.crs
--------------------------------------------------------------------------------
/_solved/solutions/02-coordinate-reference-systems3.py:
--------------------------------------------------------------------------------
1 | # Show the first rows of the GeoDataFrame
2 | districts.head()
--------------------------------------------------------------------------------
/_solved/solutions/02-coordinate-reference-systems4.py:
--------------------------------------------------------------------------------
1 | # Plot the districts dataset
2 | districts.plot()
--------------------------------------------------------------------------------
/_solved/solutions/02-coordinate-reference-systems5.py:
--------------------------------------------------------------------------------
1 | # Calculate the area of all districts
2 | districts.geometry.area
--------------------------------------------------------------------------------
/_solved/solutions/02-coordinate-reference-systems6.py:
--------------------------------------------------------------------------------
1 | # Convert the districts to the RGF93 reference system
2 | districts_RGF93 = districts.to_crs(epsg=2154) # or to_crs("EPSG:2154")
--------------------------------------------------------------------------------
/_solved/solutions/02-coordinate-reference-systems7.py:
--------------------------------------------------------------------------------
1 | # Plot the districts dataset again
2 | districts_RGF93.plot()
--------------------------------------------------------------------------------
/_solved/solutions/02-coordinate-reference-systems8.py:
--------------------------------------------------------------------------------
1 | # Calculate the area of all districts (the result is now expressed in m²)
2 | districts_RGF93.geometry.area
--------------------------------------------------------------------------------
/_solved/solutions/02-coordinate-reference-systems9.py:
--------------------------------------------------------------------------------
1 | stations = geopandas.read_file("data/paris_bike_stations.geojson")
2 | stations.head()
--------------------------------------------------------------------------------
/_solved/solutions/03-spatial-relationships-joins1.py:
--------------------------------------------------------------------------------
1 | # Construct a point object for the Eiffel Tower
2 | eiffel_tower = Point(648237.3, 6862271.9)
--------------------------------------------------------------------------------
/_solved/solutions/03-spatial-relationships-joins10.py:
--------------------------------------------------------------------------------
1 | # Filter the bike stations closer than 1 km
2 | stations_eiffel = stations[dist_eiffel < 1000]
--------------------------------------------------------------------------------
/_solved/solutions/03-spatial-relationships-joins11.py:
--------------------------------------------------------------------------------
1 | joined = geopandas.sjoin(stations, districts[['district_name', 'geometry']], predicate='within')
--------------------------------------------------------------------------------
/_solved/solutions/03-spatial-relationships-joins12.py:
--------------------------------------------------------------------------------
1 | joined.head()
--------------------------------------------------------------------------------
/_solved/solutions/03-spatial-relationships-joins13.py:
--------------------------------------------------------------------------------
1 | # Read the trees and districts data
2 | trees = geopandas.read_file("data/paris_trees.gpkg")
3 | districts = geopandas.read_file("data/paris_districts.geojson").to_crs(trees.crs)
--------------------------------------------------------------------------------
/_solved/solutions/03-spatial-relationships-joins14.py:
--------------------------------------------------------------------------------
1 | # The trees dataset with point locations of trees
2 | trees.head()
--------------------------------------------------------------------------------
/_solved/solutions/03-spatial-relationships-joins15.py:
--------------------------------------------------------------------------------
1 | # Spatial join of the trees and districts datasets
2 | joined = geopandas.sjoin(trees, districts, predicate='within')
3 | joined.head()
--------------------------------------------------------------------------------
/_solved/solutions/03-spatial-relationships-joins16.py:
--------------------------------------------------------------------------------
1 | # Calculate the number of trees in each district
2 | trees_by_district = joined.groupby('district_name').size()
--------------------------------------------------------------------------------
/_solved/solutions/03-spatial-relationships-joins17.py:
--------------------------------------------------------------------------------
1 | # Merge the 'districts' and 'trees_by_district' dataframes
2 | districts_trees = pd.merge(districts, trees_by_district, on='district_name')
3 | districts_trees.head()
--------------------------------------------------------------------------------
/_solved/solutions/03-spatial-relationships-joins18.py:
--------------------------------------------------------------------------------
1 | # Add a column with the tree density
2 | districts_trees['n_trees_per_area'] = districts_trees['n_trees'] / districts_trees.geometry.area
--------------------------------------------------------------------------------
/_solved/solutions/03-spatial-relationships-joins19.py:
--------------------------------------------------------------------------------
1 | # Make of map of the districts colored by 'n_trees_per_area'
2 | ax = districts_trees.plot(column='n_trees_per_area', figsize=(12, 6))
3 | ax.set_axis_off()
--------------------------------------------------------------------------------
/_solved/solutions/03-spatial-relationships-joins2.py:
--------------------------------------------------------------------------------
1 | # Print the result
2 | print(eiffel_tower)
--------------------------------------------------------------------------------
/_solved/solutions/03-spatial-relationships-joins3.py:
--------------------------------------------------------------------------------
1 | # Is the Eiffel Tower located within the Montparnasse district?
2 | print(eiffel_tower.within(district_montparnasse))
--------------------------------------------------------------------------------
/_solved/solutions/03-spatial-relationships-joins4.py:
--------------------------------------------------------------------------------
1 | # Does the Montparnasse district contains the bike station?
2 | print(district_montparnasse.contains(bike_station))
--------------------------------------------------------------------------------
/_solved/solutions/03-spatial-relationships-joins5.py:
--------------------------------------------------------------------------------
1 | # The distance between the Eiffel Tower and the bike station?
2 | print(eiffel_tower.distance(bike_station))
--------------------------------------------------------------------------------
/_solved/solutions/03-spatial-relationships-joins6.py:
--------------------------------------------------------------------------------
1 | # Create a boolean Series
2 | mask = districts.contains(eiffel_tower)
3 | mask
--------------------------------------------------------------------------------
/_solved/solutions/03-spatial-relationships-joins7.py:
--------------------------------------------------------------------------------
1 | # Filter the districts with the boolean mask
2 | districts[mask]
--------------------------------------------------------------------------------
/_solved/solutions/03-spatial-relationships-joins8.py:
--------------------------------------------------------------------------------
1 | # The distance from each stations to the Eiffel Tower
2 | dist_eiffel = stations.distance(eiffel_tower)
--------------------------------------------------------------------------------
/_solved/solutions/03-spatial-relationships-joins9.py:
--------------------------------------------------------------------------------
1 | # The distance to the closest station
2 | dist_eiffel.min()
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays1.py:
--------------------------------------------------------------------------------
1 | # Take a buffer
2 | seine_buffer = seine.buffer(150)
3 | seine_buffer
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays10.py:
--------------------------------------------------------------------------------
1 | # Print proportion of district area that occupied park
2 | print(intersection.area / muette.area)
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays11.py:
--------------------------------------------------------------------------------
1 | # Calculate the intersection of the land use polygons with Muette
2 | land_use_muette = land_use.geometry.intersection(muette)
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays12.py:
--------------------------------------------------------------------------------
1 | # Print the first five rows of the intersection
2 | land_use_muette.head()
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays13.py:
--------------------------------------------------------------------------------
1 | # Remove the empty geometries
2 | land_use_muette = land_use_muette[~land_use_muette.is_empty]
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays14.py:
--------------------------------------------------------------------------------
1 | # Print the first five rows of the intersection
2 | land_use_muette.head()
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays15.py:
--------------------------------------------------------------------------------
1 | # Plot the intersection
2 | land_use_muette.plot(edgecolor='black')
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays16.py:
--------------------------------------------------------------------------------
1 | land_use_muette = land_use.copy()
2 | land_use_muette['geometry'] = land_use.geometry.intersection(muette)
3 | land_use_muette = land_use_muette[~land_use_muette.is_empty]
4 | land_use_muette.head()
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays17.py:
--------------------------------------------------------------------------------
1 | land_use_muette.plot(column="class") #edgecolor="black")
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays18.py:
--------------------------------------------------------------------------------
1 | land_use_muette.dissolve(by='class')
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays19.py:
--------------------------------------------------------------------------------
1 | land_use_muette['area'] = land_use_muette.geometry.area
2 | # Total land use per class
3 | land_use_muette.groupby("class")["area"].sum()
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays2.py:
--------------------------------------------------------------------------------
1 | # Use the intersection
2 | districts_seine = districts[districts.intersects(seine_buffer)]
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays20.py:
--------------------------------------------------------------------------------
1 | # Relative percentage of land use classes
2 | land_use_muette.groupby("class")["area"].sum() / land_use_muette.geometry.area.sum() * 100
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays21.py:
--------------------------------------------------------------------------------
1 | # Overlay both datasets based on the intersection
2 | combined = geopandas.overlay(land_use, districts, how='intersection')
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays22.py:
--------------------------------------------------------------------------------
1 | # Print the first five rows of the result
2 | combined.head()
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays23.py:
--------------------------------------------------------------------------------
1 | # Add the area as a column
2 | combined['area'] = combined.geometry.area
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays24.py:
--------------------------------------------------------------------------------
1 | # Take a subset for the Muette district
2 | land_use_muette = combined[combined['district_name'] == 'Muette']
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays25.py:
--------------------------------------------------------------------------------
1 | # Visualize the land use of the Muette district
2 | land_use_muette.plot(column='class')
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays26.py:
--------------------------------------------------------------------------------
1 | # Calculate the total area for each land use class
2 | print(land_use_muette.groupby('class')['area'].sum() / 1000**2)
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays27.py:
--------------------------------------------------------------------------------
1 | districts_area = combined.groupby("district_name")["area"].sum()
2 | districts_area.head()
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays28.py:
--------------------------------------------------------------------------------
1 | urban_green = combined[combined["class"] == "Green urban areas"]
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays29.py:
--------------------------------------------------------------------------------
1 | urban_green_area = urban_green.groupby("district_name")["area"].sum()
2 | urban_green_area.head()
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays3.py:
--------------------------------------------------------------------------------
1 | # Make a plot
2 | fig, ax = plt.subplots(figsize=(20, 10))
3 | districts.plot(ax=ax, color='grey', alpha=0.4, edgecolor='k')
4 | districts_seine.plot(ax=ax, color='blue', alpha=0.4, edgecolor='k')
5 | s_seine_utm.plot(ax=ax)
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays30.py:
--------------------------------------------------------------------------------
1 | urban_green_fraction = urban_green_area / districts_area * 100
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays31.py:
--------------------------------------------------------------------------------
1 | urban_green_fraction.nlargest()
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays32.py:
--------------------------------------------------------------------------------
1 | urban_green_fraction.nsmallest()
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays4.py:
--------------------------------------------------------------------------------
1 | # Import the land use dataset
2 | land_use = geopandas.read_file("data/paris_land_use.zip")
3 | land_use.head()
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays5.py:
--------------------------------------------------------------------------------
1 | # Make a plot of the land use with 'class' as the color
2 | land_use.plot(column='class', legend=True, figsize=(15, 10))
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays6.py:
--------------------------------------------------------------------------------
1 | # Add the area as a new column
2 | land_use['area'] = land_use.geometry.area
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays7.py:
--------------------------------------------------------------------------------
1 | # Calculate the total area for each land use class
2 | total_area = land_use.groupby('class')['area'].sum() / 1000**2
3 | total_area
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays8.py:
--------------------------------------------------------------------------------
1 | # Calculate the intersection of both polygons
2 | intersection = park_boulogne.intersection(muette)
--------------------------------------------------------------------------------
/_solved/solutions/04-spatial-operations-overlays9.py:
--------------------------------------------------------------------------------
1 | # Plot the intersection
2 | geopandas.GeoSeries([intersection]).plot()
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping10.py:
--------------------------------------------------------------------------------
1 | protected_areas = geopandas.read_file("data/Conservation/RDC_aire_protegee_2013.shp")
2 | # or to read it directly from the zip file:
3 | # protected_areas = geopandas.read_file("/Conservation", vfs="zip://./data/cod_conservation.zip")
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping11.py:
--------------------------------------------------------------------------------
1 | protected_areas.plot()
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping12.py:
--------------------------------------------------------------------------------
1 | from shapely.geometry import Point
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping13.py:
--------------------------------------------------------------------------------
1 | goma = Point(29.22, -1.66)
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping14.py:
--------------------------------------------------------------------------------
1 | dist_goma = data.distance(goma)
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping15.py:
--------------------------------------------------------------------------------
1 | dist_goma.nsmallest(5)
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping16.py:
--------------------------------------------------------------------------------
1 | ax = protected_areas.plot()
2 | data.plot(ax=ax, color='C1')
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping19.py:
--------------------------------------------------------------------------------
1 | data_utm = data.to_crs(epsg=32735)
2 | protected_areas_utm = protected_areas.to_crs(epsg=32735)
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping20.py:
--------------------------------------------------------------------------------
1 | ax = protected_areas_utm.plot()
2 | data_utm.plot(ax=ax, color='C1')
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping21.py:
--------------------------------------------------------------------------------
1 | ax = protected_areas_utm.plot(figsize=(10, 10), color='green')
2 | data_utm.plot(ax=ax, markersize=5, alpha=0.5)
3 | ax.set_axis_off()
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping22.py:
--------------------------------------------------------------------------------
1 | # alternative with constructing the matplotlib figure first
2 | fig, ax = plt.subplots(figsize=(10, 10), subplot_kw=dict(aspect='equal'))
3 | protected_areas_utm.plot(ax=ax, color='green')
4 | data_utm.plot(ax=ax, markersize=5, alpha=0.5)
5 | ax.set_axis_off()
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping23.py:
--------------------------------------------------------------------------------
1 | ax = protected_areas_utm.plot(figsize=(10, 10), color='green')
2 | data_utm.plot(ax=ax, markersize=5, alpha=0.5, column='interference')
3 | ax.set_axis_off()
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping24.py:
--------------------------------------------------------------------------------
1 | ax = protected_areas_utm.plot(figsize=(10, 10), color='green')
2 | data_utm.plot(ax=ax, markersize=5, alpha=0.5, column='mineral1', legend=True)
3 | ax.set_axis_off()
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping25.py:
--------------------------------------------------------------------------------
1 | kahuzi = protected_areas_utm[protected_areas_utm['NAME_AP'] == "Kahuzi-Biega National park"].geometry.squeeze()
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping26.py:
--------------------------------------------------------------------------------
1 | mines_kahuzi = data_utm[data_utm.within(kahuzi)]
2 | mines_kahuzi
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping27.py:
--------------------------------------------------------------------------------
1 | len(mines_kahuzi)
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping28.py:
--------------------------------------------------------------------------------
1 | single_mine = data_utm.geometry[0]
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping29.py:
--------------------------------------------------------------------------------
1 | dist = protected_areas_utm.distance(single_mine)
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping3.py:
--------------------------------------------------------------------------------
1 | data_visits = geopandas.read_file("data/cod_mines_curated_all_opendata_p_ipis.geojson")
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping30.py:
--------------------------------------------------------------------------------
1 | idx = dist.idxmin()
2 | closest_area = protected_areas_utm.loc[idx, 'NAME_AP']
3 | closest_area
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping31.py:
--------------------------------------------------------------------------------
1 | def closest_protected_area(mine, protected_areas):
2 | dist = protected_areas.distance(mine)
3 | idx = dist.idxmin()
4 | closest_area = protected_areas.loc[idx, 'NAME_AP']
5 | return closest_area
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping32.py:
--------------------------------------------------------------------------------
1 | result = data_utm.geometry.apply(lambda site: closest_protected_area(site, protected_areas_utm))
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping33.py:
--------------------------------------------------------------------------------
1 | data_within_protected = geopandas.sjoin(data_utm, protected_areas_utm[['NAME_AP', 'geometry']],
2 | op='within', how='inner')
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping34.py:
--------------------------------------------------------------------------------
1 | len(data_within_protected)
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping35.py:
--------------------------------------------------------------------------------
1 | data_within_protected['NAME_AP'].value_counts()
2 | # or data_within_protected.groupby('NAME_AP').size()
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping36.py:
--------------------------------------------------------------------------------
1 | data_within_protected.groupby('NAME_AP')['workers_numb'].sum()
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping37.py:
--------------------------------------------------------------------------------
1 | protected_areas_border = protected_areas_utm[['NAME_AP', 'geometry']].copy()
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping38.py:
--------------------------------------------------------------------------------
1 | protected_areas_border['geometry'] = protected_areas_border.buffer(10000).difference(protected_areas_utm.unary_union)
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping39.py:
--------------------------------------------------------------------------------
1 | protected_areas_border.plot()
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping4.py:
--------------------------------------------------------------------------------
1 | data_visits.head()
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping40.py:
--------------------------------------------------------------------------------
1 | data_within_border = geopandas.sjoin(data_utm, protected_areas_border,
2 | op='within', how='inner')
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping41.py:
--------------------------------------------------------------------------------
1 | data_within_border['NAME_AP'].value_counts()
--------------------------------------------------------------------------------
/_solved/solutions/case-conflict-mapping5.py:
--------------------------------------------------------------------------------
1 | len(data_visits)
--------------------------------------------------------------------------------
/case-conflict-mapping.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": null,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "%matplotlib inline\n",
10 | "\n",
11 | "import pandas as pd\n",
12 | "import geopandas\n",
13 | "import matplotlib.pyplot as plt"
14 | ]
15 | },
16 | {
17 | "cell_type": "markdown",
18 | "metadata": {},
19 | "source": [
20 | "# Case study - Conflict mapping: mining sites in eastern DR Congo\n",
21 | "\n",
22 | "In this case study, we will explore a dataset on artisanal mining sites located in eastern DR Congo.\n",
23 | "\n",
24 | "**Note**: this tutorial is meant as a hands-on session, and most code examples are provided as exercises to be filled in. I highly recommend actually trying to do this yourself, but if you want to follow the solved tutorial, you can find this in the `_solved` directory.\n",
25 | "\n",
26 | "---\n",
27 | "\n",
28 | "#### Background\n",
29 | "\n",
30 | "[IPIS](http://ipisresearch.be/), the International Peace Information Service, manages a database on mining site visits in eastern DR Congo: http://ipisresearch.be/home/conflict-mapping/maps/open-data/\n",
31 | "\n",
32 | "Since 2009, IPIS has visited artisanal mining sites in the region during various data collection campaigns. As part of these campaigns, surveyor teams visit mining sites in the field, meet with miners and complete predefined questionnaires. These contain questions about the mining site, the minerals mined at the site and the armed groups possibly present at the site.\n",
33 | "\n",
34 | "Some additional links:\n",
35 | "\n",
36 | "* Tutorial on the same data using R from IPIS (but without geospatial aspect): http://ipisresearch.be/home/conflict-mapping/maps/open-data/open-data-tutorial/\n",
37 | "* Interactive web app using the same data: http://www.ipisresearch.be/mapping/webmapping/drcongo/v5/"
38 | ]
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "metadata": {},
43 | "source": [
44 | "## 1. Importing and exploring the data"
45 | ]
46 | },
47 | {
48 | "cell_type": "markdown",
49 | "metadata": {},
50 | "source": [
51 | "### The mining site visit data\n",
52 | "\n",
53 | "IPIS provides a WFS server to access the data. We can send a query to this server to download the data, and load the result into a geopandas GeoDataFrame:"
54 | ]
55 | },
56 | {
57 | "cell_type": "code",
58 | "execution_count": null,
59 | "metadata": {},
60 | "outputs": [],
61 | "source": [
62 | "import requests\n",
63 | "import json\n",
64 | "\n",
65 | "wfs_url = \"http://geo.ipisresearch.be/geoserver/public/ows\"\n",
66 | "params = dict(service='WFS', version='1.0.0', request='GetFeature',\n",
67 | " typeName='public:cod_mines_curated_all_opendata_p_ipis', outputFormat='json')\n",
68 | "\n",
69 | "r = requests.get(wfs_url, params=params)\n",
70 | "data_features = json.loads(r.content.decode('UTF-8'))\n",
71 | "data_visits = geopandas.GeoDataFrame.from_features(data_features, crs={'init': 'epsg:4326'})"
72 | ]
73 | },
74 | {
75 | "cell_type": "markdown",
76 | "metadata": {},
77 | "source": [
78 | "However, the data is also provided in the tutorial materials as a GeoJSON file, so it is certainly available during the tutorial."
79 | ]
80 | },
81 | {
82 | "cell_type": "markdown",
83 | "metadata": {},
84 | "source": [
85 | "
\n",
86 | " EXERCISE:\n",
87 | "
\n",
88 | "
Read the GeoJSON file `data/cod_mines_curated_all_opendata_p_ipis.geojson` using geopandas, and call the result `data_visits`.
\n",
89 | "
Inspect the first 5 rows, and check the number of observations
\n",
90 | "
\n",
91 | "\n",
92 | "
"
93 | ]
94 | },
95 | {
96 | "cell_type": "code",
97 | "execution_count": null,
98 | "metadata": {
99 | "clear_cell": true
100 | },
101 | "outputs": [],
102 | "source": [
103 | "# %load _solved/solutions/case-conflict-mapping3.py"
104 | ]
105 | },
106 | {
107 | "cell_type": "code",
108 | "execution_count": null,
109 | "metadata": {
110 | "clear_cell": true
111 | },
112 | "outputs": [],
113 | "source": [
114 | "# %load _solved/solutions/case-conflict-mapping4.py"
115 | ]
116 | },
117 | {
118 | "cell_type": "code",
119 | "execution_count": null,
120 | "metadata": {
121 | "clear_cell": true
122 | },
123 | "outputs": [],
124 | "source": [
125 | "# %load _solved/solutions/case-conflict-mapping5.py"
126 | ]
127 | },
128 | {
129 | "cell_type": "markdown",
130 | "metadata": {},
131 | "source": [
132 | "The provided dataset contains a lot of information, much more than we are going to use in this tutorial. Therefore, we will select a subset of the column:"
133 | ]
134 | },
135 | {
136 | "cell_type": "code",
137 | "execution_count": null,
138 | "metadata": {},
139 | "outputs": [],
140 | "source": [
141 | "data_visits = data_visits[['vid', 'project', 'visit_date', 'name', 'pcode', 'workers_numb', 'interference', 'armed_group1', 'mineral1', 'geometry']]"
142 | ]
143 | },
144 | {
145 | "cell_type": "code",
146 | "execution_count": null,
147 | "metadata": {},
148 | "outputs": [],
149 | "source": [
150 | "data_visits.head()"
151 | ]
152 | },
153 | {
154 | "cell_type": "markdown",
155 | "metadata": {},
156 | "source": [
157 | "Before starting the actual geospatial tutorial, we will use some more advanced pandas queries to construct a subset of the data that we will use further on: "
158 | ]
159 | },
160 | {
161 | "cell_type": "code",
162 | "execution_count": null,
163 | "metadata": {},
164 | "outputs": [],
165 | "source": [
166 | "# Take only the data of visits by IPIS\n",
167 | "data_ipis = data_visits[data_visits['project'].str.contains('IPIS') & (data_visits['workers_numb'] > 0)]"
168 | ]
169 | },
170 | {
171 | "cell_type": "code",
172 | "execution_count": null,
173 | "metadata": {},
174 | "outputs": [],
175 | "source": [
176 | "# For those mining sites that were visited multiple times, take only the last visit\n",
177 | "data_ipis_lastvisit = data_ipis.sort_values('visit_date').groupby('pcode', as_index=False).last()\n",
178 | "data = geopandas.GeoDataFrame(data_ipis_lastvisit, crs=data_visits.crs)"
179 | ]
180 | },
181 | {
182 | "cell_type": "markdown",
183 | "metadata": {},
184 | "source": [
185 | "### Data on protected areas in the same region\n",
186 | "\n",
187 | "Next to the mining site data, we are also going to use a dataset on protected areas (national parks) in Congo. This dataset was downloaded from http://www.wri.org/our-work/project/congo-basin-forests/democratic-republic-congo#project-tabs and included in the tutorial repository: `data/cod_conservation.zip`."
188 | ]
189 | },
190 | {
191 | "cell_type": "markdown",
192 | "metadata": {},
193 | "source": [
194 | "
\n",
195 | " EXERCISE:\n",
196 | "
\n",
197 | "
Extract the `data/cod_conservation.zip` archive, and read the shapefile contained in it. Assign the resulting GeoDataFrame to a variable named `protected_areas`.
\n",
198 | "
Quickly plot the GeoDataFrame.
\n",
199 | "
\n",
200 | "
"
201 | ]
202 | },
203 | {
204 | "cell_type": "code",
205 | "execution_count": null,
206 | "metadata": {
207 | "clear_cell": true
208 | },
209 | "outputs": [],
210 | "source": [
211 | "# %load _solved/solutions/case-conflict-mapping10.py"
212 | ]
213 | },
214 | {
215 | "cell_type": "code",
216 | "execution_count": null,
217 | "metadata": {
218 | "clear_cell": true
219 | },
220 | "outputs": [],
221 | "source": [
222 | "# %load _solved/solutions/case-conflict-mapping11.py"
223 | ]
224 | },
225 | {
226 | "cell_type": "markdown",
227 | "metadata": {},
228 | "source": [
229 | "### Conversion to a common Coordinate Reference System\n",
230 | "\n",
231 | "We will see that both datasets use a different Coordinate Reference System (CRS). For many operations, however, it is important that we use a consistent CRS, and therefore we will convert both to a commong CRS.\n",
232 | "\n",
233 | "But first, we explore problems we can encounter related to CRSs.\n",
234 | "\n",
235 | "---"
236 | ]
237 | },
238 | {
239 | "cell_type": "markdown",
240 | "metadata": {},
241 | "source": [
242 | "[Goma](https://en.wikipedia.org/wiki/Goma) is the capital city of North Kivu province of Congo, close to the border with Rwanda. It's coordinates are 1.66°S 29.22°E.\n",
243 | "\n",
244 | "
\n",
245 | " EXERCISE:\n",
246 | "
\n",
247 | "
Create a single Point object representing the location of Goma. Call this `goma`.
\n",
248 | "
Calculate the distances of all mines to Goma, and show the 5 smallest distances (mines closest to Goma).
Make a visualization of the national parks and the mining sites on a single plot.
\n",
312 | "
\n",
313 | " \n",
314 | "
Check the first section of the [04-more-on-visualization.ipynb](04-more-on-visualization.ipynb) notebook for tips and tricks to plot with GeoPandas.
\n",
315 | "
"
316 | ]
317 | },
318 | {
319 | "cell_type": "code",
320 | "execution_count": null,
321 | "metadata": {
322 | "clear_cell": true
323 | },
324 | "outputs": [],
325 | "source": [
326 | "# %load _solved/solutions/case-conflict-mapping16.py"
327 | ]
328 | },
329 | {
330 | "cell_type": "markdown",
331 | "metadata": {},
332 | "source": [
333 | "You will notice that the protected areas and mining sites do not map to the same area on the plot. This is because the Coordinate Reference Systems (CRS) differ for both datasets. Another reason we will need to convert the CRS!\n",
334 | "\n",
335 | "Let's check the Coordinate Reference System (CRS) for both datasets.\n",
336 | "\n",
337 | "The mining sites data uses the [WGS 84 lat/lon (EPSG 4326)](http://spatialreference.org/ref/epsg/4326/) CRS:"
338 | ]
339 | },
340 | {
341 | "cell_type": "code",
342 | "execution_count": null,
343 | "metadata": {},
344 | "outputs": [],
345 | "source": [
346 | "data.crs"
347 | ]
348 | },
349 | {
350 | "cell_type": "markdown",
351 | "metadata": {},
352 | "source": [
353 | "The protected areas dataset, on the other hand, uses a [WGS 84 / World Mercator (EPSG 3395)](http://spatialreference.org/ref/epsg/wgs-84-world-mercator/) projection (with meters as unit):"
354 | ]
355 | },
356 | {
357 | "cell_type": "code",
358 | "execution_count": null,
359 | "metadata": {},
360 | "outputs": [],
361 | "source": [
362 | "protected_areas.crs"
363 | ]
364 | },
365 | {
366 | "cell_type": "markdown",
367 | "metadata": {},
368 | "source": [
369 | "We will convert both datasets to a local UTM zone, so we can plot them together and that distance-based calculations give sensible results.\n",
370 | "\n",
371 | "To find the appropriate UTM zone, you can check http://www.dmap.co.uk/utmworld.htm or https://www.latlong.net/lat-long-utm.html, and in this case we will use UTM zone 35, which gives use EPSG 32735: https://epsg.io/32735\n",
372 | "\n",
373 | "
\n",
374 | " EXERCISE:\n",
375 | "
\n",
376 | "
Convert both datasets (`data` and `protected_areas`) to EPSG 32735. Name the results `data_utm` and `protected_areas_utm`.
\n",
377 | "
Try again to visualize both datasets on a single map.
For the following exercises, check the first section of the [04-more-on-visualization.ipynb](04-more-on-visualization.ipynb) notebook for tips and tricks to plot with GeoPandas.
\n",
565 | " EXERCISE: Determine for each mining site the \"closest\" protected area:\n",
566 | " \n",
567 | "
\n",
568 | "
PART 1 - do this for a single mining site:\n",
569 | "
\n",
570 | "
Get a single mining site, e.g. the first of the dataset.
\n",
571 | "
Calculate the distance (in km's) to all protected areas for this mining site
\n",
572 | "
Get the index of the minimum distance (tip: `idxmin()`) and get the name of the protected are corresponding to this index.
\n",
573 | "
\n",
574 | "
\n",
575 | "
PART 2 - apply this procedure on each geometry:\n",
576 | "
\n",
577 | "
Write the above procedure as a function that gets a single site and the protected areas dataframe as input and returns the name of the closest protected area as output.
\n",
578 | "
Apply this function to all sites using the `.apply()` method on `data_utm.geometry`.