├── .gitignore ├── 01-introduction-geospatial-data.ipynb ├── 02-spatial-relationships-operations.ipynb ├── 03-spatial-joins.ipynb ├── 04-more-on-visualization.ipynb ├── 05-mapclassification.ipynb ├── 06-exploratory-spatial-data-analysis.ipynb ├── 07-data-borrowing.ipynb ├── 08-clustering.ipynb ├── LICENSE ├── README.md ├── _solved ├── case-conflict-mapping.ipynb ├── case-gini-in-a-bottle-the-trump-vote.ipynb └── solutions │ ├── case-conflict-mapping10.py │ ├── case-conflict-mapping11.py │ ├── case-conflict-mapping12.py │ ├── case-conflict-mapping13.py │ ├── case-conflict-mapping14.py │ ├── case-conflict-mapping15.py │ ├── case-conflict-mapping16.py │ ├── case-conflict-mapping19.py │ ├── case-conflict-mapping20.py │ ├── case-conflict-mapping21.py │ ├── case-conflict-mapping22.py │ ├── case-conflict-mapping23.py │ ├── case-conflict-mapping24.py │ ├── case-conflict-mapping25.py │ ├── case-conflict-mapping26.py │ ├── case-conflict-mapping27.py │ ├── case-conflict-mapping28.py │ ├── case-conflict-mapping29.py │ ├── case-conflict-mapping3.py │ ├── case-conflict-mapping30.py │ ├── case-conflict-mapping31.py │ ├── case-conflict-mapping32.py │ ├── case-conflict-mapping33.py │ ├── case-conflict-mapping34.py │ ├── case-conflict-mapping35.py │ ├── case-conflict-mapping36.py │ ├── case-conflict-mapping37.py │ ├── case-conflict-mapping38.py │ ├── case-conflict-mapping39.py │ ├── case-conflict-mapping4.py │ ├── case-conflict-mapping40.py │ ├── case-conflict-mapping41.py │ ├── case-conflict-mapping5.py │ ├── case-trump-vote01.py │ ├── case-trump-vote02.py │ ├── case-trump-vote03.py │ ├── case-trump-vote04.py │ ├── case-trump-vote05.py │ ├── case-trump-vote06.py │ ├── case-trump-vote07.py │ ├── case-trump-vote08.py │ ├── case-trump-vote09.py │ ├── case-trump-vote10.py │ ├── case-trump-vote11.py │ ├── case-trump-vote12.py │ ├── case-trump-vote13.py │ ├── case-trump-vote14.py │ └── case-trump-vote15.py ├── case-conflict-mapping.ipynb ├── case-gini-in-a-bottle-the-trump-vote.ipynb ├── check_environment.py ├── data ├── berlin-districts.geojson ├── berlin-listings.csv.gz ├── cod_conservation.zip ├── cod_mines_curated_all_opendata_p_ipis.geojson ├── ne_110m_admin_0_countries.zip ├── ne_110m_populated_places.zip ├── ne_50m_rivers_lake_centerlines.zip └── uspres.zip ├── environment.yml └── img ├── TopologicSpatialRelations2.png └── download-button.png /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | env/ 12 | build/ 13 | develop-eggs/ 14 | dist/ 15 | downloads/ 16 | eggs/ 17 | .eggs/ 18 | lib/ 19 | lib64/ 20 | parts/ 21 | sdist/ 22 | var/ 23 | wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | 49 | # Translations 50 | *.mo 51 | *.pot 52 | 53 | # Django stuff: 54 | *.log 55 | local_settings.py 56 | 57 | # Flask stuff: 58 | instance/ 59 | .webassets-cache 60 | 61 | # Scrapy stuff: 62 | .scrapy 63 | 64 | # Sphinx documentation 65 | docs/_build/ 66 | 67 | # PyBuilder 68 | target/ 69 | 70 | # Jupyter Notebook 71 | .ipynb_checkpoints 72 | 73 | # pyenv 74 | .python-version 75 | 76 | # celery beat schedule file 77 | celerybeat-schedule 78 | 79 | # SageMath parsed files 80 | *.sage.py 81 | 82 | # dotenv 83 | .env 84 | 85 | # virtualenv 86 | .venv 87 | venv/ 88 | ENV/ 89 | 90 | # Spyder project settings 91 | .spyderproject 92 | .spyproject 93 | 94 | # Rope project settings 95 | .ropeproject 96 | 97 | # mkdocs documentation 98 | /site 99 | 100 | # mypy 101 | .mypy_cache/ 102 | -------------------------------------------------------------------------------- /01-introduction-geospatial-data.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Introduction to geospatial vector data in Python" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": null, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "%matplotlib inline\n", 17 | "\n", 18 | "import pandas as pd\n", 19 | "import geopandas\n", 20 | "\n", 21 | "pd.options.display.max_rows = 10" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "## Importing geospatial data" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "Geospatial data is often available from specific GIS file formats or data stores, like ESRI shapefiles, GeoJSON files, geopackage files, PostGIS (PostgreSQL) database, ...\n", 36 | "\n", 37 | "We can use the GeoPandas library to read many of those GIS file formats (relying on the `fiona` library under the hood, which is an interface to GDAL/OGR), using the `geopandas.read_file` function.\n", 38 | "\n", 39 | "For example, let's start by reading a shapefile with all the countries of the world (adapted from http://www.naturalearthdata.com/downloads/110m-cultural-vectors/110m-admin-0-countries/, zip file is available in the `/data` directory), and inspect the data:" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": null, 45 | "metadata": {}, 46 | "outputs": [], 47 | "source": [ 48 | "countries = geopandas.read_file(\"zip://./data/ne_110m_admin_0_countries.zip\")\n", 49 | "# or if the archive is unpacked:\n", 50 | "# countries = geopandas.read_file(\"data/ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp\")" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": null, 56 | "metadata": {}, 57 | "outputs": [], 58 | "source": [ 59 | "countries.head()" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": null, 65 | "metadata": {}, 66 | "outputs": [], 67 | "source": [ 68 | "countries.plot()" 69 | ] 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "metadata": {}, 74 | "source": [ 75 | "What can we observe:\n", 76 | "\n", 77 | "- Using `.head()` we can see the first rows of the dataset, just like we can do with Pandas.\n", 78 | "- There is a 'geometry' column and the different countries are represented as polygons\n", 79 | "- We can use the `.plot()` method to quickly get a *basic* visualization of the data" 80 | ] 81 | }, 82 | { 83 | "cell_type": "markdown", 84 | "metadata": {}, 85 | "source": [ 86 | "## What's a GeoDataFrame?\n", 87 | "\n", 88 | "We used the GeoPandas library to read in the geospatial data, and this returned us a `GeoDataFrame`:" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": null, 94 | "metadata": {}, 95 | "outputs": [], 96 | "source": [ 97 | "type(countries)" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "A GeoDataFrame contains a tabular, geospatial dataset:\n", 105 | "\n", 106 | "* It has a **'geometry' column** that holds the geometry information (or features in GeoJSON).\n", 107 | "* The other columns are the **attributes** (or properties in GeoJSON) that describe each of the geometries\n", 108 | "\n", 109 | "Such a `GeoDataFrame` is just like a pandas `DataFrame`, but with some additional functionality for working with geospatial data:\n", 110 | "\n", 111 | "* A `.geometry` attribute that always returns the column with the geometry information (returning a GeoSeries). The column name itself does not necessarily need to be 'geometry', but it will always be accessible as the `.geometry` attribute.\n", 112 | "* It has some extra methods for working with spatial data (area, distance, buffer, intersection, ...), which we will see in later notebooks" 113 | ] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": null, 118 | "metadata": {}, 119 | "outputs": [], 120 | "source": [ 121 | "countries.geometry" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": null, 127 | "metadata": {}, 128 | "outputs": [], 129 | "source": [ 130 | "type(countries.geometry)" 131 | ] 132 | }, 133 | { 134 | "cell_type": "code", 135 | "execution_count": null, 136 | "metadata": {}, 137 | "outputs": [], 138 | "source": [ 139 | "countries.geometry.area" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": {}, 145 | "source": [ 146 | "**It's still a DataFrame**, so we have all the pandas functionality available to use on the geospatial dataset, and to do data manipulations with the attributes and geometry information together.\n", 147 | "\n", 148 | "For example, we can calculate average population number over all countries (by accessing the 'pop_est' column, and calling the `mean` method on it):" 149 | ] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "execution_count": null, 154 | "metadata": {}, 155 | "outputs": [], 156 | "source": [ 157 | "countries['pop_est'].mean()" 158 | ] 159 | }, 160 | { 161 | "cell_type": "markdown", 162 | "metadata": {}, 163 | "source": [ 164 | "Or, we can use boolean filtering to select a subset of the dataframe based on a condition:" 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": null, 170 | "metadata": {}, 171 | "outputs": [], 172 | "source": [ 173 | "africa = countries[countries['continent'] == 'Africa']" 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": null, 179 | "metadata": {}, 180 | "outputs": [], 181 | "source": [ 182 | "africa.plot()" 183 | ] 184 | }, 185 | { 186 | "cell_type": "markdown", 187 | "metadata": {}, 188 | "source": [ 189 | "---\n", 190 | "\n", 191 | "The rest of the tutorial is going to assume you already know some pandas basics, but we will try to give hints for that part for those that are not familiar. \n", 192 | "A few resources in case you want to learn more about pandas:\n", 193 | "\n", 194 | "- Pandas docs: https://pandas.pydata.org/pandas-docs/stable/10min.html\n", 195 | "- Other tutorials: chapter from pandas in https://jakevdp.github.io/PythonDataScienceHandbook/, https://github.com/jorisvandenbossche/pandas-tutorial, https://github.com/TomAugspurger/pandas-head-to-tail, ..." 196 | ] 197 | }, 198 | { 199 | "cell_type": "markdown", 200 | "metadata": {}, 201 | "source": [ 202 | "
\n", 203 | "REMEMBER:
\n", 204 | "\n", 205 | "\n", 214 | "
" 215 | ] 216 | }, 217 | { 218 | "cell_type": "markdown", 219 | "metadata": {}, 220 | "source": [ 221 | "## Geometries: Points, Linestrings and Polygons\n", 222 | "\n", 223 | "Spatial **vector** data can consist of different types, and the 3 fundamental types are:\n", 224 | "\n", 225 | "* **Point** data: represents a single point in space.\n", 226 | "* **Line** data (\"LineString\"): represents a sequence of points that form a line.\n", 227 | "* **Polygon** data: represents a filled area.\n", 228 | "\n", 229 | "And each of them can also be combined in multi-part geometries (See https://shapely.readthedocs.io/en/stable/manual.html#geometric-objects for extensive overview)." 230 | ] 231 | }, 232 | { 233 | "cell_type": "markdown", 234 | "metadata": {}, 235 | "source": [ 236 | "For the example we have seen up to now, the individual geometry objects are Polygons:" 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": null, 242 | "metadata": {}, 243 | "outputs": [], 244 | "source": [ 245 | "print(countries.geometry[2])" 246 | ] 247 | }, 248 | { 249 | "cell_type": "markdown", 250 | "metadata": {}, 251 | "source": [ 252 | "Let's import some other datasets with different types of geometry objects.\n", 253 | "\n", 254 | "A dateset about cities in the world (adapted from http://www.naturalearthdata.com/downloads/110m-cultural-vectors/110m-populated-places/, zip file is available in the `/data` directory), consisting of Point data:" 255 | ] 256 | }, 257 | { 258 | "cell_type": "code", 259 | "execution_count": null, 260 | "metadata": {}, 261 | "outputs": [], 262 | "source": [ 263 | "cities = geopandas.read_file(\"zip://./data/ne_110m_populated_places.zip\")" 264 | ] 265 | }, 266 | { 267 | "cell_type": "code", 268 | "execution_count": null, 269 | "metadata": {}, 270 | "outputs": [], 271 | "source": [ 272 | "print(cities.geometry[0])" 273 | ] 274 | }, 275 | { 276 | "cell_type": "markdown", 277 | "metadata": {}, 278 | "source": [ 279 | "And a dataset of rivers in the world (from http://www.naturalearthdata.com/downloads/50m-physical-vectors/50m-rivers-lake-centerlines/, zip file is available in the `/data` directory) where each river is a (multi-)line:" 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": null, 285 | "metadata": {}, 286 | "outputs": [], 287 | "source": [ 288 | "rivers = geopandas.read_file(\"zip://./data/ne_50m_rivers_lake_centerlines.zip\")" 289 | ] 290 | }, 291 | { 292 | "cell_type": "code", 293 | "execution_count": null, 294 | "metadata": {}, 295 | "outputs": [], 296 | "source": [ 297 | "print(rivers.geometry[0])" 298 | ] 299 | }, 300 | { 301 | "cell_type": "markdown", 302 | "metadata": {}, 303 | "source": [ 304 | "### The `shapely` library\n", 305 | "\n", 306 | "The individual geometry objects are provided by the [`shapely`](https://shapely.readthedocs.io/en/stable/) library" 307 | ] 308 | }, 309 | { 310 | "cell_type": "code", 311 | "execution_count": null, 312 | "metadata": {}, 313 | "outputs": [], 314 | "source": [ 315 | "type(countries.geometry[0])" 316 | ] 317 | }, 318 | { 319 | "cell_type": "markdown", 320 | "metadata": {}, 321 | "source": [ 322 | "To construct one ourselves:" 323 | ] 324 | }, 325 | { 326 | "cell_type": "code", 327 | "execution_count": null, 328 | "metadata": {}, 329 | "outputs": [], 330 | "source": [ 331 | "from shapely.geometry import Point, Polygon, LineString" 332 | ] 333 | }, 334 | { 335 | "cell_type": "code", 336 | "execution_count": null, 337 | "metadata": {}, 338 | "outputs": [], 339 | "source": [ 340 | "p = Point(1, 1)" 341 | ] 342 | }, 343 | { 344 | "cell_type": "code", 345 | "execution_count": null, 346 | "metadata": {}, 347 | "outputs": [], 348 | "source": [ 349 | "print(p)" 350 | ] 351 | }, 352 | { 353 | "cell_type": "code", 354 | "execution_count": null, 355 | "metadata": {}, 356 | "outputs": [], 357 | "source": [ 358 | "polygon = Polygon([(1, 1), (2,2), (2, 1)])" 359 | ] 360 | }, 361 | { 362 | "cell_type": "markdown", 363 | "metadata": {}, 364 | "source": [ 365 | "
\n", 366 | "REMEMBER:

\n", 367 | "\n", 368 | "Single geometries are represented by `shapely` objects:\n", 369 | "\n", 370 | "\n", 379 | "
" 380 | ] 381 | }, 382 | { 383 | "cell_type": "markdown", 384 | "metadata": {}, 385 | "source": [ 386 | "## Coordinate reference systems\n", 387 | "\n", 388 | "A **coordinate reference system (CRS)** determines how the two-dimensional (planar) coordinates of the geometry objects should be related to actual places on the (non-planar) earth.\n", 389 | "\n", 390 | "For a nice in-depth explanation, see https://docs.qgis.org/2.8/en/docs/gentle_gis_introduction/coordinate_reference_systems.html" 391 | ] 392 | }, 393 | { 394 | "cell_type": "markdown", 395 | "metadata": {}, 396 | "source": [ 397 | "A GeoDataFrame or GeoSeries has a `.crs` attribute which holds (optionally) a description of the coordinate reference system of the geometries:" 398 | ] 399 | }, 400 | { 401 | "cell_type": "code", 402 | "execution_count": null, 403 | "metadata": {}, 404 | "outputs": [], 405 | "source": [ 406 | "countries.crs" 407 | ] 408 | }, 409 | { 410 | "cell_type": "markdown", 411 | "metadata": {}, 412 | "source": [ 413 | "For the `countries` dataframe, it indicates that it used the EPSG 4326 / WGS84 lon/lat reference system, which is one of the most used. \n", 414 | "It uses coordinates as latitude and longitude in degrees, as can you be seen from the x/y labels on the plot:" 415 | ] 416 | }, 417 | { 418 | "cell_type": "code", 419 | "execution_count": null, 420 | "metadata": {}, 421 | "outputs": [], 422 | "source": [ 423 | "countries.plot()" 424 | ] 425 | }, 426 | { 427 | "cell_type": "markdown", 428 | "metadata": {}, 429 | "source": [ 430 | "The `.crs` attribute is given as a dictionary. In this case, it only indicates the EPSG code, but it can also contain the full \"proj4\" string (in dictionary form). \n", 431 | "\n", 432 | "Under the hood, GeoPandas uses the `pyproj` / `proj4` libraries to deal with the re-projections.\n", 433 | "\n", 434 | "For more information, see also http://geopandas.readthedocs.io/en/latest/projections.html." 435 | ] 436 | }, 437 | { 438 | "cell_type": "markdown", 439 | "metadata": {}, 440 | "source": [ 441 | "---\n", 442 | "\n", 443 | "There are sometimes good reasons you want to change the coordinate references system of your dataset, for example:\n", 444 | "\n", 445 | "- different sources with different crs -> need to convert to the same crs\n", 446 | "- distance-based operations -> if you a crs that has meter units (not degrees)\n", 447 | "- plotting in a certain crs (eg to preserve area)\n", 448 | "\n", 449 | "We can convert a GeoDataFrame to another reference system using the `to_crs` function. \n", 450 | "\n", 451 | "For example, let's convert the countries to the World Mercator projection (http://epsg.io/3395):" 452 | ] 453 | }, 454 | { 455 | "cell_type": "code", 456 | "execution_count": null, 457 | "metadata": {}, 458 | "outputs": [], 459 | "source": [ 460 | "# remove Antartica, as the Mercator projection cannot deal with the poles\n", 461 | "countries = countries[(countries['name'] != \"Antarctica\")]" 462 | ] 463 | }, 464 | { 465 | "cell_type": "code", 466 | "execution_count": null, 467 | "metadata": {}, 468 | "outputs": [], 469 | "source": [ 470 | "countries_mercator = countries.to_crs(epsg=3395) # or .to_crs({'init': 'epsg:3395'})" 471 | ] 472 | }, 473 | { 474 | "cell_type": "code", 475 | "execution_count": null, 476 | "metadata": {}, 477 | "outputs": [], 478 | "source": [ 479 | "countries_mercator.plot()" 480 | ] 481 | }, 482 | { 483 | "cell_type": "markdown", 484 | "metadata": {}, 485 | "source": [ 486 | "Note the different scale of x and y." 487 | ] 488 | }, 489 | { 490 | "cell_type": "markdown", 491 | "metadata": {}, 492 | "source": [ 493 | "## Plotting our different layers together" 494 | ] 495 | }, 496 | { 497 | "cell_type": "code", 498 | "execution_count": null, 499 | "metadata": {}, 500 | "outputs": [], 501 | "source": [ 502 | "ax = countries.plot(edgecolor='k', facecolor='none', figsize=(15, 10))\n", 503 | "rivers.plot(ax=ax)\n", 504 | "cities.plot(ax=ax, color='red')\n", 505 | "ax.set(xlim=(-20, 60), ylim=(-40, 40))" 506 | ] 507 | }, 508 | { 509 | "cell_type": "markdown", 510 | "metadata": {}, 511 | "source": [ 512 | "See the [04-more-on-visualization.ipynb](04-more-on-visualization.ipynb) notebook for more details on visualizing geospatial datasets." 513 | ] 514 | }, 515 | { 516 | "cell_type": "markdown", 517 | "metadata": {}, 518 | "source": [ 519 | "## A bit more on importing and creating GeoDataFrames" 520 | ] 521 | }, 522 | { 523 | "cell_type": "markdown", 524 | "metadata": {}, 525 | "source": [ 526 | "### Note on `fiona`\n", 527 | "\n", 528 | "Under the hood, GeoPandas uses the [Fiona library](http://toblerity.org/fiona/) (pythonic interface to GDAL/OGR) to read and write data. GeoPandas provides a more user-friendly wrapper, which is sufficient for most use cases. But sometimes you want more control, and in that case, to read a file with fiona you can do the following:\n" 529 | ] 530 | }, 531 | { 532 | "cell_type": "code", 533 | "execution_count": null, 534 | "metadata": {}, 535 | "outputs": [], 536 | "source": [ 537 | "import fiona\n", 538 | "from shapely.geometry import shape\n", 539 | "\n", 540 | "with fiona.drivers():\n", 541 | " with fiona.open(\"data/ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp\") as collection:\n", 542 | " for feature in collection:\n", 543 | " # ... do something with geometry\n", 544 | " geom = shape(feature['geometry'])\n", 545 | " # ... do something with properties\n", 546 | " print(feature['properties']['name'])" 547 | ] 548 | }, 549 | { 550 | "cell_type": "markdown", 551 | "metadata": {}, 552 | "source": [ 553 | "### Constructing a GeoDataFrame manually" 554 | ] 555 | }, 556 | { 557 | "cell_type": "code", 558 | "execution_count": null, 559 | "metadata": {}, 560 | "outputs": [], 561 | "source": [ 562 | "geopandas.GeoDataFrame({\n", 563 | " 'geometry': [Point(1, 1), Point(2, 2)],\n", 564 | " 'attribute1': [1, 2],\n", 565 | " 'attribute2': [0.1, 0.2]})" 566 | ] 567 | }, 568 | { 569 | "cell_type": "markdown", 570 | "metadata": {}, 571 | "source": [ 572 | "### Creating a GeoDataFrame from an existing dataframe\n", 573 | "\n", 574 | "For example, if you have lat/lon coordinates in two columns:" 575 | ] 576 | }, 577 | { 578 | "cell_type": "code", 579 | "execution_count": null, 580 | "metadata": {}, 581 | "outputs": [], 582 | "source": [ 583 | "df = pd.DataFrame(\n", 584 | " {'City': ['Buenos Aires', 'Brasilia', 'Santiago', 'Bogota', 'Caracas'],\n", 585 | " 'Country': ['Argentina', 'Brazil', 'Chile', 'Colombia', 'Venezuela'],\n", 586 | " 'Latitude': [-34.58, -15.78, -33.45, 4.60, 10.48],\n", 587 | " 'Longitude': [-58.66, -47.91, -70.66, -74.08, -66.86]})" 588 | ] 589 | }, 590 | { 591 | "cell_type": "code", 592 | "execution_count": null, 593 | "metadata": {}, 594 | "outputs": [], 595 | "source": [ 596 | "df['Coordinates'] = list(zip(df.Longitude, df.Latitude))" 597 | ] 598 | }, 599 | { 600 | "cell_type": "code", 601 | "execution_count": null, 602 | "metadata": {}, 603 | "outputs": [], 604 | "source": [ 605 | "df['Coordinates'] = df['Coordinates'].apply(Point)" 606 | ] 607 | }, 608 | { 609 | "cell_type": "code", 610 | "execution_count": null, 611 | "metadata": {}, 612 | "outputs": [], 613 | "source": [ 614 | "gdf = geopandas.GeoDataFrame(df, geometry='Coordinates')" 615 | ] 616 | }, 617 | { 618 | "cell_type": "code", 619 | "execution_count": null, 620 | "metadata": {}, 621 | "outputs": [], 622 | "source": [ 623 | "gdf" 624 | ] 625 | }, 626 | { 627 | "cell_type": "markdown", 628 | "metadata": {}, 629 | "source": [ 630 | "See http://geopandas.readthedocs.io/en/latest/gallery/create_geopandas_from_pandas.html#sphx-glr-gallery-create-geopandas-from-pandas-py for full example" 631 | ] 632 | } 633 | ], 634 | "metadata": { 635 | "kernelspec": { 636 | "display_name": "Python 3", 637 | "language": "python", 638 | "name": "python3" 639 | }, 640 | "language_info": { 641 | "codemirror_mode": { 642 | "name": "ipython", 643 | "version": 3 644 | }, 645 | "file_extension": ".py", 646 | "mimetype": "text/x-python", 647 | "name": "python", 648 | "nbconvert_exporter": "python", 649 | "pygments_lexer": "ipython3", 650 | "version": "3.5.5" 651 | } 652 | }, 653 | "nbformat": 4, 654 | "nbformat_minor": 2 655 | } 656 | -------------------------------------------------------------------------------- /02-spatial-relationships-operations.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Spatial relationships and operations" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": null, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "%matplotlib inline\n", 17 | "\n", 18 | "import pandas as pd\n", 19 | "import geopandas\n", 20 | "\n", 21 | "pd.options.display.max_rows = 10" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": null, 27 | "metadata": {}, 28 | "outputs": [], 29 | "source": [ 30 | "countries = geopandas.read_file(\"zip://./data/ne_110m_admin_0_countries.zip\")\n", 31 | "cities = geopandas.read_file(\"zip://./data/ne_110m_populated_places.zip\")\n", 32 | "rivers = geopandas.read_file(\"zip://./data/ne_50m_rivers_lake_centerlines.zip\")" 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": {}, 38 | "source": [ 39 | "## Spatial relationships\n", 40 | "\n", 41 | "An important aspect of geospatial data is that we can look at *spatial relationships*: how two spatial objects relate to each other (whether they overlap, intersect, contain, .. one another).\n", 42 | "\n", 43 | "The topological, set-theoretic relationships in GIS are typically based on the DE-9IM model. See https://en.wikipedia.org/wiki/Spatial_relation for more information.\n", 44 | "\n", 45 | "![](img/TopologicSpatialRelations2.png)\n", 46 | "(Image by [Krauss, CC BY-SA 3.0](https://en.wikipedia.org/wiki/Spatial_relation#/media/File:TopologicSpatialRelarions2.png))" 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "### Relationships between individual objects" 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "Let's first create some small toy spatial objects:\n", 61 | "\n", 62 | "A polygon (note: we use `.squeeze()` here to to extract the scalar geometry object from the GeoSeries of length 1):" 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": null, 68 | "metadata": {}, 69 | "outputs": [], 70 | "source": [ 71 | "belgium = countries.loc[countries['name'] == 'Belgium', 'geometry'].squeeze()" 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": {}, 77 | "source": [ 78 | "Two points:" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": null, 84 | "metadata": {}, 85 | "outputs": [], 86 | "source": [ 87 | "paris = cities.loc[cities['name'] == 'Paris', 'geometry'].squeeze()\n", 88 | "brussels = cities.loc[cities['name'] == 'Brussels', 'geometry'].squeeze()" 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "And a linestring:" 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": null, 101 | "metadata": {}, 102 | "outputs": [], 103 | "source": [ 104 | "from shapely.geometry import LineString\n", 105 | "line = LineString([paris, brussels])" 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "Let's visualize those 4 geometry objects together (I only put them in a GeoSeries to easily display them together with the geopandas `.plot()` method):" 113 | ] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": null, 118 | "metadata": {}, 119 | "outputs": [], 120 | "source": [ 121 | "geopandas.GeoSeries([belgium, paris, brussels, line]).plot(cmap='tab10')" 122 | ] 123 | }, 124 | { 125 | "cell_type": "markdown", 126 | "metadata": {}, 127 | "source": [ 128 | "You can recognize the abstract shape of Belgium.\n", 129 | "\n", 130 | "Brussels, the capital of Belgium, is thus located within Belgium. This is a spatial relationship, and we can test this using the individual shapely geometry objects as follow:" 131 | ] 132 | }, 133 | { 134 | "cell_type": "code", 135 | "execution_count": null, 136 | "metadata": {}, 137 | "outputs": [], 138 | "source": [ 139 | "brussels.within(belgium)" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": {}, 145 | "source": [ 146 | "And using the reverse, Belgium contains Brussels:" 147 | ] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "execution_count": null, 152 | "metadata": {}, 153 | "outputs": [], 154 | "source": [ 155 | "belgium.contains(brussels)" 156 | ] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "metadata": {}, 161 | "source": [ 162 | "On the other hand, Paris is not located in Belgium:" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": null, 168 | "metadata": {}, 169 | "outputs": [], 170 | "source": [ 171 | "belgium.contains(paris)" 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": null, 177 | "metadata": {}, 178 | "outputs": [], 179 | "source": [ 180 | "paris.within(belgium)" 181 | ] 182 | }, 183 | { 184 | "cell_type": "markdown", 185 | "metadata": {}, 186 | "source": [ 187 | "The straight line we draw from Paris to Brussels is not fully located within Belgium, but it does intersect with it:" 188 | ] 189 | }, 190 | { 191 | "cell_type": "code", 192 | "execution_count": null, 193 | "metadata": {}, 194 | "outputs": [], 195 | "source": [ 196 | "belgium.contains(line)" 197 | ] 198 | }, 199 | { 200 | "cell_type": "code", 201 | "execution_count": null, 202 | "metadata": {}, 203 | "outputs": [], 204 | "source": [ 205 | "line.intersects(belgium)" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "### Spatial relationships with GeoDataFrames\n", 213 | "\n", 214 | "The same methods that are available on individual `shapely` geometries as we have seen above, are also available as methods on `GeoSeries` / `GeoDataFrame` objects.\n", 215 | "\n", 216 | "For example, if we call the `contains` method on the world dataset with the `paris` point, it will do this spatial check for each country in the `world` dataframe:" 217 | ] 218 | }, 219 | { 220 | "cell_type": "code", 221 | "execution_count": null, 222 | "metadata": {}, 223 | "outputs": [], 224 | "source": [ 225 | "countries.contains(paris)" 226 | ] 227 | }, 228 | { 229 | "cell_type": "markdown", 230 | "metadata": {}, 231 | "source": [ 232 | "Because the above gives us a boolean result, we can use that to filter the dataframe:" 233 | ] 234 | }, 235 | { 236 | "cell_type": "code", 237 | "execution_count": null, 238 | "metadata": {}, 239 | "outputs": [], 240 | "source": [ 241 | "countries[countries.contains(paris)]" 242 | ] 243 | }, 244 | { 245 | "cell_type": "markdown", 246 | "metadata": {}, 247 | "source": [ 248 | "And indeed, France is the only country in the world in which Paris is located." 249 | ] 250 | }, 251 | { 252 | "cell_type": "markdown", 253 | "metadata": {}, 254 | "source": [ 255 | "Another example, extracting the linestring of the Amazon river in South America, we can query through which countries the river flows:" 256 | ] 257 | }, 258 | { 259 | "cell_type": "code", 260 | "execution_count": null, 261 | "metadata": {}, 262 | "outputs": [], 263 | "source": [ 264 | "amazon = rivers[rivers['name'] == 'Amazonas'].geometry.squeeze()" 265 | ] 266 | }, 267 | { 268 | "cell_type": "code", 269 | "execution_count": null, 270 | "metadata": {}, 271 | "outputs": [], 272 | "source": [ 273 | "countries[countries.crosses(amazon)] # or .intersects" 274 | ] 275 | }, 276 | { 277 | "cell_type": "markdown", 278 | "metadata": {}, 279 | "source": [ 280 | "
\n", 281 | "REFERENCE:

\n", 282 | "\n", 283 | "Overview of the different functions to check spatial relationships (*spatial predicate functions*):\n", 284 | "\n", 285 | "\n", 296 | "\n", 297 | "

\n", 298 | "See https://shapely.readthedocs.io/en/stable/manual.html#predicates-and-relationships for an overview of those methods.\n", 299 | "

\n", 300 | "See https://en.wikipedia.org/wiki/DE-9IM for all details on the semantics of those operations.\n", 301 | "

\n", 302 | "
" 303 | ] 304 | }, 305 | { 306 | "cell_type": "markdown", 307 | "metadata": {}, 308 | "source": [ 309 | "## Spatial operations\n", 310 | "\n", 311 | "Next to the spatial predicates that return boolean values, Shapely and GeoPandas aslo provide analysis methods that return new geometric objects.\n", 312 | "\n", 313 | "See https://shapely.readthedocs.io/en/stable/manual.html#spatial-analysis-methods for more details." 314 | ] 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "metadata": {}, 319 | "source": [ 320 | "For example, using the toy data from above, let's construct a buffer around Brussels (which returns a Polygon):" 321 | ] 322 | }, 323 | { 324 | "cell_type": "code", 325 | "execution_count": null, 326 | "metadata": {}, 327 | "outputs": [], 328 | "source": [ 329 | "geopandas.GeoSeries([belgium, brussels.buffer(1)]).plot(alpha=0.5, cmap='tab10')" 330 | ] 331 | }, 332 | { 333 | "cell_type": "markdown", 334 | "metadata": {}, 335 | "source": [ 336 | "and now take the intersection, union or difference of those two polygons:" 337 | ] 338 | }, 339 | { 340 | "cell_type": "code", 341 | "execution_count": null, 342 | "metadata": {}, 343 | "outputs": [], 344 | "source": [ 345 | "brussels.buffer(1).intersection(belgium)" 346 | ] 347 | }, 348 | { 349 | "cell_type": "code", 350 | "execution_count": null, 351 | "metadata": {}, 352 | "outputs": [], 353 | "source": [ 354 | "brussels.buffer(1).union(belgium)" 355 | ] 356 | }, 357 | { 358 | "cell_type": "code", 359 | "execution_count": null, 360 | "metadata": {}, 361 | "outputs": [], 362 | "source": [ 363 | "brussels.buffer(1).difference(belgium)" 364 | ] 365 | }, 366 | { 367 | "cell_type": "markdown", 368 | "metadata": {}, 369 | "source": [ 370 | "Another useful method is the `unary_union` attribute, which converts the set of geometry objects in a GeoDataFrame into a single geometry object by taking the union of all those geometries.\n", 371 | "\n", 372 | "For example, we can construct a single object for the Africa continent:" 373 | ] 374 | }, 375 | { 376 | "cell_type": "code", 377 | "execution_count": null, 378 | "metadata": {}, 379 | "outputs": [], 380 | "source": [ 381 | "africa_countries = countries[countries['continent'] == 'Africa']" 382 | ] 383 | }, 384 | { 385 | "cell_type": "code", 386 | "execution_count": null, 387 | "metadata": {}, 388 | "outputs": [], 389 | "source": [ 390 | "africa = africa_countries.unary_union" 391 | ] 392 | }, 393 | { 394 | "cell_type": "code", 395 | "execution_count": null, 396 | "metadata": {}, 397 | "outputs": [], 398 | "source": [ 399 | "africa" 400 | ] 401 | }, 402 | { 403 | "cell_type": "code", 404 | "execution_count": null, 405 | "metadata": { 406 | "scrolled": false 407 | }, 408 | "outputs": [], 409 | "source": [ 410 | "print(str(africa)[:1000])" 411 | ] 412 | }, 413 | { 414 | "cell_type": "markdown", 415 | "metadata": {}, 416 | "source": [ 417 | "
\n", 418 | "REMEMBER:

\n", 419 | "\n", 420 | "GeoPandas (and Shapely for the individual objects) provides a whole lot of basic methods to analyse the geospatial data (distance, length, centroid, boundary, convex_hull, simplify, transform, ....), much more than the few that we can touch in this tutorial.\n", 421 | "\n", 422 | "\n", 423 | "\n", 426 | "\n", 427 | "
\n", 428 | "\n" 429 | ] 430 | } 431 | ], 432 | "metadata": { 433 | "kernelspec": { 434 | "display_name": "Python 3", 435 | "language": "python", 436 | "name": "python3" 437 | }, 438 | "language_info": { 439 | "codemirror_mode": { 440 | "name": "ipython", 441 | "version": 3 442 | }, 443 | "file_extension": ".py", 444 | "mimetype": "text/x-python", 445 | "name": "python", 446 | "nbconvert_exporter": "python", 447 | "pygments_lexer": "ipython3", 448 | "version": "3.6.3" 449 | } 450 | }, 451 | "nbformat": 4, 452 | "nbformat_minor": 2 453 | } 454 | -------------------------------------------------------------------------------- /03-spatial-joins.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Spatial joins" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": { 13 | "slideshow": { 14 | "slide_type": "fragment" 15 | } 16 | }, 17 | "source": [ 18 | "Goals of this notebook:\n", 19 | "\n", 20 | "- Based on the `countries` and `cities` dataframes, determine for each city the country in which it is located.\n", 21 | "- To solve this problem, we will use the the concept of a 'spatial join' operation: combining information of geospatial datasets based on their spatial relationship." 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": null, 27 | "metadata": {}, 28 | "outputs": [], 29 | "source": [ 30 | "%matplotlib inline\n", 31 | "\n", 32 | "import pandas as pd\n", 33 | "import geopandas\n", 34 | "\n", 35 | "pd.options.display.max_rows = 10" 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": null, 41 | "metadata": {}, 42 | "outputs": [], 43 | "source": [ 44 | "countries = geopandas.read_file(\"zip://./data/ne_110m_admin_0_countries.zip\")\n", 45 | "cities = geopandas.read_file(\"zip://./data/ne_110m_populated_places.zip\")\n", 46 | "rivers = geopandas.read_file(\"zip://./data/ne_50m_rivers_lake_centerlines.zip\")" 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "## Recap - joining dataframes\n", 54 | "\n", 55 | "Pandas provides functionality to join or merge dataframes in different ways, see https://chrisalbon.com/python/data_wrangling/pandas_join_merge_dataframe/ for an overview and https://pandas.pydata.org/pandas-docs/stable/merging.html for the full documentation." 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "To illustrate the concept of joining the information of two dataframes with pandas, let's take a small subset of our `cities` and `countries` datasets: " 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": null, 68 | "metadata": {}, 69 | "outputs": [], 70 | "source": [ 71 | "cities2 = cities[cities['name'].isin(['Bern', 'Brussels', 'London', 'Paris'])].copy()\n", 72 | "cities2['iso_a3'] = ['CHE', 'BEL', 'GBR', 'FRA']" 73 | ] 74 | }, 75 | { 76 | "cell_type": "code", 77 | "execution_count": null, 78 | "metadata": {}, 79 | "outputs": [], 80 | "source": [ 81 | "cities2" 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": null, 87 | "metadata": {}, 88 | "outputs": [], 89 | "source": [ 90 | "countries2 = countries[['iso_a3', 'name', 'continent']]\n", 91 | "countries2.head()" 92 | ] 93 | }, 94 | { 95 | "cell_type": "markdown", 96 | "metadata": {}, 97 | "source": [ 98 | "We added a 'iso_a3' column to the `cities` dataset, indicating a code of the country of the city. This country code is also present in the `countries` dataset, which allows us to merge those two dataframes based on the common column.\n", 99 | "\n", 100 | "Joining the `cities` dataframe with `countries` will transfer extra information about the countries (the full name, the continent) to the `cities` dataframe, based on a common key:" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": null, 106 | "metadata": {}, 107 | "outputs": [], 108 | "source": [ 109 | "cities2.merge(countries2, on='iso_a3')" 110 | ] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": {}, 115 | "source": [ 116 | "**But**, for this illustrative example, we added the common column manually, it is not present in the original dataset. However, we can still know how to join those two datasets based on their spatial coordinates." 117 | ] 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "metadata": {}, 122 | "source": [ 123 | "## Recap - spatial relationships between objects\n", 124 | "\n", 125 | "In the previous notebook [02-spatial-relationships.ipynb](./02-spatial-relationships-operations.ipynb), we have seen the notion of spatial relationships between geometry objects: within, contains, intersects, ...\n", 126 | "\n", 127 | "In this case, we know that each of the cities is located *within* one of the countries, or the other way around that each country can *contain* multiple cities.\n", 128 | "\n", 129 | "We can test such relationships using the methods we have seen in the previous notebook:" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": null, 135 | "metadata": {}, 136 | "outputs": [], 137 | "source": [ 138 | "france = countries.loc[countries['name'] == 'France', 'geometry'].squeeze()" 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": null, 144 | "metadata": {}, 145 | "outputs": [], 146 | "source": [ 147 | "cities.within(france)" 148 | ] 149 | }, 150 | { 151 | "cell_type": "markdown", 152 | "metadata": {}, 153 | "source": [ 154 | "The above gives us a boolean series, indicating for each point in our `cities` dataframe whether it is located within the area of France or not. \n", 155 | "Because this is a boolean series as result, we can use it to filter the original dataframe to only show those cities that are actually within France:" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": null, 161 | "metadata": {}, 162 | "outputs": [], 163 | "source": [ 164 | "cities[cities.within(france)]" 165 | ] 166 | }, 167 | { 168 | "cell_type": "markdown", 169 | "metadata": {}, 170 | "source": [ 171 | "We could now repeat the above analysis for each of the countries, and add a column to the `cities` dataframe indicating this country. However, that would be tedious to do manually, and is also exactly what the spatial join operation provides us.\n", 172 | "\n", 173 | "*(note: the above result is incorrect, but this is just because of the coarse-ness of the countries dataset)*" 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": { 179 | "slideshow": { 180 | "slide_type": "slide" 181 | } 182 | }, 183 | "source": [ 184 | "## Spatial join operation\n", 185 | "\n", 186 | "
\n", 187 | "SPATIAL JOIN = *transferring attributes from one layer to another based on their spatial relationship*

\n", 188 | "\n", 189 | "\n", 190 | "Different parts of this operations:\n", 191 | "\n", 192 | "\n", 198 | "\n", 199 | "
" 200 | ] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": { 205 | "slideshow": { 206 | "slide_type": "-" 207 | } 208 | }, 209 | "source": [ 210 | "In this case, we want to join the `cities` dataframe with the information of the `countries` dataframe, based on the spatial relationship between both datasets.\n", 211 | "\n", 212 | "We use the [`geopandas.sjoin`](http://geopandas.readthedocs.io/en/latest/reference/geopandas.sjoin.html) function:" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": null, 218 | "metadata": {}, 219 | "outputs": [], 220 | "source": [ 221 | "joined = geopandas.sjoin(cities, countries, op='within', how='left')" 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": null, 227 | "metadata": {}, 228 | "outputs": [], 229 | "source": [ 230 | "joined" 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "execution_count": null, 236 | "metadata": {}, 237 | "outputs": [], 238 | "source": [ 239 | "joined['continent'].value_counts()" 240 | ] 241 | }, 242 | { 243 | "cell_type": "markdown", 244 | "metadata": {}, 245 | "source": [ 246 | "## The overlay operation\n", 247 | "\n", 248 | "In the spatial join operation above, we are not changing the geometries itself. We are not joining geometries, but joining attributes based on a spatial relationship between the geometries. This also means that the geometries need to at least overlap partially.\n", 249 | "\n", 250 | "If you want to create new geometries based on joining (combining) geometries of different dataframes into one new dataframe (eg by taking the intersection of the geometries), you want an **overlay** operation." 251 | ] 252 | }, 253 | { 254 | "cell_type": "code", 255 | "execution_count": null, 256 | "metadata": {}, 257 | "outputs": [], 258 | "source": [ 259 | "africa = countries[countries['continent'] == 'Africa']" 260 | ] 261 | }, 262 | { 263 | "cell_type": "code", 264 | "execution_count": null, 265 | "metadata": {}, 266 | "outputs": [], 267 | "source": [ 268 | "africa.plot()" 269 | ] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "execution_count": null, 274 | "metadata": {}, 275 | "outputs": [], 276 | "source": [ 277 | "cities['geometry'] = cities.buffer(2)" 278 | ] 279 | }, 280 | { 281 | "cell_type": "code", 282 | "execution_count": null, 283 | "metadata": {}, 284 | "outputs": [], 285 | "source": [ 286 | "geopandas.overlay(africa, cities, how='difference').plot()" 287 | ] 288 | }, 289 | { 290 | "cell_type": "markdown", 291 | "metadata": {}, 292 | "source": [ 293 | "
\n", 294 | "REMEMBER
\n", 295 | "\n", 296 | "\n", 300 | "\n", 301 | "
" 302 | ] 303 | } 304 | ], 305 | "metadata": { 306 | "kernelspec": { 307 | "display_name": "Python 3", 308 | "language": "python", 309 | "name": "python3" 310 | }, 311 | "language_info": { 312 | "codemirror_mode": { 313 | "name": "ipython", 314 | "version": 3 315 | }, 316 | "file_extension": ".py", 317 | "mimetype": "text/x-python", 318 | "name": "python", 319 | "nbconvert_exporter": "python", 320 | "pygments_lexer": "ipython3", 321 | "version": "3.5.5" 322 | } 323 | }, 324 | "nbformat": 4, 325 | "nbformat_minor": 2 326 | } 327 | -------------------------------------------------------------------------------- /04-more-on-visualization.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Visualizing spatial data with Python" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": null, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "%matplotlib inline\n", 17 | "\n", 18 | "import pandas as pd\n", 19 | "import geopandas\n", 20 | "\n", 21 | "import matplotlib.pyplot as plt\n", 22 | "\n", 23 | "pd.options.display.max_rows = 10" 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": null, 29 | "metadata": {}, 30 | "outputs": [], 31 | "source": [ 32 | "countries = geopandas.read_file(\"zip://./data/ne_110m_admin_0_countries.zip\")\n", 33 | "cities = geopandas.read_file(\"zip://./data/ne_110m_populated_places.zip\")\n", 34 | "rivers = geopandas.read_file(\"zip://./data/ne_50m_rivers_lake_centerlines.zip\")" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "## GeoPandas visualization functionality" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "#### Basic plot" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": null, 54 | "metadata": {}, 55 | "outputs": [], 56 | "source": [ 57 | "countries.plot()" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "metadata": {}, 63 | "source": [ 64 | "#### Adjusting the figure size" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": null, 70 | "metadata": {}, 71 | "outputs": [], 72 | "source": [ 73 | "countries.plot(figsize=(15, 15))" 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": {}, 79 | "source": [ 80 | "#### Removing the box / x and y coordinate labels" 81 | ] 82 | }, 83 | { 84 | "cell_type": "code", 85 | "execution_count": null, 86 | "metadata": {}, 87 | "outputs": [], 88 | "source": [ 89 | "ax = countries.plot(figsize=(15, 15))\n", 90 | "ax.set_axis_off()" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "#### Coloring based on column values\n", 98 | "\n", 99 | "Let's first create a new column with the GDP per capita:" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": null, 105 | "metadata": {}, 106 | "outputs": [], 107 | "source": [ 108 | "countries = countries[(countries['pop_est'] >0 ) & (countries['name'] != \"Antarctica\")]" 109 | ] 110 | }, 111 | { 112 | "cell_type": "code", 113 | "execution_count": null, 114 | "metadata": {}, 115 | "outputs": [], 116 | "source": [ 117 | "countries['gdp_per_cap'] = countries['gdp_md_est'] / countries['pop_est'] * 100" 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "and now we can use this column to color the polygons:" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": null, 130 | "metadata": {}, 131 | "outputs": [], 132 | "source": [ 133 | "ax = countries.plot(figsize=(15, 15), column='gdp_per_cap')\n", 134 | "ax.set_axis_off()" 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": null, 140 | "metadata": {}, 141 | "outputs": [], 142 | "source": [ 143 | "ax = countries.plot(figsize=(15, 15), column='gdp_per_cap', scheme='quantiles', legend=True)\n", 144 | "ax.set_axis_off()" 145 | ] 146 | }, 147 | { 148 | "cell_type": "markdown", 149 | "metadata": {}, 150 | "source": [ 151 | "#### Combining different dataframes on a single plot\n", 152 | "\n", 153 | "The `.plot` method returns a matplotlib Axes object, which can then be re-used to add additional layers to that plot with the `ax=` keyword:" 154 | ] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "execution_count": null, 159 | "metadata": {}, 160 | "outputs": [], 161 | "source": [ 162 | "ax = countries.plot(figsize=(15, 15))\n", 163 | "cities.plot(ax=ax, color='red', markersize=10)\n", 164 | "ax.set_axis_off()" 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": null, 170 | "metadata": {}, 171 | "outputs": [], 172 | "source": [ 173 | "ax = countries.plot(edgecolor='k', facecolor='none', figsize=(15, 10))\n", 174 | "rivers.plot(ax=ax)\n", 175 | "cities.plot(ax=ax, color='C1')\n", 176 | "ax.set(xlim=(-20, 60), ylim=(-40, 40))" 177 | ] 178 | }, 179 | { 180 | "cell_type": "markdown", 181 | "metadata": {}, 182 | "source": [ 183 | "## Using `geoplot`\n", 184 | "\n", 185 | "The `geoplot` packages provides some additional functionality compared to the basic `.plot()` method on GeoDataFrames:\n", 186 | "\n", 187 | "- High-level plotting API (with more plot types as geopandas)\n", 188 | "- Native projection support\n", 189 | "\n", 190 | "https://residentmario.github.io/geoplot/index.html" 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": null, 196 | "metadata": {}, 197 | "outputs": [], 198 | "source": [ 199 | "import geoplot\n", 200 | "import geoplot.crs as gcrs" 201 | ] 202 | }, 203 | { 204 | "cell_type": "code", 205 | "execution_count": null, 206 | "metadata": {}, 207 | "outputs": [], 208 | "source": [ 209 | "fig, ax = plt.subplots(figsize=(10, 10), subplot_kw={\n", 210 | " 'projection': gcrs.Orthographic(central_latitude=40.7128, central_longitude=-74.0059)\n", 211 | "})\n", 212 | "geoplot.choropleth(countries, hue='gdp_per_cap', projection=gcrs.Orthographic(), ax=ax,\n", 213 | " cmap='magma', linewidth=0.5, edgecolor='white', k=None)\n", 214 | "ax.set_global()\n", 215 | "ax.outline_patch.set_visible(True)\n", 216 | "#ax.coastlines()" 217 | ] 218 | }, 219 | { 220 | "cell_type": "markdown", 221 | "metadata": {}, 222 | "source": [ 223 | "## Using `cartopy`\n", 224 | "\n", 225 | "Cartopy is the base matplotlib cartographic library, and it is used by `geoplot` under the hood to provide projection-awareness.\n", 226 | "\n", 227 | "http://scitools.org.uk/cartopy/docs/latest/index.html\n", 228 | "\n", 229 | "The following example is taken from the docs: http://geopandas.readthedocs.io/en/latest/gallery/cartopy_convert.html#sphx-glr-gallery-cartopy-convert-py" 230 | ] 231 | }, 232 | { 233 | "cell_type": "code", 234 | "execution_count": null, 235 | "metadata": {}, 236 | "outputs": [], 237 | "source": [ 238 | "from cartopy import crs as ccrs" 239 | ] 240 | }, 241 | { 242 | "cell_type": "code", 243 | "execution_count": null, 244 | "metadata": {}, 245 | "outputs": [], 246 | "source": [ 247 | "# Define the CartoPy CRS object.\n", 248 | "crs = ccrs.AlbersEqualArea()\n", 249 | "\n", 250 | "# This can be converted into a `proj4` string/dict compatible with GeoPandas\n", 251 | "crs_proj4 = crs.proj4_init\n", 252 | "countries_ae = countries.to_crs(crs_proj4)\n", 253 | "\n", 254 | "# Here's what the plot looks like in GeoPandas\n", 255 | "countries_ae.plot()" 256 | ] 257 | }, 258 | { 259 | "cell_type": "markdown", 260 | "metadata": {}, 261 | "source": [ 262 | "## Interactive web-based visualizations\n", 263 | "\n", 264 | "There are nowadays many libraries that target interactive web-based visualizations and that can handle geospatial data. Some packages with an example for each:\n", 265 | "\n", 266 | "- Bokeh: https://bokeh.pydata.org/en/latest/docs/gallery/texas.html\n", 267 | "- GeoViews (other interface to Bokeh/matplotlib): http://geo.holoviews.org\n", 268 | "- Altair: https://altair-viz.github.io/gallery/choropleth.html\n", 269 | "- Plotly: https://plot.ly/python/#maps\n", 270 | "- ..." 271 | ] 272 | }, 273 | { 274 | "cell_type": "markdown", 275 | "metadata": {}, 276 | "source": [ 277 | "Another popular javascript library for online maps is [Leaflet.js](https://leafletjs.com/), and this has python bindings in the [folium](https://github.com/python-visualization/folium) and [ipyleaflet](https://github.com/jupyter-widgets/ipyleaflet) packages." 278 | ] 279 | }, 280 | { 281 | "cell_type": "markdown", 282 | "metadata": {}, 283 | "source": [ 284 | "An example with folium:" 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": null, 290 | "metadata": {}, 291 | "outputs": [], 292 | "source": [ 293 | "import folium" 294 | ] 295 | }, 296 | { 297 | "cell_type": "code", 298 | "execution_count": null, 299 | "metadata": {}, 300 | "outputs": [], 301 | "source": [ 302 | "m = folium.Map([48.8566, 2.3429], zoom_start=6, tiles=\"OpenStreetMap\")\n", 303 | "folium.GeoJson(countries.to_json()).add_to(m)\n", 304 | "m" 305 | ] 306 | }, 307 | { 308 | "cell_type": "code", 309 | "execution_count": null, 310 | "metadata": {}, 311 | "outputs": [], 312 | "source": [] 313 | }, 314 | { 315 | "cell_type": "code", 316 | "execution_count": null, 317 | "metadata": {}, 318 | "outputs": [], 319 | "source": [ 320 | "m = folium.Map([48.8566, 2.3429], zoom_start=6, tiles=\"OpenStreetMap\")\n", 321 | "folium.GeoJson(cities.to_json()).add_to(m)\n", 322 | "m" 323 | ] 324 | }, 325 | { 326 | "cell_type": "code", 327 | "execution_count": null, 328 | "metadata": {}, 329 | "outputs": [], 330 | "source": [] 331 | } 332 | ], 333 | "metadata": { 334 | "kernelspec": { 335 | "display_name": "Python 3", 336 | "language": "python", 337 | "name": "python3" 338 | }, 339 | "language_info": { 340 | "codemirror_mode": { 341 | "name": "ipython", 342 | "version": 3 343 | }, 344 | "file_extension": ".py", 345 | "mimetype": "text/x-python", 346 | "name": "python", 347 | "nbconvert_exporter": "python", 348 | "pygments_lexer": "ipython3", 349 | "version": "3.5.5" 350 | } 351 | }, 352 | "nbformat": 4, 353 | "nbformat_minor": 2 354 | } 355 | -------------------------------------------------------------------------------- /07-data-borrowing.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Concepts in Spatial Linear Modelling\n", 8 | "\n", 9 | "### Data Borrowing in Supervised Learning" 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": 14, 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import numpy as np\n", 19 | "import libpysal as lp\n", 20 | "import geopandas as gpd\n", 21 | "import pandas as pd\n", 22 | "import shapely.geometry as shp\n", 23 | "import matplotlib.pyplot as plt\n", 24 | "import seaborn as sns\n", 25 | "%matplotlib inline" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 15, 31 | "metadata": {}, 32 | "outputs": [], 33 | "source": [ 34 | "listings = pd.read_csv('./data/berlin-listings.csv.gz')\n", 35 | "listings['geometry'] = listings[['longitude','latitude']].apply(shp.Point, axis=1)\n", 36 | "listings = gpd.GeoDataFrame(listings)\n", 37 | "listings.crs = {'init':'epsg:4269'}\n", 38 | "listings = listings.to_crs(epsg=3857)" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 16, 44 | "metadata": {}, 45 | "outputs": [ 46 | { 47 | "data": { 48 | "text/plain": [ 49 | "" 50 | ] 51 | }, 52 | "execution_count": 16, 53 | "metadata": {}, 54 | "output_type": "execute_result" 55 | }, 56 | { 57 | "data": { 58 | "image/png": "\n", 59 | "text/plain": [ 60 | "
" 61 | ] 62 | }, 63 | "metadata": { 64 | "needs_background": "light" 65 | }, 66 | "output_type": "display_data" 67 | } 68 | ], 69 | "source": [ 70 | "listings.sort_values('price').plot('price', cmap='plasma')" 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "# Kernel Regressions" 78 | ] 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "metadata": {}, 83 | "source": [ 84 | "Kernel regressions are one exceptionally common way to allow observations to \"borrow strength\" from nearby observations. \n", 85 | "\n", 86 | "However, when working with spatial data, there are *two simultaneous senses of what is near:* \n", 87 | "- things that similar in attribute (classical kernel regression)\n", 88 | "- things that are similar in spatial position (spatial kernel regression)\n", 89 | "\n", 90 | "Below, we'll walk through how to use scikit to fit these two types of kernel regressions, show how it's not super simple to mix the two approaches together, and refer to an approach that does this correctly in another package. " 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "First, though, let's try to predict the log of an Airbnb's nightly price based on a few factors:\n", 98 | "- accommodates: the number of people the airbnb can accommodate\n", 99 | "- review_scores_rating: the aggregate rating of the listing\n", 100 | "- bedrooms: the number of bedrooms the airbnb has\n", 101 | "- bathrooms: the number of bathrooms the airbnb has\n", 102 | "- beds: the number of beds the airbnb offers" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": 17, 108 | "metadata": {}, 109 | "outputs": [], 110 | "source": [ 111 | "model_data = listings[['accommodates', 'review_scores_rating', \n", 112 | " 'bedrooms', 'bathrooms', 'beds', \n", 113 | " 'price', 'geometry']].dropna()" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": 18, 119 | "metadata": {}, 120 | "outputs": [], 121 | "source": [ 122 | "Xnames = ['accommodates', 'review_scores_rating', \n", 123 | " 'bedrooms', 'bathrooms', 'beds' ]\n", 124 | "X = model_data[Xnames].values\n", 125 | "X = X.astype(float)\n", 126 | "y = np.log(model_data[['price']].values)" 127 | ] 128 | }, 129 | { 130 | "cell_type": "markdown", 131 | "metadata": {}, 132 | "source": [ 133 | "Further, since each listing has a location, I'll extract the set of spatial coordinates coordinates for each listing." 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": 19, 139 | "metadata": {}, 140 | "outputs": [], 141 | "source": [ 142 | "coordinates = np.vstack(model_data.geometry.apply(lambda p: np.hstack(p.xy)).values)" 143 | ] 144 | }, 145 | { 146 | "cell_type": "markdown", 147 | "metadata": {}, 148 | "source": [ 149 | "scikit neighbor regressions are contained in the `sklearn.neighbors` module, and there are two main types:\n", 150 | "- `KNeighborsRegressor`, which uses a k-nearest neighborhood of observations around each focal site\n", 151 | "- `RadiusNeighborsRegressor`, which considers all observations within a fixed radius around each focal site.\n", 152 | "\n", 153 | "Further, these methods can use inverse distance weighting to rank the relative importance of sites around each focal; in this way, near things are given more weight than far things, even when there's a lot of near things. " 154 | ] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "execution_count": 20, 159 | "metadata": {}, 160 | "outputs": [], 161 | "source": [ 162 | "import sklearn.neighbors as skn\n", 163 | "import sklearn.metrics as skm" 164 | ] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "execution_count": 21, 169 | "metadata": {}, 170 | "outputs": [], 171 | "source": [ 172 | "shuffle = np.random.permutation(len(y))\n", 173 | "train,test = shuffle[:14000],shuffle[14000:]" 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "So, let's fit three models:\n", 181 | "- `spatial`: using inverse distance weighting on the nearest 500 neighbors geograpical space\n", 182 | "- `attribute`: using inverse distance weighting on the nearest 500 neighbors in attribute space\n", 183 | "- `both`: using inverse distance weighting in both geographical and attribute space. " 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "execution_count": 22, 189 | "metadata": {}, 190 | "outputs": [], 191 | "source": [ 192 | "KNNR = skn.KNeighborsRegressor(weights='distance', n_neighbors=500)\n", 193 | "spatial = KNNR.fit(coordinates[train,:],\n", 194 | " y[train,:])\n", 195 | "KNNR = skn.KNeighborsRegressor(weights='distance', n_neighbors=500)\n", 196 | "attribute = KNNR.fit(X[train,:],\n", 197 | " y[train,])\n", 198 | "KNNR = skn.KNeighborsRegressor(weights='distance', n_neighbors=500)\n", 199 | "both = KNNR.fit(np.hstack((coordinates,X))[train,:],\n", 200 | " y[train,:])" 201 | ] 202 | }, 203 | { 204 | "cell_type": "markdown", 205 | "metadata": {}, 206 | "source": [ 207 | "To score them, I'm going to grab their out of sample prediction accuracy and get their % explained variance:" 208 | ] 209 | }, 210 | { 211 | "cell_type": "code", 212 | "execution_count": 23, 213 | "metadata": {}, 214 | "outputs": [], 215 | "source": [ 216 | "sp_ypred = spatial.predict(coordinates[test,:])\n", 217 | "att_ypred = attribute.predict(X[test,:])\n", 218 | "both_ypred = both.predict(np.hstack((X,coordinates))[test,:])" 219 | ] 220 | }, 221 | { 222 | "cell_type": "code", 223 | "execution_count": 24, 224 | "metadata": {}, 225 | "outputs": [ 226 | { 227 | "data": { 228 | "text/plain": [ 229 | "(0.1443088606590084, 0.3149860849884514, -5.684468673550214e-09)" 230 | ] 231 | }, 232 | "execution_count": 24, 233 | "metadata": {}, 234 | "output_type": "execute_result" 235 | } 236 | ], 237 | "source": [ 238 | "(skm.explained_variance_score(y[test,], sp_ypred),\n", 239 | " skm.explained_variance_score(y[test,], att_ypred),\n", 240 | " skm.explained_variance_score(y[test,], both_ypred))" 241 | ] 242 | }, 243 | { 244 | "cell_type": "markdown", 245 | "metadata": {}, 246 | "source": [ 247 | "If you don't know $X$, using $Wy$ would be better than nothing, but it works nowhere near as well... less than half of the variance that is explained by nearness in feature/attribute space is explained by nearness in geographical space. \n", 248 | "\n", 249 | "Making things even worse, simply glomming on the geographical information to the feature set makes the model perform horribly. \n", 250 | "\n", 251 | "*There must be another way!*\n", 252 | "\n", 253 | "One method that can exploit the fact that local data may be more informative in predicting $y$ at site $i$ than distant data is Geographically Weighted Regression, a type of Generalized Additive Spatial Model. Kind of like a Kernel Regression, GWR conducts a bunch of regressions at each training site only considering data near that site. This means it works like the kernel regressions above, but uses *both* the coordinates *and* the data in $X$ to predict $y$ at each site. It optimizes its sense of \"local\" depending on some information criteria or fit score.\n", 254 | "\n", 255 | "You can find this in the `gwr` package, and significant development is ongoing on this at `https://github.com/pysal/gwr`." 256 | ] 257 | }, 258 | { 259 | "cell_type": "markdown", 260 | "metadata": {}, 261 | "source": [ 262 | "# Data Borrowing" 263 | ] 264 | }, 265 | { 266 | "cell_type": "markdown", 267 | "metadata": {}, 268 | "source": [ 269 | "Another common case where these weights are used are in \"feature engineering.\" Using the weights matrix, you can construct neighbourhood averages of the data matrix and use these as synthetic features in your model. These often have a strong relationship to the outcome as well, since spatial data is often smooth and attributes of nearby sites often have a spillover impact on each other. " 270 | ] 271 | }, 272 | { 273 | "cell_type": "code", 274 | "execution_count": 25, 275 | "metadata": {}, 276 | "outputs": [], 277 | "source": [ 278 | "from libpysal.weights.util import fill_diagonal" 279 | ] 280 | }, 281 | { 282 | "cell_type": "markdown", 283 | "metadata": {}, 284 | "source": [ 285 | "First, we'll construct a Kernel weight from the data that we have, make it an adaptive Kernel bandwidth, and make sure that our kernel weights don't have any self-neighbors. Since we've got the data at each site anyway, we probably shouldn't use that data *again* when we construct our neighborhood-smoothed syntetic features. " 286 | ] 287 | }, 288 | { 289 | "cell_type": "code", 290 | "execution_count": 26, 291 | "metadata": {}, 292 | "outputs": [], 293 | "source": [ 294 | "kW = lp.weights.Kernel.from_dataframe(model_data, fixed=False, function='gaussian', k=100)\n", 295 | "kW = fill_diagonal(kW, 0)" 296 | ] 297 | }, 298 | { 299 | "cell_type": "code", 300 | "execution_count": 27, 301 | "metadata": {}, 302 | "outputs": [], 303 | "source": [ 304 | "WX = lp.weights.lag_spatial(kW, X)" 305 | ] 306 | }, 307 | { 308 | "cell_type": "markdown", 309 | "metadata": {}, 310 | "source": [ 311 | "I like `statsmodels` regression summary tables, so I'll pop it up here. " 312 | ] 313 | }, 314 | { 315 | "cell_type": "code", 316 | "execution_count": 28, 317 | "metadata": {}, 318 | "outputs": [], 319 | "source": [ 320 | "import statsmodels.api as sm" 321 | ] 322 | }, 323 | { 324 | "cell_type": "code", 325 | "execution_count": 29, 326 | "metadata": {}, 327 | "outputs": [], 328 | "source": [ 329 | "Xtable = pd.DataFrame(X, columns=Xnames)" 330 | ] 331 | }, 332 | { 333 | "cell_type": "markdown", 334 | "metadata": {}, 335 | "source": [ 336 | "Below are the results for the model with only the covariates used above:\n", 337 | "- accommodates: the number of people the airbnb can accommodate\n", 338 | "- review_scores_rating: the aggregate rating of the listing\n", 339 | "- bedrooms: the number of bedrooms the airbnb has\n", 340 | "- bathrooms: the number of bathrooms the airbnb has\n", 341 | "- beds: the number of beds the airbnb offers\n", 342 | "\n", 343 | "We've not used any of our synthetic features in `WX`. " 344 | ] 345 | }, 346 | { 347 | "cell_type": "code", 348 | "execution_count": 30, 349 | "metadata": {}, 350 | "outputs": [ 351 | { 352 | "data": { 353 | "text/html": [ 354 | "\n", 355 | "\n", 356 | "\n", 357 | " \n", 358 | "\n", 359 | "\n", 360 | " \n", 361 | "\n", 362 | "\n", 363 | " \n", 364 | "\n", 365 | "\n", 366 | " \n", 367 | "\n", 368 | "\n", 369 | " \n", 370 | "\n", 371 | "\n", 372 | " \n", 373 | "\n", 374 | "\n", 375 | " \n", 376 | "\n", 377 | "\n", 378 | " \n", 379 | "\n", 380 | "\n", 381 | " \n", 382 | "\n", 383 | "
OLS Regression Results
Dep. Variable: y R-squared: 0.290
Model: OLS Adj. R-squared: 0.290
Method: Least Squares F-statistic: 1269.
Date: Fri, 21 Sep 2018 Prob (F-statistic): 0.00
Time: 15:41:48 Log-Likelihood: -9260.8
No. Observations: 15516 AIC: 1.853e+04
Df Residuals: 15510 BIC: 1.858e+04
Df Model: 5
Covariance Type: nonrobust
\n", 384 | "\n", 385 | "\n", 386 | " \n", 387 | "\n", 388 | "\n", 389 | " \n", 390 | "\n", 391 | "\n", 392 | " \n", 393 | "\n", 394 | "\n", 395 | " \n", 396 | "\n", 397 | "\n", 398 | " \n", 399 | "\n", 400 | "\n", 401 | " \n", 402 | "\n", 403 | "\n", 404 | " \n", 405 | "\n", 406 | "
coef std err t P>|t| [0.025 0.975]
const 2.9682 0.044 67.590 0.000 2.882 3.054
accommodates 0.1882 0.004 44.362 0.000 0.180 0.197
review_scores_rating 0.0033 0.000 7.378 0.000 0.002 0.004
bedrooms 0.1427 0.008 18.503 0.000 0.128 0.158
bathrooms 0.0062 0.012 0.497 0.619 -0.018 0.031
beds -0.0482 0.005 -9.221 0.000 -0.058 -0.038
\n", 407 | "\n", 408 | "\n", 409 | " \n", 410 | "\n", 411 | "\n", 412 | " \n", 413 | "\n", 414 | "\n", 415 | " \n", 416 | "\n", 417 | "\n", 418 | " \n", 419 | "\n", 420 | "
Omnibus: 156.822 Durbin-Watson: 1.716
Prob(Omnibus): 0.000 Jarque-Bera (JB): 265.898
Skew: -0.005 Prob(JB): 1.82e-58
Kurtosis: 3.641 Cond. No. 1.17e+03


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.17e+03. This might indicate that there are
strong multicollinearity or other numerical problems." 421 | ], 422 | "text/plain": [ 423 | "\n", 424 | "\"\"\"\n", 425 | " OLS Regression Results \n", 426 | "==============================================================================\n", 427 | "Dep. Variable: y R-squared: 0.290\n", 428 | "Model: OLS Adj. R-squared: 0.290\n", 429 | "Method: Least Squares F-statistic: 1269.\n", 430 | "Date: Fri, 21 Sep 2018 Prob (F-statistic): 0.00\n", 431 | "Time: 15:41:48 Log-Likelihood: -9260.8\n", 432 | "No. Observations: 15516 AIC: 1.853e+04\n", 433 | "Df Residuals: 15510 BIC: 1.858e+04\n", 434 | "Df Model: 5 \n", 435 | "Covariance Type: nonrobust \n", 436 | "========================================================================================\n", 437 | " coef std err t P>|t| [0.025 0.975]\n", 438 | "----------------------------------------------------------------------------------------\n", 439 | "const 2.9682 0.044 67.590 0.000 2.882 3.054\n", 440 | "accommodates 0.1882 0.004 44.362 0.000 0.180 0.197\n", 441 | "review_scores_rating 0.0033 0.000 7.378 0.000 0.002 0.004\n", 442 | "bedrooms 0.1427 0.008 18.503 0.000 0.128 0.158\n", 443 | "bathrooms 0.0062 0.012 0.497 0.619 -0.018 0.031\n", 444 | "beds -0.0482 0.005 -9.221 0.000 -0.058 -0.038\n", 445 | "==============================================================================\n", 446 | "Omnibus: 156.822 Durbin-Watson: 1.716\n", 447 | "Prob(Omnibus): 0.000 Jarque-Bera (JB): 265.898\n", 448 | "Skew: -0.005 Prob(JB): 1.82e-58\n", 449 | "Kurtosis: 3.641 Cond. No. 1.17e+03\n", 450 | "==============================================================================\n", 451 | "\n", 452 | "Warnings:\n", 453 | "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", 454 | "[2] The condition number is large, 1.17e+03. This might indicate that there are\n", 455 | "strong multicollinearity or other numerical problems.\n", 456 | "\"\"\"" 457 | ] 458 | }, 459 | "execution_count": 30, 460 | "metadata": {}, 461 | "output_type": "execute_result" 462 | } 463 | ], 464 | "source": [ 465 | "onlyX = sm.OLS(y,sm.add_constant(Xtable)).fit()\n", 466 | "onlyX.summary()" 467 | ] 468 | }, 469 | { 470 | "cell_type": "markdown", 471 | "metadata": {}, 472 | "source": [ 473 | "Then, we could fit a model using the neighbourhood average synthetic features as well:" 474 | ] 475 | }, 476 | { 477 | "cell_type": "code", 478 | "execution_count": 31, 479 | "metadata": {}, 480 | "outputs": [], 481 | "source": [ 482 | "WXtable = pd.DataFrame(WX, columns=['lag_{}'.format(name) for name in Xnames])" 483 | ] 484 | }, 485 | { 486 | "cell_type": "code", 487 | "execution_count": 32, 488 | "metadata": {}, 489 | "outputs": [], 490 | "source": [ 491 | "XWXtable = pd.concat((Xtable,WXtable),axis=1)" 492 | ] 493 | }, 494 | { 495 | "cell_type": "code", 496 | "execution_count": 33, 497 | "metadata": {}, 498 | "outputs": [ 499 | { 500 | "data": { 501 | "text/html": [ 502 | "\n", 503 | "\n", 504 | "\n", 505 | " \n", 506 | "\n", 507 | "\n", 508 | " \n", 509 | "\n", 510 | "\n", 511 | " \n", 512 | "\n", 513 | "\n", 514 | " \n", 515 | "\n", 516 | "\n", 517 | " \n", 518 | "\n", 519 | "\n", 520 | " \n", 521 | "\n", 522 | "\n", 523 | " \n", 524 | "\n", 525 | "\n", 526 | " \n", 527 | "\n", 528 | "\n", 529 | " \n", 530 | "\n", 531 | "
OLS Regression Results
Dep. Variable: y R-squared: 0.311
Model: OLS Adj. R-squared: 0.311
Method: Least Squares F-statistic: 701.4
Date: Fri, 21 Sep 2018 Prob (F-statistic): 0.00
Time: 15:41:50 Log-Likelihood: -9026.9
No. Observations: 15516 AIC: 1.808e+04
Df Residuals: 15505 BIC: 1.816e+04
Df Model: 10
Covariance Type: nonrobust
\n", 532 | "\n", 533 | "\n", 534 | " \n", 535 | "\n", 536 | "\n", 537 | " \n", 538 | "\n", 539 | "\n", 540 | " \n", 541 | "\n", 542 | "\n", 543 | " \n", 544 | "\n", 545 | "\n", 546 | " \n", 547 | "\n", 548 | "\n", 549 | " \n", 550 | "\n", 551 | "\n", 552 | " \n", 553 | "\n", 554 | "\n", 555 | " \n", 556 | "\n", 557 | "\n", 558 | " \n", 559 | "\n", 560 | "\n", 561 | " \n", 562 | "\n", 563 | "\n", 564 | " \n", 565 | "\n", 566 | "\n", 567 | " \n", 568 | "\n", 569 | "
coef std err t P>|t| [0.025 0.975]
const 2.7158 0.114 23.723 0.000 2.491 2.940
accommodates 0.1829 0.004 43.616 0.000 0.175 0.191
review_scores_rating 0.0035 0.000 8.036 0.000 0.003 0.004
bedrooms 0.1465 0.008 19.126 0.000 0.131 0.161
bathrooms -0.0135 0.012 -1.087 0.277 -0.038 0.011
beds -0.0448 0.005 -8.616 0.000 -0.055 -0.035
lag_accommodates 0.0162 0.001 16.632 0.000 0.014 0.018
lag_review_scores_rating -0.0004 4.29e-05 -9.279 0.000 -0.000 -0.000
lag_bedrooms 0.0001 0.002 0.086 0.931 -0.003 0.003
lag_bathrooms 0.0240 0.002 10.595 0.000 0.020 0.028
lag_beds -0.0150 0.001 -13.860 0.000 -0.017 -0.013
\n", 570 | "\n", 571 | "\n", 572 | " \n", 573 | "\n", 574 | "\n", 575 | " \n", 576 | "\n", 577 | "\n", 578 | " \n", 579 | "\n", 580 | "\n", 581 | " \n", 582 | "\n", 583 | "
Omnibus: 163.213 Durbin-Watson: 1.786
Prob(Omnibus): 0.000 Jarque-Bera (JB): 278.468
Skew: -0.023 Prob(JB): 3.40e-61
Kurtosis: 3.655 Cond. No. 9.71e+04


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 9.71e+04. This might indicate that there are
strong multicollinearity or other numerical problems." 584 | ], 585 | "text/plain": [ 586 | "\n", 587 | "\"\"\"\n", 588 | " OLS Regression Results \n", 589 | "==============================================================================\n", 590 | "Dep. Variable: y R-squared: 0.311\n", 591 | "Model: OLS Adj. R-squared: 0.311\n", 592 | "Method: Least Squares F-statistic: 701.4\n", 593 | "Date: Fri, 21 Sep 2018 Prob (F-statistic): 0.00\n", 594 | "Time: 15:41:50 Log-Likelihood: -9026.9\n", 595 | "No. Observations: 15516 AIC: 1.808e+04\n", 596 | "Df Residuals: 15505 BIC: 1.816e+04\n", 597 | "Df Model: 10 \n", 598 | "Covariance Type: nonrobust \n", 599 | "============================================================================================\n", 600 | " coef std err t P>|t| [0.025 0.975]\n", 601 | "--------------------------------------------------------------------------------------------\n", 602 | "const 2.7158 0.114 23.723 0.000 2.491 2.940\n", 603 | "accommodates 0.1829 0.004 43.616 0.000 0.175 0.191\n", 604 | "review_scores_rating 0.0035 0.000 8.036 0.000 0.003 0.004\n", 605 | "bedrooms 0.1465 0.008 19.126 0.000 0.131 0.161\n", 606 | "bathrooms -0.0135 0.012 -1.087 0.277 -0.038 0.011\n", 607 | "beds -0.0448 0.005 -8.616 0.000 -0.055 -0.035\n", 608 | "lag_accommodates 0.0162 0.001 16.632 0.000 0.014 0.018\n", 609 | "lag_review_scores_rating -0.0004 4.29e-05 -9.279 0.000 -0.000 -0.000\n", 610 | "lag_bedrooms 0.0001 0.002 0.086 0.931 -0.003 0.003\n", 611 | "lag_bathrooms 0.0240 0.002 10.595 0.000 0.020 0.028\n", 612 | "lag_beds -0.0150 0.001 -13.860 0.000 -0.017 -0.013\n", 613 | "==============================================================================\n", 614 | "Omnibus: 163.213 Durbin-Watson: 1.786\n", 615 | "Prob(Omnibus): 0.000 Jarque-Bera (JB): 278.468\n", 616 | "Skew: -0.023 Prob(JB): 3.40e-61\n", 617 | "Kurtosis: 3.655 Cond. No. 9.71e+04\n", 618 | "==============================================================================\n", 619 | "\n", 620 | "Warnings:\n", 621 | "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", 622 | "[2] The condition number is large, 9.71e+04. This might indicate that there are\n", 623 | "strong multicollinearity or other numerical problems.\n", 624 | "\"\"\"" 625 | ] 626 | }, 627 | "execution_count": 33, 628 | "metadata": {}, 629 | "output_type": "execute_result" 630 | } 631 | ], 632 | "source": [ 633 | "withWX = sm.OLS(y,sm.add_constant(XWXtable)).fit()\n", 634 | "withWX.summary()" 635 | ] 636 | }, 637 | { 638 | "cell_type": "markdown", 639 | "metadata": {}, 640 | "source": [ 641 | "This gains a nice bump in the model fit with no significant hit to model complexity. " 642 | ] 643 | }, 644 | { 645 | "cell_type": "markdown", 646 | "metadata": {}, 647 | "source": [ 648 | "## Going Further" 649 | ] 650 | }, 651 | { 652 | "cell_type": "markdown", 653 | "metadata": {}, 654 | "source": [ 655 | "We could also use a spatial autoregressive model to further improve fit. This ceases to be estimatable in `statsmodels` and instead requires `pysal.spreg`, the spatial regression submodule of PySAL. Generalized method of moments estimators are available in `pysal.spreg.GM_Lag`, and maximum likelihood methods in `pysal.spreg.ML_Lag`. \n", 656 | "\n", 657 | "These methods are often harder to fit, though, so like `gwr`, they may be less performant on big data. But, you can usually achieve a gain in fit for no significant increase in the number of terms by using these models. " 658 | ] 659 | } 660 | ], 661 | "metadata": { 662 | "kernelspec": { 663 | "display_name": "Python [conda env:scipygeo18]", 664 | "language": "python", 665 | "name": "conda-env-scipygeo18-py" 666 | }, 667 | "language_info": { 668 | "codemirror_mode": { 669 | "name": "ipython", 670 | "version": 3 671 | }, 672 | "file_extension": ".py", 673 | "mimetype": "text/x-python", 674 | "name": "python", 675 | "nbconvert_exporter": "python", 676 | "pygments_lexer": "ipython3", 677 | "version": "3.6.6" 678 | } 679 | }, 680 | "nbformat": 4, 681 | "nbformat_minor": 2 682 | } 683 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | BSD 3-Clause License 2 | 3 | Copyright (c) 2018, Joris Van den Bossche, Levi John Wolf, Sergio Rey, Dani Arribas-Bel 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | 9 | * Redistributions of source code must retain the above copyright notice, this 10 | list of conditions and the following disclaimer. 11 | 12 | * Redistributions in binary form must reproduce the above copyright notice, 13 | this list of conditions and the following disclaimer in the documentation 14 | and/or other materials provided with the distribution. 15 | 16 | * Neither the name of the copyright holder nor the names of its 17 | contributors may be used to endorse or promote products derived from 18 | this software without specific prior written permission. 19 | 20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Scipy 2018 Tutorial - Introduction to Geospatial Data Analysis with Python 2 | 3 | [![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/geopandas/scipy2018-geospatial-data/master) 4 | 5 | ### Instructors 6 | 7 | - [Levi John Wolf](https://ljwolf.org) - [University of Bristol](http://www.bristol.ac.uk/geography/levi-j-wolf/overview.html) 8 | - Sergio Rey - [Center for Geospatial Sciences, University of California, Riverside](http://spatial.ucr.edu/peopleRey.html) 9 | - [Dani Arribas-Bel](http://darribas.org/) - University of Liverpool 10 | - [Joris Van den Bossche](https://jorisvandenbossche.github.io/) - Université Paris-Saclay Center for Data Science 11 | 12 | This tutorial is an introduction to geospatial data analysis in Python, with a focus on tabular vector data. It first focuses on introducing the participants to the different libraries to work with geospatial data and will cover munging geo-data and exploring relations over space. This includes importing data in different formats (e.g. shapefile, GeoJSON), visualizing, combining and tidying them up for analysis, and will use libraries such as `pandas`, `geopandas`, `shapely`, `PySAL`, or `rasterio`. The second part will build upon this and focus on more more advanced geographic data science and statistical methods to gain insight from the data. No previous experience with those geospatial python libraries is needed, but basic familiarity with geospatial data and concepts (shapefiles, vector vs raster data) and pandas will be helpful. 13 | 14 | ## Outline of the Workshop 15 | 16 | - 7:50 - 8:20: **Installation & setup** 17 | 18 | If you cannot complete the installation instructions ahead of time, please come slightly early so we can work on ensuring everyone can get the required packages installed and so that you can run the workshop material & exercises 19 | 20 | - 8:20 - 9:00: **Working with spatial data** 21 | 22 | 23 | 24 | - 9:00-10:00: **Spatial Relationships & Joins** 25 | 26 | 27 | 28 | - 10:00 - 10:10: **Break** 29 | 30 | - 10:10 - 11:00: **Exploratory spatial data analysis** 31 | 32 | 33 | 34 | - 11:00 - 12:00: **Leveraging space in modeling** 35 | 36 | 37 | 38 | ## Installation notes 39 | 40 | Following this tutorial will require recent installations of: 41 | 42 | - Python >= 3.5 (it will probably work on python 2.7 as well, but I didn't test it specifically) 43 | - pandas 44 | - geopandas >= 0.3.0 45 | - matplotlib 46 | - rtree 47 | - PySAL 48 | - scikit-learn 49 | - mgwr 50 | - cartopy 51 | - geoplot 52 | - [Jupyter Notebook](http://jupyter.org) 53 | 54 | If you do not yet have these packages installed, we recommend to use the [conda](http://conda.pydata.org/docs/intro.html) package manager to install all the requirements 55 | (you can install [miniconda](http://conda.pydata.org/miniconda.html) or install the (larger) Anaconda 56 | distribution, found at https://www.anaconda.com/download/). 57 | 58 | Once this is installed, the following command will install all required packages in your Python environment: 59 | 60 | ``` 61 | conda env create -f environment.yml 62 | ``` 63 | 64 | But of course, using another distribution (e.g. Enthought Canopy) or ``pip`` is fine as well (a requirements file is provided as well), as long as you have the above packages installed. 65 | 66 | 67 | ## Downloading the tutorial materials 68 | 69 | **NOTE:** *We may update the materials up until the workshop. So, please make sure that, if you download the materials, you refresh the downloaded material close to the workshop.* 70 | 71 | If you have git installed, you can get the tutorial materials by cloning this repo: 72 | 73 | git clone https://github.com/geopandas/scipy2018-geospatial-data 74 | 75 | Otherwise, you can download the repository as a .zip file by heading over 76 | to the GitHub repository (https://github.com/geopandas/scipy2018-geospatial-data) in 77 | your browser and click the green "Download" button in the upper right: 78 | 79 | ![](img/download-button.png) 80 | 81 | 82 | ## Test the tutorial environment 83 | 84 | To make sure everything was installed correctly, open a terminal, and change its directory (`cd`) so that your working directory is the tutorial materials you downloaded in the step above. Then enter the following: 85 | 86 | ```sh 87 | python check_environment.py 88 | ``` 89 | 90 | Make sure that this scripts prints "All good. Enjoy the tutorial!" 91 | 92 | -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping10.py: -------------------------------------------------------------------------------- 1 | protected_areas = geopandas.read_file("data/Conservation/RDC_aire_protegee_2013.shp") 2 | # or to read it directly from the zip file: 3 | # protected_areas = geopandas.read_file("/Conservation", vfs="zip://./data/cod_conservation.zip") -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping11.py: -------------------------------------------------------------------------------- 1 | protected_areas.plot() -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping12.py: -------------------------------------------------------------------------------- 1 | from shapely.geometry import Point -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping13.py: -------------------------------------------------------------------------------- 1 | goma = Point(29.22, -1.66) -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping14.py: -------------------------------------------------------------------------------- 1 | dist_goma = data.distance(goma) -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping15.py: -------------------------------------------------------------------------------- 1 | dist_goma.nsmallest(5) -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping16.py: -------------------------------------------------------------------------------- 1 | ax = protected_areas.plot() 2 | data.plot(ax=ax, color='C1') -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping19.py: -------------------------------------------------------------------------------- 1 | data_utm = data.to_crs(epsg=32735) 2 | protected_areas_utm = protected_areas.to_crs(epsg=32735) -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping20.py: -------------------------------------------------------------------------------- 1 | ax = protected_areas_utm.plot() 2 | data_utm.plot(ax=ax, color='C1') -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping21.py: -------------------------------------------------------------------------------- 1 | ax = protected_areas_utm.plot(figsize=(10, 10), color='green') 2 | data_utm.plot(ax=ax, markersize=5, alpha=0.5) 3 | ax.set_axis_off() -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping22.py: -------------------------------------------------------------------------------- 1 | # alternative with constructing the matplotlib figure first 2 | fig, ax = plt.subplots(figsize=(10, 10), subplot_kw=dict(aspect='equal')) 3 | protected_areas_utm.plot(ax=ax, color='green') 4 | data_utm.plot(ax=ax, markersize=5, alpha=0.5) 5 | ax.set_axis_off() -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping23.py: -------------------------------------------------------------------------------- 1 | ax = protected_areas_utm.plot(figsize=(10, 10), color='green') 2 | data_utm.plot(ax=ax, markersize=5, alpha=0.5, column='interference') 3 | ax.set_axis_off() -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping24.py: -------------------------------------------------------------------------------- 1 | ax = protected_areas_utm.plot(figsize=(10, 10), color='green') 2 | data_utm.plot(ax=ax, markersize=5, alpha=0.5, column='mineral1', legend=True) 3 | ax.set_axis_off() -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping25.py: -------------------------------------------------------------------------------- 1 | kahuzi = protected_areas_utm[protected_areas_utm['NAME_AP'] == "Kahuzi-Biega National park"].geometry.squeeze() -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping26.py: -------------------------------------------------------------------------------- 1 | mines_kahuzi = data_utm[data_utm.within(kahuzi)] 2 | mines_kahuzi -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping27.py: -------------------------------------------------------------------------------- 1 | len(mines_kahuzi) -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping28.py: -------------------------------------------------------------------------------- 1 | single_mine = data_utm.geometry[0] -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping29.py: -------------------------------------------------------------------------------- 1 | dist = protected_areas_utm.distance(single_mine) -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping3.py: -------------------------------------------------------------------------------- 1 | data_visits = geopandas.read_file("data/cod_mines_curated_all_opendata_p_ipis.geojson") -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping30.py: -------------------------------------------------------------------------------- 1 | idx = dist.idxmin() 2 | closest_area = protected_areas_utm.loc[idx, 'NAME_AP'] 3 | closest_area -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping31.py: -------------------------------------------------------------------------------- 1 | def closest_protected_area(mine, protected_areas): 2 | dist = protected_areas.distance(mine) 3 | idx = dist.idxmin() 4 | closest_area = protected_areas.loc[idx, 'NAME_AP'] 5 | return closest_area -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping32.py: -------------------------------------------------------------------------------- 1 | result = data_utm.geometry.apply(lambda site: closest_protected_area(site, protected_areas_utm)) -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping33.py: -------------------------------------------------------------------------------- 1 | data_within_protected = geopandas.sjoin(data_utm, protected_areas_utm[['NAME_AP', 'geometry']], 2 | op='within', how='inner') -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping34.py: -------------------------------------------------------------------------------- 1 | len(data_within_protected) -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping35.py: -------------------------------------------------------------------------------- 1 | data_within_protected['NAME_AP'].value_counts() 2 | # or data_within_protected.groupby('NAME_AP').size() -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping36.py: -------------------------------------------------------------------------------- 1 | data_within_protected.groupby('NAME_AP')['workers_numb'].sum() -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping37.py: -------------------------------------------------------------------------------- 1 | protected_areas_border = protected_areas_utm[['NAME_AP', 'geometry']].copy() -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping38.py: -------------------------------------------------------------------------------- 1 | protected_areas_border['geometry'] = protected_areas_border.buffer(10000).difference(protected_areas_utm.unary_union) -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping39.py: -------------------------------------------------------------------------------- 1 | protected_areas_border.plot() -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping4.py: -------------------------------------------------------------------------------- 1 | data_visits.head() -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping40.py: -------------------------------------------------------------------------------- 1 | data_within_border = geopandas.sjoin(data_utm, protected_areas_border, 2 | op='within', how='inner') -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping41.py: -------------------------------------------------------------------------------- 1 | data_within_border['NAME_AP'].value_counts() -------------------------------------------------------------------------------- /_solved/solutions/case-conflict-mapping5.py: -------------------------------------------------------------------------------- 1 | len(data_visits) -------------------------------------------------------------------------------- /_solved/solutions/case-trump-vote01.py: -------------------------------------------------------------------------------- 1 | pres = gpd.read_file("zip://./data/uspres.zip") 2 | pres.head(3) 3 | -------------------------------------------------------------------------------- /_solved/solutions/case-trump-vote02.py: -------------------------------------------------------------------------------- 1 | pres.crs = {'init':'epsg:4269'} 2 | pres = pres.to_crs(epsg=5070) -------------------------------------------------------------------------------- /_solved/solutions/case-trump-vote03.py: -------------------------------------------------------------------------------- 1 | import seaborn as sns 2 | facets = sns.pairplot(data=pres.filter(like='dem_')) 3 | facets.map_offdiag(lambda *arg, **kw: plt.plot((0,1),(0,1), color='k')) -------------------------------------------------------------------------------- /_solved/solutions/case-trump-vote04.py: -------------------------------------------------------------------------------- 1 | import seaborn as sns 2 | facets = sns.pairplot(x_vars=pres.filter(like='dem_').columns, 3 | y_vars=['gini_2015'], data=pres) -------------------------------------------------------------------------------- /_solved/solutions/case-trump-vote05.py: -------------------------------------------------------------------------------- 1 | pres['swing_2012'] = pres.eval("dem_2012 - dem_2008") 2 | pres['swing_2016'] = pres.eval("dem_2016 - dem_2012") 3 | pres['swing_full'] = pres.eval("dem_2016 - dem_2008") -------------------------------------------------------------------------------- /_solved/solutions/case-trump-vote06.py: -------------------------------------------------------------------------------- 1 | f,ax = plt.subplots(3,1, 2 | subplot_kw=dict(aspect='equal', 3 | frameon=False), 4 | figsize=(60,15)) 5 | pres.plot('dem_2008', ax=ax[0], cmap='RdYlBu') 6 | pres.plot('swing_full', ax=ax[1], cmap='bwr_r') 7 | pres.plot('dem_2016', ax=ax[2], cmap='RdYlBu') 8 | for i,ax_ in enumerate(ax): 9 | ax_.set_xticks([]) 10 | ax_.set_yticks([]) -------------------------------------------------------------------------------- /_solved/solutions/case-trump-vote07.py: -------------------------------------------------------------------------------- 1 | import libpysal as lp 2 | w = lp.weights.Rook.from_dataframe(pres) -------------------------------------------------------------------------------- /_solved/solutions/case-trump-vote08.py: -------------------------------------------------------------------------------- 1 | from pysal import esda as esda 2 | np.random.seed(1) 3 | moran = esda.moran.Moran(pres.swing_full, w) 4 | print(moran.I) 5 | -------------------------------------------------------------------------------- /_solved/solutions/case-trump-vote09.py: -------------------------------------------------------------------------------- 1 | f = plt.figure(figsize=(6,6)) 2 | plt.scatter(pres.swing_full, lp.weights.lag_spatial(w, pres.swing_full)) 3 | plt.plot((-.3,.1),(-.3,.1), color='k') 4 | plt.title('$I = {:.3f} \ \ (p < {:.3f})$'.format(moran.I,moran.p_sim)) -------------------------------------------------------------------------------- /_solved/solutions/case-trump-vote10.py: -------------------------------------------------------------------------------- 1 | np.random.seed(11) 2 | lmos = esda.moran.Moran_Local(pres.swing_full, w, 3 | permutations=70000) #min for a bonf. bound 4 | (lmos.p_sim <= (.05/len(pres))).sum() -------------------------------------------------------------------------------- /_solved/solutions/case-trump-vote11.py: -------------------------------------------------------------------------------- 1 | f = plt.figure(figsize=(10,4)) 2 | ax = plt.gca() 3 | ax.set_aspect('equal') 4 | is_weird = lmos.p_sim <= (.05/len(pres)) 5 | pres.plot(color='lightgrey', ax=ax) 6 | pres.assign(quads=lmos.q)[is_weird].plot('quads', 7 | legend=True, 8 | k=4, categorical=True, 9 | cmap='bwr_r', ax=ax) -------------------------------------------------------------------------------- /_solved/solutions/case-trump-vote12.py: -------------------------------------------------------------------------------- 1 | f = plt.figure(figsize=(10,4)) 2 | ax = plt.gca() 3 | ax.set_aspect('equal') 4 | is_weird = lmos.p_sim <= (.05/len(pres)) 5 | pres.assign(quads=lmos.q)[is_weird].plot('quads', 6 | legend=True, 7 | k=4, categorical='True', 8 | cmap='bwr_r', ax=ax) 9 | bounds = ax.axis() 10 | pres.plot(color='lightgrey', ax=ax, zorder=-1) 11 | ax.axis(bounds) -------------------------------------------------------------------------------- /_solved/solutions/case-trump-vote13.py: -------------------------------------------------------------------------------- 1 | pres.assign(local_score = lmos.Is, 2 | pval = lmos.p_sim, 3 | quad = lmos.q)\ 4 | .sort_values('local_score')\ 5 | .query('pval < 1e-3 & local_score < 0')[['name','state_name','dem_2008','dem_2016', 6 | 'local_score','pval', 'quad']] -------------------------------------------------------------------------------- /_solved/solutions/case-trump-vote14.py: -------------------------------------------------------------------------------- 1 | np.random.seed(21) 2 | lmos16 = esda.moran.Moran_Local(pres.swing_2016, w, 3 | permutations=70000) #min for a bonf. bound 4 | (lmos16.p_sim <= (.05/len(pres))).sum() 5 | pres.assign(local_score = lmos16.Is, 6 | pval = lmos16.p_sim, 7 | quad = lmos16.q)\ 8 | .sort_values('local_score')\ 9 | .query('pval < 1e-3 & local_score < 0')[['name','state_name','dem_2008','dem_2016', 10 | 'local_score','pval', 'quad']] -------------------------------------------------------------------------------- /_solved/solutions/case-trump-vote15.py: -------------------------------------------------------------------------------- 1 | #% load _solved/solutions/case-trump-vote14.py 2 | sns.regplot(pres.gini_2015, 3 | pres.swing_full) -------------------------------------------------------------------------------- /case-conflict-mapping.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "%matplotlib inline\n", 10 | "\n", 11 | "import pandas as pd\n", 12 | "import geopandas\n", 13 | "import matplotlib.pyplot as plt" 14 | ] 15 | }, 16 | { 17 | "cell_type": "markdown", 18 | "metadata": {}, 19 | "source": [ 20 | "# Case study - Conflict mapping: mining sites in eastern DR Congo\n", 21 | "\n", 22 | "In this case study, we will explore a dataset on artisanal mining sites located in eastern DR Congo.\n", 23 | "\n", 24 | "**Note**: this tutorial is meant as a hands-on session, and most code examples are provided as exercises to be filled in. I highly recommend actually trying to do this yourself, but if you want to follow the solved tutorial, you can find this in the `_solved` directory.\n", 25 | "\n", 26 | "---\n", 27 | "\n", 28 | "#### Background\n", 29 | "\n", 30 | "[IPIS](http://ipisresearch.be/), the International Peace Information Service, manages a database on mining site visits in eastern DR Congo: http://ipisresearch.be/home/conflict-mapping/maps/open-data/\n", 31 | "\n", 32 | "Since 2009, IPIS has visited artisanal mining sites in the region during various data collection campaigns. As part of these campaigns, surveyor teams visit mining sites in the field, meet with miners and complete predefined questionnaires. These contain questions about the mining site, the minerals mined at the site and the armed groups possibly present at the site.\n", 33 | "\n", 34 | "Some additional links:\n", 35 | "\n", 36 | "* Tutorial on the same data using R from IPIS (but without geospatial aspect): http://ipisresearch.be/home/conflict-mapping/maps/open-data/open-data-tutorial/\n", 37 | "* Interactive web app using the same data: http://www.ipisresearch.be/mapping/webmapping/drcongo/v5/" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "## 1. Importing and exploring the data" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": {}, 50 | "source": [ 51 | "### The mining site visit data\n", 52 | "\n", 53 | "IPIS provides a WFS server to access the data. We can send a query to this server to download the data, and load the result into a geopandas GeoDataFrame:" 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": null, 59 | "metadata": {}, 60 | "outputs": [], 61 | "source": [ 62 | "import requests\n", 63 | "import json\n", 64 | "\n", 65 | "wfs_url = \"http://geo.ipisresearch.be/geoserver/public/ows\"\n", 66 | "params = dict(service='WFS', version='1.0.0', request='GetFeature',\n", 67 | " typeName='public:cod_mines_curated_all_opendata_p_ipis', outputFormat='json')\n", 68 | "\n", 69 | "r = requests.get(wfs_url, params=params)\n", 70 | "data_features = json.loads(r.content.decode('UTF-8'))\n", 71 | "data_visits = geopandas.GeoDataFrame.from_features(data_features)" 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": {}, 77 | "source": [ 78 | "However, the data is also provided in the tutorial materials as a GeoJSON file, so it is certainly available during the tutorial." 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "
\n", 86 | " EXERCISE:\n", 87 | "
    \n", 88 | "
  • Read the GeoJSON file `data/cod_mines_curated_all_opendata_p_ipis.geojson` using geopandas, and call the result `data_visits`.
  • \n", 89 | "
  • Inspect the first 5 rows, and check the number of observations
  • \n", 90 | "
\n", 91 | "\n", 92 | "
" 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": null, 98 | "metadata": { 99 | "clear_cell": true 100 | }, 101 | "outputs": [], 102 | "source": [ 103 | "# %load _solved/solutions/case-conflict-mapping3.py" 104 | ] 105 | }, 106 | { 107 | "cell_type": "code", 108 | "execution_count": null, 109 | "metadata": { 110 | "clear_cell": true 111 | }, 112 | "outputs": [], 113 | "source": [ 114 | "# %load _solved/solutions/case-conflict-mapping4.py" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": null, 120 | "metadata": { 121 | "clear_cell": true 122 | }, 123 | "outputs": [], 124 | "source": [ 125 | "# %load _solved/solutions/case-conflict-mapping5.py" 126 | ] 127 | }, 128 | { 129 | "cell_type": "markdown", 130 | "metadata": {}, 131 | "source": [ 132 | "The provided dataset contains a lot of information, much more than we are going to use in this tutorial. Therefore, we will select a subset of the column:" 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": null, 138 | "metadata": {}, 139 | "outputs": [], 140 | "source": [ 141 | "data_visits = data_visits[['vid', 'project', 'visit_date', 'name', 'pcode', 'workers_numb', 'interference', 'armed_group1', 'mineral1', 'geometry']]" 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": null, 147 | "metadata": {}, 148 | "outputs": [], 149 | "source": [ 150 | "data_visits.head()" 151 | ] 152 | }, 153 | { 154 | "cell_type": "markdown", 155 | "metadata": {}, 156 | "source": [ 157 | "Before starting the actual geospatial tutorial, we will use some more advanced pandas queries to construct a subset of the data that we will use further on: " 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": null, 163 | "metadata": {}, 164 | "outputs": [], 165 | "source": [ 166 | "# Take only the data of visits by IPIS\n", 167 | "data_ipis = data_visits[data_visits['project'].str.contains('IPIS') & (data_visits['workers_numb'] > 0)]" 168 | ] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": null, 173 | "metadata": {}, 174 | "outputs": [], 175 | "source": [ 176 | "# For those mining sites that were visited multiple times, take only the last visit\n", 177 | "data_ipis_lastvisit = data_ipis.sort_values('visit_date').groupby('pcode', as_index=False).last()\n", 178 | "data = geopandas.GeoDataFrame(data_ipis_lastvisit, crs=data_visits.crs)" 179 | ] 180 | }, 181 | { 182 | "cell_type": "markdown", 183 | "metadata": {}, 184 | "source": [ 185 | "### Data on protected areas in the same region\n", 186 | "\n", 187 | "Next to the mining site data, we are also going to use a dataset on protected areas (national parks) in Congo. This dataset was downloaded from http://www.wri.org/our-work/project/congo-basin-forests/democratic-republic-congo#project-tabs and included in the tutorial repository: `data/cod_conservation.zip`." 188 | ] 189 | }, 190 | { 191 | "cell_type": "markdown", 192 | "metadata": {}, 193 | "source": [ 194 | "
\n", 195 | " EXERCISE:\n", 196 | "
    \n", 197 | "
  • Extract the `data/cod_conservation.zip` archive, and read the shapefile contained in it. Assign the resulting GeoDataFrame to a variable named `protected_areas`.
  • \n", 198 | "
  • Quickly plot the GeoDataFrame.
  • \n", 199 | "
\n", 200 | "
" 201 | ] 202 | }, 203 | { 204 | "cell_type": "code", 205 | "execution_count": null, 206 | "metadata": { 207 | "clear_cell": true 208 | }, 209 | "outputs": [], 210 | "source": [ 211 | "# %load _solved/solutions/case-conflict-mapping10.py" 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": null, 217 | "metadata": { 218 | "clear_cell": true 219 | }, 220 | "outputs": [], 221 | "source": [ 222 | "# %load _solved/solutions/case-conflict-mapping11.py" 223 | ] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": {}, 228 | "source": [ 229 | "### Conversion to a common Coordinate Reference System\n", 230 | "\n", 231 | "We will see that both datasets use a different Coordinate Reference System (CRS). For many operations, however, it is important that we use a consistent CRS, and therefore we will convert both to a commong CRS.\n", 232 | "\n", 233 | "But first, we explore problems we can encounter related to CRSs.\n", 234 | "\n", 235 | "---" 236 | ] 237 | }, 238 | { 239 | "cell_type": "markdown", 240 | "metadata": {}, 241 | "source": [ 242 | "[Goma](https://en.wikipedia.org/wiki/Goma) is the capital city of North Kivu province of Congo, close to the border with Rwanda. It's coordinates are 1.66°S 29.22°E.\n", 243 | "\n", 244 | "
\n", 245 | " EXERCISE:\n", 246 | "
    \n", 247 | "
  • Create a single Point object representing the location of Goma. Call this `goma`.
  • \n", 248 | "
  • Calculate the distances of all mines to Goma, and show the 5 smallest distances (mines closest to Goma).
  • \n", 249 | "
\n", 250 | "
" 251 | ] 252 | }, 253 | { 254 | "cell_type": "code", 255 | "execution_count": null, 256 | "metadata": { 257 | "clear_cell": true 258 | }, 259 | "outputs": [], 260 | "source": [ 261 | "# %load _solved/solutions/case-conflict-mapping12.py" 262 | ] 263 | }, 264 | { 265 | "cell_type": "code", 266 | "execution_count": null, 267 | "metadata": { 268 | "clear_cell": true 269 | }, 270 | "outputs": [], 271 | "source": [ 272 | "# %load _solved/solutions/case-conflict-mapping13.py" 273 | ] 274 | }, 275 | { 276 | "cell_type": "code", 277 | "execution_count": null, 278 | "metadata": { 279 | "clear_cell": true 280 | }, 281 | "outputs": [], 282 | "source": [ 283 | "# %load _solved/solutions/case-conflict-mapping14.py" 284 | ] 285 | }, 286 | { 287 | "cell_type": "code", 288 | "execution_count": null, 289 | "metadata": { 290 | "clear_cell": true 291 | }, 292 | "outputs": [], 293 | "source": [ 294 | "# %load _solved/solutions/case-conflict-mapping15.py" 295 | ] 296 | }, 297 | { 298 | "cell_type": "markdown", 299 | "metadata": {}, 300 | "source": [ 301 | "The distances we see here in degrees, which is not helpful for interpreting those distances. That is a reason we will convert the data to another coordinate reference system (CRS) for the remainder of this tutorial." 302 | ] 303 | }, 304 | { 305 | "cell_type": "markdown", 306 | "metadata": {}, 307 | "source": [ 308 | "
\n", 309 | " EXERCISE:\n", 310 | "
    \n", 311 | "
  • Make a visualization of the national parks and the mining sites on a single plot.
  • \n", 312 | "
\n", 313 | " \n", 314 | "

Check the first section of the [04-more-on-visualization.ipynb](04-more-on-visualization.ipynb) notebook for tips and tricks to plot with GeoPandas.

\n", 315 | "
" 316 | ] 317 | }, 318 | { 319 | "cell_type": "code", 320 | "execution_count": null, 321 | "metadata": { 322 | "clear_cell": true 323 | }, 324 | "outputs": [], 325 | "source": [ 326 | "# %load _solved/solutions/case-conflict-mapping16.py" 327 | ] 328 | }, 329 | { 330 | "cell_type": "markdown", 331 | "metadata": {}, 332 | "source": [ 333 | "You will notice that the protected areas and mining sites do not map to the same area on the plot. This is because the Coordinate Reference Systems (CRS) differ for both datasets. Another reason we will need to convert the CRS!\n", 334 | "\n", 335 | "Let's check the Coordinate Reference System (CRS) for both datasets.\n", 336 | "\n", 337 | "The mining sites data uses the [WGS 84 lat/lon (EPSG 4326)](http://spatialreference.org/ref/epsg/4326/) CRS:" 338 | ] 339 | }, 340 | { 341 | "cell_type": "code", 342 | "execution_count": null, 343 | "metadata": {}, 344 | "outputs": [], 345 | "source": [ 346 | "data.crs" 347 | ] 348 | }, 349 | { 350 | "cell_type": "markdown", 351 | "metadata": {}, 352 | "source": [ 353 | "The protected areas dataset, on the other hand, uses a [WGS 84 / World Mercator (EPSG 3395)](http://spatialreference.org/ref/epsg/wgs-84-world-mercator/) projection (with meters as unit):" 354 | ] 355 | }, 356 | { 357 | "cell_type": "code", 358 | "execution_count": null, 359 | "metadata": {}, 360 | "outputs": [], 361 | "source": [ 362 | "protected_areas.crs" 363 | ] 364 | }, 365 | { 366 | "cell_type": "markdown", 367 | "metadata": {}, 368 | "source": [ 369 | "We will convert both datasets to a local UTM zone, so we can plot them together and that distance-based calculations give sensible results.\n", 370 | "\n", 371 | "To find the appropriate UTM zone, you can check http://www.dmap.co.uk/utmworld.htm or https://www.latlong.net/lat-long-utm.html, and in this case we will use UTM zone 35, which gives use EPSG 32735: https://epsg.io/32735\n", 372 | "\n", 373 | "
\n", 374 | " EXERCISE:\n", 375 | "
    \n", 376 | "
  • Convert both datasets (`data` and `protected_areas`) to EPSG 32735. Name the results `data_utm` and `protected_areas_utm`.
  • \n", 377 | "
  • Try again to visualize both datasets on a single map.
  • \n", 378 | "
\n", 379 | "\n", 380 | "
" 381 | ] 382 | }, 383 | { 384 | "cell_type": "code", 385 | "execution_count": null, 386 | "metadata": { 387 | "clear_cell": true 388 | }, 389 | "outputs": [], 390 | "source": [ 391 | "# %load _solved/solutions/case-conflict-mapping19.py" 392 | ] 393 | }, 394 | { 395 | "cell_type": "code", 396 | "execution_count": null, 397 | "metadata": { 398 | "clear_cell": true 399 | }, 400 | "outputs": [], 401 | "source": [ 402 | "# %load _solved/solutions/case-conflict-mapping20.py" 403 | ] 404 | }, 405 | { 406 | "cell_type": "markdown", 407 | "metadata": {}, 408 | "source": [ 409 | "### More advanced visualizations\n", 410 | "\n", 411 | "

For the following exercises, check the first section of the [04-more-on-visualization.ipynb](04-more-on-visualization.ipynb) notebook for tips and tricks to plot with GeoPandas.

" 412 | ] 413 | }, 414 | { 415 | "cell_type": "markdown", 416 | "metadata": {}, 417 | "source": [ 418 | "
\n", 419 | " EXERCISE:\n", 420 | "
    \n", 421 | "
  • Make a visualization of the national parks and the mining sites on a single plot.
  • \n", 422 | "
  • Pay attention to the following details:\n", 423 | "
      \n", 424 | "
    • Make the figure a bit bigger.
    • \n", 425 | "
    • The protected areas should be plotted in green
    • \n", 426 | "
    • For plotting the mining sites, adjust the markersize and use an `alpha=0.5`.
    • \n", 427 | "
    • Remove the figure border and x and y labels (coordinates)
    • \n", 428 | "
    \n", 429 | "
  • \n", 430 | "
\n", 431 | "
" 432 | ] 433 | }, 434 | { 435 | "cell_type": "code", 436 | "execution_count": null, 437 | "metadata": { 438 | "clear_cell": true 439 | }, 440 | "outputs": [], 441 | "source": [ 442 | "# %load _solved/solutions/case-conflict-mapping21.py" 443 | ] 444 | }, 445 | { 446 | "cell_type": "code", 447 | "execution_count": null, 448 | "metadata": { 449 | "clear_cell": true 450 | }, 451 | "outputs": [], 452 | "source": [ 453 | "# %load _solved/solutions/case-conflict-mapping22.py" 454 | ] 455 | }, 456 | { 457 | "cell_type": "markdown", 458 | "metadata": {}, 459 | "source": [ 460 | "
\n", 461 | " EXERCISE:\n", 462 | " \n", 463 | " In addition to the previous figure:\n", 464 | "
    \n", 465 | "
  • Give the mining sites a distinct color based on the `'interference'` column, indicating whether an armed group is present at the mining site or not.
  • \n", 466 | "
\n", 467 | "
" 468 | ] 469 | }, 470 | { 471 | "cell_type": "code", 472 | "execution_count": null, 473 | "metadata": { 474 | "clear_cell": true 475 | }, 476 | "outputs": [], 477 | "source": [ 478 | "# %load _solved/solutions/case-conflict-mapping23.py" 479 | ] 480 | }, 481 | { 482 | "cell_type": "markdown", 483 | "metadata": {}, 484 | "source": [ 485 | "
\n", 486 | " EXERCISE:\n", 487 | " \n", 488 | " In addition to the previous figure:\n", 489 | "
    \n", 490 | "
  • Give the mining sites a distinct color based on the `'mineral1'` column, indicating which mineral is the primary mined mineral.
  • \n", 491 | "
\n", 492 | "
" 493 | ] 494 | }, 495 | { 496 | "cell_type": "code", 497 | "execution_count": null, 498 | "metadata": { 499 | "clear_cell": true 500 | }, 501 | "outputs": [], 502 | "source": [ 503 | "# %load _solved/solutions/case-conflict-mapping24.py" 504 | ] 505 | }, 506 | { 507 | "cell_type": "markdown", 508 | "metadata": {}, 509 | "source": [ 510 | "## 2. Spatial operations" 511 | ] 512 | }, 513 | { 514 | "cell_type": "markdown", 515 | "metadata": {}, 516 | "source": [ 517 | "
\n", 518 | " EXERCISE:\n", 519 | " \n", 520 | "
    \n", 521 | "
  • Access the geometry of the \"Kahuzi-Biega National park\".
  • \n", 522 | "
  • Filter the mining sites to select those that are located in this national park.
  • \n", 523 | "
\n", 524 | "
" 525 | ] 526 | }, 527 | { 528 | "cell_type": "code", 529 | "execution_count": null, 530 | "metadata": { 531 | "clear_cell": true 532 | }, 533 | "outputs": [], 534 | "source": [ 535 | "# %load _solved/solutions/case-conflict-mapping25.py" 536 | ] 537 | }, 538 | { 539 | "cell_type": "code", 540 | "execution_count": null, 541 | "metadata": { 542 | "clear_cell": true 543 | }, 544 | "outputs": [], 545 | "source": [ 546 | "# %load _solved/solutions/case-conflict-mapping26.py" 547 | ] 548 | }, 549 | { 550 | "cell_type": "code", 551 | "execution_count": null, 552 | "metadata": { 553 | "clear_cell": true 554 | }, 555 | "outputs": [], 556 | "source": [ 557 | "# %load _solved/solutions/case-conflict-mapping27.py" 558 | ] 559 | }, 560 | { 561 | "cell_type": "markdown", 562 | "metadata": {}, 563 | "source": [ 564 | "
\n", 565 | " EXERCISE: Determine for each mining site the \"closest\" protected area:\n", 566 | " \n", 567 | "
    \n", 568 | "
  • PART 1 - do this for a single mining site:\n", 569 | "
      \n", 570 | "
    • Get a single mining site, e.g. the first of the dataset.
    • \n", 571 | "
    • Calculate the distance (in km's) to all protected areas for this mining site
    • \n", 572 | "
    • Get the index of the minimum distance (tip: `idxmin()`) and get the name of the protected are corresponding to this index.
    • \n", 573 | "
    \n", 574 | "
  • \n", 575 | "
  • PART 2 - apply this procedure on each geometry:\n", 576 | "
      \n", 577 | "
    • Write the above procedure as a function that gets a single site and the protected areas dataframe as input and returns the name of the closest protected area as output.
    • \n", 578 | "
    • Apply this function to all sites using the `.apply()` method on `data_utm.geometry`.
    • \n", 579 | "
    \n", 580 | "
  • \n", 581 | "
\n", 582 | "
" 583 | ] 584 | }, 585 | { 586 | "cell_type": "code", 587 | "execution_count": null, 588 | "metadata": { 589 | "clear_cell": true 590 | }, 591 | "outputs": [], 592 | "source": [ 593 | "# %load _solved/solutions/case-conflict-mapping28.py" 594 | ] 595 | }, 596 | { 597 | "cell_type": "code", 598 | "execution_count": null, 599 | "metadata": { 600 | "clear_cell": true 601 | }, 602 | "outputs": [], 603 | "source": [ 604 | "# %load _solved/solutions/case-conflict-mapping29.py" 605 | ] 606 | }, 607 | { 608 | "cell_type": "code", 609 | "execution_count": null, 610 | "metadata": { 611 | "clear_cell": true 612 | }, 613 | "outputs": [], 614 | "source": [ 615 | "# %load _solved/solutions/case-conflict-mapping30.py" 616 | ] 617 | }, 618 | { 619 | "cell_type": "code", 620 | "execution_count": null, 621 | "metadata": { 622 | "clear_cell": true 623 | }, 624 | "outputs": [], 625 | "source": [ 626 | "# %load _solved/solutions/case-conflict-mapping31.py" 627 | ] 628 | }, 629 | { 630 | "cell_type": "code", 631 | "execution_count": null, 632 | "metadata": { 633 | "clear_cell": true 634 | }, 635 | "outputs": [], 636 | "source": [ 637 | "# %load _solved/solutions/case-conflict-mapping32.py" 638 | ] 639 | }, 640 | { 641 | "cell_type": "markdown", 642 | "metadata": {}, 643 | "source": [ 644 | "## 3. Using spatial join to determine mining sites in the protected areas\n", 645 | "\n", 646 | "Based on the analysis and visualizations above, we can already see that there are mining sites inside the protected areas. Let's now do an actual spatial join to determine which sites are within the protected areas." 647 | ] 648 | }, 649 | { 650 | "cell_type": "markdown", 651 | "metadata": {}, 652 | "source": [ 653 | "### Mining sites in protected areas\n", 654 | "\n", 655 | "
\n", 656 | " EXERCISE:\n", 657 | "
    \n", 658 | "
  • Add information about the protected areas to the mining sites dataset, using a spatial join:\n", 659 | "
      \n", 660 | "
    • Call the result `data_within_protected`
    • \n", 661 | "
    • If the result is empty, this is an indication that the coordinate reference system is not matching. Make sure to re-project the data (see above).
    • \n", 662 | " \n", 663 | "
    \n", 664 | "
  • \n", 665 | "
  • How many mining sites are located within a national park?
  • \n", 666 | "
  • Count the number of mining sites per national park (pandas tip: check `value_counts()`)
  • \n", 667 | "\n", 668 | "
\n", 669 | "\n", 670 | "
" 671 | ] 672 | }, 673 | { 674 | "cell_type": "code", 675 | "execution_count": null, 676 | "metadata": { 677 | "clear_cell": true 678 | }, 679 | "outputs": [], 680 | "source": [ 681 | "# %load _solved/solutions/case-conflict-mapping33.py" 682 | ] 683 | }, 684 | { 685 | "cell_type": "code", 686 | "execution_count": null, 687 | "metadata": { 688 | "clear_cell": true 689 | }, 690 | "outputs": [], 691 | "source": [ 692 | "# %load _solved/solutions/case-conflict-mapping34.py" 693 | ] 694 | }, 695 | { 696 | "cell_type": "code", 697 | "execution_count": null, 698 | "metadata": { 699 | "clear_cell": true 700 | }, 701 | "outputs": [], 702 | "source": [ 703 | "# %load _solved/solutions/case-conflict-mapping35.py" 704 | ] 705 | }, 706 | { 707 | "cell_type": "code", 708 | "execution_count": null, 709 | "metadata": { 710 | "clear_cell": true 711 | }, 712 | "outputs": [], 713 | "source": [ 714 | "# %load _solved/solutions/case-conflict-mapping36.py" 715 | ] 716 | }, 717 | { 718 | "cell_type": "markdown", 719 | "metadata": {}, 720 | "source": [ 721 | "### Mining sites in the borders of protected areas\n", 722 | "\n", 723 | "And what about the borders of the protected areas? (just outside the park)\n", 724 | "\n", 725 | "
\n", 726 | " EXERCISE:\n", 727 | "
    \n", 728 | "
  • Create a new dataset, `protected_areas_borders`, that contains the border area (10 km wide) of each protected area:\n", 729 | "
      \n", 730 | "
    • Tip: one way of doing this is with the `buffer` and `difference` function.
    • \n", 731 | "
    • Plot the resulting borders as a visual check of correctness.
    • \n", 732 | "
    \n", 733 | "
  • \n", 734 | "
  • Count the number of mining sites per national park that are located within its borders
  • \n", 735 | "\n", 736 | "
\n", 737 | "\n", 738 | "
" 739 | ] 740 | }, 741 | { 742 | "cell_type": "code", 743 | "execution_count": null, 744 | "metadata": { 745 | "clear_cell": true 746 | }, 747 | "outputs": [], 748 | "source": [ 749 | "# %load _solved/solutions/case-conflict-mapping37.py" 750 | ] 751 | }, 752 | { 753 | "cell_type": "code", 754 | "execution_count": null, 755 | "metadata": { 756 | "clear_cell": true 757 | }, 758 | "outputs": [], 759 | "source": [ 760 | "# %load _solved/solutions/case-conflict-mapping38.py" 761 | ] 762 | }, 763 | { 764 | "cell_type": "code", 765 | "execution_count": null, 766 | "metadata": { 767 | "clear_cell": true 768 | }, 769 | "outputs": [], 770 | "source": [ 771 | "# %load _solved/solutions/case-conflict-mapping39.py" 772 | ] 773 | }, 774 | { 775 | "cell_type": "code", 776 | "execution_count": null, 777 | "metadata": { 778 | "clear_cell": true 779 | }, 780 | "outputs": [], 781 | "source": [ 782 | "# %load _solved/solutions/case-conflict-mapping40.py" 783 | ] 784 | }, 785 | { 786 | "cell_type": "code", 787 | "execution_count": null, 788 | "metadata": { 789 | "clear_cell": true 790 | }, 791 | "outputs": [], 792 | "source": [ 793 | "# %load _solved/solutions/case-conflict-mapping41.py" 794 | ] 795 | } 796 | ], 797 | "metadata": { 798 | "kernelspec": { 799 | "display_name": "Python 3", 800 | "language": "python", 801 | "name": "python3" 802 | }, 803 | "language_info": { 804 | "codemirror_mode": { 805 | "name": "ipython", 806 | "version": 3 807 | }, 808 | "file_extension": ".py", 809 | "mimetype": "text/x-python", 810 | "name": "python", 811 | "nbconvert_exporter": "python", 812 | "pygments_lexer": "ipython3", 813 | "version": "3.5.5" 814 | } 815 | }, 816 | "nbformat": 4, 817 | "nbformat_minor": 2 818 | } 819 | -------------------------------------------------------------------------------- /case-gini-in-a-bottle-the-trump-vote.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "%matplotlib inline\n", 10 | "\n", 11 | "import pandas as pd\n", 12 | "import geopandas as gpd\n", 13 | "import libpysal as lp\n", 14 | "import esda\n", 15 | "import numpy as np\n", 16 | "import matplotlib.pyplot as plt" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "# Case Study: *Gini in a bottle: Income Inequality and the Trump Vote*" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "#### Read in the table and show the first three rows" 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": null, 36 | "metadata": {}, 37 | "outputs": [], 38 | "source": [ 39 | "%load _solved/solutions/case-trump-vote01.py" 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "#### Set the CRS and reproject it into a suitable projection for mapping the contiguous US\n", 47 | "\n", 48 | "*hint: the epsg code useful here is 5070, for Albers equal area conic*" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": null, 54 | "metadata": {}, 55 | "outputs": [], 56 | "source": [ 57 | "%load _solved/solutions/case-trump-vote02.py" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "metadata": {}, 63 | "source": [ 64 | "#### Plot each year's vote against each other year's vote\n", 65 | "In this instance, it also helps to include the line ($y=x$) on each plot, so that it is clearer the directions the aggregate votes moved. " 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": null, 71 | "metadata": {}, 72 | "outputs": [], 73 | "source": [ 74 | "%load _solved/solutions/case-trump-vote03.py" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": {}, 80 | "source": [ 81 | "#### Show the relationship between the dem two-party vote and the Gini coefficient by county." 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": null, 87 | "metadata": {}, 88 | "outputs": [], 89 | "source": [ 90 | "% load _solved/solutions/case-trump-vote04.py" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "#### Compute the swings (change in vote from year to year)" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": null, 103 | "metadata": {}, 104 | "outputs": [], 105 | "source": [ 106 | "%load _solved/solutions/case-trump-vote05.py" 107 | ] 108 | }, 109 | { 110 | "cell_type": "markdown", 111 | "metadata": {}, 112 | "source": [ 113 | "Negative swing means the Democrat voteshare in 2016 (what Clinton won) is lower than Democrat voteshare in 2008 (what Obama won).\n", 114 | "So, counties where swing is negative mean that Obama \"outperformed\" Clinton. \n", 115 | "Equivalently, these would be counties where McCain (in 2008) \"beat\" Trump's electoral performance in 2016.\n", 116 | "\n", 117 | "Positive swing in a county means that Clinton (in 2016) outperformed Obama (in 2008), or where Trump (in 2016) did better than McCain (in 2008). \n", 118 | "\n", 119 | "The national average swing was around -9% from 2008 to 2016. Further, swing does not directly record who \"won\" the county, only which direction the county \"moved.\"" 120 | ] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "metadata": {}, 125 | "source": [ 126 | "#### map the swing from 2008 to 2016 alongside the votes in 2008 and 2016:" 127 | ] 128 | }, 129 | { 130 | "cell_type": "code", 131 | "execution_count": null, 132 | "metadata": {}, 133 | "outputs": [], 134 | "source": [ 135 | "%load _solved/solutions/case-trump-vote06.py" 136 | ] 137 | }, 138 | { 139 | "cell_type": "markdown", 140 | "metadata": {}, 141 | "source": [ 142 | "#### Build a spatial weights object to model the spatial relationships between US counties" 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": null, 148 | "metadata": {}, 149 | "outputs": [], 150 | "source": [ 151 | "%load _solved/solutions/case-trump-vote07.py" 152 | ] 153 | }, 154 | { 155 | "cell_type": "markdown", 156 | "metadata": {}, 157 | "source": [ 158 | "Note that this is just one of many valid solutions. But, all the remaining exercises are predicated on using this weight. If you choose a different weight structure, your results may differ." 159 | ] 160 | }, 161 | { 162 | "cell_type": "markdown", 163 | "metadata": {}, 164 | "source": [ 165 | "#### Is swing \"contagious?\" Do nearby counties tend to swing together? " 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": null, 171 | "metadata": {}, 172 | "outputs": [], 173 | "source": [ 174 | "%load _solved/solutions/case-trump-vote08.py" 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": {}, 180 | "source": [ 181 | "#### Visually show the relationship between places' swing and their surrounding swing, like in a scatterplot. " 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": null, 187 | "metadata": {}, 188 | "outputs": [], 189 | "source": [ 190 | "%load _solved/solutions/case-trump-vote09.py" 191 | ] 192 | }, 193 | { 194 | "cell_type": "markdown", 195 | "metadata": {}, 196 | "source": [ 197 | "#### Are there any outliers or clusters in swing using a Local Moran's $I$?" 198 | ] 199 | }, 200 | { 201 | "cell_type": "code", 202 | "execution_count": null, 203 | "metadata": {}, 204 | "outputs": [], 205 | "source": [ 206 | "%load _solved/solutions/case-trump-vote10.py" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "#### Where are these outliers or clusters?" 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": null, 219 | "metadata": {}, 220 | "outputs": [], 221 | "source": [ 222 | "%load _solved/solutions/case-trump-vote11.py" 223 | ] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": {}, 228 | "source": [ 229 | "#### Can you focus in on the regions which are outliers?" 230 | ] 231 | }, 232 | { 233 | "cell_type": "code", 234 | "execution_count": null, 235 | "metadata": {}, 236 | "outputs": [], 237 | "source": [ 238 | "%load _solved/solutions/case-trump-vote12.py" 239 | ] 240 | }, 241 | { 242 | "cell_type": "markdown", 243 | "metadata": {}, 244 | "source": [ 245 | "Group 3 moves surprisingly strongly from Obama to Trump relative to its surroundings, and group 1 moves strongly from Obama to Hilary relative to its surroundings.\n", 246 | "\n", 247 | "Group 4 moves surprisingly away from Trump while its area moves towards Trump. Group 2 moves surprisingly towards Trump while its area moves towards Hilary. " 248 | ] 249 | }, 250 | { 251 | "cell_type": "markdown", 252 | "metadata": {}, 253 | "source": [ 254 | "#### Relaxing the significance a bit, where do we see significant spatial outliers?" 255 | ] 256 | }, 257 | { 258 | "cell_type": "code", 259 | "execution_count": null, 260 | "metadata": {}, 261 | "outputs": [], 262 | "source": [ 263 | "%load _solved/solutions/case-trump-vote13.py" 264 | ] 265 | }, 266 | { 267 | "cell_type": "markdown", 268 | "metadata": {}, 269 | "source": [ 270 | "mainly in ohio, indiana, and west virginia" 271 | ] 272 | }, 273 | { 274 | "cell_type": "markdown", 275 | "metadata": {}, 276 | "source": [ 277 | "#### What about when comparing the voting behavior from 2012 to 2016?" 278 | ] 279 | }, 280 | { 281 | "cell_type": "code", 282 | "execution_count": null, 283 | "metadata": {}, 284 | "outputs": [], 285 | "source": [ 286 | "%load _solved/solutions/case-trump-vote14.py" 287 | ] 288 | }, 289 | { 290 | "cell_type": "markdown", 291 | "metadata": {}, 292 | "source": [ 293 | "##### What is the relationship between the Gini coefficient and partisan swing?" 294 | ] 295 | }, 296 | { 297 | "cell_type": "code", 298 | "execution_count": null, 299 | "metadata": {}, 300 | "outputs": [], 301 | "source": [ 302 | "% load _solved/solutions/case-trump-vote15.py" 303 | ] 304 | }, 305 | { 306 | "cell_type": "markdown", 307 | "metadata": {}, 308 | "source": [ 309 | "Hillary tended to do better than Obama in counties with higher income inequality.\n", 310 | "In contrast, Trump fared better in counties with lower income inequality. \n", 311 | "If you're further interested in the sometimes-counterintuitive relationship between income, voting, & geographic context, check out Gelman's [Red State, Blue State](https://www.amazon.com/Red-State-Blue-Rich-Poor/dp/0691143935). " 312 | ] 313 | } 314 | ], 315 | "metadata": { 316 | "kernelspec": { 317 | "display_name": "Python 3", 318 | "language": "python", 319 | "name": "python3" 320 | }, 321 | "language_info": { 322 | "codemirror_mode": { 323 | "name": "ipython", 324 | "version": 3 325 | }, 326 | "file_extension": ".py", 327 | "mimetype": "text/x-python", 328 | "name": "python", 329 | "nbconvert_exporter": "python", 330 | "pygments_lexer": "ipython3", 331 | "version": "3.6.5" 332 | } 333 | }, 334 | "nbformat": 4, 335 | "nbformat_minor": 2 336 | } 337 | -------------------------------------------------------------------------------- /check_environment.py: -------------------------------------------------------------------------------- 1 | import importlib 2 | 3 | packages = ['geopandas', 'sklearn', 'contextily', 'folium', 'mgwr', 'pysal'] 4 | 5 | bad = [] 6 | for package in packages: 7 | try: 8 | importlib.import_module(package) 9 | except ImportError: 10 | bad.append("Can't import %s" % package) 11 | else: 12 | if len(bad) > 0: 13 | print('Your tutorial environment is not yet fully set up:') 14 | print('\n'.join(bad)) 15 | else: 16 | try: 17 | import geopandas 18 | countries = geopandas.read_file("zip://./data/ne_110m_admin_0_countries.zip") 19 | print("All good. Enjoy the tutorial!") 20 | except Exception as e: 21 | print("Couldn't read countries shapefile.") 22 | print(e) 23 | 24 | -------------------------------------------------------------------------------- /data/berlin-listings.csv.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/geopandas/scipy2018-geospatial-data/57afd9a46a75b281adb0d36b0492142eb7ca29c9/data/berlin-listings.csv.gz -------------------------------------------------------------------------------- /data/cod_conservation.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/geopandas/scipy2018-geospatial-data/57afd9a46a75b281adb0d36b0492142eb7ca29c9/data/cod_conservation.zip -------------------------------------------------------------------------------- /data/ne_110m_admin_0_countries.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/geopandas/scipy2018-geospatial-data/57afd9a46a75b281adb0d36b0492142eb7ca29c9/data/ne_110m_admin_0_countries.zip -------------------------------------------------------------------------------- /data/ne_110m_populated_places.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/geopandas/scipy2018-geospatial-data/57afd9a46a75b281adb0d36b0492142eb7ca29c9/data/ne_110m_populated_places.zip -------------------------------------------------------------------------------- /data/ne_50m_rivers_lake_centerlines.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/geopandas/scipy2018-geospatial-data/57afd9a46a75b281adb0d36b0492142eb7ca29c9/data/ne_50m_rivers_lake_centerlines.zip -------------------------------------------------------------------------------- /data/uspres.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/geopandas/scipy2018-geospatial-data/57afd9a46a75b281adb0d36b0492142eb7ca29c9/data/uspres.zip -------------------------------------------------------------------------------- /environment.yml: -------------------------------------------------------------------------------- 1 | name: scipygeo18 2 | channels: 3 | - defaults 4 | - conda-forge 5 | dependencies: 6 | - geopandas 7 | - geoplot 8 | - pysal 9 | - folium 10 | - scikit-learn 11 | - rasterio 12 | - pip 13 | - pip: 14 | - contextily 15 | - libpysal 16 | - mgwr 17 | - mapclassify 18 | - esda 19 | -------------------------------------------------------------------------------- /img/TopologicSpatialRelations2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/geopandas/scipy2018-geospatial-data/57afd9a46a75b281adb0d36b0492142eb7ca29c9/img/TopologicSpatialRelations2.png -------------------------------------------------------------------------------- /img/download-button.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/geopandas/scipy2018-geospatial-data/57afd9a46a75b281adb0d36b0492142eb7ca29c9/img/download-button.png --------------------------------------------------------------------------------