├── .gitignore ├── CNAME ├── LICENSE ├── Makefile ├── README.md ├── _config.yml ├── _toc.yml ├── images ├── fa-rocket.svg └── yt_logo.png ├── introduction.md ├── logo.png ├── notebooks ├── dask_creating_spatial_data.ipynb ├── dask_median_composite.ipynb ├── geopandas_bulk_geocoding.ipynb ├── geopandas_extract_from_excel.ipynb ├── geopandas_fuzzy_table_join.ipynb ├── geopandas_spatial_query.ipynb ├── openai_mapping_news_articles.ipynb ├── ors_distance_matrix.ipynb ├── samgeo_farm_boundary_extraction.ipynb ├── samgeo_mine_perimeter_detection.ipynb ├── xarray_aggregating_time_series.ipynb ├── xarray_climate_anomaly.ipynb ├── xarray_create_raster.ipynb ├── xarray_extracting_time_series.ipynb ├── xarray_mosaic_and_clip.ipynb ├── xarray_raster_sampling.ipynb ├── xarray_raster_styling_analysis.ipynb ├── xarray_wrap_longitude.ipynb ├── xarray_zonal_stats.ipynb ├── xee_downloading_images.ipynb └── xee_ic_to_netcdf.ipynb ├── references.bib └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | _build/ 2 | notebooks/.ipynb_checkpoints 3 | .DS_Store 4 | notebooks/data 5 | notebooks/output 6 | .ipynb_checkpoints 7 | -------------------------------------------------------------------------------- /CNAME: -------------------------------------------------------------------------------- 1 | www.geopythontutorials.com -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2025 Ujaval Gandhi 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | github: 2 | git add . 3 | git commit -a -m 'update'; git push origin main 4 | 5 | html: 6 | rm -rf _build/ 7 | jupyter-book build . 8 | cp CNAME _build/html 9 | 10 | gh-pages: html 11 | ghp-import -n -p -f _build/html -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Geospatial Python Tutorials 2 | 3 | This repository contains the notebooks and Jupyterbook configuration for the [Geospatial Python Tutorials](https://www.geopythontutorials.com/) website. 4 | 5 | The website is a static website generated using the following technologies 6 | 7 | * All the content is based on Markdown files and Jupyter notebooks. 8 | * The HTML is generated using [Jupyterbook](https://jupyterbook.org/en/stable/intro.html). 9 | * The webpages are hosted on [Github Pages](https://pages.github.com/). 10 | * Comments are powered by [utterances](https://utteranc.es/). 11 | 12 | ## Clone the Repository 13 | 14 | ``` 15 | git clone git@github.com:spatialthoughts/geopython-tutorials.git 16 | cd geopython-tutorials 17 | ``` 18 | 19 | 20 | ## Installation 21 | 22 | The following instructions have been tested for Linux/Mac systems. I prefer conda for environment management so the instructions use conda, but if you prefer virtualenv, you can use it instead as well. 23 | 24 | Create a new environment named `geopython-tutorials` and install dependencies. Optionally we also need `make` to build the source files. 25 | 26 | ``` 27 | conda create --name geopython-tutorials 28 | conda activate geopython-tutorials 29 | conda install pip 30 | conda install make 31 | pip install -r requirements.txt 32 | ``` 33 | 34 | ## Updating the Contents 35 | 36 | The homepage content is in the file `introduction.md`. All other content is generated from the `.ipynb` files in the `notebooks` folder. The table of content is in the `_toc.yml` file. 37 | 38 | ### Editing existing tutorials 39 | 40 | * Edit the corresponding notebook in the `notebooks/` folder using Jupyterlab/Colab. 41 | 42 | ### Adding a new tutorial 43 | 44 | * Add the `.ipynb` file in the `notebooks/` folder. 45 | * Edit the `_toc.yml` file and add the section for the new tutorial. 46 | 47 | ## Build the Website and Push the Changes 48 | 49 | The `Makefile` contains several rules to execute the commands to build the website. 50 | 51 | After making changes, run the following to build the HTML pages and preview them. 52 | 53 | ``` 54 | make html 55 | ``` 56 | 57 | To push the changes to GitHub pages, run the following 58 | 59 | 60 | ``` 61 | make gh-pages 62 | ``` 63 | 64 | License 65 | ------- 66 | 67 | All the tutorials are available under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/deed.en_US) 68 | -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | # Book settings 2 | # Learn more at https://jupyterbook.org/customize/config.html 3 | 4 | title: Geospatial Python Tutorials 5 | author: Ujaval Gandhi 6 | logo: logo.png 7 | copyright: "2025" 8 | 9 | # Force re-execution of notebooks on each build. 10 | # See https://jupyterbook.org/content/execute.html 11 | execute: 12 | execute_notebooks: off 13 | 14 | # Define the name of the latex output file for PDF builds 15 | latex: 16 | latex_documents: 17 | targetname: book.tex 18 | 19 | # Add a bibtex file so that we can create citations 20 | bibtex_bibfiles: 21 | - references.bib 22 | 23 | # Information about where the book exists on the web 24 | repository: 25 | url: https://github.com/spatialthoughts/geopython-tutorials 26 | branch: main # Which branch of the repository should be used when creating links (optional) 27 | 28 | # Add GitHub buttons to your book 29 | # See https://jupyterbook.org/customize/config.html#add-a-link-to-your-repository 30 | html: 31 | use_issues_button: true 32 | use_repository_button: true 33 | analytics: 34 | google_analytics_id: G-CSFQ69LBRL 35 | extra_footer : | 36 |
37 | 38 | This work is licensed under a CC BY 4.0 license. 39 |
40 | launch_buttons: 41 | colab_url: "https://colab.research.google.com" 42 | 43 | parse: 44 | myst_enable_extensions: 45 | # don't forget to list any other extensions you want enabled, 46 | # including those that are enabled by default! See here: https://jupyterbook.org/en/stable/customize/config.html 47 | - html_image 48 | 49 | sphinx: 50 | recursive_update: true 51 | config: 52 | html_context: 53 | default_mode: light 54 | html_show_copyright: false 55 | -------------------------------------------------------------------------------- /_toc.yml: -------------------------------------------------------------------------------- 1 | # Table of contents 2 | # Learn more at https://jupyterbook.org/customize/toc.html 3 | 4 | format: jb-book 5 | root: introduction 6 | parts: 7 | - caption: GeoPandas 8 | chapters: 9 | - file: notebooks/geopandas_bulk_geocoding.ipynb 10 | - file: notebooks/geopandas_spatial_query.ipynb 11 | - file: notebooks/geopandas_extract_from_excel.ipynb 12 | - file: notebooks/geopandas_fuzzy_table_join.ipynb 13 | - caption: XArray 14 | chapters: 15 | - file: notebooks/xarray_raster_styling_analysis.ipynb 16 | - file: notebooks/xarray_mosaic_and_clip.ipynb 17 | - file: notebooks/xarray_raster_sampling.ipynb 18 | - file: notebooks/xarray_zonal_stats.ipynb 19 | - file: notebooks/xarray_extracting_time_series.ipynb 20 | - file: notebooks/xarray_aggregating_time_series.ipynb 21 | - file: notebooks/xarray_wrap_longitude.ipynb 22 | - file: notebooks/xarray_climate_anomaly.ipynb 23 | - file: notebooks/xarray_create_raster.ipynb 24 | - caption: Dask 25 | chapters: 26 | - file: notebooks/dask_creating_spatial_data.ipynb 27 | - file: notebooks/dask_median_composite.ipynb 28 | - caption: XEE (XArray + Google Earth Engine) 29 | chapters: 30 | - file: notebooks/xee_ic_to_netcdf.ipynb 31 | - file: notebooks/xee_downloading_images.ipynb 32 | - caption: Segment Geospatial 33 | chapters: 34 | - file: notebooks/samgeo_farm_boundary_extraction.ipynb 35 | - file: notebooks/samgeo_mine_perimeter_detection.ipynb 36 | - caption: Web APIs 37 | chapters: 38 | - file: notebooks/ors_distance_matrix.ipynb 39 | - file: notebooks/openai_mapping_news_articles.ipynb 40 | -------------------------------------------------------------------------------- /images/fa-rocket.svg: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /images/yt_logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/spatialthoughts/geopython-tutorials/f2c655066981b4d1c6708f56e8748a85defb38d2/images/yt_logo.png -------------------------------------------------------------------------------- /introduction.md: -------------------------------------------------------------------------------- 1 | # Geospatial Python Tutorials 2 | 3 | Welcome to Spatial Analysis and Remote Sensing Tutorials by Spatial Thoughts. These tutorials complement our Python courses and are suitable for learners who want to advance their skills. 4 | 5 | We highly recommend completing the following courses before diving into these tutorials. All our courses are open-access and freely available for self-study. 6 | 7 | 8 | * Python Foundation for Spatial Analysis ↗ 9 | * Mapping and Data Visualization with Python ↗ 10 | 11 | ## Before you begin 12 | 13 | Each tutorial is in the form of a self-contained notebook and comes with step-by-step explanation and datasets. Many tutorials also have an accompanying video walkthrough as well. The preferred way to run each notebook is using [Google Colab](https://colab.research.google.com/). Click the icon located at the top of each tutorial to open it on Colab. 14 | 15 | > If you are new to Colab, see our Hello Colab ↗ video. 16 | 17 | ## Tutorials 18 | 19 | ```{tableofcontents} 20 | ``` 21 | -------------------------------------------------------------------------------- /logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/spatialthoughts/geopython-tutorials/f2c655066981b4d1c6708f56e8748a85defb38d2/logo.png -------------------------------------------------------------------------------- /notebooks/geopandas_extract_from_excel.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "N6CsIukeW1-7" 7 | }, 8 | "source": [ 9 | "# Extract a Shapefile Subset\n", 10 | "\n", 11 | "## Introduction\n", 12 | "\n", 13 | "Many GIS processes involve extracting a subset from a database. A common pattern is to have data identifiers (like Parcel IDs) sent in a spreadsheet which needs to be queried and extracted from a master file. This tutorial shows how you can automate such a process using Pandas and GeoPandas.\n", 14 | "\n", 15 | "## Overview of the task\n", 16 | "\n", 17 | "This tutorial shows you how to use extract a subset from a shapefile using data contained in an Excel spreadsheet.\n", 18 | "\n", 19 | "We will be working with a parcels data layer for the city of San Francisco, California. Given a list of parcel ids in a spreadsheet, we will extract those parcels and save it to another data layer.\n", 20 | "\n", 21 | "**Input Layers**:\n", 22 | "* `sf_parcels.zip`: A shapefile of parcels San Francisco\n", 23 | "* `parcels_to_export.xlsx`: A spreadsheet containing list of parcels to export.\n", 24 | "\n", 25 | "**Output**:\n", 26 | "* `subset.zip`: A zipped shapefile containing a subset of parcels based on the spreadsheet.\n", 27 | "\n", 28 | "**Data Credit**:\n", 29 | "* Parcels downloaded from [DataSF Open Data Portal](https://datasf.org/opendata/)\n", 30 | "\n", 31 | "\n", 32 | "**Watch Video Walkthrough** \n" 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": { 38 | "id": "JepwzAj2U5L5" 39 | }, 40 | "source": [ 41 | "## Setup and Data Download\n", 42 | "\n", 43 | "The following blocks of code will install the required packages and download the datasets to your Colab environment." 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": 1, 49 | "metadata": { 50 | "id": "uQovPAjjU5L6" 51 | }, 52 | "outputs": [], 53 | "source": [ 54 | "import os\n", 55 | "import pandas as pd\n", 56 | "import geopandas as gpd\n", 57 | "import zipfile" 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": 2, 63 | "metadata": { 64 | "id": "-Zndcd8KU5L6" 65 | }, 66 | "outputs": [], 67 | "source": [ 68 | "data_folder = 'data'\n", 69 | "output_folder = 'output'\n", 70 | "\n", 71 | "if not os.path.exists(data_folder):\n", 72 | " os.mkdir(data_folder)\n", 73 | "if not os.path.exists(output_folder):\n", 74 | " os.mkdir(output_folder)" 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": 3, 80 | "metadata": { 81 | "id": "N9cAjPXSU5L6" 82 | }, 83 | "outputs": [ 84 | { 85 | "name": "stdout", 86 | "output_type": "stream", 87 | "text": [ 88 | "Downloaded data/sf_parcels.zip\n", 89 | "Downloaded data/parcels_to_export.xlsx\n" 90 | ] 91 | } 92 | ], 93 | "source": [ 94 | "def download(url):\n", 95 | " filename = os.path.join(data_folder, os.path.basename(url))\n", 96 | " if not os.path.exists(filename):\n", 97 | " from urllib.request import urlretrieve\n", 98 | " local, _ = urlretrieve(url, filename)\n", 99 | " print('Downloaded ' + local)\n", 100 | "\n", 101 | "data_url = 'https://github.com/spatialthoughts/geopython-tutorials/releases/download/data/'\n", 102 | "\n", 103 | "download(data_url + 'sf_parcels.zip')\n", 104 | "download(data_url + 'parcels_to_export.xlsx')" 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": { 110 | "id": "-D-U34cbYkrC" 111 | }, 112 | "source": [ 113 | "## Procedure" 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": { 119 | "id": "gE90KEg9Z9BU" 120 | }, 121 | "source": [ 122 | "We first unzip the `sf_parcels.zip` archive and extract the shapefile contained inside. Then we can read it using GeoPandas.\n", 123 | "\n", 124 | "> GeoPandas can read zipped files directly using the `zip://` prefix as described in [Reading and Writing Files](https://geopandas.org/en/stable/docs/user_guide/io.html) section of the documentation. `gpd.read_file('zip:///data/sf_parcels.zip')`. But it was much slower than unzipping and reading the shapefile." 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": 4, 130 | "metadata": { 131 | "id": "QWwnTyVyoFMr" 132 | }, 133 | "outputs": [], 134 | "source": [ 135 | "parcels_filepath = os.path.join(data_folder, 'sf_parcels.zip')" 136 | ] 137 | }, 138 | { 139 | "cell_type": "markdown", 140 | "metadata": { 141 | "id": "NWqvtTRPb27L" 142 | }, 143 | "source": [ 144 | "We use Python's built-in `zipfile` module to extract the files in the data directory." 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 5, 150 | "metadata": { 151 | "id": "zIxrmIW0Y9By" 152 | }, 153 | "outputs": [], 154 | "source": [ 155 | "with zipfile.ZipFile(parcels_filepath) as zf:\n", 156 | " zf.extractall(data_folder)" 157 | ] 158 | }, 159 | { 160 | "cell_type": "markdown", 161 | "metadata": { 162 | "id": "XoHkIBvzb-6z" 163 | }, 164 | "source": [ 165 | "Once unzipped, we can read the parcels shapefile using GeoPandas." 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": 6, 171 | "metadata": { 172 | "id": "0kv8x2JCoMFG" 173 | }, 174 | "outputs": [], 175 | "source": [ 176 | "parcels_shp = os.path.join(data_folder, 'sf_parcels.shp')\n", 177 | "parcels_gdf = gpd.read_file(parcels_shp)" 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "metadata": { 183 | "id": "X-la_03PcLth" 184 | }, 185 | "source": [ 186 | "Preview the resulting GeoDataFrame. The parcel ids are contained in the `mapblklot` column." 187 | ] 188 | }, 189 | { 190 | "cell_type": "code", 191 | "execution_count": 10, 192 | "metadata": { 193 | "id": "Kz-zLg_ucLAh" 194 | }, 195 | "outputs": [ 196 | { 197 | "data": { 198 | "text/html": [ 199 | "
\n", 200 | "\n", 213 | "\n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | "
mapblklotblklotblock_numlot_numfrom_addre
00001001000100100010010
10002001000200100020010
2000400200040020004002160
3000500100050010005001206
4000600100060010006001350
\n", 267 | "
" 268 | ], 269 | "text/plain": [ 270 | " mapblklot blklot block_num lot_num from_addre\n", 271 | "0 0001001 0001001 0001 001 0\n", 272 | "1 0002001 0002001 0002 001 0\n", 273 | "2 0004002 0004002 0004 002 160\n", 274 | "3 0005001 0005001 0005 001 206\n", 275 | "4 0006001 0006001 0006 001 350" 276 | ] 277 | }, 278 | "execution_count": 10, 279 | "metadata": {}, 280 | "output_type": "execute_result" 281 | } 282 | ], 283 | "source": [ 284 | "parcels_gdf.iloc[:5,:5]" 285 | ] 286 | }, 287 | { 288 | "cell_type": "markdown", 289 | "metadata": { 290 | "id": "IIszH40icQ_h" 291 | }, 292 | "source": [ 293 | "Next, we read the Excel file containing the parcel ids that we need to export." 294 | ] 295 | }, 296 | { 297 | "cell_type": "code", 298 | "execution_count": 11, 299 | "metadata": { 300 | "id": "OwYzuk3QoQxl" 301 | }, 302 | "outputs": [], 303 | "source": [ 304 | "export_file_path = os.path.join(data_folder, 'parcels_to_export.xlsx')" 305 | ] 306 | }, 307 | { 308 | "cell_type": "markdown", 309 | "metadata": { 310 | "id": "X4kJD_XOciij" 311 | }, 312 | "source": [ 313 | "Pandas can read Excel files directly using `read_excel()` function. If you get an error, make sure to install the package `openpyxl` which is used to read excel files." 314 | ] 315 | }, 316 | { 317 | "cell_type": "code", 318 | "execution_count": 13, 319 | "metadata": { 320 | "id": "Pc2UHj9VprO9" 321 | }, 322 | "outputs": [ 323 | { 324 | "data": { 325 | "text/html": [ 326 | "
\n", 327 | "\n", 340 | "\n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | "
mapblklotblklotblock_numlot_num
004780130478013478013
104780010478001478001
20478001B0478001B478001B
30478001C0478001C478001C
40478002A0478002A478002A
...............
8404990360499037499037
8504990360499038499038
8604990360499039499039
8704990360499040499040
8804990360499041499041
\n", 430 | "

89 rows × 4 columns

\n", 431 | "
" 432 | ], 433 | "text/plain": [ 434 | " mapblklot blklot block_num lot_num\n", 435 | "0 0478013 0478013 478 013\n", 436 | "1 0478001 0478001 478 001\n", 437 | "2 0478001B 0478001B 478 001B\n", 438 | "3 0478001C 0478001C 478 001C\n", 439 | "4 0478002A 0478002A 478 002A\n", 440 | ".. ... ... ... ...\n", 441 | "84 0499036 0499037 499 037\n", 442 | "85 0499036 0499038 499 038\n", 443 | "86 0499036 0499039 499 039\n", 444 | "87 0499036 0499040 499 040\n", 445 | "88 0499036 0499041 499 041\n", 446 | "\n", 447 | "[89 rows x 4 columns]" 448 | ] 449 | }, 450 | "execution_count": 13, 451 | "metadata": {}, 452 | "output_type": "execute_result" 453 | } 454 | ], 455 | "source": [ 456 | "export_df = pd.read_excel(export_file_path)\n", 457 | "export_df" 458 | ] 459 | }, 460 | { 461 | "cell_type": "markdown", 462 | "metadata": { 463 | "id": "5dRTB-ONctAL" 464 | }, 465 | "source": [ 466 | "We need to export all parcels whose ids are given in the `mapblklot` column. We extract that column and create a list." 467 | ] 468 | }, 469 | { 470 | "cell_type": "code", 471 | "execution_count": 14, 472 | "metadata": { 473 | "id": "7jPpLxJvpxw8" 474 | }, 475 | "outputs": [ 476 | { 477 | "data": { 478 | "text/plain": [ 479 | "array(['0478013', '0478001', '0478001B', '0478001C', '0478002A',\n", 480 | " '0478004', '0478005', '0478007', '0478008', '0478009', '0478010',\n", 481 | " '0478010B', '0478011', '0478011A', '0478011B', '0478011C',\n", 482 | " '0478011E', '0478014', '0478015', '0478015A', '0478016', '0478021',\n", 483 | " '0478022', '0478023', '0478024', '0478025', '0478026', '0478027',\n", 484 | " '0478028', '0478029', '0478030', '0478031', '0478032', '0478033',\n", 485 | " '0478034', '0478035', '0478036', '0478037', '0478038', '0478039',\n", 486 | " '0478040', '0478041', '0478042', '0478043', '0478044', '0478045',\n", 487 | " '0478046', '0478047', '0478061', '0478062', '0478063', '0478064',\n", 488 | " '0478065', '0478066', '0478067', '0499001', '0499001A', '0499001B',\n", 489 | " '0499001C', '0499001F', '0499001H', '0499002', '0499002A',\n", 490 | " '0499002B', '0499002D', '0499003', '0499004', '0499005', '0499006',\n", 491 | " '0499007', '0499009', '0499013', '0499014', '0499015', '0499016',\n", 492 | " '0499017', '0499018', '0499021', '0499022', '0499023', '0499024',\n", 493 | " '0499025', '0499026', '0499036', '0499037', '0499038', '0499039',\n", 494 | " '0499040', '0499041'], dtype=object)" 495 | ] 496 | }, 497 | "execution_count": 14, 498 | "metadata": {}, 499 | "output_type": "execute_result" 500 | } 501 | ], 502 | "source": [ 503 | "id_list = export_df['blklot'].values\n", 504 | "id_list" 505 | ] 506 | }, 507 | { 508 | "cell_type": "markdown", 509 | "metadata": { 510 | "id": "2sFfj8Y4dCGq" 511 | }, 512 | "source": [ 513 | "Now we can use Pandas `isin()` method to filter the GeoDataFrame where the `\n", 514 | "blklot` column matches any ids from the `id_list`." 515 | ] 516 | }, 517 | { 518 | "cell_type": "code", 519 | "execution_count": 16, 520 | "metadata": { 521 | "id": "mjJ_I_AsqE4p" 522 | }, 523 | "outputs": [ 524 | { 525 | "data": { 526 | "text/html": [ 527 | "
\n", 528 | "\n", 541 | "\n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | "
mapblklotblklotblock_numlot_numfrom_addre
211030478013047801304780132940
211190478001047800104780011101
211200478001B0478001B0478001B2855
211210478001C0478001C0478001C2845
211220478002A0478002A0478002A2821
\n", 595 | "
" 596 | ], 597 | "text/plain": [ 598 | " mapblklot blklot block_num lot_num from_addre\n", 599 | "21103 0478013 0478013 0478 013 2940\n", 600 | "21119 0478001 0478001 0478 001 1101\n", 601 | "21120 0478001B 0478001B 0478 001B 2855\n", 602 | "21121 0478001C 0478001C 0478 001C 2845\n", 603 | "21122 0478002A 0478002A 0478 002A 2821" 604 | ] 605 | }, 606 | "execution_count": 16, 607 | "metadata": {}, 608 | "output_type": "execute_result" 609 | } 610 | ], 611 | "source": [ 612 | "subset_gdf = parcels_gdf[parcels_gdf['blklot'].isin(id_list)]\n", 613 | "subset_gdf.iloc[:5, :5]" 614 | ] 615 | }, 616 | { 617 | "cell_type": "markdown", 618 | "metadata": { 619 | "id": "rFuW7npBdWBb" 620 | }, 621 | "source": [ 622 | "We have successfully selected the subset of parcels. We are ready to save the resulting GeoDataFrame as a shapefile. We define the output file path and save the `subset_gdf`." 623 | ] 624 | }, 625 | { 626 | "cell_type": "code", 627 | "execution_count": 17, 628 | "metadata": { 629 | "id": "2adR9beyqSbw" 630 | }, 631 | "outputs": [], 632 | "source": [ 633 | "output_file = 'subset.shp'\n", 634 | "output_path = os.path.join(output_folder, output_file)" 635 | ] 636 | }, 637 | { 638 | "cell_type": "code", 639 | "execution_count": 18, 640 | "metadata": { 641 | "id": "kyHcRUOfqmgV" 642 | }, 643 | "outputs": [], 644 | "source": [ 645 | "subset_gdf.to_file(output_path)" 646 | ] 647 | }, 648 | { 649 | "cell_type": "markdown", 650 | "metadata": { 651 | "id": "TWr7mRiQdvAa" 652 | }, 653 | "source": [ 654 | "For ease of data sharing, let's zip all the shapefile parts into a single archive. We again use the `zipfile` module and use the `write()` method to add each sidecar file for the shapefile. The `arcname` parameter is used to avoid creating a sub-folder inside the archive." 655 | ] 656 | }, 657 | { 658 | "cell_type": "code", 659 | "execution_count": 19, 660 | "metadata": { 661 | "id": "AhD3TXNfqop0" 662 | }, 663 | "outputs": [], 664 | "source": [ 665 | "output_zip = 'subset.zip'\n", 666 | "output_zip_path = os.path.join(output_folder, output_zip)\n", 667 | "\n", 668 | "with zipfile.ZipFile(output_zip_path, 'w') as output_zf:\n", 669 | " for ext in ['.shp', '.shx', '.prj', '.dbf']:\n", 670 | " filename = 'subset' + ext\n", 671 | " filepath = os.path.join(output_folder, filename)\n", 672 | " output_zf.write(filepath, arcname=filename)" 673 | ] 674 | }, 675 | { 676 | "cell_type": "markdown", 677 | "metadata": {}, 678 | "source": [ 679 | "----\n", 680 | "\n", 681 | "If you want to give feedback or share your experience with this tutorial, please comment below. (requires GitHub account)\n", 682 | "\n", 683 | "\n", 684 | "" 691 | ] 692 | } 693 | ], 694 | "metadata": { 695 | "colab": { 696 | "authorship_tag": "ABX9TyNN8jPiJlLRGqVPDfy1xf1X", 697 | "provenance": [] 698 | }, 699 | "kernelspec": { 700 | "display_name": "Python 3 (ipykernel)", 701 | "language": "python", 702 | "name": "python3" 703 | }, 704 | "language_info": { 705 | "codemirror_mode": { 706 | "name": "ipython", 707 | "version": 3 708 | }, 709 | "file_extension": ".py", 710 | "mimetype": "text/x-python", 711 | "name": "python", 712 | "nbconvert_exporter": "python", 713 | "pygments_lexer": "ipython3", 714 | "version": "3.13.1" 715 | } 716 | }, 717 | "nbformat": 4, 718 | "nbformat_minor": 4 719 | } 720 | -------------------------------------------------------------------------------- /notebooks/openai_mapping_news_articles.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "y4i88eriHdD9" 7 | }, 8 | "source": [ 9 | "# Natural Language Processing using OpenAI API\n", 10 | "\n", 11 | "## Introduction\n", 12 | "\n", 13 | "Large Language Models (LLMs) are great at understanding and interpreting natural language text. These can be leveraged to perform text extraction with very high accuracy. In this tutorial, we will use the [OpenAI API](https://platform.openai.com/docs/overview) to extract location information from news articles and display the results on a map. The tutorial also shows how we can develop prompts suitable for data processing pipelines that can return structured data from the models. The notebook is based on the excellent course [ChatGPT Prompt Engineering for Developers](https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers/) by Andrew Ng.\n", 14 | "\n", 15 | "## Overview of the Task\n", 16 | "\n", 17 | "We will take 3 news articles about human-elephant conflict in India, extract the information about the incident from these using a LLM and geocode the results to create a map.\n", 18 | "\n", 19 | "\n", 20 | "**Input Data**:\n", 21 | "\n", 22 | "* `article1.txt`, `article2.txt`, `article3.txt`: Sample news articles\n", 23 | "\n", 24 | "**Output Layers**:\n", 25 | "* An interactive map of locations and data extracted from the articles.\n" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": { 31 | "id": "gBPoBAwuIg5n" 32 | }, 33 | "source": [ 34 | "## Setup and Data Download" 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": null, 40 | "metadata": { 41 | "id": "B9II9CUJsVer" 42 | }, 43 | "outputs": [], 44 | "source": [ 45 | "%%capture\n", 46 | "if 'google.colab' in str(get_ipython()):\n", 47 | " !pip install openai mapclassify" 48 | ] 49 | }, 50 | { 51 | "cell_type": "code", 52 | "execution_count": null, 53 | "metadata": { 54 | "id": "tHaljj6OscCv" 55 | }, 56 | "outputs": [], 57 | "source": [ 58 | "from folium import Figure\n", 59 | "from geopy.extra.rate_limiter import RateLimiter\n", 60 | "from geopy.geocoders import GoogleV3\n", 61 | "import folium\n", 62 | "import geopandas as gpd\n", 63 | "import json\n", 64 | "import openai\n", 65 | "import os\n", 66 | "import pandas as pd\n", 67 | "import textwrap" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": { 73 | "id": "tGhy4NBdH2nV" 74 | }, 75 | "source": [ 76 | "Add your OpenAI API Key below. You need to [sign-up](https://platform.openai.com/signup) and obtain a key. This requires setting up a billing account. If you want to experiement, you can use the free environment provided by the [ChatGPT Prompt Engineering for Developers](https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers/) course.\n", 77 | "\n", 78 | "Add your Google Maps API Key below. This requires [signing-up](https://console.cloud.google.com/) using Google Cloud Console and setting up a billing account. Once done, make sure to enable Geocoding API and get a key." 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": null, 84 | "metadata": { 85 | "id": "-sF6fo2NIfBG" 86 | }, 87 | "outputs": [], 88 | "source": [ 89 | "openai.api_key = ''\n", 90 | "google_maps_api_key = ''" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": { 96 | "id": "TKJpusMOImRx" 97 | }, 98 | "source": [ 99 | "Initialize the model." 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": null, 105 | "metadata": { 106 | "id": "CKiYuhNPsjEl" 107 | }, 108 | "outputs": [], 109 | "source": [ 110 | "client = openai.OpenAI(api_key=openai.api_key)\n", 111 | "\n", 112 | "def get_completion(prompt, model='gpt-3.5-turbo'):\n", 113 | " messages = [{'role': 'user', 'content': prompt}]\n", 114 | " response = client.chat.completions.create(\n", 115 | " model=model,\n", 116 | " messages=messages,\n", 117 | " temperature=0, # This is the degree of randomness of the model's output\n", 118 | " )\n", 119 | " return response.choices[0].message.content" 120 | ] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "metadata": { 125 | "id": "4OEfB4jZIpMZ" 126 | }, 127 | "source": [ 128 | "## Load Data" 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "execution_count": null, 134 | "metadata": { 135 | "id": "aiQnF2l_ItZh" 136 | }, 137 | "outputs": [], 138 | "source": [ 139 | "data_folder = 'data'\n", 140 | "output_folder = 'output'\n", 141 | "\n", 142 | "if not os.path.exists(data_folder):\n", 143 | " os.mkdir(data_folder)\n", 144 | "if not os.path.exists(output_folder):\n", 145 | " os.mkdir(output_folder)" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": null, 151 | "metadata": { 152 | "colab": { 153 | "base_uri": "https://localhost:8080/" 154 | }, 155 | "id": "OKAv4jXVJRpE", 156 | "outputId": "5d62e15c-4ef8-4bd3-d302-cd6e46be41eb" 157 | }, 158 | "outputs": [ 159 | { 160 | "output_type": "stream", 161 | "name": "stdout", 162 | "text": [ 163 | "Downloaded data/article1.txt\n", 164 | "Downloaded data/article2.txt\n", 165 | "Downloaded data/article3.txt\n" 166 | ] 167 | } 168 | ], 169 | "source": [ 170 | "def download(url):\n", 171 | " filename = os.path.join(data_folder, os.path.basename(url))\n", 172 | " if not os.path.exists(filename):\n", 173 | " from urllib.request import urlretrieve\n", 174 | " local, _ = urlretrieve(url, filename)\n", 175 | " print('Downloaded ' + local)\n", 176 | "\n", 177 | "data_url = 'https://github.com/spatialthoughts/geopython-tutorials/releases/download/data/'\n", 178 | "\n", 179 | "articles = ['article1.txt', 'article2.txt', 'article3.txt']\n", 180 | "\n", 181 | "for article in articles:\n", 182 | " download(data_url + article)\n" 183 | ] 184 | }, 185 | { 186 | "cell_type": "markdown", 187 | "metadata": { 188 | "id": "d8_jtZ03JkaU" 189 | }, 190 | "source": [ 191 | "## Get AI Predictions" 192 | ] 193 | }, 194 | { 195 | "cell_type": "markdown", 196 | "metadata": { 197 | "id": "ed4tnA4sJv9E" 198 | }, 199 | "source": [ 200 | "Read the data." 201 | ] 202 | }, 203 | { 204 | "cell_type": "code", 205 | "execution_count": null, 206 | "metadata": { 207 | "id": "U1R6kcHcJy1c" 208 | }, 209 | "outputs": [], 210 | "source": [ 211 | "articles_texts = []\n", 212 | "\n", 213 | "for article in articles:\n", 214 | " path = os.path.join(data_folder, article)\n", 215 | " f = open(path, 'r')\n", 216 | " articles_texts.append(f.read())" 217 | ] 218 | }, 219 | { 220 | "cell_type": "markdown", 221 | "metadata": { 222 | "id": "tl6ypBU0KjDO" 223 | }, 224 | "source": [ 225 | "Display the first article." 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": null, 231 | "metadata": { 232 | "colab": { 233 | "base_uri": "https://localhost:8080/" 234 | }, 235 | "id": "3f8Nh3R9KaoH", 236 | "outputId": "1b23e1e0-6fd1-4662-d847-84c43593946c" 237 | }, 238 | "outputs": [ 239 | { 240 | "output_type": "stream", 241 | "name": "stdout", 242 | "text": [ 243 | "Title: 2 Persons Trampled To Death By Elephants In 2 Days In Odisha’s Dhenkanal\n", 244 | "Description: Dhenkanal: Human casualty due to elephant attack continued in\n", 245 | "Odisha’s Dhenkanal district as a man was trampled to death by a herd on\n", 246 | "Saturday. According to sources, the incident tool place when the victim, Khirod\n", 247 | "Samal of Neulapoi village under Sadangi forest range, had gone to collect cashew\n", 248 | "nuts from a nearby orchard in the morning. He came face to face with 3 elephants\n", 249 | "who had separated from a herd and were creating a rampage in the area. Though\n", 250 | "Khirod tried to escape from the place, the elephants caught hold of him and\n", 251 | "trampled him to death. It took place hardly 100 metre from the panchayat office\n", 252 | "in the area. On being informed, forester Madhusita Pati from Joronda went to\n", 253 | "the spot along with a team of Forest officials. She sent the body for post-\n", 254 | "mortem and advised the villagers not to venture into the forest till the Forest\n", 255 | "officials send the elephants back. In a similar incident on Friday, one person\n", 256 | "was killed in elephant attack in the district. The deceased was identified as\n", 257 | "Lakshmidhar Sahu of Bali Kiari village under Angat Jarda Panchayat in Hindol\n", 258 | "forest range. He was attacked by the elephant in the morning when he had gone to\n", 259 | "the village pond.\n" 260 | ] 261 | } 262 | ], 263 | "source": [ 264 | "wrapped_article = textwrap.fill(articles_texts[0], width=80)\n", 265 | "print(wrapped_article)" 266 | ] 267 | }, 268 | { 269 | "cell_type": "markdown", 270 | "metadata": { 271 | "id": "FNI4_vxqJxxl" 272 | }, 273 | "source": [ 274 | "\n", 275 | "We design a prompt to extract specific information from the news article in JSON format." 276 | ] 277 | }, 278 | { 279 | "cell_type": "code", 280 | "execution_count": null, 281 | "metadata": { 282 | "id": "Oh20xjYDwTs3" 283 | }, 284 | "outputs": [], 285 | "source": [ 286 | "results = []\n", 287 | "\n", 288 | "for article_text in articles_texts:\n", 289 | " prompt = f\"\"\"\n", 290 | " Identify the following items from the news article\n", 291 | " - Location of the incident\n", 292 | " - Number of people injured\n", 293 | " - Number of people killed\n", 294 | " - Short summary\n", 295 | "\n", 296 | " The news article is delimited with triple backticks.\n", 297 | " Format your response as a JSON object with 'location', 'num_killed' and \\\n", 298 | " 'summary' as the keys.\n", 299 | " If the information isn't present, use 'unknown' as the value.\n", 300 | " Make your response as short as possible.\n", 301 | "\n", 302 | " News article: '''{article_text}'''\n", 303 | " \"\"\"\n", 304 | " response = get_completion(prompt)\n", 305 | " results.append(json.loads(response))" 306 | ] 307 | }, 308 | { 309 | "cell_type": "markdown", 310 | "source": [ 311 | "We can turn the list of JSON responses to a Pandas DataFrame." 312 | ], 313 | "metadata": { 314 | "id": "xZ1yUSmxS9tv" 315 | } 316 | }, 317 | { 318 | "cell_type": "code", 319 | "execution_count": null, 320 | "metadata": { 321 | "id": "WOM9IKD1xiBC", 322 | "colab": { 323 | "base_uri": "https://localhost:8080/", 324 | "height": 143 325 | }, 326 | "outputId": "d2438bc4-6077-4eb7-eb46-2da6ae4ddc40" 327 | }, 328 | "outputs": [ 329 | { 330 | "output_type": "execute_result", 331 | "data": { 332 | "text/plain": [ 333 | " location num_killed \\\n", 334 | "0 Dhenkanal, Odisha 2 \n", 335 | "1 Jharkhand's Latehar district 3 \n", 336 | "2 Perumugai in the T.N. Palayam block 1 \n", 337 | "\n", 338 | " summary \n", 339 | "0 2 persons trampled to death by elephants in 2 ... \n", 340 | "1 Three members of a family, including a 3-year-... \n", 341 | "2 Wild elephant Karuppan trampled a 48-year-old ... " 342 | ], 343 | "text/html": [ 344 | "\n", 345 | "
\n", 346 | "
\n", 347 | "\n", 360 | "\n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | "
locationnum_killedsummary
0Dhenkanal, Odisha22 persons trampled to death by elephants in 2 ...
1Jharkhand's Latehar district3Three members of a family, including a 3-year-...
2Perumugai in the T.N. Palayam block1Wild elephant Karuppan trampled a 48-year-old ...
\n", 390 | "
\n", 391 | "
\n", 392 | "\n", 393 | "
\n", 394 | " \n", 402 | "\n", 403 | " \n", 443 | "\n", 444 | " \n", 468 | "
\n", 469 | "\n", 470 | "\n", 471 | "
\n", 472 | " \n", 483 | "\n", 484 | "\n", 573 | "\n", 574 | " \n", 596 | "
\n", 597 | "\n", 598 | "
\n", 599 | " \n", 630 | " \n", 639 | " \n", 651 | "
\n", 652 | "\n", 653 | "
\n", 654 | "
\n" 655 | ], 656 | "application/vnd.google.colaboratory.intrinsic+json": { 657 | "type": "dataframe", 658 | "variable_name": "df", 659 | "summary": "{\n \"name\": \"df\",\n \"rows\": 3,\n \"fields\": [\n {\n \"column\": \"location\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Dhenkanal, Odisha\",\n \"Jharkhand's Latehar district\",\n \"Perumugai in the T.N. Palayam block\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"num_killed\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1,\n \"min\": 1,\n \"max\": 3,\n \"num_unique_values\": 3,\n \"samples\": [\n 2,\n 3,\n 1\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"summary\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"2 persons trampled to death by elephants in 2 days in Odisha's Dhenkanal district.\",\n \"Three members of a family, including a 3-year-old girl, were trampled to death by elephants in Jharkhand's Latehar district.\",\n \"Wild elephant Karuppan trampled a 48-year-old daily wage worker to death, prompting the Forest Department to plan its capture and translocation.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" 660 | } 661 | }, 662 | "metadata": {}, 663 | "execution_count": 24 664 | } 665 | ], 666 | "source": [ 667 | "df = pd.DataFrame.from_dict(results)\n", 668 | "df" 669 | ] 670 | }, 671 | { 672 | "cell_type": "markdown", 673 | "metadata": { 674 | "id": "xN603rMXmCmB" 675 | }, 676 | "source": [ 677 | "## Geocode Locations\n", 678 | "\n", 679 | "We were able to extract the descriptive location name from the article. Now we can use a geocoding service to map the location to coordinates." 680 | ] 681 | }, 682 | { 683 | "cell_type": "code", 684 | "execution_count": null, 685 | "metadata": { 686 | "id": "lgyYJVXOL311" 687 | }, 688 | "outputs": [], 689 | "source": [ 690 | "locator = GoogleV3(api_key=google_maps_api_key)\n", 691 | "geocode_fn = RateLimiter(locator.geocode, min_delay_seconds=2)\n", 692 | "\n", 693 | "df['geocoded'] = df['location'].apply(geocode_fn)" 694 | ] 695 | }, 696 | { 697 | "cell_type": "code", 698 | "execution_count": null, 699 | "metadata": { 700 | "colab": { 701 | "base_uri": "https://localhost:8080/", 702 | "height": 178 703 | }, 704 | "id": "FNOP3eZWNRap", 705 | "outputId": "22441604-4d1e-4c04-c0d1-aaf706d11c29" 706 | }, 707 | "outputs": [ 708 | { 709 | "output_type": "execute_result", 710 | "data": { 711 | "text/plain": [ 712 | "0 (Dhenkanal, Odisha, India, (20.6504753, 85.598...\n", 713 | "1 (Latehar, Jharkhand, India, (23.7555791, 84.35...\n", 714 | "2 (Perumugai, Tamil Nadu 632009, India, (12.9376...\n", 715 | "Name: geocoded, dtype: object" 716 | ], 717 | "text/html": [ 718 | "
\n", 719 | "\n", 732 | "\n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | "
geocoded
0(Dhenkanal, Odisha, India, (20.6504753, 85.598...
1(Latehar, Jharkhand, India, (23.7555791, 84.35...
2(Perumugai, Tamil Nadu 632009, India, (12.9376...
\n", 754 | "

" 755 | ] 756 | }, 757 | "metadata": {}, 758 | "execution_count": 32 759 | } 760 | ], 761 | "source": [ 762 | "df['geocoded']" 763 | ] 764 | }, 765 | { 766 | "cell_type": "markdown", 767 | "source": [ 768 | "We extract the latitude and longitude from the geocoded response." 769 | ], 770 | "metadata": { 771 | "id": "3-mtiDCwTly1" 772 | } 773 | }, 774 | { 775 | "cell_type": "code", 776 | "execution_count": null, 777 | "metadata": { 778 | "id": "o-wbJRTfO0_A", 779 | "colab": { 780 | "base_uri": "https://localhost:8080/", 781 | "height": 143 782 | }, 783 | "outputId": "874896e6-ed52-43d2-96bc-64a0b482e454" 784 | }, 785 | "outputs": [ 786 | { 787 | "output_type": "execute_result", 788 | "data": { 789 | "text/plain": [ 790 | " location num_killed \\\n", 791 | "0 Dhenkanal, Odisha 2 \n", 792 | "1 Jharkhand's Latehar district 3 \n", 793 | "2 Perumugai in the T.N. Palayam block 1 \n", 794 | "\n", 795 | " summary latitude longitude \n", 796 | "0 2 persons trampled to death by elephants in 2 ... 20.650475 85.598122 \n", 797 | "1 Three members of a family, including a 3-year-... 23.755579 84.354205 \n", 798 | "2 Wild elephant Karuppan trampled a 48-year-old ... 12.937608 79.185825 " 799 | ], 800 | "text/html": [ 801 | "\n", 802 | "
\n", 803 | "
\n", 804 | "\n", 817 | "\n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | " \n", 828 | " \n", 829 | " \n", 830 | " \n", 831 | " \n", 832 | " \n", 833 | " \n", 834 | " \n", 835 | " \n", 836 | " \n", 837 | " \n", 838 | " \n", 839 | " \n", 840 | " \n", 841 | " \n", 842 | " \n", 843 | " \n", 844 | " \n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 850 | " \n", 851 | " \n", 852 | " \n", 853 | " \n", 854 | "
locationnum_killedsummarylatitudelongitude
0Dhenkanal, Odisha22 persons trampled to death by elephants in 2 ...20.65047585.598122
1Jharkhand's Latehar district3Three members of a family, including a 3-year-...23.75557984.354205
2Perumugai in the T.N. Palayam block1Wild elephant Karuppan trampled a 48-year-old ...12.93760879.185825
\n", 855 | "
\n", 856 | "
\n", 857 | "\n", 858 | "
\n", 859 | " \n", 867 | "\n", 868 | " \n", 908 | "\n", 909 | " \n", 933 | "
\n", 934 | "\n", 935 | "\n", 936 | "
\n", 937 | " \n", 948 | "\n", 949 | "\n", 1038 | "\n", 1039 | " \n", 1061 | "
\n", 1062 | "\n", 1063 | "
\n", 1064 | " \n", 1095 | " \n", 1104 | " \n", 1116 | "
\n", 1117 | "\n", 1118 | "
\n", 1119 | "
\n" 1120 | ], 1121 | "application/vnd.google.colaboratory.intrinsic+json": { 1122 | "type": "dataframe", 1123 | "variable_name": "df_output", 1124 | "summary": "{\n \"name\": \"df_output\",\n \"rows\": 3,\n \"fields\": [\n {\n \"column\": \"location\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Dhenkanal, Odisha\",\n \"Jharkhand's Latehar district\",\n \"Perumugai in the T.N. Palayam block\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"num_killed\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1,\n \"min\": 1,\n \"max\": 3,\n \"num_unique_values\": 3,\n \"samples\": [\n 2,\n 3,\n 1\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"summary\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"2 persons trampled to death by elephants in 2 days in Odisha's Dhenkanal district.\",\n \"Three members of a family, including a 3-year-old girl, were trampled to death by elephants in Jharkhand's Latehar district.\",\n \"Wild elephant Karuppan trampled a 48-year-old daily wage worker to death, prompting the Forest Department to plan its capture and translocation.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"latitude\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 5.570135999485992,\n \"min\": 12.937608,\n \"max\": 23.7555791,\n \"num_unique_values\": 3,\n \"samples\": [\n 20.6504753,\n 23.7555791,\n 12.937608\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"longitude\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 3.4004174577068067,\n \"min\": 79.1858252,\n \"max\": 85.5981223,\n \"num_unique_values\": 3,\n \"samples\": [\n 85.5981223,\n 84.3542049,\n 79.1858252\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" 1125 | } 1126 | }, 1127 | "metadata": {}, 1128 | "execution_count": 34 1129 | } 1130 | ], 1131 | "source": [ 1132 | "df['point'] = df['geocoded'].apply(lambda loc: tuple(loc.point) if loc else None)\n", 1133 | "df[['latitude', 'longitude', 'altitude']] = pd.DataFrame(df['point'].tolist(), index=df.index)\n", 1134 | "df_output = df[['location', 'num_killed', 'summary', 'latitude', 'longitude']].copy()\n", 1135 | "df_output" 1136 | ] 1137 | }, 1138 | { 1139 | "cell_type": "markdown", 1140 | "source": [ 1141 | "Turn the Pandas Dataframe to a GeoPandas GeoDataFrame so we can display the results on a map." 1142 | ], 1143 | "metadata": { 1144 | "id": "-a1zIiUtUQMa" 1145 | } 1146 | }, 1147 | { 1148 | "cell_type": "code", 1149 | "execution_count": null, 1150 | "metadata": { 1151 | "id": "SxscanjIQHp-" 1152 | }, 1153 | "outputs": [], 1154 | "source": [ 1155 | "geometry = gpd.points_from_xy(df_output.longitude, df_output.latitude)\n", 1156 | "gdf = gpd.GeoDataFrame(df_output, crs='EPSG:4326', geometry=geometry)" 1157 | ] 1158 | }, 1159 | { 1160 | "cell_type": "code", 1161 | "execution_count": null, 1162 | "metadata": { 1163 | "colab": { 1164 | "base_uri": "https://localhost:8080/", 1165 | "height": 421 1166 | }, 1167 | "id": "jhS6D6RUO3m2", 1168 | "outputId": "f4c6590d-e3a3-472d-f0c4-7ec6d6404f4f" 1169 | }, 1170 | "outputs": [ 1171 | { 1172 | "output_type": "execute_result", 1173 | "data": { 1174 | "text/plain": [ 1175 | "" 1176 | ], 1177 | "text/html": [ 1178 | "" 1411 | ] 1412 | }, 1413 | "metadata": {}, 1414 | "execution_count": 40 1415 | } 1416 | ], 1417 | "source": [ 1418 | "bounds = gdf.total_bounds\n", 1419 | "\n", 1420 | "fig = Figure(width=700, height=400)\n", 1421 | "\n", 1422 | "m = folium.Map()\n", 1423 | "m.fit_bounds([[bounds[1],bounds[0]], [bounds[3],bounds[2]]])\n", 1424 | "\n", 1425 | "gdf.explore(\n", 1426 | " m=m,\n", 1427 | " tooltip=['location', 'num_killed'],\n", 1428 | " popup=['location', 'num_killed'],\n", 1429 | " marker_kwds=dict(radius=5))\n", 1430 | "\n", 1431 | "fig.add_child(m)" 1432 | ] 1433 | }, 1434 | { 1435 | "cell_type": "markdown", 1436 | "source": [ 1437 | "----\n", 1438 | "\n", 1439 | "If you want to give feedback or share your experience with this tutorial, please comment below. (requires GitHub account)\n", 1440 | "\n", 1441 | "\n", 1442 | "" 1449 | ], 1450 | "metadata": { 1451 | "id": "3tajc-oHQbWL" 1452 | } 1453 | } 1454 | ], 1455 | "metadata": { 1456 | "colab": { 1457 | "provenance": [] 1458 | }, 1459 | "kernelspec": { 1460 | "display_name": "Python 3 (ipykernel)", 1461 | "language": "python", 1462 | "name": "python3" 1463 | }, 1464 | "language_info": { 1465 | "codemirror_mode": { 1466 | "name": "ipython", 1467 | "version": 3 1468 | }, 1469 | "file_extension": ".py", 1470 | "mimetype": "text/x-python", 1471 | "name": "python", 1472 | "nbconvert_exporter": "python", 1473 | "pygments_lexer": "ipython3", 1474 | "version": "3.13.1" 1475 | } 1476 | }, 1477 | "nbformat": 4, 1478 | "nbformat_minor": 0 1479 | } -------------------------------------------------------------------------------- /notebooks/xarray_create_raster.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "7999aec0-7a91-49b9-8c5a-062ea83f4ac2", 6 | "metadata": { 7 | "id": "7999aec0-7a91-49b9-8c5a-062ea83f4ac2" 8 | }, 9 | "source": [ 10 | "# Create a Raster from Array\n", 11 | "\n", 12 | "## Introduction\n", 13 | "\n", 14 | "It is sometimes useful to create a test image of known pixel values for testing geospatial algorithms. Other times, you may have an array of pixels from a non-spatial packag that needs to be turned into a georeferenced raster. In this tutorial, we will see how XArray and `rioxarray` can be used to accomplish this.\n", 15 | "\n", 16 | "## Overview of the Task\n", 17 | "\n", 18 | "We use xarray to create a DataArray from an array of pixels and use `rioxarray` to assign a CRS and create a GeoTIFF image.\n", 19 | "\n", 20 | "**Inputs**:\n", 21 | "* An array of pixel values, coordinates of the upper left pixel, pixel resolution and the CRS.\n", 22 | "\n", 23 | "**Output**:\n", 24 | "* `image.tif` : A georeferenced image with the supplied pixel values.\n" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "id": "758a4aff-2602-437d-a2ff-ce18e0c51a73", 30 | "metadata": { 31 | "id": "758a4aff-2602-437d-a2ff-ce18e0c51a73" 32 | }, 33 | "source": [ 34 | "## Setup and Data Download\n", 35 | "\n", 36 | "The following blocks of code will install the required packages and download the datasets to your Colab environment." 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 3, 42 | "id": "d41381b0-fc9e-46f9-97b1-c959e693f940", 43 | "metadata": { 44 | "id": "d41381b0-fc9e-46f9-97b1-c959e693f940" 45 | }, 46 | "outputs": [], 47 | "source": [ 48 | "%%capture\n", 49 | "if 'google.colab' in str(get_ipython()):\n", 50 | " !pip install rioxarray" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 4, 56 | "id": "3202cc14-9826-4644-b480-84aeb423487f", 57 | "metadata": { 58 | "id": "3202cc14-9826-4644-b480-84aeb423487f" 59 | }, 60 | "outputs": [], 61 | "source": [ 62 | "import os\n", 63 | "import numpy as np\n", 64 | "import xarray as xr\n", 65 | "import rioxarray as rxr" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": 5, 71 | "id": "57146443-2e36-47dd-94a7-7ae8c68fff18", 72 | "metadata": { 73 | "id": "57146443-2e36-47dd-94a7-7ae8c68fff18" 74 | }, 75 | "outputs": [], 76 | "source": [ 77 | "output_folder = 'output'\n", 78 | "\n", 79 | "if not os.path.exists(output_folder):\n", 80 | " os.mkdir(output_folder)" 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "id": "cbc5fb0f-dee4-47a0-b710-c17d4ce59fc8", 86 | "metadata": { 87 | "id": "cbc5fb0f-dee4-47a0-b710-c17d4ce59fc8" 88 | }, 89 | "source": [ 90 | "We want to create an image with a resolution of `100` meters with the upper-left pixel at coordinates `(780850,1432187)` in the CRS `EPSG:32643 `(WGS84 / UTM Zone 43N)." 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": 16, 96 | "id": "21dd61df-2726-4792-b74d-05faa7d9ed08", 97 | "metadata": { 98 | "id": "21dd61df-2726-4792-b74d-05faa7d9ed08" 99 | }, 100 | "outputs": [], 101 | "source": [ 102 | "upper_left_x, upper_left_y = (780850,1432187)\n", 103 | "resolution = 1000 # 1 km\n", 104 | "crs = 'EPSG:32643'" 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "id": "c2185e8d-f6c0-4d90-a8cc-1b24d4a6860b", 110 | "metadata": { 111 | "id": "c2185e8d-f6c0-4d90-a8cc-1b24d4a6860b" 112 | }, 113 | "source": [ 114 | "If we wanted the resulting image with known pixel values, we can define a 2-dimentional array. Since we are storing small integers values set the data type to **Byte** (uint8). If you had larger integers or floating point numbers, you can use appropriate data type." 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": 18, 120 | "id": "9dc24184-de76-45cf-93fb-227afcafc1dd", 121 | "metadata": { 122 | "id": "9dc24184-de76-45cf-93fb-227afcafc1dd" 123 | }, 124 | "outputs": [], 125 | "source": [ 126 | "array = np.array([\n", 127 | " [0, 0, 1, 1],\n", 128 | " [0, 0, 1, 1],\n", 129 | " [0, 2, 2, 2],\n", 130 | " [2, 2, 3, 3]\n", 131 | "], dtype=np.uint8)" 132 | ] 133 | }, 134 | { 135 | "cell_type": "markdown", 136 | "id": "eVth5plQiEv5", 137 | "metadata": { 138 | "id": "eVth5plQiEv5" 139 | }, 140 | "source": [ 141 | "Another option is to create an array of random values. The following block create an array of 100 x 100 pixels with random values between 0 to 4.\n", 142 | "\n", 143 | " > array = np.random.randint(\n", 144 | " 0, 4, size=(100, 100)).astype(np.uint8)" 145 | ] 146 | }, 147 | { 148 | "cell_type": "markdown", 149 | "id": "d9318091-99df-4200-82f4-af04f87beedd", 150 | "metadata": { 151 | "id": "d9318091-99df-4200-82f4-af04f87beedd" 152 | }, 153 | "source": [ 154 | "Next, we need to assign X and Y coordinates for each pixels.\n", 155 | "We use `np.linspace` function to create a sequence of x and y coordinates for each pixel of the image." 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": 19, 161 | "id": "56b19b72-cb36-4f0d-b70b-f72831418e12", 162 | "metadata": { 163 | "colab": { 164 | "base_uri": "https://localhost:8080/" 165 | }, 166 | "id": "56b19b72-cb36-4f0d-b70b-f72831418e12", 167 | "outputId": "e817d773-98c2-4d96-9fcd-c5f698bc7928" 168 | }, 169 | "outputs": [ 170 | { 171 | "data": { 172 | "text/plain": [ 173 | "(array([780850, 781850, 782850, 783850], dtype=uint64),\n", 174 | " array([1432187, 1431187, 1430187, 1429187], dtype=uint64))" 175 | ] 176 | }, 177 | "execution_count": 19, 178 | "metadata": {}, 179 | "output_type": "execute_result" 180 | } 181 | ], 182 | "source": [ 183 | "num_pixels = array.shape[0]\n", 184 | "\n", 185 | "x_coords = np.linspace(\n", 186 | " start=upper_left_x,\n", 187 | " stop=upper_left_x + (resolution*(num_pixels-1)),\n", 188 | " num=num_pixels, dtype=np.uint)\n", 189 | "y_coords = np.linspace(\n", 190 | " start=upper_left_y,\n", 191 | " stop=upper_left_y - (resolution*(num_pixels-1)),\n", 192 | " num=num_pixels, dtype=np.uint)\n", 193 | "x_coords, y_coords" 194 | ] 195 | }, 196 | { 197 | "cell_type": "markdown", 198 | "id": "11aAI1gBDKik", 199 | "metadata": { 200 | "id": "11aAI1gBDKik" 201 | }, 202 | "source": [ 203 | "Now we create a DataArray and assign `x` and `y` coordinates." 204 | ] 205 | }, 206 | { 207 | "cell_type": "code", 208 | "execution_count": 20, 209 | "id": "25a4f7f5-3c7f-4568-873f-cae8e727dd43", 210 | "metadata": { 211 | "colab": { 212 | "base_uri": "https://localhost:8080/", 213 | "height": 271 214 | }, 215 | "id": "25a4f7f5-3c7f-4568-873f-cae8e727dd43", 216 | "outputId": "62e3466e-8f77-4b80-cbf2-807deae8e0af" 217 | }, 218 | "outputs": [ 219 | { 220 | "data": { 221 | "text/html": [ 222 | "
\n", 223 | "\n", 224 | "\n", 225 | "\n", 226 | "\n", 227 | "\n", 228 | "\n", 229 | "\n", 230 | "\n", 231 | "\n", 232 | "\n", 233 | "\n", 234 | "\n", 235 | "\n", 236 | "\n", 237 | "
<xarray.DataArray (y: 4, x: 4)> Size: 16B\n",
 593 |        "array([[0, 0, 1, 1],\n",
 594 |        "       [0, 0, 1, 1],\n",
 595 |        "       [0, 2, 2, 2],\n",
 596 |        "       [2, 2, 3, 3]], dtype=uint8)\n",
 597 |        "Coordinates:\n",
 598 |        "  * y        (y) uint64 32B 1432187 1431187 1430187 1429187\n",
 599 |        "  * x        (x) uint64 32B 780850 781850 782850 783850
" 603 | ], 604 | "text/plain": [ 605 | " Size: 16B\n", 606 | "array([[0, 0, 1, 1],\n", 607 | " [0, 0, 1, 1],\n", 608 | " [0, 2, 2, 2],\n", 609 | " [2, 2, 3, 3]], dtype=uint8)\n", 610 | "Coordinates:\n", 611 | " * y (y) uint64 32B 1432187 1431187 1430187 1429187\n", 612 | " * x (x) uint64 32B 780850 781850 782850 783850" 613 | ] 614 | }, 615 | "execution_count": 20, 616 | "metadata": {}, 617 | "output_type": "execute_result" 618 | } 619 | ], 620 | "source": [ 621 | "da = xr.DataArray(\n", 622 | " data=array,\n", 623 | " coords={\n", 624 | " 'y': y_coords,\n", 625 | " 'x': x_coords\n", 626 | " }\n", 627 | ")\n", 628 | "da" 629 | ] 630 | }, 631 | { 632 | "cell_type": "markdown", 633 | "id": "234aea67-cb18-431d-8d6a-91b846fb325b", 634 | "metadata": { 635 | "id": "234aea67-cb18-431d-8d6a-91b846fb325b" 636 | }, 637 | "source": [ 638 | "Next, we assign a CRS. The `rioxarray` extension provides a `rio` accessor that allows us to set a CRS.\n", 639 | "\n", 640 | "> Even though we are not using `rioxarray` directly, we still need to import it which activates the `rio` accessor in xarray." 641 | ] 642 | }, 643 | { 644 | "cell_type": "code", 645 | "execution_count": 21, 646 | "id": "aa854623-54d5-4d42-8b10-104561582c31", 647 | "metadata": { 648 | "colab": { 649 | "base_uri": "https://localhost:8080/", 650 | "height": 292 651 | }, 652 | "id": "aa854623-54d5-4d42-8b10-104561582c31", 653 | "outputId": "e335f9a9-60ec-49f7-d176-b1118b3cc664" 654 | }, 655 | "outputs": [ 656 | { 657 | "data": { 658 | "text/html": [ 659 | "
\n", 660 | "\n", 661 | "\n", 662 | "\n", 663 | "\n", 664 | "\n", 665 | "\n", 666 | "\n", 667 | "\n", 668 | "\n", 669 | "\n", 670 | "\n", 671 | "\n", 672 | "\n", 673 | "\n", 674 | "
<xarray.DataArray (y: 4, x: 4)> Size: 16B\n",
1030 |        "array([[0, 0, 1, 1],\n",
1031 |        "       [0, 0, 1, 1],\n",
1032 |        "       [0, 2, 2, 2],\n",
1033 |        "       [2, 2, 3, 3]], dtype=uint8)\n",
1034 |        "Coordinates:\n",
1035 |        "  * y            (y) uint64 32B 1432187 1431187 1430187 1429187\n",
1036 |        "  * x            (x) uint64 32B 780850 781850 782850 783850\n",
1037 |        "    spatial_ref  int64 8B 0
" 1041 | ], 1042 | "text/plain": [ 1043 | " Size: 16B\n", 1044 | "array([[0, 0, 1, 1],\n", 1045 | " [0, 0, 1, 1],\n", 1046 | " [0, 2, 2, 2],\n", 1047 | " [2, 2, 3, 3]], dtype=uint8)\n", 1048 | "Coordinates:\n", 1049 | " * y (y) uint64 32B 1432187 1431187 1430187 1429187\n", 1050 | " * x (x) uint64 32B 780850 781850 782850 783850\n", 1051 | " spatial_ref int64 8B 0" 1052 | ] 1053 | }, 1054 | "execution_count": 21, 1055 | "metadata": {}, 1056 | "output_type": "execute_result" 1057 | } 1058 | ], 1059 | "source": [ 1060 | "da = da.rio.write_crs(crs)\n", 1061 | "da" 1062 | ] 1063 | }, 1064 | { 1065 | "cell_type": "markdown", 1066 | "id": "26fe6315-3cdc-4f22-a9ad-7e1dedbe9f53", 1067 | "metadata": { 1068 | "id": "26fe6315-3cdc-4f22-a9ad-7e1dedbe9f53" 1069 | }, 1070 | "source": [ 1071 | "Now we can save the DataArray in any of the supported raster formats." 1072 | ] 1073 | }, 1074 | { 1075 | "cell_type": "code", 1076 | "execution_count": 22, 1077 | "id": "33d724ae-dbd3-4b2c-9307-1af6b68b7fe6", 1078 | "metadata": { 1079 | "id": "33d724ae-dbd3-4b2c-9307-1af6b68b7fe6" 1080 | }, 1081 | "outputs": [], 1082 | "source": [ 1083 | "output_file = 'image.tif'\n", 1084 | "output_path = os.path.join(output_folder, output_file)\n", 1085 | "da.rio.to_raster(output_path)" 1086 | ] 1087 | }, 1088 | { 1089 | "cell_type": "markdown", 1090 | "id": "d6652700-bd98-4024-a7a9-cdb66f4e65c2", 1091 | "metadata": {}, 1092 | "source": [ 1093 | "----\n", 1094 | "\n", 1095 | "If you want to give feedback or share your experience with this tutorial, please comment below. (requires GitHub account)\n", 1096 | "\n", 1097 | "\n", 1098 | "" 1105 | ] 1106 | } 1107 | ], 1108 | "metadata": { 1109 | "colab": { 1110 | "provenance": [] 1111 | }, 1112 | "kernelspec": { 1113 | "display_name": "Python 3 (ipykernel)", 1114 | "language": "python", 1115 | "name": "python3" 1116 | }, 1117 | "language_info": { 1118 | "codemirror_mode": { 1119 | "name": "ipython", 1120 | "version": 3 1121 | }, 1122 | "file_extension": ".py", 1123 | "mimetype": "text/x-python", 1124 | "name": "python", 1125 | "nbconvert_exporter": "python", 1126 | "pygments_lexer": "ipython3", 1127 | "version": "3.13.1" 1128 | } 1129 | }, 1130 | "nbformat": 4, 1131 | "nbformat_minor": 5 1132 | } 1133 | -------------------------------------------------------------------------------- /notebooks/xarray_mosaic_and_clip.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "364040d8-fefb-46a2-b134-a9f1f4d57603", 6 | "metadata": { 7 | "id": "364040d8-fefb-46a2-b134-a9f1f4d57603" 8 | }, 9 | "source": [ 10 | "# Raster Mosaicing and Clipping" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "id": "f6134d7a-b039-4960-ad9f-c9843c3cb79e", 16 | "metadata": { 17 | "id": "f6134d7a-b039-4960-ad9f-c9843c3cb79e" 18 | }, 19 | "source": [ 20 | "## Introduction\n", 21 | "\n", 22 | "The [`rioxarray`](https://corteva.github.io/rioxarray/stable/) extension adds support for working with raster data using XArray. In this tutorial, we explore basic raster processing of reading, mosaicing and clipping raster data using XArray.\n", 23 | "\n", 24 | "## Overview of the Task\n", 25 | "\n", 26 | "We will work with elevation data for Sri Lanka in the form of individual SRTM tiles, merge them and clip the resulting mosaic to the country boundary. We will also save the resulting raster as a Cloud-optimized GeoTIFF (COG).\n", 27 | "\n", 28 | "\n", 29 | "**Input Layers**:\n", 30 | "* `[NXXEYYY].SRTMGL1.hgt.zip`: Zipped raster tiles in SRTM HGT format\n", 31 | "* `ne_10m_admin_0_countries.zip`: A shapefile of country boundaries\n", 32 | "\n", 33 | "**Output**:\n", 34 | "* `clipped.tif`: A clipped and mosaiced GeoTiff file for Sri Lanka.\n", 35 | "\n", 36 | "**Data Credit**:\n", 37 | "- NASA Shuttle Radar Topography Mission Global 1 arc second provided by The Land Processes Distributed Active Archive Center (LP DAAC). Downloaded using the [30-Meter SRTM Tile Downloader](https://dwtkns.com/srtm30m/).\n", 38 | "- Made with Natural Earth. Free vector and raster map data @ naturalearthdata.com.\n", 39 | "\n", 40 | "\n", 41 | "**Watch Video Walkthrough** " 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "id": "85dbaeb4-655e-4743-809f-823bc75400c4", 47 | "metadata": { 48 | "id": "85dbaeb4-655e-4743-809f-823bc75400c4", 49 | "tags": [] 50 | }, 51 | "source": [ 52 | "## Setup and Data Download\n", 53 | "\n", 54 | "The following blocks of code will install the required packages and download the datasets to your Colab environment." 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": null, 60 | "id": "27ab242e-e140-4afe-a6e8-c9f044bec425", 61 | "metadata": { 62 | "id": "27ab242e-e140-4afe-a6e8-c9f044bec425" 63 | }, 64 | "outputs": [], 65 | "source": [ 66 | "%%capture\n", 67 | "if 'google.colab' in str(get_ipython()):\n", 68 | " !pip install rioxarray" 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": null, 74 | "id": "3decd0de-ae44-4d80-80b6-51223cc36ffe", 75 | "metadata": { 76 | "id": "3decd0de-ae44-4d80-80b6-51223cc36ffe" 77 | }, 78 | "outputs": [], 79 | "source": [ 80 | "import geopandas as gpd\n", 81 | "import matplotlib.pyplot as plt\n", 82 | "import os\n", 83 | "import rioxarray as rxr\n", 84 | "from rioxarray.merge import merge_arrays\n", 85 | "import shapely\n", 86 | "import xarray as xr" 87 | ] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "execution_count": null, 92 | "id": "e1c0ed63-39db-4689-9a65-f6eb7de98f19", 93 | "metadata": { 94 | "id": "e1c0ed63-39db-4689-9a65-f6eb7de98f19" 95 | }, 96 | "outputs": [], 97 | "source": [ 98 | "data_folder = 'data'\n", 99 | "output_folder = 'output'\n", 100 | "\n", 101 | "if not os.path.exists(data_folder):\n", 102 | " os.mkdir(data_folder)\n", 103 | "if not os.path.exists(output_folder):\n", 104 | " os.mkdir(output_folder)" 105 | ] 106 | }, 107 | { 108 | "cell_type": "code", 109 | "execution_count": null, 110 | "id": "782466e6-1f57-437f-a564-dda8caeb87db", 111 | "metadata": { 112 | "id": "782466e6-1f57-437f-a564-dda8caeb87db" 113 | }, 114 | "outputs": [], 115 | "source": [ 116 | "def download(url):\n", 117 | " filename = os.path.join(data_folder, os.path.basename(url))\n", 118 | " if not os.path.exists(filename):\n", 119 | " from urllib.request import urlretrieve\n", 120 | " local, _ = urlretrieve(url, filename)\n", 121 | " print('Downloaded ' + local)\n", 122 | "\n", 123 | "srtm_tiles = [\n", 124 | " 'N05E080.SRTMGL1.hgt.zip',\n", 125 | " 'N06E079.SRTMGL1.hgt.zip',\n", 126 | " 'N06E080.SRTMGL1.hgt.zip',\n", 127 | " 'N06E081.SRTMGL1.hgt.zip',\n", 128 | " 'N07E079.SRTMGL1.hgt.zip',\n", 129 | " 'N07E080.SRTMGL1.hgt.zip',\n", 130 | " 'N07E081.SRTMGL1.hgt.zip',\n", 131 | " 'N08E079.SRTMGL1.hgt.zip',\n", 132 | " 'N08E080.SRTMGL1.hgt.zip',\n", 133 | " 'N08E081.SRTMGL1.hgt.zip',\n", 134 | " 'N09E080.SRTMGL1.hgt.zip',\n", 135 | " 'N09E079.SRTMGL1.hgt.zip'\n", 136 | "]\n", 137 | "\n", 138 | "shapefile = 'ne_10m_admin_0_countries_ind.zip'\n", 139 | "\n", 140 | "data_url = 'https://github.com/spatialthoughts/geopython-tutorials/releases/download/data/'\n", 141 | "\n", 142 | "for tile in srtm_tiles:\n", 143 | " url = '{}/{}'.format(data_url, tile)\n", 144 | " download(url)\n", 145 | "\n", 146 | "download('{}/{}'.format(data_url,shapefile))" 147 | ] 148 | }, 149 | { 150 | "cell_type": "markdown", 151 | "id": "31760f89-975f-42df-9599-13466772da61", 152 | "metadata": { 153 | "id": "31760f89-975f-42df-9599-13466772da61" 154 | }, 155 | "source": [ 156 | "## Procedure" 157 | ] 158 | }, 159 | { 160 | "cell_type": "markdown", 161 | "id": "WaBA4q4FFupB", 162 | "metadata": { 163 | "id": "WaBA4q4FFupB" 164 | }, 165 | "source": [ 166 | "For this tutorial, we want to mosaic the source tiles and clip them to the boundary of Sri Lanka. We read the Natural Earth administrative regions shapefile." 167 | ] 168 | }, 169 | { 170 | "cell_type": "code", 171 | "execution_count": null, 172 | "id": "HbDnBfLpGH4Q", 173 | "metadata": { 174 | "id": "HbDnBfLpGH4Q" 175 | }, 176 | "outputs": [], 177 | "source": [ 178 | "shapefile_path = os.path.join(data_folder, shapefile)\n", 179 | "boundaries_gdf = gpd.read_file(shapefile_path)" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "id": "CV7AAgNuGR4R", 185 | "metadata": { 186 | "id": "CV7AAgNuGR4R" 187 | }, 188 | "source": [ 189 | "We filter the dataframe using the ADM0_A3 column and extract the geometry." 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": null, 195 | "id": "81gdSJcYGQ7i", 196 | "metadata": { 197 | "id": "81gdSJcYGQ7i" 198 | }, 199 | "outputs": [], 200 | "source": [ 201 | "filtered_gdf = boundaries_gdf[boundaries_gdf['ADM0_A3'] == 'LKA']\n", 202 | "geometry = filtered_gdf.geometry\n", 203 | "geometry" 204 | ] 205 | }, 206 | { 207 | "cell_type": "markdown", 208 | "id": "36a7fe8d-e3a9-4137-a3f1-87898c4b6301", 209 | "metadata": { 210 | "id": "36a7fe8d-e3a9-4137-a3f1-87898c4b6301" 211 | }, 212 | "source": [ 213 | "Next, we read the zipped SRTM tiles using rioxarray. rioxarray uses GDAL to read raster datasets, and can read zipped SRTM files directly. We also specify `mask_and_scale=False` so the nodata values from the input rasters is preserved and not set to NaN." 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": null, 219 | "id": "e6428cb4-0d19-4ffd-bea4-e3109dae33d3", 220 | "metadata": { 221 | "id": "e6428cb4-0d19-4ffd-bea4-e3109dae33d3" 222 | }, 223 | "outputs": [], 224 | "source": [ 225 | "datasets = []\n", 226 | "for tile in srtm_tiles:\n", 227 | " zipfile = os.path.join(data_folder, tile)\n", 228 | " ds = rxr.open_rasterio(zipfile, mask_and_scale=False)\n", 229 | " datasets.append(ds)" 230 | ] 231 | }, 232 | { 233 | "cell_type": "markdown", 234 | "source": [ 235 | "We can get the bounding box of each image and create a GeoDataFrame to visualize the extent of the coverage along with the chosen region." 236 | ], 237 | "metadata": { 238 | "id": "5gRXiB8s8XpP" 239 | }, 240 | "id": "5gRXiB8s8XpP" 241 | }, 242 | { 243 | "cell_type": "code", 244 | "source": [ 245 | "bboxes = []\n", 246 | "for ds in datasets:\n", 247 | " bounds = ds.rio.bounds()\n", 248 | " bbox = shapely.box(*bounds) # Create a shapely box object\n", 249 | " bboxes.append(bbox)\n", 250 | "\n", 251 | "gdf = gpd.GeoDataFrame(geometry=bboxes, crs=datasets[0].rio.crs)" 252 | ], 253 | "metadata": { 254 | "id": "kXbQ-J5o7PmN" 255 | }, 256 | "id": "kXbQ-J5o7PmN", 257 | "execution_count": null, 258 | "outputs": [] 259 | }, 260 | { 261 | "cell_type": "markdown", 262 | "source": [ 263 | "Plot the bounding boxes and the chosen admin boundary." 264 | ], 265 | "metadata": { 266 | "id": "z0AQ-fkI9MWg" 267 | }, 268 | "id": "z0AQ-fkI9MWg" 269 | }, 270 | { 271 | "cell_type": "code", 272 | "source": [ 273 | "fig, ax = plt.subplots(1, 1)\n", 274 | "fig.set_size_inches(5,5)\n", 275 | "gdf.plot(\n", 276 | " ax=ax,\n", 277 | " facecolor='none',\n", 278 | " edgecolor='black',\n", 279 | " alpha=0.5)\n", 280 | "\n", 281 | "filtered_gdf.plot(\n", 282 | " ax=ax,\n", 283 | " facecolor='blue',\n", 284 | " alpha=0.5\n", 285 | ")\n", 286 | "ax.set_axis_off()\n", 287 | "plt.show()\n" 288 | ], 289 | "metadata": { 290 | "colab": { 291 | "base_uri": "https://localhost:8080/", 292 | "height": 422 293 | }, 294 | "id": "3EfywFl39UFt", 295 | "outputId": "dbf93e5a-ae27-4e40-a5d9-a2e050da1ff9" 296 | }, 297 | "id": "3EfywFl39UFt", 298 | "execution_count": null, 299 | "outputs": [ 300 | { 301 | "output_type": "display_data", 302 | "data": { 303 | "text/plain": [ 304 | "
" 305 | ], 306 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPgAAAGVCAYAAADTzDw7AAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAIwFJREFUeJzt3fl3k9edBvBHsiwZS94tG4NZzBaM2UzAQwcIEKalhDQEJqFJugxtmjkzyeF0zszfMD/PnJPOnGnTpNMmbUIbSJM2DdkaCBCIiZ3EbGEzBHBsed8k27ItzQ/feGQbY7S80n3fq+dzjg+WLV59MX5073vf+95rC4fDYRCRluyqCyCi5GHAiTTGgBNpjAEn0hgDTqQxBpxIYww4kcYYcCKNMeBEGmPAiTTGgBNpjAEn0hgDTqQxBpxIYww4kcYYcCKNMeBEGmPAiTTGgBNpjAEn0hgDTqQxBpxIYww4kcYYcCKNMeBEGmPAiTTmMPqAHR0dCAaDRh/W8vr7++HxeFSXYSpOpxNFRUWqy9CaoQHv6OjAs88+a+QhtRAIBFBbW4uamhpkZ2erLsdU9u/fz5AnkaEBH2u59+zZA6/Xa+ShLe38+fOora3F9u3bsWzZMtXlmEJbWxsOHTrE3l6SGd5FBwCv14uysrJkHNqSfD4fAKCoqIg/F0opDrIRaYwBJ9IYA06kMQacSGMMOJHGGHAijTHgRBpjwIk0lpSJLhQxOgoEgzaEw9E9f2gIaGkB2tuBgQEgGARCIcDrBaqqAAf/xygG/HVJorNngXfeAS5dKsWNGz/Ab3/rxenTgNsNuFxAXx/Q2wsMDwPf/jbQ0QEcOyaPp/LJJ8CTT6b230DWxoAnyalTwOHDkcfhcAb6+jLQ1DT1869cAbKy7hxuALh5E+jqAgoKjK2V9MVzcIOFw8AHH0wMdzRsNuBb3wL27gXWrAGKi6d+3qVLiddI6YMtuIFaWoA//xm4dSv2v9vQAPT3A+XlwNq1wEMPAX4/cPkycPSotNyAfL5sGZCTY2ztpCcGPAGhEHDkCODzAd/4hnSd29riO9bIiLTOly4Bf/0rsG2bnKMPDgL/8A/AxYtyPh8IAC+/DOzbBzidRv5rSEfsoifgxg3gww8lfH4/kJcH7NplzLHffx+orZWW/be/BVasAHbvlu999RVw6BCiHpmn9MWAJ6C5OfL5/Pny57JlwJIlxr5OWxvwu9/Jcbdtk6998QXw3nvGvg7phwFPwPhz7fHd5aqq5LzWgQPA3/6tDMIBwIkTQF2d8a9F+mDA49TeDpw7F3nc0xP5fO5cGRU32tWrwGuvAQ88ACxcKF97802gsdH41yI9MOBxKioCSkoij0+ckHPicFgG2558Evj+94GZM4193bHJM3v3AqWlMtD3+9/HP7hHemPA42SzAVu2RB5/+inw7/8O/Od/yuPycmDRIuCf/gl49NF2uFzGJbC2Fjh5Evje92QK6+CgnKP7/Ya9BGmCAU9AZaW0omNGRmTqaXv7xOcVFY1g5szDWLo0YNhrHzkio/c/+YkMvnV1Aa+8IjUQjWHAE2CzATt3TjzfDoeB//kfOTfu7h7/3FFs2dKL3bsnTlLxeoHc3InHLS4G9u+XwbTpzuX/8heZCPPYY8DGjTKV9fXXefmMIhjwBM2dC9x/v3yekQGsWyet+unTwH/9l9wRNt6qVRLeLVuAzExp7TdtApYvjzzH75fz+IceAn76Uxk5z8q6/bXDYRl0a2iQy2d///fAhQsy240I4Ew2Q2zaJN3krCyZ7BIOyzXywcGpg+l0SsDvvVfeCD78UN4YNm4Ejh+XN4VDh2TSTH6+zFHfsgX47DPgo48m9gxGR4E//lFCvnMn8OMfS1fdbpfj2fkWntYYcIOMPxe32YBZs+783J4emXfucgFbt8pHb6903WfNkllsZ8/KCPzGjfJ3nE6gpgZYvVpuZjl1amJXvLER+NnPZKLNgw8CZ84AL74ot6GOr43SCwOugN0urWxfn5yDr10rrbndHpkJ99FH8vlkTiewfbs858AB6SWMCYfl2vy5c0BFhTznrbfkkt62bQC3RUs/7MApkJMjo99VVXL9+q235Hz9wgUJqcMB3HcfUFh452NUVMi19vz8qb9/7Rrw9ttAU5PMdvv5z+VuN0ovDDgi58yplJcHPPqo3ClWUiKXuQ4ckG51a2t0x/B65Y1i9uw7P2dsAYmeHuD556XrT+mDAYd0lZ9/Xv5MtYoKmQzzwAMyINfYKJfZfvUrOde+fn36a9sej9w6unTp3V9reBh49VU5xw+FjPoXkJnxHBzyyz4yAvzylzL7LDdXrkGnalEFu10G0JYtAw4elO71l1/Kx9Gj0mVfv14G4zIybv/7mZkydfWTT2RdN4dDBvHOngWqq+Xf0dUlo/CArPvW0iKX1aYa5Sd9MODj9PRE7s7q7jbu3u5oeTzAD34AvPFGJIyAvPkcPy7n09/97tShHHuTGG/btomTaJYvl0tq/f0yQea554DHH7/z8lBkfeyiQ1Y5LSqa+LXGRjXdWLtdJriM3S023rVrsppLNDPVWlrk2nhdnbwxhELSO/nXfwWefhrYvFlWh3nuOa7zpjO24JAu7o9/LJeXmprkUtT69eomidjt0nt49tnbV1mtqrrz9NVgULrlY6Eeb8YMYMECYPFiORXYulWWmTp8WN407r9frrkn4zZXUocB/5rbfXsXV5WBAQmq2z1x1hpwe+sdDktrXVcnk1uGhu58zLFr5G+/Ldfea2qAhx+WAbo//UmOs2sX13rTCQNuMl1dwAsv3HlE/8QJCf7QkAzCXb8us+BiMTAgA20ffSRrvS1fLpfrPvhAXvuxx+58fZ2shQE3kZER4A9/mP5yXW+vXOoywuioDOaNDejl5Ej4f/ELGZUfW2eOrIuDbCYQCEhrfPCgrJiqSl+fvMkEAsBvfiMLS/DWU2tjC67I+Nbz5k3V1dwuFJL7zVtaZBIONz20Jv63KdDdDfz615HdSsysvl7my+/dy91UrIhddAVOnbJGuMfcvCnn5XfaOJHMiwFXYPISTVbQ1yfz48fPsCPzY8AVmO7uLzMbGZGprh9/rLoSihYDrsCcOXIt26ref5/rsFsFA66A3S7rrFl1WmgwKOuwT55lR+bDgCuyahXwzDMyt9xKCyNmZMhKssGgzHrr7FRdEU3HQr9a+ikullVdnnlGdSXRC4dl9dZ/+zegrEy2TRodVV0V3Qmvg5vA+IUTzS4Ukmv43/se8Mgj0k2fahEKMge24BSzQEBC3tQk68KdOSM7uXBaq/kw4BSXYBB46SXZH83nkw0cfD7VVdFkDDjFbWRE1ndvbZWBQq7Yaj4MuAmoWM3VKOGwLPkUCsmiE1YaT0gHDLgJfPml6gqMMTDAWW5mw4CbwPXrqiswzsmTbMXNhAFXrKsr9buqJNPgIFtxM2HAFdPx7iy24ubBgCsUCgGffqq6CuM4HLJLam4uW3Gz4Ew2hRobY18R1cy8XtkVNTeXN6KYBVtwherrVVdgrOZm4D/+QzZPJHNgwBUJBGQWmI58PuDGDeDzz7mLqWrsok8yOiqriXZ0RL6Wny/b/FZUGLfc0s2bet+FdeSIXCHweoFZs1RXk74Y8EmGhuTmiWBw4tfHRrvnzpUteZctA1yu+F9Hp0tjUxlbVPLMGQZcJQZ8kuxs4F/+RQLY2Sk7el6+HNkE8MYN+XjrLWDTJtnAL9Y1w8Nh4MoVw0s3pfp6YMuWxN4MKX4M+BSys2X73oULgXXrZApmXZ1c+hmbNx4Mytpk9fWyMcDixdEf/8svgVu3klO72QwNyX5q99+vupL0xEG2KMyYIVvr/vSnwMqVka+7XHJO/vLLsV3Pbmgwvkaz8XgiS1EdPy47pFDqMeAxcDiA3buBDRvk8dAQUFkJPP20zCeP5nbJYDA9uufBIPDDHwJ79sh4xYEDcrpDqWV4Fz0QCMCn+Z3/y5cDIyOZ+PBDJ1591YbvfGcQ69ePIByeevCsra0NgUAA58714t13u9HSkh5rHL3wQhiPPDKADRtCqKvLxM9+5sKTT/pRUBCGz+dDR0eH9r8rsXI6nSgqKjLseLZw2LiFds6dO4dnnnkGlZWVyM7ONuqwpjU0lIVz59YhGHShquoT5OR0T/m8q1fb8M472aio2IZcK25rkgCHYwSVlfXweHrQ2elFYWErbDagvb0dR48exQMPPICSkhLVZZrK/v37DQs5B9kS4HINYunSenzxRTW++KIaq1cfQ2bmyITnBAJuXL++EqFQM2xWXQg9ASMjDty4sQhVVZ+gqKgVgFxFGGtXduzYgbVr16os0TTa2tpw6NAhBCdfo02AoQEvLCxETU0NnnjiCZSWlhp5aFMbGgJ8vgzk5q5Ffn6kQ9TUZMdrr83AihVtaG//ANXV1WnZWuXkhPHoo1tRWCg/m3ffdSE7uwGNjY2YM2cOysrKFFeoL8Nb8OzsbJSWlqbdf9r8+RMfX7wIHD4MZGYCbncADocDbrcbOWm4B29eHlBWlovCQqC9XQYk/f5FsNn0P41TjV10A42OyjXfa9fkl5jLCIt164DCQunpvP66zE/v6XGgvX0z56onGQNuILsdOH+e13wne+89WZixp0c+xgwMzMa5c9lYs0ZdbbrjdXAD2WzA+vWqqzCnGzcmhnvM6dMeBAKpryddMOAG83pVV2AtwaAdf/2r6ir0xYAbzONRXYH11NUBX32lugo9MeCkXDgs8/mvX+cCEUbjIJvBRkbu/hy6XV8f8L//K6Ptjz0GpOF0gaRgC24wnVdpSYXOTuDPf1ZdhT4YcIMx4Im7cYM7lRqFATfY2MovlJjTp1VXoAcG3GA8BzdGQwPXVjcCA26wefN4qcwIwaAMup05w9OeRDDgBsvIAFatUl2FHrq7gYMH5YPiw4AnQXW16gr0cuGC3IVGsWPAk6C4GJg9W3UV+giHo1vvjm7HgCdJrGul0/QaG1VXYE0MeJL096uuQC+3bsmClvy5xoYBTxK/X3UFegmFgJdeAt58U3Ul1sKAJ8HICDA4qLoK/fj9snAEu+vRY8CTgK138oyOAq+8wttLo8WAJwHPE5MrGARefZWzBqPBgCcBp1gmX2fn1EtA0UQMeBLwRonkW7hQNn6k6THgBuvqkpVJKHnmzwcWLJA152l6nI5hsM8/V12B/q5fl49584DyctXVmBtbcIOdP6+6gvRx4oTqCsyPLbiBzpwBWltVV6G/mTOBoiJ5M+3uBvLzVVdkXmzBDRAKAcePA6+9prqS9LBhA7Bzp9yEwjfU6bEFT1A4LME+c0Z1Jemjq0v2Odu+nauv3g0DnqCWFoY7mdxuYMcO2THmwgXpKR07JhsafuMbqqszP3bRE8Qpk8m1axewdClw6hRQWgr88z8Dc+fe+RLZzZuy2WFDA3d3BdiCJ4y/RMl16hTwzjuyoktDA/DUU8D3vy8bPY4XDgO//vXEOQg3b8q5ejpjC06m1tgYWa5pdFTmoE+1G6nNBuTlTfza6dPAlSvJr9HM2IInaGhIdQXppb0d+NWvZCR9zhz5WigkoZ9q3bZLl4BFi1Jbo5kw4AkYHpaBH0qt9nbg9deje266X0ZjFz0Bhw/LUkJkXj5feo+TMOAJ4E0l5jcwADQ1qa5CHQY8AVzYwRouXVJdgToMeJyCQQ6wmd2qVTJv/eJF1ZWow4DHia23+WVlyXRWny99l3diwOPEgJtfR4csDrFmTfpu68zLZHHq61NdAd3NrVsS7IceSt+RdAY8CsGgTKbIyop8jQE3v8FB4Oc/l+WdKiqAZctUV5R67KJH4erVieEGZBsdMr+ODpmyeviw6krUYMDvoqnp9hsbAE5wsZre3vS86sGA38XFi8A990z82sCAtAxkLS0tqitIPQZ8GgMDwOLFt7fgXHDfmtJxTzMGfBpXr049zbGtLfW1UOLS8dZRjqLfQTgsy/I2N8vqnTt2AGVl0qq//77q6ige7e3y/zrVmIqu2ILfQW1tZKTc55M1wQDgww+595hVDQ2l39UPBnyScFgWUXz77cjXVq0CHF/3dTo71dRFxrh8WXUFqcWAT9LeDhw8KBNbAFlUf+PGyPe5kqe1pdudZQz4JMXFwIwZkcdPPTVxF8vJE17IWpqa0us+AgZ8EpsNyMmJPHZMGoYMBlNbDxmjpgaYPVs+T6duOgM+Bbc78vno6J2/R9aRnS1jKYBc/kwXDPgUxkKcnz+xuw7IpneVlSkviRIUDgNVVYDdnl4LMTLgUxg7z66snPqaaXV1auuhxNnt8sY9Z44MpE7umemKAZ+kvT2yHdHSpVM/58aN1NVDxjh1Cjh7Fpg3Tx5PtXmCjhjwSbq7JeBZWZGF9cfr6wM+/jjlZVGCBgaAo0dlX7PNm2WwNB2mrjLgk9i//ok4HJHPxzt2LH2X/7G6tjZpxdeulT/HT2bSFeeiT1JaKn8ODMhkl/Eh7+oC6urU1EXG+Owz+Rjj9+t9ZYQt+CRuN1BQIIMw77wz8XsNDekzOJMubt5UXUFyMeBTKC+XP0+dAs6di3y9ujryPdLDqVN6L8jIgE9h/OBaSUnk89xc4PHHJ850I2u7fl3v238Z8CmMtdKVlZHbRMe43cD996e+Jkqe48f13b/M8EG2QCAAn89n9GFTKhQCBgc9KCoaRHPz7VtieL3A0JAbwWB0Kwf4/X4MDw/D7/ejj+stAzDfz+TNN0fw7W8PIhi0we1W02f3+XwIGHyB3tCAd3Z2ora2Fn19fcjOzjby0Cl39uw6HDuWh7lzL6Os7MsJM9r6+vJx4UI1RkYyozpWT08POjo6UF9fj9zxt6alMbP9TE6eDOMPf+jDnDmX4XINIitrAHZ7KKU19Pf34+LFi+js7ERZWZkhx+RlsjvIze1Cb28Brl+/Bz5fOVyuQTgcQdhsYbS3lyEcjn7dH9vX7w62dFor6C7M9jMJh23o78/FtWuVGBrKRlXVaeTmdqW0BvvX12SdTqdhxzQ04IWFhaipqcETTzyB0rELyhbV2WnDCy9MfYF08eLYjtXa2oru7m5UV1ejZPyoXRoz88/E4QCeeWYlMqProBnG5/PhwIED8Hg8hh3T8BY8OzsbpaWlhnUxVCkrA1askFHWRPn9fmRmZsLtdiOHQ/AAzP0zsdsBrzfntjsJU8Hlchl6PI6iT2PlStUVkAqhEPDmm3pMSWbApzFrluoKSJWzZ4EXXrB+yBnwaXi9QEaG6ipIFbfb+kt0MeDTyMiI3HxC6WfrVuvfiMKA30VVleoKSBUDB7OVYcDvYu1awOCBTbKICxdUV5A4BvwuXK6pV3Yh/X3+eWQDDKtiwKPA0fT01NwMHD5s7dtJOVU1Cgx4+qqtlUUhurpkh9mxtdWtgi14FBjw9NbcDAwOSpfdaq05Ax6FnBw9RlQpMY2N1ht4Y8CjYLOxFSdx6JAx9yekCgMeJQacAGBkBPjd74Bbt1RXEh0GPEoMePrasAF45BFg2TJ5HAwCL70EWGHhIgY8Sia7ZZlSpKIC+OY3geXLgb175XNABt0OHpRltEduX9XLNBjwKPGmk/RUWioj5z09wO9/L+sEjF0qa22VZbUvXzZva87r4ETTOHVKNrwIBqWlPn8eePppoLNTQh0Oy58XLgB79qiu9nYMeJSysmQ03WrXQSlxkxc6PX8e2LdPfhccDuCNN6Ql/7u/k7XzzYRd9ChlZnKgjcSRI8B770VO2+65R87F33pLaVlTYsBjcN99qisgszh5UnaaBYAlS4DCQuCLL2R7aTNhwGOwYIHqCshMjhyRLYltNlk3IByeuHOpGTDgMejvV10BmUkoJL8Tzc3A5s2yxNfx4+ZqxRnwGFhl9hKlzpUrwHPPAR9/DDz8sCzSaKZzcQY8SuEw0NKiugoymxMnpCV/913prm/cKKPsFy+qrkww4FE6eRKor1ddBZlZba1May0vB/7yF3OsBsOAR2nJEmBgQHUVZGZffSUTYLZtk5lvHR2qK2LAo1ZQAJhknzwysYaGyGQXM5zSMeBRysgA8vJUV0Fm5/FENktoblZbC8CAx6SgQHUFZHYzZ8qdZgADbjmFhaorILNrbIyM1bS0qL93gQGPAVtwmk5WFlBZGWnBBwaA3l61NTHgMWALTtPJzZX7xbOyIl8bHVVXD8DbRWPCFpym09oK1NUB69bJxoU9PeoHZtmCx4ABp7upr5dR9E2bgO3bgaNH1Z6HM+AxyMoCsrNVV0FmlpMD/OY3MsA2OCg3n7S3q6uHXfQYFRTcvsIH0ZjLl+XPlhb5CIVkdpvXq6YetuAx4kAbRcPnkz3NAOm2q+qmM+Ax4nk4RaOrK3KJ7OJF4NIlNXUw4DFiC07RuHQJ8Psjjxsa1NTBgMeILTjF4+pVNdfEGfAYsQWneAwOqlkRiAGPkccjSygTxUrFeTgDHiObjd10is/YJbRUYsDjwIBTPFpbU7/iKgMeB56HU7xSvRMpAx4HtuAUr1TvUsuAx4EtOMXLnuLEMeBxYAtO8WILbgH5+VxhleLDFtwCuMIqxYstuAWoXkiPrIstuAU0NQHd3aqrIKux2RhwS3C7gVWruLoLxSbV4QYY8JgFArL29e7dwL59qqshK0n1+TfAgMfM6ZR1toaGgJISufmEKBoMuAU4HEBxMXDhgjx2udTWQ9bBLrpFLFoEXLkio+kcUadosQW3iEWL5Na/Tz+VFTOJosEW3CIKC2UE/Y03VFdCVsIW3CJsNmnFiWLBFtxCGHCK1dBQ6l+TAY/T/PlqulxkXb29QH9/al+TAY+TywXMnau6CrKaq1dT+3oMeALYTadYXbmS2tdjwBPAgFOsenpS+3qG7y4aCATg8/mMPqwphcPA6KgbgcD0qz/4/X4MDw/D7/ejL9XLappUuv5MenpG0dw8MOX3fD4fAgZvXWtowDs7O1FbW4u+vj5kp8mtVvX1mzA4OGPa5/T09KCjowP19fXIzc1NUWXmlq4/k5KSJty6dW7K7/X39+PixYvo7OxEWVmZIa/HLnoChoay7hpuALB9vb6Tjes8/b90/ZlkZd25hbZ/faHc6XQa9nqGtuCFhYWoqanBE088gdLSUiMPbUrHjjnhct39P6O1tRXd3d2orq5GSUlJCiozv3T9mTz2WDXKy6fehdDn8+HAgQPwGHiLouHn4NnZ2SgtLTWsi2Fmvb1ATs7dn+f3+5GZmQm3242caP5CGkjHn0lWFrBmTc608ydcBt+eyC46UYosXMhFF4m0peKyKgNOlCIMOJGmysqiG68xGgOeAC7XRNFavFjN6zLgCUijqzuUIAbcgtLgUj8ZYMYMYPZsNa/NgCeAAadoLFyoZjUXgAFPSHExdxmlu1PVPQcY8IRkZgJFRaqrILObM0fdazPgCVL57kzWoPJmOQY8QdXVqisgM8vIULt2HwOeoJISoLxcdRVkVqrHaRhwA6xZo7oCMqvVq9W+PgNugKoq2XWUaLycHGDtWrU1MOAGcLkk5ETjVVbKlRaVGHCDsJtOk91zj+oKGHDDlJfLgAoRIJtiLFigugoG3DA2G1txEhkZwHe+Y45Zjgy4gVatUjfnmMxj0ybA61VdheCvo4HcbnOcd5FaZpoXwYAbjN10unZNdQURDLjBFi5UO/eY1DPTiuEMuMHsds5PT2czZgBLl6quIoIBTwLV0xNJnW3bAIfh24nEjwFPgoICc1wDpdSaOxe4917VVUzEgCcJB9vSS0YG8OCD5rj2PR4DniRLl8r5GKWHDRvMucouA54kDgewcqXqKigVioqA++5TXcXUGPAkWrhQdQWUCg8+aK6BtfEY8CTigoz6W70aqKhQXcWdMeBJlJ9vvkEXMk52NvCtb6muYnoMeBLZbBJy0tP27RJyM2PAk+j114GuLtVVUDIsWGCNQVQGPIkeeIDbG+mooADYtcsap18MeBK5XNwYQTcFBcC+fUBenupKomPSwX19zJ+vugIyyuzZwN691gk3wIAn3Zw5gM0WVl0GJcBmk1VaNm9Wu0tJPBjwJHO5gOLiEdVlUJzy8oA9e4B581RXEh8GPAWWL/erLoFiVF4u9/VXVQFZWaqriR8DngJLlgzC6exQXQZFIScH2LlT1tazwij53TDgKWCzAbm5F1SXQXcxbx7w6KOAx6O6EuMw4CnidHaqLoGm8Td/I9NOrTaIdjcMeIpkZvZwNN2EMjNlkwIrzEqLBwOeIjbbKLzeYdVl0DgFBcB3vwvMnKm6kuThTLYUWrEioLoE+tqiRcA//qPe4QbYgqdUcTFbcDO47z5gy5b02GaKAU+hzEyeg6vkcgG7d5tr3fJkS4P3MPPweEJcTlkRrxd46qn0CjfAgKfcxo2qK0g/y5YBP/lJeu7fzi56ilVUyF1JTU2qK9GfwyHn2hs26DErLR4MeIrZbNKKHziguhJ92WxyXXvrVi6ZxYArsHSpdBfb21VXopf8fAn2ypXp2R2fCgOuwFgr/sc/qq7EeubNA778MvI4O1vu+Fq5Uu4AS9eu+J0w4IqsWAEcPcpFGWPh9QI/+hHwy19Ka71ihUxY0W3+uJEYcEUyMoBHHgGefx4IhVRXYw3l5fLnj37EUEeLAVdo9mxZdbW5WXUl5rZggYyGz5kjjxnu6DHgim3fDrz8MjA0pLoS8ykrk7XQKit5bh0vBlyx+fOBxx8HXnoJGEnjpdvsdunNzJ4tXfHyctnbjcFODANuAvPny3K8r7ySPufjdnsYM2bcxPr1vdi6dSZmzZJ7s8lYnKpqEkuWyOqdOrdYmZmypfKuXcAPf9iK0tL3sXp1APPmMdzJwhbcRJYvl3PxP/1JdSXGKCyUa9RlZdL9LiyMvIF99hnvrEsFBtxk7r0XGBwE3n1XdSWJ2bEDqKnRu0diBQy4CW3YAAwPA0eOqK4keuXlMvFkYAAYHWW4zYIBN6ktW2Q+tcMB1NYCjY2qK5rajBnAww/LOuJkPgx4igwPD6OtrQ3NMcxqKSqSP7dtA7KynKivd2LYRKs+zZ49ip07B5GbG455sk5bWxuGzfSP0VRSAt7W1paMw1rW5cuX0dTUhBdffBFerzfu44RCNgwMeNDdXYy2tlkIBNwGVhmb8vJrCIWu4Be/iG+wrLW1FU1NTWhsbEQpN1EHgJje/KNlaMCdTicA4NChQ0Ye1vJaW1sBAJkJXguy28Nwu/vgdvdh1qxr8Ptz0NIyD62ts4woMyqZmUEsWnQGBQWJbcXkcMiv3gcffICGhgYjStOGx8CtVQwNeFFREfbv349gMGjkYS3P5/PB6XRi3759hrdW4TBQV5eJJUtGUFfnRF1d5E3EbgfKykbR1GTM5O2srDAefzyAoqK1CR/L5/MhLy8vKT8TK/N4PMjJyTHseIZ30YvGThxpgvz8fJSWlqKsrMzwY8/6ugG/5x65Q62rC+jvl3P4nBzgv/8b6Iixwa2okLXM8vJkRH9wUK5nz5qVa1jdyfyZkOAgm2acTplUMr5R3LNHrqsHAhLYsY+REeDcuYkry3g8Miq+aFHKS6ckYMDTwOzZwL59U39v82bg/Hng008l3N/8pl67a6Y7BjzN2e0yRXb5ctWVUDLwZhMijTHgRBpjwIk0xoATaYwBJ9IYA06kMQacSGMMOJHGGHAijTHgRBpjwIk0xoATaYwBJ9IYA06kMQacSGMMOJHGGHAijTHgRBpjwIk0xoATaYwBJ9IYA06kMQacSGMMOJHGGHAijTHgRBpjwIk0xoATaYwBJ9IYA06kMQacSGMMOJHGGHAijTHgRBpjwIk0xoATaYwBJ9IYA06kMQacSGMMOJHGGHAijTHgRBpjwIk0xoATaYwBJ9IYA06kMQacSGMMOJHGGHAijTHgRBpjwIk0xoATaYwBJ9IYA06kMQacSGMMOJHGGHAijTHgRBpjwIk0xoATaYwBJ9IYA06kMQacSGMO1QWkk7a2NtUlmEZzc7PqEtICA54CTqcTAHDo0CHFlZiPx+NRXYLWbOFwOKy6iHTQ0dGBYDCougxT8Xg8yMnJUV2G1hhwIo1xkI1IYww4kcYYcCKNMeBEGmPAiTTGgBNpjAEn0hgDTqQxBpxIYww4kcYYcCKNMeBEGmPAiTTGgBNpjAEn0hgDTqQxBpxIYww4kcYYcCKNMeBEGmPAiTTGgBNpjAEn0hgDTqSx/wPxwrnbXoQFBAAAAABJRU5ErkJggg==\n" 307 | }, 308 | "metadata": {} 309 | } 310 | ] 311 | }, 312 | { 313 | "cell_type": "markdown", 314 | "id": "9FcN19D1F7H1", 315 | "metadata": { 316 | "id": "9FcN19D1F7H1" 317 | }, 318 | "source": [ 319 | "Now that we have a list of XArray datasets, we can use the `merge_arrays` function from `rioxarray` to merge them into a mosaic." 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": null, 325 | "id": "9a963d20-5b86-4410-84c7-a11414ff1d8f", 326 | "metadata": { 327 | "id": "9a963d20-5b86-4410-84c7-a11414ff1d8f" 328 | }, 329 | "outputs": [], 330 | "source": [ 331 | "merged = merge_arrays(datasets)\n", 332 | "merged" 333 | ] 334 | }, 335 | { 336 | "cell_type": "markdown", 337 | "id": "sXDViQXSHdnH", 338 | "metadata": { 339 | "id": "sXDViQXSHdnH" 340 | }, 341 | "source": [ 342 | "Now we clip the merged raster using the `clip` function from `rioxarray`. For XArray datasets, we can use the `rio` accessor to run the `rioxarray` functions." 343 | ] 344 | }, 345 | { 346 | "cell_type": "code", 347 | "execution_count": null, 348 | "id": "ioN2PJdyG4ow", 349 | "metadata": { 350 | "id": "ioN2PJdyG4ow" 351 | }, 352 | "outputs": [], 353 | "source": [ 354 | "clipped = merged.rio.clip(filtered_gdf.geometry)" 355 | ] 356 | }, 357 | { 358 | "cell_type": "markdown", 359 | "id": "8maPy7tYNtZ6", 360 | "metadata": { 361 | "id": "8maPy7tYNtZ6" 362 | }, 363 | "source": [ 364 | "Last step is to save the results to disk as GeoTiff files. We use [Cloud-Optimized GeoTIFF (COG)](https://gdal.org/drivers/raster/cog.html) driver and specify additional GDAL [compression options](https://rasterio.readthedocs.io/en/stable/topics/image_options.html#creation-options)." 365 | ] 366 | }, 367 | { 368 | "cell_type": "code", 369 | "execution_count": null, 370 | "id": "UQk5MB6JMsuH", 371 | "metadata": { 372 | "id": "UQk5MB6JMsuH" 373 | }, 374 | "outputs": [], 375 | "source": [ 376 | "output_dem = 'clipped.tif'\n", 377 | "output_dem_path = os.path.join(output_folder, output_dem)\n", 378 | "clipped.rio.to_raster(\n", 379 | " output_dem_path, driver='COG', dtype='int16',\n", 380 | " compress='DEFLATE', predictor='YES')" 381 | ] 382 | }, 383 | { 384 | "cell_type": "markdown", 385 | "id": "3e1aaf19-e90b-4113-97a2-fc40a329654a", 386 | "metadata": { 387 | "id": "3e1aaf19-e90b-4113-97a2-fc40a329654a" 388 | }, 389 | "source": [ 390 | "----\n", 391 | "\n", 392 | "If you want to give feedback or share your experience with this tutorial, please comment below. (requires GitHub account)\n", 393 | "\n", 394 | "\n", 395 | "" 402 | ] 403 | } 404 | ], 405 | "metadata": { 406 | "colab": { 407 | "name": "raster_mosaicing_and_clipping.ipynb", 408 | "provenance": [] 409 | }, 410 | "kernelspec": { 411 | "display_name": "Python 3 (ipykernel)", 412 | "language": "python", 413 | "name": "python3" 414 | }, 415 | "language_info": { 416 | "codemirror_mode": { 417 | "name": "ipython", 418 | "version": 3 419 | }, 420 | "file_extension": ".py", 421 | "mimetype": "text/x-python", 422 | "name": "python", 423 | "nbconvert_exporter": "python", 424 | "pygments_lexer": "ipython3", 425 | "version": "3.13.1" 426 | } 427 | }, 428 | "nbformat": 4, 429 | "nbformat_minor": 5 430 | } -------------------------------------------------------------------------------- /references.bib: -------------------------------------------------------------------------------- 1 | --- 2 | --- 3 | 4 | @inproceedings{holdgraf_evidence_2014, 5 | address = {Brisbane, Australia, Australia}, 6 | title = {Evidence for {Predictive} {Coding} in {Human} {Auditory} {Cortex}}, 7 | booktitle = {International {Conference} on {Cognitive} {Neuroscience}}, 8 | publisher = {Frontiers in Neuroscience}, 9 | author = {Holdgraf, Christopher Ramsay and de Heer, Wendy and Pasley, Brian N. and Knight, Robert T.}, 10 | year = {2014} 11 | } 12 | 13 | @article{holdgraf_rapid_2016, 14 | title = {Rapid tuning shifts in human auditory cortex enhance speech intelligibility}, 15 | volume = {7}, 16 | issn = {2041-1723}, 17 | url = {http://www.nature.com/doifinder/10.1038/ncomms13654}, 18 | doi = {10.1038/ncomms13654}, 19 | number = {May}, 20 | journal = {Nature Communications}, 21 | author = {Holdgraf, Christopher Ramsay and de Heer, Wendy and Pasley, Brian N. and Rieger, Jochem W. and Crone, Nathan and Lin, Jack J. and Knight, Robert T. and Theunissen, Frédéric E.}, 22 | year = {2016}, 23 | pages = {13654}, 24 | file = {Holdgraf et al. - 2016 - Rapid tuning shifts in human auditory cortex enhance speech intelligibility.pdf:C\:\\Users\\chold\\Zotero\\storage\\MDQP3JWE\\Holdgraf et al. - 2016 - Rapid tuning shifts in human auditory cortex enhance speech intelligibility.pdf:application/pdf} 25 | } 26 | 27 | @inproceedings{holdgraf_portable_2017, 28 | title = {Portable learning environments for hands-on computational instruction using container-and cloud-based technology to teach data science}, 29 | volume = {Part F1287}, 30 | isbn = {978-1-4503-5272-7}, 31 | doi = {10.1145/3093338.3093370}, 32 | abstract = {© 2017 ACM. There is an increasing interest in learning outside of the traditional classroom setting. This is especially true for topics covering computational tools and data science, as both are challenging to incorporate in the standard curriculum. These atypical learning environments offer new opportunities for teaching, particularly when it comes to combining conceptual knowledge with hands-on experience/expertise with methods and skills. Advances in cloud computing and containerized environments provide an attractive opportunity to improve the effciency and ease with which students can learn. This manuscript details recent advances towards using commonly-Available cloud computing services and advanced cyberinfrastructure support for improving the learning experience in bootcamp-style events. We cover the benets (and challenges) of using a server hosted remotely instead of relying on student laptops, discuss the technology that was used in order to make this possible, and give suggestions for how others could implement and improve upon this model for pedagogy and reproducibility.}, 33 | booktitle = {{ACM} {International} {Conference} {Proceeding} {Series}}, 34 | author = {Holdgraf, Christopher Ramsay and Culich, A. and Rokem, A. and Deniz, F. and Alegro, M. and Ushizima, D.}, 35 | year = {2017}, 36 | keywords = {Teaching, Bootcamps, Cloud computing, Data science, Docker, Pedagogy} 37 | } 38 | 39 | @article{holdgraf_encoding_2017, 40 | title = {Encoding and decoding models in cognitive electrophysiology}, 41 | volume = {11}, 42 | issn = {16625137}, 43 | doi = {10.3389/fnsys.2017.00061}, 44 | abstract = {© 2017 Holdgraf, Rieger, Micheli, Martin, Knight and Theunissen. Cognitive neuroscience has seen rapid growth in the size and complexity of data recorded from the human brain as well as in the computational tools available to analyze this data. This data explosion has resulted in an increased use of multivariate, model-based methods for asking neuroscience questions, allowing scientists to investigate multiple hypotheses with a single dataset, to use complex, time-varying stimuli, and to study the human brain under more naturalistic conditions. These tools come in the form of “Encoding” models, in which stimulus features are used to model brain activity, and “Decoding” models, in which neural features are used to generated a stimulus output. Here we review the current state of encoding and decoding models in cognitive electrophysiology and provide a practical guide toward conducting experiments and analyses in this emerging field. Our examples focus on using linear models in the study of human language and audition. We show how to calculate auditory receptive fields from natural sounds as well as how to decode neural recordings to predict speech. The paper aims to be a useful tutorial to these approaches, and a practical introduction to using machine learning and applied statistics to build models of neural activity. The data analytic approaches we discuss may also be applied to other sensory modalities, motor systems, and cognitive systems, and we cover some examples in these areas. In addition, a collection of Jupyter notebooks is publicly available as a complement to the material covered in this paper, providing code examples and tutorials for predictive modeling in python. The aimis to provide a practical understanding of predictivemodeling of human brain data and to propose best-practices in conducting these analyses.}, 45 | journal = {Frontiers in Systems Neuroscience}, 46 | author = {Holdgraf, Christopher Ramsay and Rieger, J.W. and Micheli, C. and Martin, S. and Knight, R.T. and Theunissen, F.E.}, 47 | year = {2017}, 48 | keywords = {Decoding models, Encoding models, Electrocorticography (ECoG), Electrophysiology/evoked potentials, Machine learning applied to neuroscience, Natural stimuli, Predictive modeling, Tutorials} 49 | } 50 | 51 | @book{ruby, 52 | title = {The Ruby Programming Language}, 53 | author = {Flanagan, David and Matsumoto, Yukihiro}, 54 | year = {2008}, 55 | publisher = {O'Reilly Media} 56 | } 57 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | jupyter-book 2 | ghp-import 3 | --------------------------------------------------------------------------------