├── .gitignore ├── CONDUCT.md ├── LICENSE ├── README.md ├── _config.yml ├── _toc.yml ├── assignments ├── a1.md ├── a2.md ├── a3.md ├── a4.md ├── a5.md ├── a6.md └── a7.md ├── docs ├── assessment.md ├── license.md ├── policies.md ├── references.md ├── resources.md ├── schedule.md ├── syllabus.md └── team.md ├── files ├── 07_crs.zip └── landcover_chips.csv ├── intro.md ├── lectures ├── 01_unix.md ├── 02_git.md ├── 03_python-env.md ├── 04_jupyter.md ├── 05_landscape.md ├── 06_crs.md ├── 07_crs_python.ipynb ├── 08_docker.md ├── 09_earth_search_tutorial.ipynb ├── 10_dask_intro.ipynb ├── 11_stackstac.ipynb ├── 12_dask_dataframe.ipynb ├── 13_raster.md ├── 14_raster_analysis.ipynb ├── 15_raster_processing.ipynb ├── 16_vector.md ├── 17_vector_analysis.ipynb ├── 18_dask_geopandas_intro.ipynb ├── 19_scalable_vector_analysis.ipynb ├── 20_xarray_fundamentals.ipynb ├── 21_xarray_advanced.ipynb ├── 22_workflows_intro.md ├── 23_workflows_pseudocode.md ├── 24_workflows_best_practices.ipynb └── figures │ ├── EOSDIS-archive-2023.png │ ├── Tissot_mercator.png │ ├── change-drivers.png │ ├── docker-engine.svg │ ├── final-doc-phd-comics.gif │ ├── geoid-ellipsoid.png │ ├── git-add-commit.png │ ├── git-url.png │ ├── jupyterlab.png │ ├── local-datum.png │ ├── maxar-umbra.png │ ├── miniconda-vs-anaconda.png │ ├── orange-peel-earth.jpg │ ├── raster_resolution.png │ ├── rasters.png │ ├── unix_files.png │ ├── vectors.png │ ├── what-is-a-crs.png │ └── zarr.png ├── logo.png ├── references.bib └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | **/.DS_Store 2 | _build/* 3 | .vscode/ 4 | -------------------------------------------------------------------------------- /CONDUCT.md: -------------------------------------------------------------------------------- 1 | 2 | # Code of Conduct 3 | 4 | ## Our Pledge 5 | 6 | In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation. 7 | 8 | ## Our Standards 9 | 10 | Examples of behavior that contributes to creating a positive environment include: 11 | 12 | * Using welcoming and inclusive language 13 | * Being respectful of differing viewpoints and experiences 14 | * Gracefully accepting constructive criticism 15 | * Focusing on what is best for the community 16 | * Showing empathy towards other community members 17 | 18 | Examples of unacceptable behavior by participants include: 19 | 20 | * The use of sexualized language or imagery and unwelcome sexual attention or advances 21 | * Trolling, insulting/derogatory comments, and personal or political attacks 22 | * Public or private harassment 23 | * Publishing others' private information, such as a physical or electronic address, without explicit permission 24 | * Other conduct which could reasonably be considered inappropriate in a professional setting 25 | 26 | ## Our Responsibilities 27 | 28 | Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior. 29 | 30 | Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful. 31 | 32 | ## Scope 33 | 34 | This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers. 35 | 36 | ## Enforcement 37 | 38 | Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team. The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately. 39 | 40 | Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership. 41 | 42 | ## Attribution 43 | 44 | This Code of Conduct is adapted from the [Contributor Covenant, version 1.4](http://contributor-covenant.org/version/1/4). 45 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Attribution 4.0 International 2 | 3 | ======================================================================= 4 | 5 | Creative Commons Corporation ("Creative Commons") is not a law firm and 6 | does not provide legal services or legal advice. Distribution of 7 | Creative Commons public licenses does not create a lawyer-client or 8 | other relationship. Creative Commons makes its licenses and related 9 | information available on an "as-is" basis. Creative Commons gives no 10 | warranties regarding its licenses, any material licensed under their 11 | terms and conditions, or any related information. Creative Commons 12 | disclaims all liability for damages resulting from their use to the 13 | fullest extent possible. 14 | 15 | Using Creative Commons Public Licenses 16 | 17 | Creative Commons public licenses provide a standard set of terms and 18 | conditions that creators and other rights holders may use to share 19 | original works of authorship and other material subject to copyright 20 | and certain other rights specified in the public license below. The 21 | following considerations are for informational purposes only, are not 22 | exhaustive, and do not form part of our licenses. 23 | 24 | Considerations for licensors: Our public licenses are 25 | intended for use by those authorized to give the public 26 | permission to use material in ways otherwise restricted by 27 | copyright and certain other rights. Our licenses are 28 | irrevocable. Licensors should read and understand the terms 29 | and conditions of the license they choose before applying it. 30 | Licensors should also secure all rights necessary before 31 | applying our licenses so that the public can reuse the 32 | material as expected. Licensors should clearly mark any 33 | material not subject to the license. This includes other CC- 34 | licensed material, or material used under an exception or 35 | limitation to copyright. More considerations for licensors: 36 | wiki.creativecommons.org/Considerations_for_licensors 37 | 38 | Considerations for the public: By using one of our public 39 | licenses, a licensor grants the public permission to use the 40 | licensed material under specified terms and conditions. If 41 | the licensor's permission is not necessary for any reason--for 42 | example, because of any applicable exception or limitation to 43 | copyright--then that use is not regulated by the license. Our 44 | licenses grant only permissions under copyright and certain 45 | other rights that a licensor has authority to grant. Use of 46 | the licensed material may still be restricted for other 47 | reasons, including because others have copyright or other 48 | rights in the material. A licensor may make special requests, 49 | such as asking that all changes be marked or described. 50 | Although not required by our licenses, you are encouraged to 51 | respect those requests where reasonable. More considerations 52 | for the public: 53 | wiki.creativecommons.org/Considerations_for_licensees 54 | 55 | ======================================================================= 56 | 57 | Creative Commons Attribution 4.0 International Public License 58 | 59 | By exercising the Licensed Rights (defined below), You accept and agree 60 | to be bound by the terms and conditions of this Creative Commons 61 | Attribution 4.0 International Public License ("Public License"). To the 62 | extent this Public License may be interpreted as a contract, You are 63 | granted the Licensed Rights in consideration of Your acceptance of 64 | these terms and conditions, and the Licensor grants You such rights in 65 | consideration of benefits the Licensor receives from making the 66 | Licensed Material available under these terms and conditions. 67 | 68 | 69 | Section 1 -- Definitions. 70 | 71 | a. Adapted Material means material subject to Copyright and Similar 72 | Rights that is derived from or based upon the Licensed Material 73 | and in which the Licensed Material is translated, altered, 74 | arranged, transformed, or otherwise modified in a manner requiring 75 | permission under the Copyright and Similar Rights held by the 76 | Licensor. For purposes of this Public License, where the Licensed 77 | Material is a musical work, performance, or sound recording, 78 | Adapted Material is always produced where the Licensed Material is 79 | synched in timed relation with a moving image. 80 | 81 | b. Adapter's License means the license You apply to Your Copyright 82 | and Similar Rights in Your contributions to Adapted Material in 83 | accordance with the terms and conditions of this Public License. 84 | 85 | c. Copyright and Similar Rights means copyright and/or similar rights 86 | closely related to copyright including, without limitation, 87 | performance, broadcast, sound recording, and Sui Generis Database 88 | Rights, without regard to how the rights are labeled or 89 | categorized. For purposes of this Public License, the rights 90 | specified in Section 2(b)(1)-(2) are not Copyright and Similar 91 | Rights. 92 | 93 | d. Effective Technological Measures means those measures that, in the 94 | absence of proper authority, may not be circumvented under laws 95 | fulfilling obligations under Article 11 of the WIPO Copyright 96 | Treaty adopted on December 20, 1996, and/or similar international 97 | agreements. 98 | 99 | e. Exceptions and Limitations means fair use, fair dealing, and/or 100 | any other exception or limitation to Copyright and Similar Rights 101 | that applies to Your use of the Licensed Material. 102 | 103 | f. Licensed Material means the artistic or literary work, database, 104 | or other material to which the Licensor applied this Public 105 | License. 106 | 107 | g. Licensed Rights means the rights granted to You subject to the 108 | terms and conditions of this Public License, which are limited to 109 | all Copyright and Similar Rights that apply to Your use of the 110 | Licensed Material and that the Licensor has authority to license. 111 | 112 | h. Licensor means the individual(s) or entity(ies) granting rights 113 | under this Public License. 114 | 115 | i. Share means to provide material to the public by any means or 116 | process that requires permission under the Licensed Rights, such 117 | as reproduction, public display, public performance, distribution, 118 | dissemination, communication, or importation, and to make material 119 | available to the public including in ways that members of the 120 | public may access the material from a place and at a time 121 | individually chosen by them. 122 | 123 | j. Sui Generis Database Rights means rights other than copyright 124 | resulting from Directive 96/9/EC of the European Parliament and of 125 | the Council of 11 March 1996 on the legal protection of databases, 126 | as amended and/or succeeded, as well as other essentially 127 | equivalent rights anywhere in the world. 128 | 129 | k. You means the individual or entity exercising the Licensed Rights 130 | under this Public License. Your has a corresponding meaning. 131 | 132 | 133 | Section 2 -- Scope. 134 | 135 | a. License grant. 136 | 137 | 1. Subject to the terms and conditions of this Public License, 138 | the Licensor hereby grants You a worldwide, royalty-free, 139 | non-sublicensable, non-exclusive, irrevocable license to 140 | exercise the Licensed Rights in the Licensed Material to: 141 | 142 | a. reproduce and Share the Licensed Material, in whole or 143 | in part; and 144 | 145 | b. produce, reproduce, and Share Adapted Material. 146 | 147 | 2. Exceptions and Limitations. For the avoidance of doubt, where 148 | Exceptions and Limitations apply to Your use, this Public 149 | License does not apply, and You do not need to comply with 150 | its terms and conditions. 151 | 152 | 3. Term. The term of this Public License is specified in Section 153 | 6(a). 154 | 155 | 4. Media and formats; technical modifications allowed. The 156 | Licensor authorizes You to exercise the Licensed Rights in 157 | all media and formats whether now known or hereafter created, 158 | and to make technical modifications necessary to do so. The 159 | Licensor waives and/or agrees not to assert any right or 160 | authority to forbid You from making technical modifications 161 | necessary to exercise the Licensed Rights, including 162 | technical modifications necessary to circumvent Effective 163 | Technological Measures. For purposes of this Public License, 164 | simply making modifications authorized by this Section 2(a) 165 | (4) never produces Adapted Material. 166 | 167 | 5. Downstream recipients. 168 | 169 | a. Offer from the Licensor -- Licensed Material. Every 170 | recipient of the Licensed Material automatically 171 | receives an offer from the Licensor to exercise the 172 | Licensed Rights under the terms and conditions of this 173 | Public License. 174 | 175 | b. No downstream restrictions. You may not offer or impose 176 | any additional or different terms or conditions on, or 177 | apply any Effective Technological Measures to, the 178 | Licensed Material if doing so restricts exercise of the 179 | Licensed Rights by any recipient of the Licensed 180 | Material. 181 | 182 | 6. No endorsement. Nothing in this Public License constitutes or 183 | may be construed as permission to assert or imply that You 184 | are, or that Your use of the Licensed Material is, connected 185 | with, or sponsored, endorsed, or granted official status by, 186 | the Licensor or others designated to receive attribution as 187 | provided in Section 3(a)(1)(A)(i). 188 | 189 | b. Other rights. 190 | 191 | 1. Moral rights, such as the right of integrity, are not 192 | licensed under this Public License, nor are publicity, 193 | privacy, and/or other similar personality rights; however, to 194 | the extent possible, the Licensor waives and/or agrees not to 195 | assert any such rights held by the Licensor to the limited 196 | extent necessary to allow You to exercise the Licensed 197 | Rights, but not otherwise. 198 | 199 | 2. Patent and trademark rights are not licensed under this 200 | Public License. 201 | 202 | 3. To the extent possible, the Licensor waives any right to 203 | collect royalties from You for the exercise of the Licensed 204 | Rights, whether directly or through a collecting society 205 | under any voluntary or waivable statutory or compulsory 206 | licensing scheme. In all other cases the Licensor expressly 207 | reserves any right to collect such royalties. 208 | 209 | 210 | Section 3 -- License Conditions. 211 | 212 | Your exercise of the Licensed Rights is expressly made subject to the 213 | following conditions. 214 | 215 | a. Attribution. 216 | 217 | 1. If You Share the Licensed Material (including in modified 218 | form), You must: 219 | 220 | a. retain the following if it is supplied by the Licensor 221 | with the Licensed Material: 222 | 223 | i. identification of the creator(s) of the Licensed 224 | Material and any others designated to receive 225 | attribution, in any reasonable manner requested by 226 | the Licensor (including by pseudonym if 227 | designated); 228 | 229 | ii. a copyright notice; 230 | 231 | iii. a notice that refers to this Public License; 232 | 233 | iv. a notice that refers to the disclaimer of 234 | warranties; 235 | 236 | v. a URI or hyperlink to the Licensed Material to the 237 | extent reasonably practicable; 238 | 239 | b. indicate if You modified the Licensed Material and 240 | retain an indication of any previous modifications; and 241 | 242 | c. indicate the Licensed Material is licensed under this 243 | Public License, and include the text of, or the URI or 244 | hyperlink to, this Public License. 245 | 246 | 2. You may satisfy the conditions in Section 3(a)(1) in any 247 | reasonable manner based on the medium, means, and context in 248 | which You Share the Licensed Material. For example, it may be 249 | reasonable to satisfy the conditions by providing a URI or 250 | hyperlink to a resource that includes the required 251 | information. 252 | 253 | 3. If requested by the Licensor, You must remove any of the 254 | information required by Section 3(a)(1)(A) to the extent 255 | reasonably practicable. 256 | 257 | 4. If You Share Adapted Material You produce, the Adapter's 258 | License You apply must not prevent recipients of the Adapted 259 | Material from complying with this Public License. 260 | 261 | 262 | Section 4 -- Sui Generis Database Rights. 263 | 264 | Where the Licensed Rights include Sui Generis Database Rights that 265 | apply to Your use of the Licensed Material: 266 | 267 | a. for the avoidance of doubt, Section 2(a)(1) grants You the right 268 | to extract, reuse, reproduce, and Share all or a substantial 269 | portion of the contents of the database; 270 | 271 | b. if You include all or a substantial portion of the database 272 | contents in a database in which You have Sui Generis Database 273 | Rights, then the database in which You have Sui Generis Database 274 | Rights (but not its individual contents) is Adapted Material; and 275 | 276 | c. You must comply with the conditions in Section 3(a) if You Share 277 | all or a substantial portion of the contents of the database. 278 | 279 | For the avoidance of doubt, this Section 4 supplements and does not 280 | replace Your obligations under this Public License where the Licensed 281 | Rights include other Copyright and Similar Rights. 282 | 283 | 284 | Section 5 -- Disclaimer of Warranties and Limitation of Liability. 285 | 286 | a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE 287 | EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS 288 | AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF 289 | ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, 290 | IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, 291 | WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR 292 | PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, 293 | ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT 294 | KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT 295 | ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU. 296 | 297 | b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE 298 | TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, 299 | NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, 300 | INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, 301 | COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR 302 | USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN 303 | ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR 304 | DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR 305 | IN PART, THIS LIMITATION MAY NOT APPLY TO YOU. 306 | 307 | c. The disclaimer of warranties and limitation of liability provided 308 | above shall be interpreted in a manner that, to the extent 309 | possible, most closely approximates an absolute disclaimer and 310 | waiver of all liability. 311 | 312 | 313 | Section 6 -- Term and Termination. 314 | 315 | a. This Public License applies for the term of the Copyright and 316 | Similar Rights licensed here. However, if You fail to comply with 317 | this Public License, then Your rights under this Public License 318 | terminate automatically. 319 | 320 | b. Where Your right to use the Licensed Material has terminated under 321 | Section 6(a), it reinstates: 322 | 323 | 1. automatically as of the date the violation is cured, provided 324 | it is cured within 30 days of Your discovery of the 325 | violation; or 326 | 327 | 2. upon express reinstatement by the Licensor. 328 | 329 | For the avoidance of doubt, this Section 6(b) does not affect any 330 | right the Licensor may have to seek remedies for Your violations 331 | of this Public License. 332 | 333 | c. For the avoidance of doubt, the Licensor may also offer the 334 | Licensed Material under separate terms or conditions or stop 335 | distributing the Licensed Material at any time; however, doing so 336 | will not terminate this Public License. 337 | 338 | d. Sections 1, 5, 6, 7, and 8 survive termination of this Public 339 | License. 340 | 341 | 342 | Section 7 -- Other Terms and Conditions. 343 | 344 | a. The Licensor shall not be bound by any additional or different 345 | terms or conditions communicated by You unless expressly agreed. 346 | 347 | b. Any arrangements, understandings, or agreements regarding the 348 | Licensed Material not stated herein are separate from and 349 | independent of the terms and conditions of this Public License. 350 | 351 | 352 | Section 8 -- Interpretation. 353 | 354 | a. For the avoidance of doubt, this Public License does not, and 355 | shall not be interpreted to, reduce, limit, restrict, or impose 356 | conditions on any use of the Licensed Material that could lawfully 357 | be made without permission under this Public License. 358 | 359 | b. To the extent possible, if any provision of this Public License is 360 | deemed unenforceable, it shall be automatically reformed to the 361 | minimum extent necessary to make it enforceable. If the provision 362 | cannot be reformed, it shall be severed from this Public License 363 | without affecting the enforceability of the remaining terms and 364 | conditions. 365 | 366 | c. No term or condition of this Public License will be waived and no 367 | failure to comply consented to unless expressly agreed to by the 368 | Licensor. 369 | 370 | d. Nothing in this Public License constitutes or may be interpreted 371 | as a limitation upon, or waiver of, any privileges and immunities 372 | that apply to the Licensor or You, including from the legal 373 | processes of any jurisdiction or authority. 374 | 375 | 376 | ======================================================================= 377 | 378 | Creative Commons is not a party to its public 379 | licenses. Notwithstanding, Creative Commons may elect to apply one of 380 | its public licenses to material it publishes and in those instances 381 | will be considered the “Licensor.” The text of the Creative Commons 382 | public licenses is dedicated to the public domain under the CC0 Public 383 | Domain Dedication. Except for the limited purpose of indicating that 384 | material is shared under a Creative Commons public license or as 385 | otherwise permitted by the Creative Commons policies published at 386 | creativecommons.org/policies, Creative Commons does not authorize the 387 | use of the trademark "Creative Commons" or any other trademark or logo 388 | of Creative Commons without its prior written consent including, 389 | without limitation, in connection with any unauthorized modifications 390 | to any of its public licenses or any other arrangements, 391 | understandings, or agreements concerning use of licensed material. For 392 | the avoidance of doubt, this paragraph does not form part of the 393 | public licenses. 394 | 395 | Creative Commons may be contacted at creativecommons.org. 396 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Advanced Geospatial Analytics with Python 2 | 3 | This book is designed for those interested to learn advanced geospatial analytics with Python along with basics of developing reproducible code. The content is developed as part of the **GEOG 213/313 Advanced Geospatial Analytics with Python** course taught at Clark University by Dr. [Hamed Alemohammad](https://hamedalemo.github.io/). 4 | 5 | ## Usage 6 | 7 | All the content in this book is published under a Creative Commons Attribution 4.0 International (CC BY 4.0) license which in simple terms means you can use the content in any work by citing the source. 8 | 9 | ## Contribute 10 | 11 | If you use the book content and want to contribute to it, or simply want to report a bug or error please submit an issue in this repo. 12 | 13 | ## Note to Developers 14 | 15 | ### Building the book 16 | 17 | If you'd like to develop and/or build the Advanced Geospatial Analytics with Python book, you should: 18 | 19 | 1. Clone this repository 20 | 2. Run `pip install -r requirements.txt` (it is recommended you do this within a virtual environment) 21 | 3. (Optional) Edit the books source files located in the `advanced-geo-python/` directory 22 | 4. Run `jupyter-book clean advanced-geo-python/` to remove any existing builds 23 | 5. Run `jupyter-book build advanced-geo-python/` 24 | 25 | A fully-rendered HTML version of the book will be built in `advanced-geo-python/_build/html/`. 26 | 27 | ### Hosting the book 28 | 29 | Please see the [Jupyter Book documentation](https://jupyterbook.org/publish/web.html) to discover options for deploying a book online using services such as GitHub, GitLab, or Netlify. 30 | 31 | For GitHub and GitLab deployment specifically, the [cookiecutter-jupyter-book](https://github.com/executablebooks/cookiecutter-jupyter-book) includes templates for, and information about, optional continuous integration (CI) workflow files to help easily and automatically deploy books online with GitHub or GitLab. For example, if you chose `github` for the `include_ci` cookiecutter option, your book template was created with a GitHub actions workflow file that, once pushed to GitHub, automatically renders and pushes your book to the `gh-pages` branch of your repo and hosts it on GitHub Pages when a push or pull request is made to the main branch. 32 | 33 | 34 | ## Credits 35 | 36 | This book is created using the excellent open source [Jupyter Book project](https://jupyterbook.org/) and the [executablebooks/cookiecutter-jupyter-book template](https://github.com/executablebooks/cookiecutter-jupyter-book). 37 | -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | ####################################################################################### 2 | # A default configuration that will be loaded for all jupyter books 3 | # See the documentation for help and more options: 4 | # https://jupyterbook.org/customize/config.html 5 | 6 | ####################################################################################### 7 | # Book settings 8 | title : Advanced Geospatial Analytics with Python # The title of the book. Will be placed in the left navbar. 9 | author : Hamed Alemohammad # The author of the book 10 | copyright : "2024" # Copyright year to be placed in the footer 11 | logo : logo.png # A path to the book logo 12 | # Patterns to skip when building the book. Can be glob-style (e.g. "*skip.ipynb") 13 | exclude_patterns: [_build, .DS_Store, "**.ipynb_checkpoints"] 14 | # Auto-exclude files not in the toc 15 | only_build_toc_files: true 16 | 17 | # Force re-execution of notebooks on each build. 18 | # See https://jupyterbook.org/content/execute.html 19 | execute: 20 | execute_notebooks: "off" 21 | 22 | # Define the name of the latex output file for PDF builds 23 | latex: 24 | latex_documents: 25 | targetname: book.tex 26 | 27 | # Add a bibtex file so that we can create citations 28 | bibtex_bibfiles: 29 | - references.bib 30 | 31 | # Information about where the book exists on the web 32 | repository: 33 | url: https://github.com/HamedAlemo/advanced-geo-python # Online location of your book 34 | path_to_book: docs # Optional path to your book, relative to the repository root 35 | branch: main # Which branch of the repository should be used when creating links (optional) 36 | 37 | # Add GitHub buttons to your book 38 | # See https://jupyterbook.org/customize/config.html#add-a-link-to-your-repository 39 | html: 40 | use_issues_button: true 41 | use_repository_button: true -------------------------------------------------------------------------------- /_toc.yml: -------------------------------------------------------------------------------- 1 | format: jb-book 2 | root: intro 3 | title: Home 4 | parts: 5 | - caption: Introduction 6 | chapters: 7 | - file: docs/syllabus 8 | - file: docs/team 9 | - file: docs/schedule 10 | - file: docs/assessment 11 | title: Assessment 12 | - file: docs/resources 13 | - file: docs/policies 14 | - caption: Lectures 15 | numbered: True 16 | chapters: 17 | - file: lectures/01_unix 18 | title: Intro to Unix and Bash 19 | - file: lectures/02_git 20 | title: Intro to Version Control and Git 21 | - file: lectures/03_python-env.md 22 | - file: lectures/04_jupyter.md 23 | title: Intro to JupyterLab 24 | - file: lectures/05_landscape.md 25 | title: Geospatial Landscape 26 | - file: lectures/06_crs 27 | title: Review of CRS 28 | - file: lectures/07_crs_python 29 | title: CRS and Projections in Python 30 | - file: lectures/08_docker.md 31 | title: Intro to Docker 32 | - file: lectures/09_earth_search_tutorial 33 | title: Searching a STAC Catalog 34 | - file: lectures/10_dask_intro 35 | title: Intro to Dask & Dask Array 36 | - file: lectures/11_stackstac 37 | title: Scaling stackstac w Dask 38 | - file: lectures/12_dask_dataframe 39 | title: Into to Dask DataFrame 40 | - file: lectures/13_raster 41 | title: Review of Raster Data 42 | - file: lectures/14_raster_analysis 43 | title: Raster Data in Python 44 | - file: lectures/15_raster_processing 45 | title: Raster Data Processing 46 | - file: lectures/16_vector 47 | title: Review of Vector Data 48 | - file: lectures/17_vector_analysis 49 | title: Vector Data in Python 50 | - file: lectures/18_dask_geopandas_intro 51 | title: Intro to Dask-GeoPandas 52 | - file: lectures/19_scalable_vector_analysis 53 | title: Scalable Vector Analysis 54 | - file: lectures/20_xarray_fundamentals 55 | title: Xarray Fundamentals 56 | - file: lectures/21_xarray_advanced 57 | title: Advanced Xarray Operations 58 | - file: lectures/22_workflows_intro 59 | title: Workflows in Python 60 | - file: lectures/23_workflows_pseudocode 61 | title: Writing Pseudocode 62 | - file: lectures/24_workflows_best_practices 63 | title: Workflow Best Practices 64 | - caption: Assignments 65 | chapters: 66 | - file: assignments/a1 67 | - file: assignments/a2 68 | - file: assignments/a3 69 | - file: assignments/a4 70 | - file: assignments/a5 71 | - file: assignments/a6 72 | - file: assignments/a7 73 | - caption: Appendices 74 | chapters: 75 | - file: docs/license 76 | - file: docs/references -------------------------------------------------------------------------------- /assignments/a1.md: -------------------------------------------------------------------------------- 1 | # Assignment 1 2 | 3 | 4 | **Due: Monday Sep 16th at 11:59 pm ET** 5 | 6 | The goal of this assignment is to learn the principles of Open Science. For this purpose, you have to enroll in the NASA TOPS Open Science 101 online course and complete modules 1, 2, and 3. The modules are designed in a way that you have to complete them in sequence. Make sure to complete the **regular modules and NOT the fast track ones**. 7 | 8 | Steps to enroll, complete, and report back: 9 | - Navigate to NASA TOPS Open Science 101 page [here](https://nasa.github.io/Transform-to-Open-Science/take-os101/). 10 | - Click on "Register Now" under the "SELF-PACED VIRTUAL TRAINING" section. 11 | - Click on Register 12 | - Create an account if you don't have one already. If you have an ORCID account, it's best to use it for logging in. 13 | - After registering, make sure to complete your profile by entering your Full Name and Location. 14 | - After completing the first three modules, you need to submit a screenshot of your profile which shows the badges for these three modules and your full name on Canvas. 15 | -------------------------------------------------------------------------------- /assignments/a2.md: -------------------------------------------------------------------------------- 1 | # Assignment 2 2 | 3 | 4 | **Due: Monday Sep 23rd at 11:59 pm ET** 5 | 6 | The goal of this assignment is to work with Bash, git, and GitHub. If you finish the main part of the assignment, continue with the optional ones. 7 | 8 | ## Create a Dummy Resume Repository 9 | 10 | Open your terminal and do the following tasks: 11 | 12 | 1. Create a new directory called `resume` within your home directory 13 | 2. Create an empty file within this directory called `README.md` 14 | 15 | Now use your text editor (VS Code is recommended) to edit the file: 16 | 17 | 1. Open your `resume` folder through VS Code. 18 | 2. Open `README.md` in the text editor 19 | 3. Open `README.md` in in Markdown Preview 20 | - You can arrange these files side-by-side so you can see your document rendered live. 21 | 5. Edit the file in the editor. Add the following information: 22 | 23 | - Top level heading with your name 24 | - An image. It can be a photo of you or, if you prefer, a photo of your favorite animal. 25 | - Secondary heading titled "Education" 26 | - A list of schools you attended, hyperlinked to the websites of those institutions 27 | 28 | 6. Save the file 29 | 30 | Now go back to the terminal and do the following: 31 | 32 | 1. Initialize a new git repository in the `resume` directory 33 | 2. Add the `README.md` file to the git staging 34 | 3. Create a new commit with a commit message 35 | 4. Check the git log to see your commit history 36 | 5. Go to GitHub and create a [new public repository](https://github.com/new) entitled `resume` 37 | 6. Push your local resume repository to GitHub following the instructions. 38 | 7. View your online resume at `http://github.com//resume` 39 | 40 | Finally, go back to the editor and add a new subsection called "Research Interests" to your `README.md` file. Update your local git repository and push your changes to GitHub. Verify that the remote repository is updated. 41 | 42 | To hand in this part of the assignment, put a link to it in the `README.md` file in the next part. 43 | 44 | 45 | ## Create a Repository for Your Assignments 46 | 47 | Now that you know how to create a git repository, you should create your assignments repository. 48 | 49 | 1. Create a new directory called `geog213-assignments` (if you are registered in GEOG213) or `geog313-assignments` (if you are registered in GEOG313) in your home directory. 50 | 2. Create a `README.md` markdown file that contains your name and a link to your "resume" repo. 51 | 3. Initialize a new git repository 52 | 4. Add the file and make your first commit 53 | 5. Create a new **private** repository on GitHub called `geog213-assignments` or `geog313-assignments`. (Call it exactly like that. Do not vary the spelling, capitalization, or punctuation.) 54 | 6. Push your changes to the GitHub repository 55 | 7. Navigate to your repository on GitHub, go to "Settings" -> "Collaborators" -> "Add People" and add `hamedalemo` and `kluchman` as collaborators. 56 | 8. Push new commits to this repository whenever you are ready to hand in your assignments 57 | 58 | ## [Optional] Undo Changes in a Git Repository 59 | It might happen that you commit new changes to your git repository, and later you decide to undo it. There are two options for undoing your changes namely `git revert` and `git reset`. In this exercise you will explore their differences. 60 | 61 | 1. Create a new directory called `git-explore` in your home directory 62 | 2. Create the following four files in the new directory: `README.md`, `cv.md`, `address.md`, and `phone.md` 63 | 3. In four different commits, add and commit each of the four files to your Git repository. (e.g. the first one would be for `README.md`, the second for `cv.md`, and so on). Make sure to use a commit message that indicates which file is being added. 64 | 4. Use `git reflog` or `git log` command to print out the history of your git commands. You can see the ID associated with each commit. 65 | 66 | Now you can try two things: 67 | 68 | 5. Use `git revert ` to remove the changes associated with a specific commit. This command does not remove any changes commited after the commit ID you are using. Use this command to remove the third commit you used to add a file, and then check the status of files in your repo as well as `git status` to see what changes are made to your git repository. 69 | 6. Use `git reset ` to remove ALL commits after the commit ID cumulatively. Use this to remove all the commits you have made after the first commit to add the first file. Check the status of files in your repo as well `git status` to see what changes are made to your git repository. 70 | 71 |

 

-------------------------------------------------------------------------------- /assignments/a3.md: -------------------------------------------------------------------------------- 1 | # Assignment 3 2 | 3 | 4 | **Due: Monday Sep 30th at 11:59 pm ET** 5 | 6 | This assignment is a follow up to Assignment 1, and the goal is to complete the NASA TOPS Open Science 101 online course by finishing modules 4, and 5. The modules are designed in a way that you have to complete them in sequence. Make sure to complete the **regular modules and NOT the fast track ones**. After finishing these 5 modules, you will receive an online certificate from NASA that you can reference in your resume or online profiles like LinkedIn. 7 | 8 | Steps to complete, and report back: 9 | - Navigate to NASA TOPS Open Science 101 page [here](https://nasa.github.io/Transform-to-Open-Science/take-os101/). 10 | - Log in using your existing account from Assignment 1. 11 | - After completing modules 4 and 5, you need to submit a screenshot of your profile which shows the badges for these modules and your full name on Canvas. 12 | 13 | **Optional but highly recommended**: You can connect your account with your ORCID account to display your certificate on your ORCID account. By doing this, you will also have the chance to receive a paper certificate for this course from NASA together with some stickers. Make sure to share your ORCID ID when submitting your assignment on Canvas. 14 | -------------------------------------------------------------------------------- /assignments/a4.md: -------------------------------------------------------------------------------- 1 | # Assignment 4 2 | 3 | 4 | **Due: Wednesday Oct 9th at 11:59 pm ET** 5 | 6 | The goal of this assignment is to work with Conda, Docker and JupyterLab. 7 | 8 | You should submit this assignment to your existing `geog213-assignments` or `geog313-assignments` GitHub repository under a new directory named `assignment-4`. 9 | 10 | ## Create a Dockerized Conda Environment 11 | In this part, you will develop a Dockerfile that creates a conda environment using an `environment.yml` file. Your Dockerfile should: 12 | 13 | 1. Start from a miniconda parent image. 14 | 1. Create a new conda environment named `a4-env` using an `environment.yml` file and install the following packages along with Python version `3.12.5`: `numpy=1.26.4`, `scipy=1.13.1`, `matplotlib=3.9.2`, and `jupyterlab=4.2.5`. 15 | 1. Activate the `a4-env`, and launch JupyterLab from `/home/assignment/` when the container is run. 16 | 17 | Make a new commit, and push this part of the assignment to your GitHub repository. 18 | 19 | ## Create a Notebook 20 | In this part, you will create a Jupyter notebook that visualizes some data. Your Jupyter notebook should run inside a container based on the Dockerfile from the previous section (the notebook should be accessed using the mount option, and should not be included in the Docker image) 21 | 22 | 1. Create a Jupyter notebook. 23 | 2. Write appropriate code in this notebook to do the following: 24 | - Generate array `x` that contains a sequence of numbers from -100 to 100 (inclusive) with steps of 0.5 25 | - Generate array `y` that is the `cosine` of `x` 26 | - Generate array `z` that is the `sine` of `x` 27 | 3. Create a new plot with the following: 28 | - Plot both `z` and `y` vs `x` on the same axis 29 | - Add a legend to the plot to name `z` and `y` 30 | - Add appropriate x- and y-axes labels 31 | 32 | Make a new commit, and push this part of the assignment to your GitHub repository. Make sure you run all cells in your notebook, and save it before committing it so that the plot is included in your commit and its rendered on GitHub. 33 | 34 | ## Documentation 35 | 36 | Finally, create a `README.md` file inside the `assignment-4` directory that provides a step-by-step guide on how to run a container using your Dockerfile and execute the notebook you created. Commit and push this file to GitHub. 37 | 38 |

 

-------------------------------------------------------------------------------- /assignments/a5.md: -------------------------------------------------------------------------------- 1 | # Assignment 5 2 | 3 | 4 | **Due: Wednesday Oct 16th at 11:59 pm ET** 5 | 6 | **This assignment is only required for those registered at 200-level. If you are a 300-level student, you are highly encouraged to complete the assignment and ask any question during the class or office hours.** 7 | 8 | The goal of this assignment is to query a STAC API and find satellite imagery given some search parameters. 9 | 10 | You should submit this assignment to your existing `geog213-assignments` GitHub repository under a new directory named `assignment-5`. 11 | 12 | Create a Dockerfile with all required packages to complete the following task, and following best practices covered in the class. After you complete your notebook, (re)build the Docker image, and publish it on your Docker Hub account. 13 | 14 | ## Search for Satellite Imagery 15 | 16 | For this part, you will use the Earth Search STAC API available at: `https://earth-search.aws.element84.com/v1`. Develop a Jupyter Notebook (name it `s2_query.ipynb`) that implements the following using this API: 17 | 18 | 1. Connects to the API and prints out the `title` property of all the available collections. 19 | 2. Retrieves the number of scenes from `sentinel-2-l2a` collection that intersects with the following point, and are acquired between *January 1st, 2024 and September 30th, 2024*: 20 | 21 | `point_of_interest_latitude = -2.1334` 22 | 23 | `point_of_interest_longitude = 33.8663` 24 | 25 | 3. Plots a histogram of the percent vegetation cover present in each scene across all the scenes from the step 2 query. Use 5 as the number of bins for the histogram plot. (vegetation classification is recorded in the `s2:vegetation_percentage` property in the STAC catalog) 26 | 4. Returns the `id` of all scenes that follow the query parameters identified in step 2, and have less than 5% pixels covered by cloud and more than 25% of the pixels classified as water. 27 | 28 | ## Documentation 29 | 30 | Include a `README` in your `assignment-5` directory with all the steps a user needs to follow to pull your Docker image from Docker Hub, run a container, and execute the notebook to complete the query. -------------------------------------------------------------------------------- /assignments/a6.md: -------------------------------------------------------------------------------- 1 | # Assignment 6 2 | 3 | **Due: Monday Nov 4th at 11:59 pm ET** 4 | 5 | In this assignment, you will use Dask, and stackstac to retrieve time series of NDWI values from Sentinel-2 imagery. 6 | 7 | You should submit this assignment to your existing `geog213-assignments` or `geog313-assignments` GitHub repository under a new directory named `assignment-6`. 8 | 9 | Create a Dockerfile with all required packages to complete the task, and following best practices covered in the class. Include a `README` in your `assignment-6` directory with all the steps a user needs to follow to reproduce your results. 10 | 11 | You should write a Python function (let's call it `main`) that receives the following inputs: 12 | 1. A bbox in the form of a tuple 13 | 1. A start date for searching scenes from a STAC API 14 | 1. An end date for searching scenes from a STAC API 15 | 16 | and plots the mean NDWI values vs time for all Sentinel-2 scenes returned from the search. 17 | 18 | You should break your pipeline to multiple functions inside the `main` function as following: 19 | 1. A function to search and retrieve items from a STAC API. This function should return the collection of items from the search. 20 | 2. A function that receives the following as input 21 | - a STAC item collection 22 | - list of assets requested by the user 23 | - bbox for clipping the scenes 24 | 25 | and returns a xarray object using stackstac with the requested assets stacked and clipped to the bbox. 26 | 27 | > **Note** You can use the argument `assets` in `stackstac.stack` to only stack specific assets from the item collection (check documentation [here](https://stackstac.readthedocs.io/en/latest/api/main/stackstac.stack.html#stackstac.stack.params.assets)). This is a good practice to reduce the size of your dask array. 28 | 29 | 3. A function that receives an xarray object with bands needed for NDWI included, and returns the mean NDWI for each scene. 30 | 4. A function that receives the mean NDWI and time stamps for each scene, and plots NDWI vs time as point plot. 31 | 32 | All of the steps above should be carried out using Dask lazy computation until the plot computation is executed. 33 | 34 | 35 | Now that you have prepared your pipline, create a Jupyter Notebook and do the following: 36 | 1. Start a local Dask cluster. 37 | 1. Import the `main` function you have developed. 38 | 1. Plot mean NDWI for the following bbox between *Jan 1st, 2017 and Dec 31st, 2023* using `main`: 39 | 40 | `bbox = (32.16033157094111, 22.852984440390316, 33.288140399555346, 23.806193394664234)` 41 | 42 | 43 | 44 | ## [Optional] Filter NDWI Time Series (Extra 3 points) 45 | 46 | We know that some pixels from our previous NDWI calculation will be cloudy, and have an incorrect NDWI value. To improve your NDWI plot, create a new function that masks cloudy pixels using the Scene Classification Layer (SCL) band from Sentinel-2. (checkout Table 6 [here](https://sentiwiki.copernicus.eu/web/s2-processing#S2-Processing-Scene-Classification) to learn about different values in the SCL layer. You need to exclude any pixel that has a SCL value of 3, 8, 9 or 10) 47 | -------------------------------------------------------------------------------- /assignments/a7.md: -------------------------------------------------------------------------------- 1 | # Assignment 7 2 | 3 | **Due: Wednesday Nov 27th at 11:59 pm ET** 4 | 5 | The goal of this assignment is to work with Dask-GeoPandas, Source Cooperative and analyze vector datasets. 6 | 7 | You should submit this assignment to your existing `geog213-assignments` or `geog313-assignments` GitHub repository under a new directory named `assignment-7`. 8 | 9 | Create a Dockerfile with all required packages to complete the task, and following best practices covered in the class. Include a `README` in your `assignment-7` directory with all the steps a user needs to follow to reproduce your results. 10 | 11 | ## Download Building Footprint Data 12 | 13 | For this assignment you will be working with the [Google-Microsoft Open Buildings Dataset - Combined by VIDA](https://source.coop/repositories/vida/google-microsoft-open-buildings/description) which is available on Source Cooperative. Check out the [Read Me](https://source.coop/repositories/vida/google-microsoft-open-buildings/description) of the dataset to understand the dataset and familiarize yourself with its metadata. 14 | 15 | ## Download Data 16 | 17 | - Write a function that receives the ISO code for a country and downloads the corresponding geoparquet file or files for that country. 18 | - Use this function to download the data for Haiti. 19 | 20 | 21 | ## Load Geoparquet Data 22 | 23 | - Write a function that loads the building footprints for the country of interest from the downloaded geoparquet file(s). The geoparquet file(s) in this dataset might be large depending on the country you are working with. So, ideally you would like to benefit from `dask_geopandas` functionality. 24 | - Use this function to load building footprints for Haiti. 25 | 26 | 27 | ## Analyze the Data 28 | 29 | In this section, you will analyze the data using the functionality provided by Dask: 30 | 31 | - Plot the histogram of the area of all buildings provided by Microsoft as the source. 32 | - Note: this might have a very skewed distribution. Try passing arguments to your histogram function to create a more even histogram plot, and explain your approach. 33 | - Count the number of building footprints that `intersect` with each other. 34 | - From the intersecting building footprints, calculate how many: 35 | - Google building footprints intersect another Google building footprint 36 | - Microsoft building footprints intersect another Microsoft building footprint 37 | - Google building footprints intersect a Microsoft building footprint -------------------------------------------------------------------------------- /docs/assessment.md: -------------------------------------------------------------------------------- 1 | # Student Responsibilities and Assessment Criteria 2 | Students will be responsible for and assessed according to the following criteria: 3 | 4 | 1. **Participation**: This will be an active, hands-on course in which we learn and develop new packages and methods during the semester. Student contributions to problem solving and to class discussions represent **10% of the final grade (200-300 level)**. 5 | 2. **Assignments**: There will be a total of 10 assignments for 200-level students and 8 assignments for 300-level. This will be **40% (300-level) to 50% (200-level) of the final grade**. 6 | 3. **Final project**: The final project will be undertaken in teams of up to 2 students, and will consist of two parts: 7 | 8 | * A _project proposal_, providing a summary of the proposed work, the datasets and methodology that will be used, the delineation of specific tasks undertaken by each team member, and then expected results. This component will account for **10% of the final grade (200-300 levels)**. 9 | 10 | * The _final project repository and report_. The repository will contain all the code and analytical steps used to perform the analysis (including documents of the content and steps to reproduce the results), with the report submitted as a Jupyter notebook providing a summary of the project rationale, analyses undertaken, key results (illustrated with key figures, graphs, and tables), and discussion, including unexpected results, challenges encountered, and suggested improvements. This component will be worth **30% (200-level) to 40% (300-level) of the final grade**. 11 | 12 | 13 | Assignments and the final project will be assessed against four main metrics: Quality and Accuracy, Clarity; Documentation; and Reproducibility. Numerical grades will be converted to letter grades as follows: 14 | 15 | A: 93-100; A-: 90-93 16 | 17 | B+: 88-90; B: 83-88; B- 80-83 18 | 19 | C+: 78-80; C: 73-78; C-: 70-73 20 | 21 | D+: 68-70; D: 63-68; D-: 60-63 22 | 23 | F: <60 24 |

 

25 | 26 | -------------------------------------------------------------------------------- /docs/license.md: -------------------------------------------------------------------------------- 1 | # License 2 | 3 | All the content for this course that are provided on this website are licensed under the Creative Commons Attribution 4.0 International ([CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)) license. 4 |

 

5 | -------------------------------------------------------------------------------- /docs/policies.md: -------------------------------------------------------------------------------- 1 | # Policies 2 | 3 | ## __Academic integrity__ 4 | 5 | All students are expected to adhere to Clark’s standards of academic integrity; this means that all work must be entirely your own and entirely unique to this course. Plagiarism and other forms of cheating will not be tolerated or excused. All students are required to complete assignments individually, and group delivery of assignments is not allowed. You are not allowed to share any part of your code/notebook with another classmate. However, you can consult/brainstorm with another classmate for an assignment while each individual needs to have their own unique submission and needs to report at the time of submission that they worked with each other on the assignment. 6 | 7 | For more information, please refer to the university’s policy on this issue, available on [Academic Policies page](https://catalog.clarku.edu/content.php?catoid=32&navoid=2735#academic-integrity) or in the student handbook. If you have any questions about proper citation or other related issues, please don’t hesitate to come see me. 8 | 9 | ## Extension Policy 10 | 11 | Each student is allowed to request a **one-time extension** to the deadline for submitting one of their assignment throughout the semester. The extension can be requested for up to one week from the deadline, and should be submitted at least 2 days prior to the deadline. You do not need to provide any reason for requesting this extension. To submit your request, send an email to your instructor and TA. 12 | 13 | ## __Use of Artificial Intelligence (AI)__ 14 | This policy covers any generative AI tool, such as ChatGPT, GitHub Copilot, Elicit, DALL·E, etc. and applies to text and artwork/graphics/video/audio content. 15 | 1. You are discouraged from using AI tools unless under direct instruction from the instructor to do so. Please contact your instructor if you are unsure or have questions before using AI for any assignment or project. 16 | - You should note that the material generated by these programs may be inaccurate, incomplete, or otherwise problematic. Beware that use may also stifle your own independent thinking and creativity. 17 | 2. If AI is permitted to be used, you must indicate/cite what part of the assignment/project was written by AI and what was written by you. No more than 25% of an assignment/project should be created with AI if the instructor gives permission for its use. 18 | 19 | ## __Student Accessibility__ 20 | Clark University is committed to providing students with documented disabilities equal access to all university programs and facilities. Students are encouraged to register with Student Accessibility Services (SAS) to explore and access accommodations that may support their success in their coursework. SAS is located on the second floor of the Shaich Family Alumni and Student Engagement Center (ASEC). Please contact SAS at accessibilityservices@clarku.edu with questions or to initiate the registration process. For additional information, please visit the SAS website at: https://www.clarku.edu/offices/student-accessibility-services/ 21 | 22 | ## __FERPA__ 23 | The link to Clark’s policy regarding student privacy under the Family Education Rights and Privacy Act is available here: https://www.clarku.edu/offices/registrar/ferpa/ 24 | 25 | ## __Title IX__ 26 | Clark University and its faculty are committed to creating a safe and open learning environment for all students. Clark University encourages all members of the community to seek support and report incidents of sexual harassment to the Title IX office (titleix@clarku.edu). If you or someone you know has experienced any sexual harassment, including sexual assault, dating or domestic violence, or stalking, help and support is available. 27 | 28 | Please be aware that all Clark University faculty and teaching assistants are considered responsible employees, which means that if you tell me about a situation involving the aforementioned offenses, I must share that information with the Title IX Coordinator, Brittany Rende (titleix@clarku.edu). Although I have to make that notification, you will, for the most part, control how your case will be handled, including whether or not you wish to pursue a formal complaint. Our goal is to make sure you are aware of the range of options available to you and have access to the resources you need. 29 | 30 | If you wish to speak to a confidential resource who does not have this reporting responsibility, you can contact Clark’s Center for Counseling and Professional Growth (508-793-7678), Clark’s Health Center (508-793-7467), or confidential resource providers on campus: Prof. Stewart (als.confidential@clarku.edu), Prof. Palm Reed (kpr.confidential@clarku.edu), and Prof. Cordova (jvc.confidential@clarku.edu). 31 | 32 | ## __Graduate Students__ 33 | Graduate students can contact Sara Simeone, Assistant Dean of Graduate Studies for Arts and Sciences, for any questions regarding academic affairs at GradSchool@clarku.edu. 34 | 35 |

 

36 | -------------------------------------------------------------------------------- /docs/references.md: -------------------------------------------------------------------------------- 1 | # References 2 | 3 | The contents for this course are partially derived from the following resources, and we are thankful to their authors for sharing them. 4 | 5 | 1. Introduction to Earth Data Science Textbook, Earth Lab CU Boulder ([link](https://www.earthdatascience.org/courses/intro-to-earth-data-science/)). 6 | 1. An Introduction to Earth and Environmental Data Science, Columbia University ([link](https://earth-env-data-science.github.io/)). 7 | 1. The Unix Shell, Software Carpentry ([link](https://swcarpentry.github.io/shell-novice/index.html)). 8 | 1. Introduction to Geospatial Raster and Vector Data with Python, Software Carpentry ([link](https://carpentries-incubator.github.io/geospatial-python/index.html)). 9 | 1. Introduction to Conda for (Data) Scientists, The Carpentries Incubator([link](https://carpentries-incubator.github.io/introduction-to-conda-for-data-scientists/)) 10 | 1. In writing a few chapters of this book, I have used [OpenAI's ChatGPT](https://chat.openai.com/) to generate the text. All of the content has been reviewed, revised and edited as needed to ensure accuracy. Use of ChatGPT facilitated the generation of text for these chapters based on an outline that I had envisioned for it. 11 | 12 |

 

13 | 14 | -------------------------------------------------------------------------------- /docs/resources.md: -------------------------------------------------------------------------------- 1 | # Resources 2 | 3 | ## Book(s) 4 | You can consult the following book for most of the content presented in this course. We will provide more specific resources for each lecture throughout the semester. 5 | 6 | McClain, B. (2022). Python for Geospatial Data Analysis: Theory, Tools, and Practice for Location Intelligence. United States: O'Reilly Media, Incorporated. (Available [online](https://learning.oreilly.com/library/view/python-for-geospatial/9781098104788/). You can sign in using your Clark credentials to freely access the book.) 7 | 8 | ## Canvas 9 | You can access the materials on Clark University [Canvas page](https://canvas.clarku.edu/courses/13802). 10 | 11 | ## Slack 12 | We will be using Slack to facilitate communication during the semester. If you are new to Slack, checkout [this page](https://slack.com/help/articles/360059928654-How-to-use-Slack--your-quick-start-guide) to learn about it. 13 | 14 | You will be invited to join the Slack channel for this course with your Clark email address (check your spam folder to make sure you don't miss the email). If you have not received the invitation yet, reach out to the course instructor. 15 | 16 | ## Software 17 | - Code Editor: [Visual Studio Code](https://code.visualstudio.com/) 18 | - Containerization: [Docker](https://www.docker.com/) 19 | 20 | ## Python Resources 21 | If you need a refresher on introduction to Python, check out the following resources: 22 | - [Kaggle Python](https://www.kaggle.com/learn/python) 23 | - [Introduction to Python](https://introtopython.org/) 24 | - [Python Crash Course](https://github.com/ehmatthes/pcc_2e) 25 | 26 | ## Datasets and Projects 27 | Use this section to learn about existing open-access datasets and projects. These can be helpful in defining your final projects. 28 | 29 | ### Datasets 30 | 1. Source Cooperative [[link](https://source.coop/)] 31 | 1. Microsoft Planetary Computer Data Catalog [[link](https://planetarycomputer.microsoft.com/catalog)] 32 | 1. Registry of Open Data on AWS [[link](https://registry.opendata.aws/)] 33 | 1. World Terrestrial Ecosystems(WTE) 2015 and 2050 [[link](https://www.arcgis.com/home/item.html?id=e247d15898804e42b9a8f2aa7128b2b6)][[blog](https://www.esri.com/arcgis-blog/products/arcgis-living-atlas/announcements/2050-projections-of-world-terrestrial-ecosystems-are-now-available/)] 34 | 1. Public Data on Google Cloud [[link](https://cloud.google.com/storage/docs/public-datasets)] 35 | 1. Umbra Open Data Program [[link](https://umbra.space/open-data)] 36 | 1. Maxar Open Data Program [[link](https://www.maxar.com/open-data/)] 37 | 1. Planet Education and Research Data [[link](https://www.planet.com/markets/education-and-research/)] 38 | 1. Google Earth Engine Data Catalog [[link](https://developers.google.com/earth-engine/datasets)] 39 | 1. Global River Widths from Landsat (GRWL) Database [[link](https://zenodo.org/record/1297434)] 40 | 1. Open Topography [[link](https://opentopography.org/)] 41 | 1. Nigeria Geodata [[link](https://github.com/jeafreezy/nigeria_geodata)] 42 | 1. A multi-year crop field boundary labels for Africa [[link](https://zenodo.org/records/11060871)] 43 | 1. Indiana Statewide Digital Aerial Imagery Catalog [[link](https://registry.opendata.aws/in-imagery/)][[Tutorial](https://docs.google.com/document/d/1tVXt4MctYVsz5UJXhlU_FM6_uJt5zR2kPO7AiaHaFvY/edit?usp=sharing)] 44 | 1. U.S. Census Bureau American Community Survey (ACS) Public Use Microdata Sample (PUMS) [[link](https://registry.opendata.aws/census-dataworld-pums/)] 45 | 1. US Structures from Oak Ridge National Laboratory [[link](https://source.coop/repositories/wherobots/usa-structures/description)] 46 | 47 | ## Projects 48 | 1. Understanding at risk crops in 2050 [[link](https://www.esri.com/arcgis-blog/products/arcgis-living-atlas/analytics/climate-change-threatens-the-worlds-olive-legacy-how-gis-can-help-understand-crops-at-risk-by-2050/)] 49 | 1. World Bank’s Open Night Lights [[link](https://worldbank.github.io/OpenNightLights/welcome.html)] 50 | 1. A Guidebook on Mapping Poverty through Data Integration and Artificial Intelligence [[link](https://www.adb.org/sites/default/files/publication/698091/guidebook-mapping-poverty-data-integration-ai.pdf)] 51 | 1. Cookiecutter Data Science [[link](https://drivendata.github.io/cookiecutter-data-science/)] 52 | 1. Analyze NLCD annual time series [[link](https://www.usgs.gov/centers/eros/science/annual-national-land-cover-database)] 53 | 1. Climate change threatens the world’s olive legacy: How GIS can help understand crops at risk by 2050 [[link](https://www.esri.com/arcgis-blog/products/arcgis-living-atlas/analytics/climate-change-threatens-the-worlds-olive-legacy-how-gis-can-help-understand-crops-at-risk-by-2050/)] 54 | 55 | 56 |

 

57 | -------------------------------------------------------------------------------- /docs/schedule.md: -------------------------------------------------------------------------------- 1 | # Schedule 2 | 3 | ## Lectures 4 | Mondays and Thursdays 2:50 - 4:05 pm in BP326 5 | 6 | ## Office Hours 7 | **Hamed's:**
8 | Mondays 11am - 1pm in Jeff 221
9 |
10 | 11 | **Caleb's:**
12 | Wednesdays and Fridays 10am - 12pm in Jeff 220a 13 | 14 | To schedule a meeting with either Hamed or Caleb outside the regular office hour schedule, please send them email. 15 |

 

16 |

 

17 | -------------------------------------------------------------------------------- /docs/syllabus.md: -------------------------------------------------------------------------------- 1 | # Syllabus 2 | The course will cover the following topics during the semester. Part of each week’s lecture will be interactive programming time during which students will replicate exercises covered in the lecture or execute new ones. 3 | 4 | There will be 9 assignments during the semester, and students can join office hours to ask any questions about the assignment, or general questions about the lecture. 5 | 6 | | **Week** | **Date** | **Topic** | **Assignment**
*Released on Tue* | 7 | | -------- | -------- | ---------------------------------------------------------------- | ----------- | 8 | | 1 | 8/26/24 | **No Classes** | | 9 | | 1 | 8/29/24 | **No Classes** | | 10 | | 2 | 9/2/24 | **Labor Day (No Classes)** | | 11 | | 2 | 9/5/24 | Workflows for Open Reproducible Science | Assignment 1 \* | 12 | | 3 | 9/9/24 | Introduction to Unix, Version Control and Git | | 13 | | 3 | 9/12/24 | Intro to Markdown, and Basics of Python Environments | | 14 | | 4 | 9/16/24 | Intro to JupyterLab | Assignment 2 | 15 | | 4 | 9/19/24 | Introduction to Geospatial Python Landscape | | 16 | | 5 | 9/23/24 | Review of CRS and Projections in Python | Assignment 3 | 17 | | 5 | 9/26/24 | CGA Collaborative Workshop | | 18 | | 6 | 9/30/24 | Introduction to Containers and Docker | | 19 | | 6 | 10/3/24 | Introduction to Containers and Docker | Assignment 4 | 20 | | 7 | 10/7/24 | Introduction to STAC and Geospatial Data on the Cloud | | 21 | | 7 | 10/10/24 | Introduction to Dask | Assignment 5 \** | 22 | | 7 | 10/14/24 | **Fall Break (No Classes)** | | 23 | | 8 | 10/17/24 | Introduction to Dask DataFrame | | 24 | | 9 | 10/21/24 | Introduction to Dask DataFrame | | 25 | | 9 | 10/24/24 | Working with Raster Data in Python | Assignment 6 | 26 | | 10 | 10/28/24 | Scaling Raster Data Analytics | | 27 | | 10 | 10/31/24 | Working with Vector Data in Python | | 28 | | 11 | 11/4/24 | Scaling Vector Data Analytics | | 29 | | 11 | 11/7/24 | Working with Multi-Dimensional Arrays in Python | | 30 | | 12 | 11/11/24 | Working with Multi-Dimensional Arrays in Python | | 31 | | 12 | 11/14/24 | Data Visualization | Assignment 7 | 32 | | 13 | 11/18/24 | Geospatial Workflows | | 33 | | 13 | 11/21/24 | Make-up Lecture | | 34 | | 14 | 11/25/24 | Projects | Project Proposal | 35 | | 14 | 11/28/24 | **Thanksgiving Break (No Classes)** | | 36 | | 15 | 12/2/24 | Projects | | 37 | | 15 | 12/5/24 | Projects | | 38 | | 16 | 12/9/24 | Projects | | 39 | 40 | \* Assignment 1 is released on Thu Sep 5th, and it's due on Monday Sep 16th.
41 | \** Not required for **300-level** students. 42 | 43 | Assignments will be released on the Tuesday of the week, and are due the week after on Monday. The deadline for submissions are 11:59 pm ET on Monday. For example, for the assignment that will be released during the 3rd week of the class the submission deadline is on Monday at 11:59 pm during the 4th week. 44 | 45 | ## Final Project 46 | Final projects are due by the end of the day on Dec 18, 2024 (this won't be extended due to the grades deadline set by the University). There won't be a presentation needed for your projects. 47 | 48 | ## __Engaged Hours__ 49 | 50 | The following provides an approximate breakdown of the hours of effort required for this class: 51 | - Class lectures (2 X 1hr 15min per week): **35 hrs** 52 | - Required readings, assignments, and coding work: **85 hrs** 53 | - Analytical work for the final project: **60 hrs** 54 | 55 | Total: **180 hrs** 56 | 57 |

 

58 | -------------------------------------------------------------------------------- /docs/team.md: -------------------------------------------------------------------------------- 1 | # Team 2 | 3 | ## Instructor 4 | Prof. [Hamed Alemohammad](https://hamedalemo.github.io/)
5 | Associate Professor, Graduate School of Geography
6 | Director, Center for Geospatial Analytics
7 | (halemohammad [at] clarku [dot] edu) 8 | 9 | ## TA 10 | Caleb Kluchman
11 | MS GIS Student, Graduate School of Geography
12 | (ckluchman [at] clarku [dot] edu) 13 |

 

14 | -------------------------------------------------------------------------------- /files/07_crs.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HamedAlemo/advanced-geo-python/adf939d2cc1e722329ee64ff0eb701c8b3919837/files/07_crs.zip -------------------------------------------------------------------------------- /files/landcover_chips.csv: -------------------------------------------------------------------------------- 1 | id,landcover,datetime 2 | 1,"[""Sea and ocean""]",2017-09-06T10:10:19+0000 3 | 2,"[""Discontinuous urban fabric"", ""Coniferous forest"", ""Mixed forest"", ""Transitional woodland/shrub""]",2018-02-05T10:02:11+0000 4 | 3,"[""Sea and ocean""]",2017-12-01T11:24:31+0000 5 | 4,"[""Pastures"", ""Complex cultivation patterns"", ""Broad-leaved forest"", ""Transitional woodland/shrub""]",2018-05-09T09:20:29+0000 6 | 5,"[""Non-irrigated arable land"", ""Pastures"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Natural grassland"", ""Sclerophyllous vegetation"", ""Transitional woodland/shrub""]",2017-10-02T11:21:12+0000 7 | 6,"[""Discontinuous urban fabric"", ""Industrial or commercial units"", ""Non-irrigated arable land"", ""Complex cultivation patterns""]",2018-05-10T09:40:31+0000 8 | 7,"[""Coniferous forest"", ""Mixed forest"", ""Water bodies""]",2017-11-01T09:41:31+0000 9 | 8,"[""Coniferous forest"", ""Mixed forest"", ""Water bodies""]",2018-02-05T10:02:11+0000 10 | 9,"[""Non-irrigated arable land""]",2017-10-15T09:50:31+0000 11 | 10,"[""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest"", ""Coniferous forest"", ""Mixed forest""]",2018-05-22T09:30:29+0000 12 | 11,"[""Water bodies""]",2018-02-04T09:41:56+0000 13 | 12,"[""Olive groves"", ""Complex cultivation patterns"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest""]",2018-05-15T11:21:10+0000 14 | 13,"[""Sea and ocean""]",2018-05-11T10:00:29+0000 15 | 14,"[""Coniferous forest""]",2018-04-17T10:20:19+0000 16 | 15,"[""Complex cultivation patterns"", ""Broad-leaved forest"", ""Transitional woodland/shrub""]",2017-08-25T09:30:29+0000 17 | 16,"[""Sea and ocean""]",2018-05-21T10:00:29+0000 18 | 17,"[""Non-irrigated arable land"", ""Mixed forest"", ""Water bodies""]",2017-07-01T09:30:31+0000 19 | 18,"[""Non-irrigated arable land"", ""Complex cultivation patterns"", ""Agro-forestry areas"", ""Broad-leaved forest"", ""Transitional woodland/shrub""]",2018-05-15T11:21:10+0000 20 | 19,"[""Broad-leaved forest"", ""Transitional woodland/shrub""]",2017-08-25T09:30:29+0000 21 | 20,"[""Broad-leaved forest"", ""Coniferous forest"", ""Mixed forest"", ""Transitional woodland/shrub""]",2017-09-24T09:30:20+0000 22 | 21,"[""Pastures"", ""Coniferous forest"", ""Mixed forest"", ""Transitional woodland/shrub""]",2017-09-30T09:50:19+0000 23 | 22,"[""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Coniferous forest"", ""Mixed forest""]",2017-06-13T10:10:32+0000 24 | 23,"[""Discontinuous urban fabric"", ""Industrial or commercial units"", ""Road and rail networks and associated land"", ""Airports"", ""Construction sites"", ""Land principally occupied by agriculture, with significant areas of natural vegetation""]",2018-02-25T10:50:19+0000 25 | 24,"[""Broad-leaved forest"", ""Coniferous forest"", ""Mixed forest""]",2017-08-01T09:50:29+0000 26 | 25,"[""Non-irrigated arable land"", ""Transitional woodland/shrub""]",2018-04-13T09:50:31+0000 27 | 26,"[""Discontinuous urban fabric"", ""Pastures""]",2018-02-25T11:43:51+0000 28 | 27,"[""Non-irrigated arable land""]",2017-12-06T09:43:49+0000 29 | 28,"[""Coniferous forest"", ""Mixed forest""]",2018-04-17T10:20:19+0000 30 | 29,"[""Sea and ocean""]",2017-08-30T10:20:19+0000 31 | 30,"[""Discontinuous urban fabric"", ""Non-irrigated arable land"", ""Complex cultivation patterns"", ""Broad-leaved forest""]",2018-04-21T10:00:29+0000 32 | 31,"[""Pastures""]",2018-02-20T11:43:39+0000 33 | 32,"[""Sea and ocean""]",2017-08-14T10:00:29+0000 34 | 33,"[""Non-irrigated arable land"", ""Pastures""]",2018-04-30T09:40:31+0000 35 | 34,"[""Non-irrigated arable land""]",2018-04-30T09:40:31+0000 36 | 35,"[""Transitional woodland/shrub"", ""Sea and ocean""]",2018-02-23T10:10:19+0000 37 | 36,"[""Discontinuous urban fabric"", ""Non-irrigated arable land"", ""Vineyards"", ""Complex cultivation patterns"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Water bodies""]",2018-05-06T10:00:31+0000 38 | 37,"[""Coniferous forest"", ""Mixed forest"", ""Water bodies""]",2018-02-04T09:41:55+0000 39 | 38,"[""Non-irrigated arable land"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Coniferous forest"", ""Mixed forest""]",2018-02-28T10:10:21+0000 40 | 39,"[""Coniferous forest"", ""Sea and ocean""]",2018-05-21T10:00:29+0000 41 | 40,"[""Pastures"", ""Coniferous forest""]",2017-11-12T11:43:39+0000 42 | 41,"[""Mixed forest"", ""Sea and ocean""]",2017-08-14T10:00:29+0000 43 | 42,"[""Continuous urban fabric"", ""Discontinuous urban fabric"", ""Industrial or commercial units"", ""Salt marshes""]",2018-03-26T11:21:09+0000 44 | 43,"[""Coniferous forest"", ""Mixed forest"", ""Transitional woodland/shrub""]",2017-07-09T09:40:29+0000 45 | 44,"[""Non-irrigated arable land"", ""Complex cultivation patterns"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest"", ""Transitional woodland/shrub""]",2018-05-02T09:30:39+0000 46 | 45,"[""Sea and ocean""]",2018-05-11T10:00:29+0000 47 | 46,"[""Mixed forest"", ""Transitional woodland/shrub"", ""Peatbogs""]",2018-05-15T09:40:29+0000 48 | 47,"[""Complex cultivation patterns"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Agro-forestry areas"", ""Broad-leaved forest"", ""Transitional woodland/shrub""]",2018-03-26T11:21:09+0000 49 | 48,"[""Permanently irrigated land"", ""Pastures"", ""Agro-forestry areas"", ""Broad-leaved forest"", ""Transitional woodland/shrub""]",2017-12-21T11:25:01+0000 50 | 49,"[""Sea and ocean""]",2018-04-21T11:43:49+0000 51 | 50,"[""Sea and ocean""]",2017-10-16T10:10:09+0000 52 | 51,"[""Sea and ocean""]",2018-05-11T10:00:29+0000 53 | 52,"[""Coniferous forest"", ""Transitional woodland/shrub"", ""Peatbogs""]",2017-08-17T10:10:19+0000 54 | 53,"[""Non-irrigated arable land"", ""Complex cultivation patterns"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest""]",2017-09-14T09:30:30+0000 55 | 54,"[""Non-irrigated arable land"", ""Agro-forestry areas"", ""Broad-leaved forest""]",2017-10-02T11:21:11+0000 56 | 55,"[""Broad-leaved forest"", ""Coniferous forest"", ""Mixed forest"", ""Transitional woodland/shrub""]",2017-08-08T09:40:29+0000 57 | 56,"[""Sea and ocean""]",2017-10-16T10:10:09+0000 58 | 57,"[""Pastures"", ""Coniferous forest"", ""Transitional woodland/shrub""]",2018-02-25T11:43:51+0000 59 | 58,"[""Coniferous forest""]",2017-08-08T09:40:29+0000 60 | 59,"[""Sport and leisure facilities"", ""Broad-leaved forest""]",2017-07-04T11:21:11+0000 61 | 60,"[""Non-irrigated arable land"", ""Complex cultivation patterns"", ""Land principally occupied by agriculture, with significant areas of natural vegetation""]",2018-02-01T09:32:19+0000 62 | 61,"[""Sclerophyllous vegetation"", ""Sea and ocean""]",2017-10-02T11:21:11+0000 63 | 62,"[""Coniferous forest"", ""Water bodies""]",2018-05-25T09:40:30+0000 64 | 63,"[""Olive groves"", ""Complex cultivation patterns"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Natural grassland"", ""Sclerophyllous vegetation""]",2017-12-21T11:25:01+0000 65 | 64,"[""Pastures"", ""Complex cultivation patterns"", ""Broad-leaved forest"", ""Natural grassland""]",2017-09-14T09:30:29+0000 66 | 65,"[""Non-irrigated arable land"", ""Pastures"", ""Complex cultivation patterns"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest"", ""Transitional woodland/shrub""]",2017-09-24T09:30:19+0000 67 | 66,"[""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest"", ""Sparsely vegetated areas""]",2018-05-02T09:30:39+0000 68 | 67,"[""Pastures"", ""Coniferous forest"", ""Natural grassland"", ""Transitional woodland/shrub"", ""Peatbogs""]",2017-11-12T11:43:39+0000 69 | 68,"[""Continuous urban fabric""]",2018-05-15T11:21:09+0000 70 | 69,"[""Sea and ocean""]",2017-06-13T10:10:31+0000 71 | 70,"[""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Mixed forest"", ""Water bodies""]",2017-09-05T09:50:31+0000 72 | 71,"[""Coniferous forest"", ""Mixed forest"", ""Peatbogs""]",2017-11-01T09:41:32+0000 73 | 72,"[""Non-irrigated arable land"", ""Water courses""]",2017-08-03T09:40:31+0000 74 | 73,"[""Coniferous forest"", ""Mixed forest"", ""Water bodies""]",2018-05-27T09:30:41+0000 75 | 74,"[""Non-irrigated arable land"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest""]",2018-02-01T09:32:19+0000 76 | 75,"[""Industrial or commercial units"", ""Non-irrigated arable land""]",2017-12-26T09:43:59+0000 77 | 76,"[""Mixed forest"", ""Bare rock"", ""Sea and ocean""]",2017-10-16T10:10:09+0000 78 | 77,"[""Pastures"", ""Coniferous forest""]",2018-04-21T10:00:29+0000 79 | 78,"[""Complex cultivation patterns"", ""Coniferous forest"", ""Mixed forest"", ""Transitional woodland/shrub"", ""Water bodies""]",2017-09-24T09:30:20+0000 80 | 79,"[""Coniferous forest"", ""Mixed forest"", ""Water bodies""]",2018-05-08T10:40:31+0000 81 | 80,"[""Discontinuous urban fabric"", ""Non-irrigated arable land"", ""Complex cultivation patterns"", ""Broad-leaved forest""]",2018-03-18T09:30:31+0000 82 | 81,"[""Mixed forest"", ""Sea and ocean""]",2018-05-11T10:00:29+0000 83 | 82,"[""Coniferous forest"", ""Mixed forest"", ""Water bodies""]",2017-09-24T09:30:19+0000 84 | 83,"[""Non-irrigated arable land"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Coniferous forest""]",2018-05-21T10:00:29+0000 85 | 84,"[""Discontinuous urban fabric"", ""Non-irrigated arable land"", ""Pastures"", ""Complex cultivation patterns""]",2017-11-07T10:52:29+0000 86 | 85,"[""Pastures"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Coniferous forest"", ""Mixed forest""]",2018-04-21T10:00:29+0000 87 | 86,"[""Coniferous forest"", ""Transitional woodland/shrub"", ""Water bodies""]",2018-02-19T09:40:31+0000 88 | 87,"[""Coniferous forest"", ""Mixed forest""]",2018-02-04T09:41:55+0000 89 | 88,"[""Rice fields"", ""Agro-forestry areas"", ""Broad-leaved forest"", ""Coniferous forest"", ""Transitional woodland/shrub""]",2017-08-13T11:21:21+0000 90 | 89,"[""Non-irrigated arable land"", ""Pastures"", ""Mixed forest""]",2017-11-12T11:43:39+0000 91 | 90,"[""Non-irrigated arable land""]",2018-04-22T09:30:29+0000 92 | 91,"[""Sclerophyllous vegetation"", ""Transitional woodland/shrub""]",2017-12-01T11:24:31+0000 93 | 92,"[""Non-irrigated arable land"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Coniferous forest""]",2017-06-13T10:10:31+0000 94 | 93,"[""Non-irrigated arable land"", ""Broad-leaved forest"", ""Mixed forest"", ""Transitional woodland/shrub""]",2018-05-10T09:40:31+0000 95 | 94,"[""Non-irrigated arable land"", ""Pastures"", ""Mixed forest""]",2017-11-12T11:43:39+0000 96 | 95,"[""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Coniferous forest"", ""Mixed forest"", ""Water bodies""]",2018-02-05T10:02:11+0000 97 | 96,"[""Continuous urban fabric""]",2018-05-15T11:21:09+0000 98 | 97,"[""Non-irrigated arable land"", ""Complex cultivation patterns""]",2017-10-15T09:50:31+0000 99 | 98,"[""Sea and ocean""]",2017-06-13T10:10:31+0000 100 | 99,"[""Mixed forest"", ""Transitional woodland/shrub"", ""Water bodies""]",2018-05-22T09:30:29+0000 101 | 100,"[""Water bodies""]",2017-09-24T09:30:21+0000 102 | 101,"[""Non-irrigated arable land"", ""Pastures"", ""Agro-forestry areas"", ""Transitional woodland/shrub""]",2017-08-18T11:21:09+0000 103 | 102,"[""Coniferous forest"", ""Mixed forest""]",2017-11-01T09:41:31+0000 104 | 103,"[""Non-irrigated arable land"", ""Coniferous forest"", ""Mixed forest"", ""Transitional woodland/shrub""]",2018-05-21T10:00:29+0000 105 | 104,"[""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest"", ""Coniferous forest"", ""Transitional woodland/shrub""]",2018-05-15T11:21:10+0000 106 | 105,"[""Broad-leaved forest"", ""Mixed forest"", ""Natural grassland"", ""Transitional woodland/shrub""]",2017-12-08T09:33:51+0000 107 | 106,"[""Discontinuous urban fabric"", ""Non-irrigated arable land"", ""Vineyards"", ""Complex cultivation patterns""]",2017-10-15T09:50:31+0000 108 | 107,"[""Non-irrigated arable land"", ""Broad-leaved forest"", ""Coniferous forest"", ""Mixed forest""]",2017-09-05T09:50:31+0000 109 | 108,"[""Complex cultivation patterns"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest"", ""Transitional woodland/shrub""]",2017-09-14T09:30:30+0000 110 | 109,"[""Coniferous forest"", ""Transitional woodland/shrub"", ""Peatbogs""]",2017-09-05T09:50:31+0000 111 | 110,"[""Coniferous forest"", ""Mixed forest""]",2017-09-06T10:10:20+0000 112 | 111,"[""Non-irrigated arable land"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Coniferous forest"", ""Transitional woodland/shrub""]",2018-02-04T09:41:59+0000 113 | 112,"[""Non-irrigated arable land"", ""Complex cultivation patterns"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest"", ""Mixed forest""]",2018-03-18T09:30:31+0000 114 | 113,"[""Coniferous forest"", ""Mixed forest"", ""Peatbogs""]",2017-07-01T09:30:31+0000 115 | 114,"[""Discontinuous urban fabric"", ""Non-irrigated arable land"", ""Complex cultivation patterns"", ""Land principally occupied by agriculture, with significant areas of natural vegetation""]",2017-12-19T09:54:09+0000 116 | 115,"[""Non-irrigated arable land"", ""Pastures"", ""Complex cultivation patterns"", ""Transitional woodland/shrub""]",2017-08-16T09:50:31+0000 117 | 116,"[""Permanently irrigated land"", ""Broad-leaved forest"", ""Natural grassland"", ""Transitional woodland/shrub""]",2017-08-13T11:21:21+0000 118 | 117,"[""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Coniferous forest"", ""Mixed forest""]",2017-08-17T10:10:19+0000 119 | 118,"[""Discontinuous urban fabric"", ""Pastures"", ""Complex cultivation patterns"", ""Coniferous forest"", ""Mixed forest""]",2017-08-08T09:40:29+0000 120 | 119,"[""Coniferous forest"", ""Mixed forest"", ""Transitional woodland/shrub"", ""Water bodies""]",2018-05-25T09:40:29+0000 121 | 120,"[""Complex cultivation patterns"", ""Broad-leaved forest"", ""Coniferous forest"", ""Mixed forest""]",2018-04-28T09:50:29+0000 122 | 121,"[""Sea and ocean""]",2017-12-01T11:24:31+0000 123 | 122,"[""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Coniferous forest"", ""Mixed forest"", ""Water bodies""]",2018-05-27T09:30:41+0000 124 | 123,"[""Broad-leaved forest"", ""Sclerophyllous vegetation"", ""Transitional woodland/shrub""]",2017-10-02T11:21:12+0000 125 | 124,"[""Continuous urban fabric""]",2018-05-15T11:21:09+0000 126 | 125,"[""Non-irrigated arable land"", ""Complex cultivation patterns"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest""]",2018-05-02T09:30:39+0000 127 | 126,"[""Complex cultivation patterns"", ""Coniferous forest"", ""Transitional woodland/shrub""]",2017-11-21T11:23:51+0000 128 | 127,"[""Non-irrigated arable land"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest"", ""Mixed forest"", ""Transitional woodland/shrub""]",2017-07-09T09:40:29+0000 129 | 128,"[""Non-irrigated arable land"", ""Mixed forest"", ""Water bodies""]",2017-07-09T09:40:29+0000 130 | 129,"[""Complex cultivation patterns"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest""]",2017-12-08T09:33:51+0000 131 | 130,"[""Non-irrigated arable land"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Coniferous forest"", ""Mixed forest"", ""Transitional woodland/shrub"", ""Water bodies""]",2017-07-09T09:40:29+0000 132 | 131,"[""Non-irrigated arable land"", ""Transitional woodland/shrub"", ""Inland marshes""]",2017-10-02T09:40:32+0000 133 | 132,"[""Coniferous forest"", ""Mixed forest"", ""Peatbogs""]",2018-05-26T10:00:31+0000 134 | 133,"[""Sea and ocean""]",2018-05-11T10:00:29+0000 135 | 134,"[""Coniferous forest"", ""Mixed forest"", ""Peatbogs""]",2018-05-26T10:00:31+0000 136 | 135,"[""Non-irrigated arable land"", ""Coniferous forest"", ""Transitional woodland/shrub""]",2017-09-24T09:30:20+0000 137 | 136,"[""Discontinuous urban fabric"", ""Complex cultivation patterns"", ""Coniferous forest"", ""Mixed forest"", ""Transitional woodland/shrub""]",2017-11-21T11:23:51+0000 138 | 137,"[""Non-irrigated arable land"", ""Pastures"", ""Complex cultivation patterns"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest"", ""Coniferous forest"", ""Inland marshes""]",2018-04-13T09:50:31+0000 139 | 138,"[""Coniferous forest"", ""Mixed forest"", ""Transitional woodland/shrub"", ""Water bodies""]",2018-03-18T09:30:31+0000 140 | 139,"[""Non-irrigated arable land"", ""Pastures""]",2018-04-22T09:30:29+0000 141 | 140,"[""Pastures"", ""Coniferous forest"", ""Moors and heathland""]",2018-04-21T11:43:49+0000 142 | 141,"[""Non-irrigated arable land"", ""Pastures"", ""Broad-leaved forest"", ""Coniferous forest"", ""Mixed forest""]",2017-09-27T09:40:19+0000 143 | 142,"[""Non-irrigated arable land"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Coniferous forest""]",2018-05-08T10:40:31+0000 144 | 143,"[""Non-irrigated arable land"", ""Agro-forestry areas"", ""Transitional woodland/shrub""]",2017-08-13T11:21:21+0000 145 | 144,"[""Non-irrigated arable land"", ""Complex cultivation patterns"", ""Broad-leaved forest"", ""Mixed forest""]",2018-04-21T10:00:29+0000 146 | 145,"[""Non-irrigated arable land"", ""Complex cultivation patterns"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest""]",2018-05-06T10:00:31+0000 147 | 146,"[""Non-irrigated arable land""]",2018-04-30T09:40:31+0000 148 | 147,"[""Non-irrigated arable land"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest"", ""Mixed forest""]",2017-09-27T09:40:19+0000 149 | 148,"[""Non-irrigated arable land"", ""Coniferous forest"", ""Mixed forest""]",2018-02-04T09:41:55+0000 150 | 149,"[""Broad-leaved forest"", ""Coniferous forest"", ""Mixed forest"", ""Transitional woodland/shrub""]",2017-08-13T11:21:21+0000 151 | 150,"[""Continuous urban fabric""]",2018-05-15T11:21:09+0000 152 | 151,"[""Sea and ocean""]",2017-09-06T10:10:19+0000 153 | 152,"[""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest"", ""Mixed forest""]",2017-09-24T09:30:21+0000 154 | 153,"[""Non-irrigated arable land""]",2017-10-15T09:50:31+0000 155 | 154,"[""Discontinuous urban fabric"", ""Inland marshes"", ""Water bodies""]",2017-06-13T10:10:31+0000 156 | 155,"[""Road and rail networks and associated land"", ""Pastures""]",2018-02-25T11:43:51+0000 157 | 156,"[""Non-irrigated arable land"", ""Mixed forest"", ""Transitional woodland/shrub""]",2017-07-01T09:30:31+0000 158 | 157,"[""Discontinuous urban fabric"", ""Non-irrigated arable land"", ""Pastures"", ""Complex cultivation patterns"", ""Coniferous forest"", ""Transitional woodland/shrub""]",2018-05-07T09:30:41+0000 159 | 158,"[""Non-irrigated arable land"", ""Complex cultivation patterns"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest"", ""Transitional woodland/shrub""]",2017-08-16T09:50:31+0000 160 | 159,"[""Complex cultivation patterns"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Transitional woodland/shrub""]",2018-05-15T11:21:10+0000 161 | 160,"[""Discontinuous urban fabric"", ""Industrial or commercial units"", ""Broad-leaved forest"", ""Mixed forest""]",2018-02-04T09:41:59+0000 162 | 161,"[""Sea and ocean""]",2018-04-19T10:10:31+0000 163 | 162,"[""Coniferous forest"", ""Mixed forest"", ""Transitional woodland/shrub""]",2018-05-25T09:40:29+0000 164 | 163,"[""Sea and ocean""]",2017-08-30T10:20:19+0000 165 | 164,"[""Broad-leaved forest"", ""Coniferous forest"", ""Mixed forest"", ""Transitional woodland/shrub""]",2017-09-24T09:30:19+0000 166 | 165,"[""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Sclerophyllous vegetation"", ""Transitional woodland/shrub""]",2017-08-18T11:21:09+0000 167 | 166,"[""Sea and ocean""]",2017-08-14T10:00:29+0000 168 | 167,"[""Non-irrigated arable land"", ""Mixed forest"", ""Water bodies""]",2017-09-05T09:50:31+0000 169 | 168,"[""Non-irrigated arable land"", ""Pastures"", ""Complex cultivation patterns""]",2017-06-13T10:10:31+0000 170 | 169,"[""Coniferous forest"", ""Mixed forest"", ""Transitional woodland/shrub""]",2017-11-01T09:41:32+0000 171 | 170,"[""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Mixed forest"", ""Water bodies""]",2017-11-01T09:41:31+0000 172 | 171,"[""Coniferous forest"", ""Mixed forest""]",2017-11-21T11:23:51+0000 173 | 172,"[""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest"", ""Transitional woodland/shrub""]",2018-02-01T09:32:19+0000 174 | 173,"[""Non-irrigated arable land"", ""Coniferous forest"", ""Mixed forest"", ""Transitional woodland/shrub""]",2018-05-27T09:30:41+0000 175 | 174,"[""Discontinuous urban fabric"", ""Non-irrigated arable land"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Mixed forest"", ""Water courses""]",2018-02-28T10:10:21+0000 176 | 175,"[""Coniferous forest"", ""Mixed forest""]",2017-07-09T09:40:29+0000 177 | 176,"[""Olive groves"", ""Land principally occupied by agriculture, with significant areas of natural vegetation""]",2017-11-21T11:23:51+0000 178 | 177,"[""Pastures"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest""]",2017-09-27T09:40:20+0000 179 | 178,"[""Non-irrigated arable land"", ""Complex cultivation patterns"", ""Broad-leaved forest"", ""Transitional woodland/shrub""]",2017-08-02T09:20:29+0000 180 | 179,"[""Sea and ocean""]",2017-08-14T10:00:29+0000 181 | 180,"[""Discontinuous urban fabric"", ""Non-irrigated arable land""]",2017-08-24T10:00:19+0000 182 | 181,"[""Coniferous forest"", ""Mixed forest"", ""Transitional woodland/shrub""]",2018-05-25T09:40:29+0000 183 | 182,"[""Mixed forest"", ""Transitional woodland/shrub"", ""Peatbogs""]",2018-05-26T10:00:31+0000 184 | 183,"[""Sea and ocean""]",2017-06-17T11:33:21+0000 185 | 184,"[""Pastures""]",2018-02-20T11:43:39+0000 186 | 185,"[""Non-irrigated arable land"", ""Permanently irrigated land"", ""Water bodies""]",2017-10-02T11:21:11+0000 187 | 186,"[""Pastures"", ""Coniferous forest"", ""Transitional woodland/shrub""]",2018-02-25T11:43:51+0000 188 | 187,"[""Non-irrigated arable land"", ""Complex cultivation patterns"", ""Mixed forest""]",2017-09-30T09:50:19+0000 189 | 188,"[""Coniferous forest"", ""Water bodies""]",2018-05-15T09:40:29+0000 190 | 189,"[""Non-irrigated arable land"", ""Complex cultivation patterns""]",2018-05-10T09:40:31+0000 191 | 190,"[""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Coniferous forest"", ""Mixed forest"", ""Transitional woodland/shrub"", ""Water bodies""]",2017-09-24T09:30:19+0000 192 | 191,"[""Annual crops associated with permanent crops"", ""Sclerophyllous vegetation""]",2017-12-01T11:24:31+0000 193 | 192,"[""Sea and ocean""]",2017-09-06T10:10:19+0000 194 | 193,"[""Coniferous forest"", ""Peatbogs"", ""Water courses""]",2017-09-24T09:30:19+0000 195 | 194,"[""Water bodies""]",2017-09-24T09:30:21+0000 196 | 195,"[""Coniferous forest"", ""Mixed forest"", ""Transitional woodland/shrub""]",2018-05-22T09:30:29+0000 197 | 196,"[""Sea and ocean""]",2017-08-14T10:00:29+0000 198 | 197,"[""Broad-leaved forest"", ""Coniferous forest"", ""Mixed forest""]",2017-11-01T09:41:31+0000 199 | 198,"[""Sea and ocean""]",2017-12-01T11:24:31+0000 200 | 199,"[""Coniferous forest""]",2018-04-17T10:20:19+0000 201 | 200,"[""Complex cultivation patterns"", ""Broad-leaved forest"", ""Coniferous forest"", ""Transitional woodland/shrub""]",2017-09-14T09:30:29+0000 202 | 201,"[""Coniferous forest"", ""Mixed forest"", ""Transitional woodland/shrub""]",2017-08-17T10:10:19+0000 203 | 202,"[""Mineral extraction sites"", ""Pastures"", ""Complex cultivation patterns"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest"", ""Natural grassland"", ""Transitional woodland/shrub"", ""Bare rock"", ""Sparsely vegetated areas""]",2017-09-14T09:30:29+0000 204 | 203,"[""Discontinuous urban fabric"", ""Non-irrigated arable land"", ""Broad-leaved forest""]",2017-11-04T09:52:01+0000 205 | 204,"[""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Coniferous forest"", ""Mixed forest"", ""Transitional woodland/shrub"", ""Water bodies""]",2017-09-05T09:50:31+0000 206 | 205,"[""Broad-leaved forest"", ""Coniferous forest"", ""Mixed forest""]",2018-02-23T10:10:19+0000 207 | 206,"[""Discontinuous urban fabric"", ""Non-irrigated arable land"", ""Pastures"", ""Mixed forest""]",2018-05-08T10:40:31+0000 208 | 207,"[""Non-irrigated arable land"", ""Complex cultivation patterns"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Coniferous forest""]",2018-05-06T10:00:31+0000 209 | 208,"[""Non-irrigated arable land"", ""Vineyards"", ""Olive groves"", ""Agro-forestry areas""]",2017-08-18T11:21:09+0000 210 | 209,"[""Non-irrigated arable land"", ""Complex cultivation patterns"", ""Land principally occupied by agriculture, with significant areas of natural vegetation""]",2018-04-13T09:50:31+0000 211 | 210,"[""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest"", ""Coniferous forest"", ""Mixed forest"", ""Sea and ocean""]",2018-05-11T10:00:29+0000 212 | 211,"[""Coniferous forest"", ""Mixed forest"", ""Sea and ocean""]",2018-05-21T10:00:29+0000 213 | 212,"[""Non-irrigated arable land"", ""Complex cultivation patterns"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest"", ""Water bodies""]",2017-09-27T09:40:19+0000 214 | 213,"[""Discontinuous urban fabric"", ""Non-irrigated arable land"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest"", ""Mixed forest""]",2018-05-25T09:40:30+0000 215 | 214,"[""Pastures"", ""Moors and heathland"", ""Water bodies""]",2017-06-17T11:33:21+0000 216 | 215,"[""Discontinuous urban fabric"", ""Non-irrigated arable land"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest""]",2017-12-08T09:33:51+0000 217 | 216,"[""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Coniferous forest"", ""Mixed forest""]",2018-02-04T09:41:55+0000 218 | 217,"[""Non-irrigated arable land""]",2018-04-30T09:40:31+0000 219 | 218,"[""Water bodies""]",2017-09-24T09:30:20+0000 220 | 219,"[""Non-irrigated arable land"", ""Broad-leaved forest"", ""Mixed forest""]",2017-09-27T09:40:20+0000 221 | 220,"[""Coniferous forest"", ""Mixed forest"", ""Water bodies""]",2017-11-01T09:41:32+0000 222 | 221,"[""Complex cultivation patterns"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest""]",2018-05-02T09:30:39+0000 223 | 222,"[""Coniferous forest"", ""Mixed forest""]",2018-02-05T10:02:11+0000 224 | 223,"[""Non-irrigated arable land"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Transitional woodland/shrub""]",2018-04-30T09:40:31+0000 225 | 224,"[""Non-irrigated arable land"", ""Land principally occupied by agriculture, with significant areas of natural vegetation""]",2017-12-08T09:33:51+0000 226 | 225,"[""Non-irrigated arable land"", ""Broad-leaved forest""]",2017-10-02T11:21:12+0000 227 | 226,"[""Complex cultivation patterns"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest""]",2018-02-01T09:32:19+0000 228 | 227,"[""Non-irrigated arable land"", ""Coniferous forest"", ""Mixed forest"", ""Transitional woodland/shrub""]",2018-02-28T10:10:21+0000 229 | 228,"[""Sea and ocean""]",2017-09-06T10:10:20+0000 230 | 229,"[""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Agro-forestry areas"", ""Transitional woodland/shrub""]",2017-07-04T11:21:11+0000 231 | 230,"[""Sea and ocean""]",2017-10-16T10:10:09+0000 232 | 231,"[""Non-irrigated arable land"", ""Complex cultivation patterns"", ""Broad-leaved forest""]",2018-04-13T09:50:31+0000 233 | 232,"[""Bare rock"", ""Sea and ocean""]",2018-04-19T10:10:31+0000 234 | 233,"[""Non-irrigated arable land"", ""Broad-leaved forest"", ""Transitional woodland/shrub"", ""Water bodies""]",2017-10-02T11:21:11+0000 235 | 234,"[""Non-irrigated arable land"", ""Transitional woodland/shrub"", ""Inland marshes"", ""Water bodies""]",2017-10-02T09:40:31+0000 236 | 235,"[""Non-irrigated arable land"", ""Olive groves"", ""Agro-forestry areas"", ""Transitional woodland/shrub""]",2018-03-26T11:21:09+0000 237 | 236,"[""Discontinuous urban fabric"", ""Complex cultivation patterns"", ""Mixed forest""]",2017-07-04T11:21:11+0000 238 | 237,"[""Complex cultivation patterns"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Broad-leaved forest"", ""Transitional woodland/shrub""]",2017-12-21T11:25:01+0000 239 | 238,"[""Non-irrigated arable land"", ""Coniferous forest"", ""Mixed forest"", ""Transitional woodland/shrub""]",2017-08-17T10:10:19+0000 240 | 239,"[""Sea and ocean""]",2017-07-20T10:00:31+0000 241 | 240,"[""Non-irrigated arable land"", ""Pastures"", ""Land principally occupied by agriculture, with significant areas of natural vegetation""]",2017-08-16T09:50:31+0000 242 | 241,"[""Sea and ocean""]",2017-07-17T11:33:21+0000 243 | 242,"[""Coniferous forest"", ""Transitional woodland/shrub""]",2017-07-01T09:30:31+0000 244 | 243,"[""Non-irrigated arable land"", ""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Coniferous forest"", ""Mixed forest""]",2018-05-25T09:40:29+0000 245 | 244,"[""Non-irrigated arable land"", ""Pastures""]",2018-04-21T11:43:49+0000 246 | 245,"[""Discontinuous urban fabric"", ""Non-irrigated arable land""]",2017-09-27T09:40:19+0000 247 | 246,"[""Mineral extraction sites"", ""Non-irrigated arable land"", ""Coniferous forest"", ""Mixed forest""]",2018-05-27T09:30:41+0000 248 | 247,"[""Pastures"", ""Broad-leaved forest"", ""Transitional woodland/shrub""]",2017-10-02T11:21:12+0000 249 | 248,"[""Mixed forest"", ""Transitional woodland/shrub""]",2017-09-05T09:50:31+0000 250 | 249,"[""Land principally occupied by agriculture, with significant areas of natural vegetation"", ""Coniferous forest"", ""Mixed forest"", ""Water bodies""]",2017-11-01T09:41:32+0000 251 | 250,"[""Non-irrigated arable land"", ""Broad-leaved forest""]",2017-08-31T09:50:29+0000 -------------------------------------------------------------------------------- /intro.md: -------------------------------------------------------------------------------- 1 | # GEOG 213/313: Advanced Geospatial Analytics with Python 2 | 3 | ## __Overview__ 4 | Geospatial analytics is being revolutionized by the increasing availability of multi-modal observations from various sensors, new analytical methods including machine learning techniques, and the shift to using cloud infrastructures. Along with these drivers, the software landscape for geospatial analytics has changed during the last decade. While there are still several commercial software providers, there is a growing ecosystem of open-source software and toolboxes for geospatial analysis. Python is one of main programming languages in this landscape. Python is a general-purpose language which can be used for a wide range of tasks including web development and data manipulation in addition to data analytics. These features along with the large developer community who maintains and expands various Python packages, have increased popularity and usability of Python for geospatial applications. 5 | 6 | This course is a follow-on to Intro Python Programming (IDCE 302) offered as part of the Geographic Information Science, MS program (MSGIS) at Clark University. The course is designed to fill the gap for an advanced Python programming course (200/300) at Clark with a focus on geospatial data analytics. Students who take this course will be introduced to the principles of open-source software for science, and how to develop reproducible workflows in Python. They will also learn to access geospatial data on various portals (with an emphasis on cloud data stores) using Python. The key focus of the course will be on geospatial data analytics and data visualization for the rest of the semester. 7 | 8 | The intended audiences for the course are PhD students in Geography, MSGIS students, and majors in Geography, GES, ESS, and Data Science. 9 | 10 | ## __Learning Goals__ 11 | - Develop reproducible scientific code; 12 | - Gain comprehensive understanding of geospatial Python packages; 13 | - Access and work with geospatial data in Python; 14 | - Transform, merge, and manipulate geospatial data in Python; 15 | - Visualize geospatial data in Python; 16 | - Scalable and parallel computations in Python; 17 | -------------------------------------------------------------------------------- /lectures/01_unix.md: -------------------------------------------------------------------------------- 1 | # Introduction to Shell and Bash 2 | 3 | This lectures covers an introduction to Unix file system, and Shell/Bash commands. 4 | 5 | **Attribution** 6 | *The content of this lecture are modified from three excellent sources: [Introduction to Bash (Shell)](https://www.earthdatascience.org/courses/intro-to-earth-data-science/open-reproducible-science/bash/) from Earth Lab CU Boulder; [Intro to Unix](https://earth-env-data-science.github.io/lectures/environment/intro_to_unix.html) from Columbia University; and [The Unix Shell](https://swcarpentry.github.io/shell-novice/) from Software Carpentry.* 7 | 8 | --- 9 | 10 | ## Background 11 | 12 | We interact with computers in many different ways, such as through a keyboard and mouse, touch screen interfaces, or using speech recognition systems. The most widely used way to interact with personal computers is called a graphical user interface (GUI). With a GUI, we give instructions by clicking a mouse and using menu-driven interactions. 13 | 14 | While the visual aid of a GUI makes it intuitive to learn, this way of delivering instructions to a computer scales very poorly and it is not reproducible. Imagine the following task: for a literature search, you have to copy the third line of one thousand text files in one thousand different directories and paste it into a single file. Using a GUI, you would not only be clicking at your desk for several hours, but you could potentially also commit an error in the process of completing this repetitive task. This is where we take advantage of the Unix shell. The Unix shell is both a command-line interface (CLI) and a scripting language, allowing such repetitive tasks to be done automatically and fast. With the proper commands, the shell can repeat tasks with or without some modification as many times as we want. Using the shell, the task in the literature example can be accomplished in seconds. 15 | 16 | ## The Shell 17 | 18 | Shell is the primary program that computers use to receive code (i.e. commands) and return information produced by executing these commands (i.e. output). These commands can be entered via a Terminal, which you will work with in this course. 19 | 20 | Using a Shell helps you: 21 | 22 | - Navigate your computer to access and manage files and folders (i.e. directories). 23 | - Efficiently work with many files and directories at once. 24 | - Run programs that provide more functionality at the command line such as git for version control. 25 | - Launch programs from specific directories on your computer such as Jupyter Notebook for interactive programming. 26 | - Use repeatable commands for these tasks across many different operating systems (Windows, Mac, Linux). 27 | 28 | Shell is also important if you need to work on remote machines such as a high performance computing cluster (HPC) or the cloud. 29 | 30 | The most popular Unix shell is Bash (the Bourne Again SHell — so-called because it’s derived from a shell written by Stephen Bourne). Bash is the default shell on most modern implementations of Unix and in most packages that provide Unix-like tools for Windows. Note that ‘Git Bash’ is a piece of software that enables Windows users to use a Bash like interface when interacting with Git. 31 | 32 | 33 | ## Terminal Sessions On Your Computer 34 | 35 | The terminal program that you use to run `Bash` commands will vary depending upon your computer’s operating system. 36 | 37 | **Mac (OS X)** 38 | 39 | You can use the program called Terminal, which uses the Bash implementation of Shell and is installed natively on the Mac OS. 40 | 41 | You can open Terminal by finding and launching it from Spotlight (or from /Applications/Utilities). 42 | 43 | 44 | **Linux** 45 | 46 | Many Linux computers use the Bash implementation of Shell, which you will learn to test for in the section below. 47 | 48 | You can open the program called Terminal (or Terminal Emulator) by finding and launching it from your list of programs. 49 | 50 | **Windows** 51 | 52 | There are many options for running Bash on Windows. In this course, we recommend using Windows Subsystem for Linux (WSL). WSL is a feature of the Windows operating system that enables you to run a Linux file system, along with Linux command-line tools and GUI apps, directly on Windows, alongside your traditional Windows desktop and apps. You can read more about it [here](https://learn.microsoft.com/en-us/windows/wsl/faq). 53 | 54 | To enable WSL on windows, follow the steps outlined [here](https://learn.microsoft.com/en-us/windows/wsl/install). The default Ubuntu distribution works well with the content of this course, and you do not need to change it. 55 | 56 | 57 | ## Home Directory 58 | To understand what a “home directory” is, let’s have a look at how the file system as a whole is organized. For the sake of this example, check the following figure that illustrates the file system on Hamed's computer. After this illustration, you’ll be learning commands to explore your own filesystem, which will be constructed in a similar way, but not be exactly identical. 59 | 60 | On most Unix computers, the filesystem looks like something this: 61 | 62 | ```{figure} ../lectures/figures/unix_files.png 63 | --- 64 | name: unix 65 | class: bg-primary mb-1 66 | width: 500px 67 | align: center 68 | --- 69 | Hamed's file system tree 70 | ``` 71 | 72 | The filesystem looks like an upside down tree. The top most directory is the **root directory** that holds everything else. We refer to it using a slash character, `/`, on its own; this character is the leading slash in `/Users/hamed`. 73 | 74 | Inside that directory are several other directories: `bin` (which is where some built-in programs are stored), `lib` (for the software “libraries” used by different programs), `Users` (where users’ personal directories are located), `tmp` (for temporary files that don’t need to be stored long-term), and so on. 75 | 76 | By default, when you open your terminal it lands in your home directory, which in our example is `/Users/hamed` (in WSL the home directory would be `/home/hamed`). We know that this directory is stored inside `/Users` because `/Users` is the first part of its name. Similarly, we know that `/Users` is stored inside the root directory `/` because its name begins with `/`. 77 | 78 | ``` {tip} 79 | There are two meanings for the `/` character. When it appears at the front of a file or directory name, it refers to the root directory. When it appears inside a path, it’s just a separator. 80 | ``` 81 | 82 | ## Bash Commands to Navigate Files and Directories 83 | 84 | Now, let's review some of the useful bash commands to navigate and manipulate files and directories in your filesystem. 85 | 86 | ### Print a List of Files and Subdirectories (`ls`) 87 | 88 | `ls` prints the names of the files and directories in the current directory in alphabetical order, arranged neatly into columns. Here is the output of running `ls` inside the directory of this book on a local computer: 89 | ``` 90 | $ ls 91 | ``` 92 | ``` 93 | CONDUCT.md _config.yml lectures 94 | CONTRIBUTING.md _toc.yml logo.png 95 | LICENSE assignments references.bib 96 | README.md docs requirements.txt 97 | _build intro.md 98 | ``` 99 | We can make its output more comprehensible by using the flag `-F`, which tells `ls` to add a trailing `/` to the names of directories: 100 | 101 | ``` 102 | $ ls -F 103 | ``` 104 | ``` 105 | CONDUCT.md _config.yml lectures/ 106 | CONTRIBUTING.md _toc.yml logo.png 107 | LICENSE* assignments/ references.bib 108 | README.md docs/ requirements.txt 109 | _build/ intro.md 110 | ``` 111 | To print all files and directories, including hidden ones, you can add the `-a` flag: 112 | ``` 113 | $ ls -aF 114 | ``` 115 | ``` 116 | ./ LICENSE intro.md 117 | ../ README.md lectures/ 118 | .DS_Store _build/ logo.png 119 | .git/ _config.yml references.bib 120 | .gitignore _toc.yml requirements.txt 121 | CONDUCT.md assignments/ 122 | CONTRIBUTING.md docs/ 123 | ``` 124 | 125 | The `../` directory in the output of `ls -aF` is a special directory name meaning "the directory containing this one" or simply the **parent** of the current directory. You will learn in the following how to navigate to the parent directory. 126 | 127 | ### Print Current Working Directory (`pwd`) 128 | 129 | To print the name of the current working directory, use the command `pwd`. This commands prints the full path to the directory (meaning that you can see the parent directory). 130 | 131 | ``` 132 | $ pwd 133 | ``` 134 | ``` 135 | /Users/hamed 136 | ``` 137 | 138 | ### Change Current Working Directory (`cd`) 139 | The command to change locations is `cd` followed by a directory name to change our working directory. `cd` stands for “change directory”. 140 | 141 | Let's say we are inside the directory of this book, and we want to move to the `lectures` directory we saw above. We can use the following command: 142 | ``` 143 | $ cd lectures 144 | ``` 145 | There is no output from this command. But if you are `pwd` you can confirm that you are in `lectures` directory now. 146 | 147 | You can use the same `cd` command to also go to the parent directory of the current directory: 148 | ``` 149 | $ cd .. 150 | ``` 151 | 152 | ``` {tip} 153 | If you run `cd` without any arguments it will return you to your home directory. This is equivalent to running `cd ~` 154 | ``` 155 | 156 | So far, when specifying directory names, or even a directory path (as above), we have been using **relative paths**. When you use a relative path with a command like `ls` or `cd`, it tries to find that location from where we are, rather than from the root of the file system. 157 | 158 | However, it is possible to specify the **absolute path** to a directory by including its entire path from the root directory, which is indicated by a leading slash. The leading `/` tells the computer to follow the path from the root of the file system, so it always refers to exactly one directory, no matter where we are when we run the command. 159 | 160 | `````{admonition} A useful shortcut 161 | :class: tip 162 | You can use the `-` (dash) character with `cd` to move into the previous directory you were in. This is very helpful instead of having to remember the full path. 163 | ````` 164 | 165 | ### Create a New Directory (`mkdir`) 166 | 167 | To create a new directory you can use `mkdir` followed by the name you would like to give the new directory. If you only give a name, the new directory will be created in the current directory. You can also give the **absolute path** to `mkdir` to create a new directory anywhere on your file system. The following command will create a new directory in Hamed's home directory named `new-directory`: 168 | 169 | ``` 170 | $ mkdir /Users/hamed/new-directory/ 171 | ``` 172 | ### Create a New File Using a Single Command (`touch`) 173 | 174 | You can create a new empty file using the single command `touch`. This command was originally created to manage the timestamps of files. However, if a file does not already exist, then the command will make the file. 175 | 176 | This is an incredibly useful way to quickly and programmatically create a new empty file that can be populated at a later time. Here is an example: 177 | ``` 178 | $ touch samples.txt 179 | ``` 180 | 181 | ### Edit a File Using Vim 182 | 183 | There are various editors that you can use to edit a file in Bash. These are very useful if you need to edit text in a plain text file, or in HTML, LaTex or other markup languages. While these editors might not seem as easy to use as other standard GUI-based editors initially, they can help you become very productive over the long run. 184 | 185 | In this course, we are going to use Vim as editor. Vim can have a steep learning curve, and you do not need to learn all the commands in the beginning. Start with navigating between the *command mode* and the *insert mode*, edit the text and save it to the file. We will practice these commands in the class. For a complete introduction to Vim commands check out [A Beginner's Guide to Vim](https://www.linuxfoundation.org/blog/blog/classic-sysadmin-vim-101-a-beginners-guide-to-vim) on Linux Foundation blog. 186 | 187 | 188 | ### Copy a File (`cp`) 189 | 190 | You can copy a specific file to a new directory using the command `cp` followed by the name of the file you want to copy and the name of the directory to where you want to copy the file. The names can be relative path or absolute path. 191 | 192 | For example, to copy the file `samples.txt` from the current directory to `/Users/hamed/documents/` 193 | 194 | ``` 195 | $ cp samples.txt /Users/hamed/documents/ 196 | ``` 197 | 198 | 199 | ### Copy a Directory and Its Contents (`cp -r`) 200 | To copy a directory and all its content to a new directory, you need to use the flag `-r` (meaning recursive) with `cp`. 201 | 202 | For example, to copy the directory `documents` (and all its content) from Hamed's home directory to `/Users/hamed/projects/` you can run: 203 | ``` 204 | $ cp -r /Users/hamed/documents/ /Users/hamed/projects/ 205 | ``` 206 | 207 | ### Delete a File (`rm`) 208 | To delete a specific file, you can use the command `rm` (abbreviated for remove) followed by the name of the file you want to delete. 209 | 210 | For example, you can delete the `samples.txt` file under the current directory: 211 | 212 | ``` 213 | $ rm samples.txt 214 | ``` 215 | 216 | ### Delete a Directory (`rm -r`) 217 | To delete a directory and all its content (be careful when using this command as Unix doesn't have a trash bin), you can use the `-r` flag with `rm`. 218 | 219 | For example, the following command will delete the `projects` directory and all its content from the current directory: 220 | ``` 221 | $ rm -r projects/ 222 | ``` 223 | 224 | ## Getting help 225 | Every command in bash has multiple options that you can pass to change the output. There are two ways to find out what options are available: 226 | 227 | 1. Pass `--help` to any command: 228 | ``` 229 | $ ls --help 230 | ``` 231 | 2. Read the manual using `man`: 232 | ``` 233 | $ man ls 234 | ``` 235 | 236 |

 

-------------------------------------------------------------------------------- /lectures/02_git.md: -------------------------------------------------------------------------------- 1 | # Introduction to Version Control and Git 2 | 3 | In this lecture, you will learn what version control is and how to use git and GitHub for version control. 4 | 5 | **Attribution** 6 | *The content of this lecture are modified from three excellent sources: [Git and GitHub](https://www.earthdatascience.org/courses/intro-to-earth-data-science/git-github/) from Earth Lab CU Boulder; [Intro to Unix](https://earth-env-data-science.github.io/lectures/environment/intro_to_git.html) from Columbia University; and [Version Control with Git](https://swcarpentry.github.io/git-novice/) from Software Carpentry.* 7 | 8 | --- 9 | 10 | ## What is Version Control? 11 | A version control system maintains a record of changes to code and other content. It also allows us to revert changes to a previous point in time. 12 | 13 | ## Why Version Control? 14 | The following figure is one reason you need version control: 15 | 16 | ```{figure} ../lectures/figures/final-doc-phd-comics.gif 17 | --- 18 | name: date-version-control 19 | class: bg-primary mb-1 20 | width: 500px 21 | align: center 22 | --- 23 | Appending a date and number to version control. Source: "Piled Higher and Deeper" by Jorge Cham on [www.phdcomics.com](http://phdcomics.com/comics/archive/phd101212s.gif). 24 | ``` 25 | 26 | Version control is a powerful way to organize, back up, and share your research computing code with collaborators. A Version control system keeps track of a set of files and saves snapshots (i.e. versions, commits) of the files at any point in time. Using version control allows you to confidently make changes to your code (and any other files), with the ability to roll back to any previous state. 27 | 28 | Version control also allows you to share code with collaborators, make simultaneous edits, and merge your changes in a systematic, controlled way. There are different ways that a version control system can help you. Checkout the section on ["How Version Control Systems Work"](https://www.earthdatascience.org/courses/intro-to-earth-data-science/git-github/version-control/) on Earth Lab's course. 29 | 30 | ## Git and GitHub 31 | 32 | In this course, we will be using [git](https://git-scm.com/) for version control. Git is a distributed version control system that you can use locally on your computer, or through [GitHub](https://github.com/) and GitLab. In this course, we will also use GitHub for assignments and projects. 33 | 34 | If you do not have a GitHub account, you need a create one [here](https://github.com/) (it's free). 35 | 36 | ``` {Note} 37 | Git is very powerful, and has lots of utility with a steep learning curve. In this course, we will only use a subset of its functionality. You are encouraged to practice with git for your day-to-day research. 38 | ``` 39 | First step to get started with git, is to make sure you have git installed on your computer. If you are not sure git is installed on your computer, open terminal and run the following command: 40 | ``` 41 | $ git --version 42 | ``` 43 | If git is installed, you will get an output with the version of git installed. Otherwise, you need to install git following the instructions [here](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git). 44 | 45 | ## Configure git on Your Computer 46 | The first time that you use git on a computer, you will need to configure your GitHub.com username and email address. This information will be used to document who made changes to files in git. It is important to use the same email address and username that you setup on GitHub.com. 47 | 48 | You can set your Github.com username in the terminal by typing: 49 | ``` 50 | $ git config --global user.name "username" 51 | ``` 52 | Next, you can set the email for your Github.com account by typing: 53 | 54 | ``` 55 | $ git config --global user.email "email@email.com" 56 | ``` 57 | 58 | Using the `--global` configuration option, you are telling git to use these settings for all git repositories that you work with on your computer. Note that you only have to configure these settings one time on your computer. 59 | 60 | ## Authentication for GitHub 61 | GitHub requires authentication for any changes to a repo. The preferred method by GitHub is using a SSH key. Setting up SSH involves two steps: 62 | 63 | 1. Creating the key itself locally on your computer: Follow [these step](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent) to generate a new SSH key on your computer and add it to ssh-agent. 64 | 2. Adding the key to your GitHub account: Follow [this step-by-step guide](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account) to add your SSH key to your GitHub account. 65 | 66 | 67 | ## Get Started with a git repository 68 | Create a new directory, use `git init` to initiate git version control system: 69 | ``` 70 | $ mkdir my_project 71 | $ cd my_project 72 | $ git init 73 | ``` 74 | At anytime, you can run `git status` to see what files have been modified/added/deleted compared to the previous commit, and what files are staged. 75 | 76 | To stage a file for addition to the repository use the following: 77 | ``` 78 | $ git add 79 | ``` 80 | Then you need to *commit* the staged files to be added to the history of your repository: 81 | ``` 82 | $ git commit -m "a brief and informative commit message" 83 | ``` 84 | The commit message should be brief but indicative of what changes are being committed. It's important to stage and commit your changes in a way that it does not contain several unrelated changes to your code. For example, if you are making changes to different modules of your code, make sure to commit them separately, so if you need to revert one of the changes it's easy to do so. 85 | 86 | The following figure gives you a nice perspective on how staging and committing works: 87 | 88 | ```{figure} ../lectures/figures/git-add-commit.png 89 | --- 90 | name: git-add-commit 91 | class: bg-primary mb-1 92 | width: 500px 93 | align: center 94 | --- 95 | Modified files are staged using git add. Then, following git commit, all files in the staging area are included in snapshot and become part of the repository's history, receiving a unique SHA-1 hash identifier. Source: Max Joseph, adapted from Pro Git by Chacon and Straub (2014). 96 | ``` 97 | 98 | ## Push Changed Files to GitHub 99 | To push your changes to your GitHub repository, you need to create a new repository on GitHub. You can do this using the following steps: 100 | 1. In the upper-right corner of any page, use the drop-down menu, and select New repository. 101 | 2. Type a short, informative name for your repository. 102 | 3. Click Create repository. 103 | 104 | ``` {note} 105 | You can choose to initialize your repository with a README. While this is recommended, do not do it for this exercise, otherwise the changes you made locally will be out of sync with your GitHub repository. 106 | ``` 107 | 108 | Now that you have created your repository on GitHub, you need to copy it's URL and add it to your local git repository. Open your repository on GitHub and click on the green `Code` button and copy the SSH url (see the following figure). 109 | 110 | ```{figure} ../lectures/figures/git-url.png 111 | --- 112 | name: git-url 113 | class: bg-primary mb-1 114 | width: 500px 115 | align: center 116 | --- 117 | Accessing the URL of a GitHub repository. 118 | ``` 119 | Lastly, you need to add the URL to your local repository and push the changes to GitHub: 120 | 121 | ``` 122 | $ git remote add origin 123 | $ git push origin main 124 | ``` 125 | ## Working with an Existing GitHub Repository 126 | So far, we have talked about initializing a git repository and pushing that to GitHub. But you may need to start working on a project that already has a GitHub repository with content contributed by others. In this case, you can use `git clone` and `git pull` commands as following. 127 | 128 | Use `git clone` to create a *cloned* version of the GitHub repository on your local computer. You don't need to create a directory for it. When you run the following command it will automatically create a directory in your current path, and *pull* the files, branches, and all the git history to your computer: 129 | 130 | ``` 131 | $ git clone git@github.com:/.git 132 | ``` 133 | You can copy the url for the repo from GitHub as shown in {numref}`git-url`. 134 | 135 | You can also use `git pull` to sync changes that might have been committed to the GitHub repository to your local computer. Note, you should have not made any changes to your local copy that is in conflict with changes made on GitHub to be able to use this command. Otherwise, your request will result in conflicts that needs to be resolved. 136 | 137 | ``` 138 | $ git pull origin 139 | ``` 140 | 141 | ## Reverting to a Previous Commit 142 | Let's say you have made some changes to an existing repository, and committed those to your target branch. Now, you notice a fault or mistake and you would like to revert to an older commit. You can use `git revert` for this purpose. One of the easiest ways to do this is to find the commit number that you would like to revert to and use that in your `git revert` command. 143 | 144 | To get the list of all commits in your repository, you can use the `git log` command. The output of `git log` will be a list of all changes, their commit number and any messages associated with those commits. 145 | 146 | Then you can use the following command: 147 | 148 | ``` 149 | $ git revert 150 | ``` 151 | 152 | This will, by default, open a file (using *Vim*) to enter a new message for your revert commit. **Note:** `git revert` will create a new commit in the history of your git repository, and it is best practice to include a message to indicate the reason for reverting to an old commit. After entering your message, you can save the message using `:wq`. 153 | 154 | ``` {Tip} 155 | To be able to use `git revert` your working branch should be clean (no modifications). You should also check out `git reset` and `git restore` as alternatives to `git revert`. `git reset` can clean all uncommitted changes in your working directory, and `git restore` can extract specific files from a previous commit. 156 | ``` 157 | 158 |

 

-------------------------------------------------------------------------------- /lectures/03_python-env.md: -------------------------------------------------------------------------------- 1 | # Python Environments 2 | 3 | In this lecture, you will learn about Python environments and how best to use them to create reproducible pipelines. 4 | 5 | **Attribution** 6 | *The content of this lecture are modified from three excellent sources: [Python Packages for Earth Data Science](https://www.earthdatascience.org/courses/intro-to-earth-data-science/python-code-fundamentals/use-python-packages/) from Earth Lab CU Boulder; [Managing Python Environments](https://earth-env-data-science.github.io/lectures/environment/python_environments.html) from Columbia University; [Introduction to Conda for (Data) Scientists](https://carpentries-incubator.github.io/introduction-to-conda-for-data-scientists/) from Software Carpentry; and [Managing a Development Environment](https://docs.descarteslabs.com/installation-conda.html) by Descartes Labs.* 7 | 8 | ## Background 9 | Python and nearly all of the software packages in the scientific python ecosystem are open-source. They are maintained and developed by a community of scientists and programmers, some of whose work is supported by universities, non-profits, and for-profit corporations. This work mostly happens in the open, via github and other online collaboration platforms. 10 | 11 | When working with a programming language, such as Python, that can do almost anything, one has to wonder how this is possible. You download Python, it has about 25 MB, how can everything be included in this small data package. The answer is - it is not. Python, as well as many other programming languages use external libraries or packages for being able to do almost anything. You can see this already when you start programming. After learning some very basics, you often learn how to *import* something into your script or session. 12 | 13 | ```{admonition} Definitions 14 | **Module**: a collection of functions and variables, as in a script 15 | 16 | **Package**: a collection of modules with an `__init__.py` file (can be empty), as in a directory with scripts 17 | 18 | **Library**: a collection of packages with related functionality 19 | 20 | Library/Package are often used interchangeably. 21 | ``` 22 | 23 | ## Dependencies 24 | A bit further into your programming career you may notice/have noticed that many packages do not just do everything on their own. Instead, they depend on other packages for their functionality. For example, the `SciPy` package is used for numerical routines. To not reinvent the wheel, the package makes use of other packages, such as `NumPy` (numerical python) and `matplotlib` (plotting) and many more. So we say that `NumPy` and `matplotlib` are dependencies of `SciPy`. 25 | 26 | Many packages are being further developed all the time, generating different versions of packages. During development it may happen that a function call changes and/or functionalities are added or removed. If one package can depend on another, this may create issues. Therefore it is not only important to know that e.g. `SciPy` depends on `NumPy` and `matplotlib`, but also that it depends on `NumPy` version >= 1.6 and `matplotlib` version >= 1.1. `NumPy` version 1.5 in this case would not be sufficient. 27 | 28 | This emphasizes the need for creating and recording environments (in virtual terms!) to run your Python code. 29 | 30 | ## Environments 31 | 32 | When starting with programming we may not use many packages yet and the installation may be straightforward. But for most people, there comes a time when one version of a package or also the programming language is not enough anymore. You may find an older tool that depends on an older version of your programming language (e.g. Pyhton 2.7), but many of your other tools depend on a newer version (e.g. Python 3.6). You could now start up another computer or virtual machine to run the other version of the programming language, but this is not very handy, since you may want to use the tools together in a workflow later on. Here, environments are one solution to the problem. Nowadays there are several environment management systems following a similar idea: Instead of having to use multiple computers or virtual machines to run different versions of the same package, you can install packages in isolated environments. 33 | 34 | However, managing Python development environments can be tricky, especially if you are new to the language or less familiar with computer science concepts. In this course, we introduce tools to make it easy to manage and reproduce your Python environments on any machine (local computer/server/cloud). 35 | 36 | ```{figure} https://imgs.xkcd.com/comics/python_environment.png 37 | --- 38 | name: xkcd-python-env 39 | class: bg-primary mb-1 40 | width: 500px 41 | align: center 42 | --- 43 | The complexity of Python environments as illustrated by [xkcd](https://xkcd.com/1987/). 44 | ``` 45 | 46 | ## Environment management 47 | 48 | An environment management system solves a number of problems commonly encountered by (data) scientists. 49 | 50 | - An application you need for a research project requires different versions of your base programming language or different versions of various third-party packages from the versions that you are currently using. 51 | - An application you developed as part of a previous research project that worked fine on your system six months ago now no longer works. 52 | - Code that was written for a joint research project works on your machine but not on your collaborators’ machines. 53 | - An application that you are developing on your local machine doesn’t provide the same results when run on your remote cluster. 54 | 55 | An environment management system enables you to set up a new, project specific software environment containing specific Python versions as well as the versions of additional packages and required dependencies that are all mutually compatible. 56 | 57 | - Environment management systems help resolve dependency issues by allowing you to use different versions of a package for different projects. 58 | - Make your projects self-contained and reproducible by capturing all package dependencies in a single requirements file. 59 | - Allow you to install packages on a host on which you do not have admin privileges. 60 | 61 | 62 | ## Conda 63 | 64 | [Conda](https://docs.conda.io/en/latest/) is an open-source package management system and environment management system that runs on Windows, macOS, and Linux. When you install one module using conda, such as JupyterLab, all of the required modules will also be downloaded and installed with compatible versions. It was created for Python programs but it can package and distribute software for any language (Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN) 65 | 66 | Conda as a package manager helps you find and install packages. If you need a package that requires a different version of Python, you do not need to switch to a different environment manager because conda is also an environment manager. With just a few commands, you can set up a totally separate environment to run that different version of Python, while continuing to run your usual version of Python in your normal environment. 67 | 68 | There are multiple ways to install conda for Python, including [Anaconda Distribution](https://www.anaconda.com/download/) and [Miniconda Distribution](https://docs.conda.io/projects/miniconda/en/latest/). {numref}`miniconda-anaconda` shows the difference between Conda, Miniconda and Anaconda. Miniconda combines Conda with Python and a small number of core packages; Anaconda includes Miniconda as well as a large number of the most widely used Python packages (> 150). 69 | 70 | We recommend Miniconda, the most lightweight and bare-minimum approach to using conda. While the Anaconda distribution has a lot of packages pre-installed, they are not used all together in most of the projects. Therefore, it is optimal to install Miniconda and benefit from the Conda package and environment management systems but avoid installing too many unnecessary packages. 71 | 72 | 73 | ```{figure} ../lectures/figures/miniconda-vs-anaconda.png 74 | --- 75 | name: miniconda-anaconda 76 | class: bg-primary mb-1 77 | width: 500px 78 | align: center 79 | --- 80 | Conda vs. Miniconda vs. Anaconda [Source: [Planemo Documentation](https://planemo.readthedocs.io/en/latest/writing_advanced_cwl.html)] 81 | ``` 82 | ## Installing Miniconda 83 | You can download the installer for Miniconda from [this page](https://docs.conda.io/projects/miniconda/en/latest/). Make sure to download the correct version depending on your OS and hardware. Follow the step on *[Quick command line install](https://docs.anaconda.com/miniconda/#quick-command-line-install)* to install Miniconda. 84 | 85 | After finishing the installation, and opening a new terminal (or reloading it) you should see (base) at the start of your prompt in the terminal. This indicates that you are in the "base" Conda environment: 86 | ``` 87 | (base) $ 88 | ``` 89 | *For simplicity, we avoid including the (base) environment name for any code block in this book.* 90 | 91 | ```{warning} 92 | Conda has a default environment called base that include a Python installation and some core system libraries and dependencies of Conda. It is a “best practice” to avoid installing additional packages into your base software environment. Additional packages needed for a new project should always be installed into a newly created Conda environment. 93 | ``` 94 | 95 | ## Creating Environments 96 | A Conda environment is an isolated workspace where you can install specific versions of Python and packages without affecting your system-wide Python installation. Each environment has its own set of packages, making it possible to have different versions of packages in different environments. This isolation is essential for avoiding conflicts and ensuring the reproducibility of your projects. 97 | 98 | To create a new Conda environment, use the `conda create` command. Here's the basic syntax: 99 | 100 | ``` 101 | $ conda create --name myenv 102 | ``` 103 | Replace *myenv* with the name you want to give to your environment, and use a name that reflects the project or assignment you will use this environment for. 104 | 105 | If you wish, you can specify a particular version of Python for Conda to install when creating the environment. For example, to create an environment named "g313-a1" with Python 3.8, use the following command: 106 | 107 | ``` 108 | $ conda create --name g313-a1 python=3.8 109 | ``` 110 | 111 | ## Activating and Deactivating Environments 112 | Once you've created an environment, you can activate it to start working within that isolated space. To activate an environment, use the `conda activate` command: 113 | 114 | ``` 115 | $ conda activate g313-a1 116 | ``` 117 | 118 | When the environment is activated, your command prompt will display the environment name to indicate that you are working within it (e.g. `(g313-a1)username@hostname ~ $`). 119 | 120 | To deactivate the current environment and return to the base (system-wide) environment, use the `conda deactivate` command: 121 | 122 | ``` 123 | (g313-a1) $ conda deactivate 124 | ``` 125 | 126 | ```{tip} 127 | To see all the environments on your system: 128 | 129 | $ conda info --envs 130 | 131 | ``` 132 | 133 | If you want to permanently remove an environment and delete all the data associated with it: 134 | ``` 135 | $ conda remove --name my_environment --all 136 | ``` 137 | 138 | For extensive documentation on using environments, please see the [Conda documentation](https://conda.io/projects/conda/en/latest/user-guide/concepts/environments.html). 139 | 140 | ```{tip} 141 | 142 | You can avoid spelling out the full option for `conda` commands and use their first letter with a single `-`. 143 | For example, the two following commands are the same: 144 | 145 | $ conda create --name myenv 146 | 147 | $ conda create -n myenv 148 | 149 | ``` 150 | 151 | ## Installing Packages in an Environment 152 | After activating an environment, you can use `conda` to install packages specific to that environment. For example, to install the `NumPy` package into your "g313-a1" environment, use the following command: 153 | 154 | ``` 155 | $ conda install numpy 156 | ``` 157 | Conda will ensure that the package and its dependencies are installed in the active environment. 158 | 159 | If you list more than one package to be installed, Conda will download the most current, mutually compatible versions of the requested packages. 160 | 161 | In order to make your results more reproducible and to make it easier for research colleagues to recreate your Conda environments on their machines it is a “best practice” to always explicitly specify the version number for each package that you install into an environment. If you are not sure exactly which version of a package you want to use, then you can use `conda search` to see what versions are available. For example, if you wanted to see which versions of `NumPy` were available, you would run the following. 162 | ``` 163 | $ conda search numpy 164 | ``` 165 | You can then update your `conda install` command as following to install `NumPy` version 1.25.2: 166 | ``` 167 | $ conda install numpy=1.25.2 168 | ``` 169 | 170 | Finally, you can specify multiple packages and their version in the `conda create` command if you wish to install them when creating a new environment. For example, the following command will create a new environment called `scipy-env` and install four packages: 171 | ``` 172 | $ conda create --name scipy-env ipython=7.13 matplotlib=3.1 numpy=1.18 scipy=1.4 173 | ``` 174 | 175 | Another benefit of using Conda for package and environment management is that it allows you to install packages using `pip` too. Outside of the scientific python community, the most common way to install packages is to search for them on the official [PyPI](https://pypi.org/) index. Once you’ve found the package you want to install (you may have also just found it on github or elsewhere), you use the pip command from a the command line: 176 | 177 | ``` 178 | $ pip install 179 | ``` 180 | This will fetch the source code, build it, and install it to wherever your `$PYTHONPATH` is set. This works in the vast majority of cases, particularly when the code you’re installing doesn’t have any compiled dependencies. 181 | 182 | If you can’t find a package on either PyPI or `conda-forge`, you can always install it directly from the source code. If the package is on github, pip already has an alias to do this for you: 183 | ``` 184 | $ pip install git+https://github.com//.git 185 | ``` 186 | 187 | ## Channels and Conda-Forge 188 | The packages that you install using the `conda` command are hosted on conda *[channels](https://conda.io/projects/conda/en/latest/user-guide/concepts/channels.html)*. From the conda docs: 189 | 190 | > Conda channels are the locations where packages are stored. They serve as the base for hosting and managing packages. Conda packages are downloaded from remote channels, which are URLs to directories containing conda packages. The `conda` command searches a set of channels. By default, packages are automatically downloaded and updated from the [default channel](https://repo.anaconda.com/pkgs/) which may require a paid license, as described in the [repository terms of service](https://www.anaconda.com/terms-of-service). The `conda-forge` channel is free for all to use. You can modify what remote channels are automatically searched. You might want to do this to maintain a private or internal channel. For details, see how to [modify your channel lists](https://conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html#config-channels). 191 | > 192 | > Conda-forge is a community channel made up of thousands of contributors. Conda-forge itself is analogous to PyPI but with a unified, automated build infrastructure and more peer review of recipes. 193 | 194 | The Anaconda channel terms of service clearly excludes all educational activities and all research activities at non-profit institutions from their definition of commercial usage. Even companies with fewer than 200 employees are excluded. The aim of the commercial paid license for Anaconda is to require large corporations which use the repository heavily to contribute financially to its maintenance and development. Without such contributions, Anaconda might not be able to sustain itself. 195 | 196 | The simple way to specify your preferred channel is to pass the `-c` option when you run `conda install`: 197 | 198 | ``` 199 | $ conda install -c conda-forge {package_name} 200 | ``` 201 | 202 | 203 | ## Managing Environment Dependencies with `environment.yml` 204 | To ensure that your project can be easily reproduced on different systems, you can create an `environment.yml` file that lists all the dependencies for your project, including Python version and packages. Here's an example of an `environment.yml` file: 205 | 206 | ``` 207 | name: g313-a1 208 | channels: 209 | - defaults 210 | dependencies: 211 | - python=3.10 # Specify Python version 212 | - numpy=1.25.2 213 | - matplotlib=3.7.1 214 | - pandas=2.0.3 215 | ``` 216 | 217 | You can create a Conda environment from this file using the following command: 218 | ``` 219 | $ conda env create -f environment.yml 220 | ``` 221 | 222 | This command will create an environment named "g313-a1" with the specified dependencies. 223 | 224 | **Note**: This command `conda env create` is different from `conda create`. If you are only passing `-f` to `conda` for creating a new environment, you need to use `conda env create`. 225 | 226 | ## What about `pip`? 227 | `pip` is a package manager for Python that simplifies the process of installing, upgrading, and managing Python packages and dependencies. Its name is a recursive acronym for "Pip Installs Packages," emphasizing its primary function: installing packages. 228 | 229 | `pip` works by connecting to the Python Package Index ([PyPI](https://pypi.org/)), a repository that hosts a large collection of Python packages contributed by the open-source community. PyPI serves as the central hub where developers publish their Python packages. In most cases, pip comes pre-installed with Python. 230 | 231 | You can install packages using the `pip install` command as following: 232 | 233 | ``` 234 | $ pip install numpy==1.25.2 235 | ``` 236 | 237 | *Note the syntax difference for specifying package version in `pip install` (==) and `conda install` (=).* 238 | 239 | ## Using `pip` with Conda 240 | To benefit from the strengths of both pip and Conda, you can use them together. Conda is a powerful environment management system that also serves as package management, but there might be cases that a package or a specific version of package is not available on Conda. In that case, you can use `pip` together with conda and install any required packages. By combining Conda and pip, you can create reproducible and isolated Python environments while benefiting from pip's extensive package ecosystem. 241 | 242 | The simplest way to use `pip` with conda, is creating a new conda environment, and activating it. Then run `pip install` in the new environment. In this case, `pip` will install the package(s) in the conda environment only, and not in your base environment. 243 | 244 | You can also specify `pip` packages in your `environment.yml` file. Here is an example of such file: 245 | 246 | ``` 247 | name: my_env 248 | channels: 249 | - defaults 250 | dependencies: 251 | - python=3.10 252 | 253 | # Packages from Anaconda defaults channel 254 | - numpy 255 | - pandas 256 | - scikit-learn 257 | 258 | # Packages from pip 259 | - pip: 260 | - requests 261 | - matplotlib 262 | ``` 263 | 264 | ```{tip} 265 | Check out this Conda [cheatsheet](https://docs.conda.io/projects/conda/en/latest/_downloads/843d9e0198f2a193a3484886fa28163c/conda-cheatsheet.pdf) to look for some quick answers to your Conda related questions. 266 | ``` 267 | 268 |

 

269 | -------------------------------------------------------------------------------- /lectures/04_jupyter.md: -------------------------------------------------------------------------------- 1 | # JupyterLab for Interactive Computing 2 | 3 | ## Introduction 4 | 5 | [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) is an open-source, web-based integrated development environment ([IDE](https://www.codecademy.com/article/what-is-an-ide)) for notebooks, code, and data. It has a flexible and powerful interface for interactive computing, data analysis, and scientific computing. JupyterLab is a part of the [Jupyter Project](https://jupyter.org/), which aims to provide a platform-agnostic, interactive computing experience. JupyterLab builds upon the functionality of traditional Jupyter notebooks by offering a more feature-rich environment for data scientists, researchers, and students. 6 | 7 | 8 | 9 | ## Installing JupyterLab 10 | 11 | To Install JupyterLab, follow the steps below: 12 | 13 | **Using `pip`** 14 | ``` 15 | $ pip install jupyterlab 16 | ``` 17 | 18 | **Using Conda** 19 | 20 | ``` 21 | $ conda install jupyterlab 22 | ``` 23 | 24 | **Note:** You need to install JupyterLab in each environment that you need to use it. In most scientific projects, JupyterLab is always included in the `environment.yml`. At the end of this chapter you will learn how to install one instance of JupyterLab and use different conda environments from that instance. Just make sure you use the latest version to take advantage of all the new features. 25 | 26 | ## Launching JupyterLab 27 | 28 | You can launch JupyterLab using the following command: 29 | 30 | ``` 31 | $ jupyter lab 32 | ``` 33 | 34 | This will open a new tab in your web browser, displaying the JupyterLab interface. 35 | 36 | Let's look into what's happening when you run JupyterLab. Behind the scene, JupyterLab runs a Jupyter server that is hosted on your computer and accessible through port `8888` (default) of localhost. After you run `jupyter lab`, several lines will be printed in your terminal which indicates the path to the Jupyter server on your machine, and the URL to access it. The URL has a format like this: 37 | ``` 38 | http://127.0.0.1:8888/lab?token=TOKEN 39 | ``` 40 | 41 | `127.0.0.1` is the IP of your local host, `8888` is the port number, and TOKEN is a randomly generated string that is used to authenticate access to the notebook server. If JupyterLab doesn't automatically open in your browser, you can copy the link from terminal and paste it in your browser. We will use this later on when we deploy JupyterLab inside a Docker container. 42 | 43 | 44 | ```{figure} ../lectures/figures/jupyterlab.png 45 | --- 46 | name: jupyterlab 47 | class: bg-primary mb-1 48 | width: 700px 49 | align: center 50 | --- 51 | JupyterLab interface (source: [JupyterLab Documentation](https://jupyterlab.readthedocs.io/en/stable/)) 52 | ``` 53 | 54 | ## JupyterLab Interface Overview 55 | 56 | {numref}`jupyterlab` shows the latest JupyterLab interface at the time of this writing (Sep 2023). The interface has multiple sections: 57 | 58 | **Launcher** 59 | 60 | Launcher is a tab that contains shortcuts to launch a notebook, terminal, markdown file, python file, etc. You can access launcher under *File* or by clicking the large blue button on the top left of the screen. 61 | 62 | **File Browser** 63 | 64 | On the left-hand side of the interface, you'll find the file browser, which allows you to navigate your file system and create, open, and manage notebooks and other files. The files and directories that you can access here are the ones that are located in the directory you launched JupyterLab from. 65 | 66 | **Tabs and Workspaces** 67 | 68 | JupyterLab supports multiple tabs, enabling you to work on multiple notebooks or files simultaneously. You can also organize your workspaces by creating customized layouts to suit your workflow. 69 | 70 | **Kernel** 71 | 72 | The kernel is responsible for executing code within a notebook. You can choose different kernels for different programming languages (e.g., Python, R, Julia) depending on your analysis needs. You can see all running kernels on the left had side by clicking on the kernels icon. 73 | 74 | The JupyterLab documentation has detailed tutorials for [The JupyterLab Interface](https://jupyterlab.readthedocs.io/en/stable/user/interface.html), [Managing Kernels and Terminals](https://jupyterlab.readthedocs.io/en/stable/user/running.html), [Working with Terminals](https://jupyterlab.readthedocs.io/en/stable/user/terminal.html), [Notebooks](https://jupyterlab.readthedocs.io/en/stable/user/notebook.html), [Text Editor](https://jupyterlab.readthedocs.io/en/stable/user/file_editor.html), and [Working with Files](https://jupyterlab.readthedocs.io/en/stable/user/files.html). 75 | 76 | ## Accessing Conda Environments from JupyterLab 77 | 78 | When you run the `jupyter lab` command in your terminal, JupyterLab server will launch from the conda environment that was active in your terminal. For example, if you are in the `base` environment, the JupyterLab instance will be launched in `base` and all notebook will use the base environment by default. 79 | 80 | In order to use other conda environments in your JupyterLab instance, you have two choices: 81 | 82 | 1. Launch JupyterLab in the target environment that you are interested in. In this case, you will only be able to access your target environment from your JupyterLab 83 | 1. If you prefer to be able to change your environment inside the JupyterLab instance, you need to do two things: 84 | - Decide which environment you want to use to launch JupyterLab from (this can be your `base` environment), and install `nb_conda_kernels` in that environment as following: 85 | ``` 86 | $ conda install nb_conda_kernels 87 | ``` 88 | - Make sure to install `ipykernel` in any other environment that you would like to be able to access inside JupyterLab 89 | ``` 90 | $ conda install ipykernel 91 | ``` 92 | 93 | ## JupyterLab Shortcuts 94 | 95 | There are several very useful and practical shortcuts for JupyterLab that improves your experience of working with it. You can see all of the default shortcuts after launching your JupyterLab byt navigating to Settings>Advanced Setting Editor>Keyboard Shortcuts. 96 | 97 | We will introduce some of the useful ones in the class. 98 | 99 | ## Exercise 100 | 101 | In this exercise you will be working with a CSV file that contains labels for image chips. An image chip is a small portion of a larger satellite image for which the labels are valid for. A typical image chip is 256px by 256px. Each row in this CSV represents a single image chip’s metadata and contains the chip’s land cover types (as a list) and the datetime for which the label is valid. 102 | 103 | Download the CSV file from [here](../files/landcover_chips.csv). The CSV has the following columns: 104 | - `id` as Integer 105 | - `landcover` as String Array 106 | - `datetime` in ISO 8601 Timestamp 107 | 108 | You are asked to do the following in a Jupyter Notebook: 109 | - Load the CSV as a dataframe 110 | - Return the number of chips for chips with a datetime between `2017-01-01T00:00:00Z` and `2017-12-31T11:59:59Z` 111 | - Plot a Bar Chart visualizing the total number of chips for each hour of the day 112 | - Return a list of unique land cover types present in the CSV 113 | 114 |

 

-------------------------------------------------------------------------------- /lectures/05_landscape.md: -------------------------------------------------------------------------------- 1 | # The Landscape of Geospatial Data and Tools 2 | 3 | This sections provides an overview of the geospatial analytics landscape including datasets, platforms, cloud repositories, Python packages and so on. The goal of this introductory section is to familiarize you with the rapidly evolving landscape for geospatial analytics. 4 | 5 | ## Background 6 | 7 | Geospatial sector is experiencing an evolution in the recent years. We can summarize the drivers of these changes in three categories: 8 | 9 | ```{figure} ../lectures/figures/change-drivers.png 10 | --- 11 | name: change-drivers 12 | class: bg-primary mb-1 13 | width: 300px 14 | align: center 15 | --- 16 | Drivers of change in geospatial technology. 17 | ``` 18 | 19 | 1. **Data Acquisition**: 20 | 21 | Earth Observation (EO) satellites were historically launched and operated by government agencies such as NASA, USGS, ESA, JAXA, ISRO, and others. While there were a few commercial companies starting to operate EO satellites in early 2000s, it was still very limited in scope. However, in the last decade with the advancements in smallsat and cubesat technologies a large number of commercial companies have entered the market and launched new satellite constellations. This boom in commercial EO satellites include launch of satellites that do not carry multi-spectral sensors, and instead carry synthetic aperture radar (SAR), light detection and ranging (Lidar), or Hyperspectral sensors. 22 | 23 | {numref}`EOSDIS-archive` shows the growing archive of NASA satellite mission data as an example. and {numref}`maxar-umbra` shows collocated SAR and optical imagery. 24 | 25 | ```{figure} ../lectures/figures/EOSDIS-archive-2023.png 26 | --- 27 | name: EOSDIS-archive 28 | class: bg-primary mb-1 29 | width: 800px 30 | align: center 31 | --- 32 | Growing archive of NASA satellite mission data [source [NASA Earth Data](https://www.earthdata.nasa.gov/technology/open-science)] 33 | ``` 34 | 35 | ```{figure} ../lectures/figures/maxar-umbra.png 36 | --- 37 | name: maxar-umbra 38 | class: bg-primary mb-1 39 | width: 800px 40 | align: center 41 | --- 42 | Collocated SAR and optical imagery provided by Umbra and Maxar. [source: [Umbra](https://umbra.space/blog/maxar-secures-dedicated-access-to-umbras-sar-constellation)] 43 | ``` 44 | 45 | 2. **Data Access** 46 | 47 | These new modes of data acquisition have resulted in new and innovate ways for storing, sharing and accessing these data. The traditional way of downloading all your imagery to your local machine (desktop or server) and running analysis on it does not scale anymore. It is more efficient, and in some case it's the only possible solution, to bring the computation next to your data storage. The advancements in cloud technology has been a major help in changing how we access geospatial data these days. 48 | 49 | The new mode of access also requires new data formats that are "cloud-native" and better ways of cataloging the data so users can query and find relevant data. Read more about cloud-native geospatial data [here](https://guide.cloudnativegeo.org/). You can also follow the Cloud-Native Geospatial Forum ([CNG](https://cloudnativegeo.org/)) to stay up to date with the developments in this community. 50 | 51 | Finally, these changes are accompanied by development of new APIs and data catalog standards that facilitate query and access of the data. 52 | 53 | 3. **AI and Advanced Analytics** 54 | 55 | AI is revolutionizing how we consume data and what insights we can derive from data. Geospatial data is no exception in this case. There are numerous applications that are either enhanced by the use of AI or are purely enabled by them. Here is a non-inclusive list of such applications 56 | - Mapping non-forest trees in West Africa Sahara ([link](https://www.nature.com/articles/s41586-020-2824-5)) 57 | - Mapping schools from space ([link](https://developmentseed.org/blog/2021-03-18-ai-enabling-school-mapping)) 58 | - Mapping land use and land cover ([link](https://dynamicworld.app/)) 59 | - Mapping unmapped population ([link](https://rampml.global/)) 60 | - Mapping Africa's croplands ([link](https://mappingafrica.io/)) 61 | 62 | These advancements require a new set of tools and pipeline for processing geospatial data, and hence have been one of the main drivers of changes in the geospatial Python landscape. 63 | 64 | 65 | In the following sections, we will learn more about the geospatial technology landscape and in particular Python packages that facilitate usage and manipulation of geospatial data. 66 | 67 | ## Cloud-native Data Formats 68 | 69 | Geospatial data is broadly categorized as vectors and rasters. With the changes that we talked about in the previous section, the file formats to store and access these data have evolved too. Let's look at a similar example: back in the day you would buy/rent a DVD/CD or even cassette to watch a movie on your TV at home. Nowadays, you simply log into a website or app and "stream" the same type of content (with even higher image quality). Geospatial data is going through a similar change. You don't want to download all the satellite images for your application, rather access them where they are stored and only load portions of the data needed for your analysis. Let's deep dive into these concepts for each of raster and vector formats. {numref}`CNG-formats` shows geospatial data formats for various data types. 70 | 71 | 72 | ```{figure} https://guide.cloudnativegeo.org/images/cogeo-formats-table.png 73 | --- 74 | name: CNG-formats 75 | class: bg-primary mb-1 76 | width: 800px 77 | align: center 78 | --- 79 | Cloud optimized geospatial formats. [source: [CNG Guide](https://guide.cloudnativegeo.org/images/cogeo-formats-table.png)] 80 | ``` 81 | 82 | ### Raster Formats 83 | 84 | There are many formats for storing raster data. The most popular ones are: 85 | 86 | - **GeoTIFF**: This is a variant of TIFF (Tag Image File Format) format that is enriched with geospatial metadata. TIFF itself is the format mostly used by scanners and it has lossless compression schemes. In GeoTIFF files, the latitude and longitude at the edge of pixels is recorded in the header in addition to other metadata such as map projections, coordinate systems, datums, etc. 87 | 88 | - **JPEG2000**: This is a compress format for storing raster data that allows both lossy and lossless compressions. 89 | 90 | - **netCDF**: This is more that just a data format. According to UniData [NetCDF page](https://www.unidata.ucar.edu/software/netcdf/): *NetCDF (network Common Data Form) is an interface for array-oriented data access and a library that provides an implementation of the interface. The netCDF library also defines a machine-independent format for representing scientific data. Together, the interface, library, and format support the creation, access, and sharing of scientific data.* NetCDF is a widely used format for storing multi-dimensional geospatial data such as outputs of climate models. 91 | 92 | - **Cloud Optimized GeoTIFF (COG)**: This is the cloud optimized version of GeoTIFF, and based on [definition](https://www.cogeo.org/): *is a regular GeoTIFF file, aimed at being hosted on a HTTP file server, with an internal organization that enables more efficient workflows on the cloud. It does this by leveraging the ability of clients issuing ​HTTP GET range requests to ask for just the parts of a file they need.* 93 | 94 | > Cloud Optimized GeoTIFF relies on two complementary pieces of technology: The first is the ability of a GeoTIFF to not only store the raw pixels of the image, but to also organize those pixels in particular ways. The second is HTTP GET range requests, that let clients ask for just the portions of a file that they need. Together these enable fully online processing of data by COG-aware clients, as they can **stream the right parts of the GeoTIFF** as they need it, instead of having to download the whole file. 95 | 96 | We will learn more about COGs in the next chapter. 97 | 98 | - **Zarr**: Zarr is a format for the storage of chunked, compressed, N-dimensional arrays inspired by HDF5, and NetCDF. From [Zarr Spec](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html): *The primary motivation for the development of Zarr is to address this challenge by enabling the storage of large multidimensional arrays in a way that is compatible with parallel and/or distributed computing applications.* 99 | 100 | ```{figure} ../lectures/figures/zarr.png 101 | --- 102 | name: zarr 103 | class: bg-primary mb-1 104 | width: 800px 105 | align: center 106 | --- 107 | Zarr data format used for storing n-dimensional arrays. An n-dimensional array is a great data model for dynamic phenomena. [source: [Working with Climate Data](https://kpegion.github.io/AOES-CLIM-WorkingWithData/figures/)] 108 | ``` 109 | 110 | ### Vector Formats 111 | 112 | Similar to rasters, there are multiple formats for storing vector data. Here are a couple of the most popular ones: 113 | 114 | - **Shapefile**: are a popular geospatial vector data format. Shapefile was defined by ESRI and it's supported by almost all geospatial software. Shapefile indeed refers to a collection files with the same filename but different extensions. Each shapefile contains four files: a .shp file, a .dbf file, a .shx file and a .prj file. These files store the the data itself, the location of the data, the index to share the data and the coordinate reference system. Such a way of storing the data make it hard to use on cloud object storages as the metadata about the data is stored in separate files and one has to access all of them to be able to query the dataset. 115 | 116 | - **GeoJSON**: GeoJSON is a format for encoding a variety of geographic data structures. From its [spec](https://geojson.org/): *GeoJSON supports the following geometry types: `Point`, `LineString`, `Polygon`, `MultiPoint`, `MultiLineString`, and `MultiPolygon`. Geometric objects with additional properties are `Feature` objects. Sets of features are contained by `FeatureCollection` objects.* 117 | 118 | Check out [this tutorial](https://tyson-swetnam.github.io/agic-2022/geojson/) on creating and using GeoJSON. 119 | 120 | 121 | - **GeoParquet**: GeoParquet is a new evolving data format building on the powerful Apache Parquet format to add interoperable geospatial types (Point, Line and Polygon) to Parquet. You can read more about it [here](https://geoparquet.org/#intro). 122 | 123 | ## Open-Access Data 124 | 125 | Geospatial data has gone through major changes in terms of access policy in the last two decades. These changes which has mostly resulted in more open-access and free data has been a push for technology development. 126 | 127 | **Note**: Open-access does not mean free by definition. Open-access refers to the license of the data which allows anyone to access it (sometimes this might be limited to certain usages like non-commercial). Open-access is an attribute of the data and does not mean the data is free or paid. 128 | 129 | One of best examples of the free data policy, is Landsat data. Landsat satellite data was not free until 2008, and it was in that year that US Geological Survey (USGS) made Landsat data accessible for free. This resulted in substantial downloads of Landsat imagery and expansion of applications, and geospatial tools (check out [this paper](https://doi.org/10.1016/j.rse.2019.02.016) for more details). By some accounts, later in 2014 putting Landsat data on AWS cloud resulted in development of COG. 130 | 131 | In recent years, there have been numerous developments to support publication and sharing of free datasets these include investments by government agencies in deploying data portals or providing grants to private institutions to do so, or by commercial companies who have allocated some of their resources to development and publication of open and free data. 132 | 133 | ## SpatioTemporal Asset Catalog (STAC) 134 | 135 | The STAC specification is a common language to describe geospatial information, so it can more easily be worked with, indexed, and discovered. At its core, the SpatioTemporal Asset Catalog (STAC) specification provides a common structure for describing and cataloging spatiotemporal assets. 136 | 137 | A *spatiotemporal asset* is any file that represents information about the earth captured in a certain space and time. 138 | 139 | STAC is intentionally designed to be simple, flexible, and extensible. STAC is a network of JSON files that reference other JSON files, with each JSON file adhering to a specific core specification depending on which STAC component it is describing. This core JSON format can also be customized to fit differing needs, making the STAC specification highly flexible and adaptable. Check out [this](https://stacspec.org/en/tutorials/intro-to-stac/) Intro to STAC guide to learn more about it. 140 | 141 | In this course, we will interact with STAC data catalogs to search for geospatial data and retrieve them. 142 | 143 | Another simple to use tool for STAC is [STAC Browser](https://radiantearth.github.io/stac-browser/#/). This browser retrieves a static catalog and you can browse it on the web. 144 | 145 | ## Cloud Data Repositories 146 | 147 | - Microsoft Planetary Computer ([link](https://planetarycomputer.microsoft.com/)) 148 | 149 | - Earth on AWS ([link](https://aws.amazon.com/earth/)) 150 | 151 | - Google Earth Engine ([link](https://earthengine.google.com/)) 152 | 153 | - Sentinel-Hub ([link](https://www.sentinel-hub.com/)) 154 | 155 | ## Python Landscape 156 | 157 | - Shapely 158 | - GDAL 159 | - Rasterio 160 | - GeoPandas 161 | - Xarray and rioxarray 162 | - sarpy 163 | - leafmap 164 | - Fiona 165 | - PyProj 166 | - Cartopy 167 | - SentinelHub-Py 168 | - satpy 169 | - geemap 170 | 171 |

 

-------------------------------------------------------------------------------- /lectures/06_crs.md: -------------------------------------------------------------------------------- 1 | # Review of Coordinate Reference Systems 2 | 3 | In this chapter, we will review Coordinate Reference Systems, and how to work with them in Python. 4 | 5 | **Attribution** 6 | *The content of this lecture are modified from three excellent sources: [Introduction to Geospatial Raster and Vector Data with Python](https://carpentries-incubator.github.io/geospatial-python/03-crs.html) from Software Carpentry; [Intro to Coordinate Reference Systems in Python](https://www.earthdatascience.org/courses/use-data-open-source-python/intro-vector-data-python/spatial-data-vector-shapefiles/intro-to-coordinate-reference-systems-python/) from Earth Lab CU Boulder; [Why do Coordinate Systems Matter](https://anythingmapping.com/2021/10/21/why-do-coordinate-systems-matter/) from Anything Mapping.* 7 | 8 | ## Coordinate Reference Systems (CRS) 9 | 10 | We use coordinate systems to define the location of objects in space. In a 2-dimensional space, such a system consists of two values (typically names X ad Y). In higher dimensional spaces, we define extra coordinates to represent each dimension. 11 | 12 | To define location of objects on the Earth surface, we also need to define a coordinate system. But in this case, the Earth surface is round and we need a coordinate system that adapts to its shape. When we make maps on paper or on a flat computer screen, we move from a 3-dimensional space (the globe) to a 2-dimensional space. 13 | 14 | A Coordinate Reference System (CRS) defines the “flattening” of data that exists in a 3-D globe space to a 2-D flat space. The CRS also defines the the coordinate system itself. For the purpose of this lecture, we will focus on the components of CRS, and how to define them in your dataset/code. 15 | 16 | There are numerous great resources to learn more about CRS and projections. You can check [this one](https://docs.qgis.org/3.28/en/docs/gentle_gis_introduction/coordinate_reference_systems.html) from QGIS documentation as an example. 17 | 18 | ```{figure} ../lectures/figures/what-is-a-crs.png 19 | --- 20 | name: crs 21 | class: bg-primary mb-1 22 | width: 700px 23 | align: center 24 | --- 25 | CRS defines the translation between a point on Earth surface and the same location on a flattened 2-D space (source: [Intro to Coordinate Reference Systems in Python](https://www.earthdatascience.org/courses/use-data-open-source-python/intro-vector-data-python/spatial-data-vector-shapefiles/intro-to-coordinate-reference-systems-python/)) 26 | ``` 27 | 28 | ## Components of a CRS 29 | 30 | ### Datum 31 | 32 | A model of the shape of the earth. It has angular units (i.e. degrees) and defines the starting point (i.e. where is [0,0]?) so the angles reference a meaningful spot on the earth. Common global datums are WGS84 and NAD83. The WGS84 system uses an Ellipsoid rather than a Geoid. This is a generalized model of the earth, rather than a model of how the earth actually is. This is an important thing to understand when working with Z elevation values in WGS84 because it’s only a representation of the world, not an actual model. 33 | 34 | ```{figure} ../lectures/figures/geoid-ellipsoid.png 35 | --- 36 | name: geoid-ellipsoid 37 | class: bg-primary mb-1 38 | width: 400px 39 | align: center 40 | --- 41 | Difference between geoid and the modeled ellipsoid of the Earth surface (source: [Why do Coordinate Systems matter](https://anythingmapping.com/2021/10/21/why-do-coordinate-systems-matter/)) 42 | ``` 43 | 44 | Datums can also be local - fit to a particular area of the globe, but ill-fitting outside the area of intended use. In this course, we will use the WGS84 datum which is the common datum for many datasets and applications around the world. 45 | 46 | 47 | ```{figure} ../lectures/figures/local-datum.png 48 | --- 49 | name: local-datum 50 | class: bg-primary mb-1 51 | width: 400px 52 | align: center 53 | --- 54 | Difference between a global and local ellipsoid (source: [Why do Coordinate Systems matter](https://anythingmapping.com/2021/10/21/why-do-coordinate-systems-matter/)) 55 | ``` 56 | 57 | 58 | ### Projection 59 | 60 | Projections is a mathematical transformation of the angular measurements on a round earth to a flat surface (i.e. paper or a computer screen). A common analogy employed to teach projections is the orange peel analogy. If you imagine that the Earth is an orange, how you peel it and then flatten the peel is similar to how projections get made. 61 | 62 | ```{figure} ../lectures/figures/orange-peel-earth.jpg 63 | --- 64 | name: orange-peel-crs 65 | class: bg-primary mb-1 66 | width: 500px 67 | align: center 68 | --- 69 | Projection example using an orange peel (credit: Prof. Drika Geografia, [Projeções Cartográficas](http://profdrikageografia.blogspot.com/2010_12_01_archive.html)) 70 | ``` 71 | 72 | ### Coordinate System 73 | 74 | This is the the X, Y grid upon which the data is overlaid and how you define where a point is located in space. 75 | 76 | ### Horizontal and vertical units 77 | 78 | The units used to define the grid along the x, y (and z) axis. These will be in the unis defined by the coordinate system of the CRS. 79 | 80 | ### Additional Parameters 81 | 82 | Additional parameters are often necessary to create the full coordinate reference system. One common additional parameter is a definition of the center of the map. The number of required additional parameters depends on what is needed by each specific projection. 83 | 84 | ```{admonition} Further Reading 85 | If you like to read more about CRS, and a *problem-based guide* of common CRS issues, check out the [**I Hate Coordinate Systems!**](https://ihatecoordinatesystems.com/) blog. 86 | ``` 87 | 88 | ## Defining a CRS 89 | 90 | There are several common systems in use for storing and transmitting CRS information, as well as translating among different CRSs. These systems generally comply with ISO 19111. Common systems for describing CRSs include EPSG, OGC WKT, and PROJ strings. 91 | 92 | 93 | ### EPSG 94 | 95 | The [EPSG system](https://epsg.org/home.html) is a database of CRS information maintained by the International Association of Oil and Gas Producers. The dataset contains both CRS definitions and information on how to safely convert data from one CRS to another. Using EPSG is easy as every CRS has an integer identifier, e.g. WGS84 is EPSG:4326. The downside is that you can only use the CRSs defined by EPSG and cannot customize them (some datasets do not have EPSG codes). [epsg.io](https://epsg.io/) is an excellent website for finding suitable projections by location or for finding information about a particular EPSG code. 96 | 97 | ### Well-Known Text (WKT) 98 | 99 | The [Open Geospatial Consortium](https://www.ogc.org/) WKT standard is used by a number of important geospatial apps and software libraries. WKT is a nested list of geodetic parameters. The structure of the information is defined on their [website](https://www.opengeospatial.org/standards/wkt-crs). WKT is valuable in that the CRS information is more transparent than in EPSG, but can be more difficult to read and compare than PROJ since it is meant to necessarily represent more complex CRS information. Additionally, the WKT standard is implemented inconsistently across various software platforms, and the spec itself has some [known issues](https://gdal.org/tutorials/wktproblems.html). 100 | 101 | ### PROJ 102 | 103 | [PROJ](https://proj4.org/) is an open-source library for storing, representing and transforming CRS information. PROJ strings continue to be used, but the format [is deprecated by the PROJ C maintainers](https://proj.org/faq.html#what-is-the-best-format-for-describing-coordinate-reference-systems) due to inaccuracies when converting to the WKT format. The data and python libraries we will be working with in this class use different underlying representations of CRSs under the hood for reprojecting. CRS information can still be represented with EPSG, WKT, or PROJ strings without consequence, but **it is best to only use PROJ strings as a format for viewing CRS information, not for reprojecting data**. 104 | 105 | PROJ represents CRS information as a text string of key-value pairs, which makes it easy to read and interpret. 106 | 107 | A PROJ4 string includes the following information: 108 | 109 | - **proj**: the projection of the data 110 | - **zone**: the zone of the data (this is specific to the UTM projection) 111 | - **datum**: the datum used 112 | - **units**: the units for the coordinates of the data 113 | - **ellps**: the ellipsoid (how the earth’s roundness is calculated) for the data 114 | 115 | *Note that the zone is unique to the UTM projection. Not all CRSs will have a zone.* 116 | 117 | ## Examples 118 | 119 | Here is the WGS84 CRS defined in the three formats we discussed: 120 | 121 | **EPSG** 122 | ``` 123 | EPSG:4326 124 | ``` 125 | 126 | **WKT** 127 | ``` 128 | GEOGCS["WGS 84", 129 | DATUM["WGS_1984", 130 | SPHEROID["WGS 84",6378137,298.257223563, 131 | AUTHORITY["EPSG","7030"]], 132 | AUTHORITY["EPSG","6326"]], 133 | PRIMEM["Greenwich",0, 134 | AUTHORITY["EPSG","8901"]], 135 | UNIT["degree",0.0174532925199433, 136 | AUTHORITY["EPSG","9122"]], 137 | AUTHORITY["EPSG","4326"]] 138 | ``` 139 | 140 | **PROJ** 141 | ``` 142 | +proj=longlat +datum=WGS84 +no_defs +type=crs 143 | ``` 144 | 145 | ## Tissot's indicatrix 146 | 147 | Developed by French mathematician Nicolas Auguste Tissot, Tissot's indicatrix characterizes local distortions due to map projection. A single indicatrix describes the distortion at a single point. Because distortion varies across a map, generally Tissot's indicatrices are placed across a map to illustrate the spatial change in distortion. 148 | 149 | 150 | ```{figure} ../lectures/figures/Tissot_mercator.png 151 | --- 152 | name: tissot_mercator 153 | class: bg-primary mb-1 154 | width: 500px 155 | align: center 156 | --- 157 | The Mercator projection with Tissot's indicatrices (source: [Stefan Kühn](https://en.wikipedia.org/wiki/File:Tissot_mercator.png)) 158 | ``` 159 | 160 |

 

-------------------------------------------------------------------------------- /lectures/08_docker.md: -------------------------------------------------------------------------------- 1 | # Introduction to Docker 2 | 3 | In this chapter, you will learn about containerization and Docker as one of the platforms for containerization. 4 | 5 | **Attribution** 6 | *The content of this lecture are modified from three excellent sources: [Docker for beginners](https://docker-curriculum.com/); [Introduction to Docker Containers](https://learn.microsoft.com/en-us/training/modules/intro-to-docker-containers/2-what-is-docker) by Microsoft; and [Reproducible Computational Environments Using Containers: Introduction to Docker](https://carpentries-incubator.github.io/docker-introduction/) from Software Carpentry* 7 | 8 | ## Containerization 9 | 10 | In software development and deployment teams, containerization has become a regular practice that allows developers to package applications and their dependencies into lightweight, portable units known as containers. Docker is one of the most popular containerization platforms. 11 | 12 | Containers are self-sufficient units that package an application and all its required dependencies, including libraries and configuration files, into a single image. These containers can run consistently across different environments, ensuring that an application behaves the same way whether it's in a developer's laptop or a production server. Containers do not have high overhead like virtual machine (VM), and hence enable more efficient usage of the underlying system and resources. 13 | 14 | ## Why Containers? 15 | 16 | Containerization has several benefits: 17 | 18 | 1. **Portability:** Containers encapsulate everything an application needs to run, making it easy to move between different environments without modification. 19 | 1. **Consistency:** Containers ensure that the application runs the same way everywhere, eliminating the "it works on my machine" problem. 20 | 1. **Isolation:** Containers provide process and resource isolation, ensuring that one container cannot interfere with another, improving security and stability. 21 | 1. **Scalability and Speed:** Containers can be easily scaled up or down to meet changing workloads and the speed to deploy them is much higher than VMs. 22 | 1. **Resource Efficiency:** Containers are lightweight and share the host OS kernel, making them more resource-efficient than traditional virtualization. While containers live on top of a host machine and use its resources, they virtualize the host OS unlike VMs that virtualize the underlying hardware. Meaning containers don’t need to have their own OS, making them much more lightweight than VMs, and consequently quicker to spin up. 23 | 24 | ## What is Docker? 25 | 26 | Docker is a leading containerization platform that has played a major role in popularizing containers. Docker is open-source and it provides a set of tools and services for creating, deploying, and managing containers. 27 | 28 | ## Docker Components 29 | 1. **Docker Engine:** The core component of Docker that acts as a client-server application. It includes: 30 | - Docker daemon (`dockerd`) which acts as the server and responds to requests from the client. 31 | - Docker REST API for communication, and 32 | - Docker client which has two alternatives: the command line interface (CLI) named `docker` and the graphical user interface (GUI) application named Docker Desktop. 33 | 2. **Images:** A snapshot of a file system with the application code and all dependencies needed to run it. Images are used to create containers. 34 | 3. **Containers:** An instance of a Docker image that can run a specific application. Containers are isolated from each other and share the host OS kernel. 35 | 4. **Docker Hub:** A registry of Docker images containing all available Docker images ([link](https://hub.docker.com/)). 36 | 37 | 38 | ```{figure} ../lectures/figures/docker-engine.svg 39 | --- 40 | name: docker-engine 41 | class: bg-primary mb-1 42 | width: 700px 43 | align: center 44 | --- 45 | Different components of Docker. [source: [Introduction to Docker Containers](https://learn.microsoft.com/en-us/training/modules/intro-to-docker-containers/2-what-is-docker)] 46 | ``` 47 | 48 | ## Run Your First Docker Container 49 | 50 | There is a simple Docker image that you can run as a container and also verify that your Docker installation is correct. To do this, execute the following the command: 51 | ``` 52 | $ docker run hello-world 53 | ``` 54 | 55 | ## What is a Dockerfile? 56 | 57 | A Dockerfile is a text file that contains the instructions we use to build and run a Docker image. The following aspects of the image are defined: 58 | 59 | - The base or parent image we use to create the new image 60 | - Commands to update the base OS and install additional software 61 | - Build artifacts to include, such as a developed application 62 | - Services to expose, such as storage and network configuration 63 | - Command to run when the container is launched 64 | 65 | **Note**: A *base image* is an image that uses the Docker `scratch` image. The `scratch` image is an empty image that doesn't create a filesystem layer. This image assumes that the application you're going to run can directly use the host OS kernel. A *parent image* is an image from which you create your images. For example, instead of creating an image from `scratch` and then installing Ubuntu, we'll rather use an image already based on Ubuntu. 66 | 67 | ## Useful Docker CLI Commands 68 | 69 | 1. **Build an Image** 70 | 71 | You can use `docker build` command to build a Docker image from a Dockerfile as following: 72 | ``` 73 | $ docker build -t . 74 | ``` 75 | This command assumes you have a Dockerfile in the current directory. If your Dockerfile is not in the current directory, or if the file name is not *Dockerfile* you need to replace `.` with `-f `. 76 | 77 | 1. **List Images** 78 | 79 | You can list all images on your machine using the `docker images` command: 80 | ``` 81 | $ docker images 82 | ``` 83 | The output will be a table similar to the following (this is from Hamed's computer!): 84 | ``` 85 | REPOSITORY TAG IMAGE ID CREATED SIZE 86 | giswqs/segment-geospatial latest cd7db75a587c 3 weeks ago 5.34GB 87 | gfm-gap latest 93ca820ec782 2 months ago 2.73GB 88 | cdl latest 9f1e0f6b1273 2 months ago 2.73GB 89 | hls latest 1d3452e331df 2 months ago 9.73GB 90 | lc-td latest 221b2866cb63 3 months ago 1.82GB 91 | ``` 92 | 93 | 1. **Remove an Image** 94 | 95 | You can remove an image using the `docker rmi` command. This can be used to free some space on your computer. You can specify the name or ID of an image as following (including the tag is optional): 96 | ``` 97 | $ docker rmi 98 | ``` 99 | 100 | 1. **Run a Container** 101 | 102 | You can run a container using the `docker run` command. You only need to specify the Docker image name or ID to launch a container from the image. 103 | ``` 104 | $ docker run 105 | ``` 106 | 107 | 1. **List Available Containers** 108 | 109 | You can use `docker ps` to list containers in *running* state. To see all containers in all states, pass the `-a` flag to the command: 110 | ``` 111 | $ docker ps -a 112 | ``` 113 | 114 | *To learn more about Docker container states, check out this [page](https://learn.microsoft.com/en-us/training/modules/intro-to-docker-containers/4-how-docker-containers-work).* 115 | 116 | 117 | 1. **Interrupt a Container** 118 | 119 | You can stop or restart a container using one of the following commands: 120 | ``` 121 | $ docker stop 122 | $ docker restart 123 | ``` 124 | 125 | 1. **Remove a Container** 126 | 127 | You can remove a container using the following command. Note that this will result in all the data in the container to be deleted. 128 | ``` 129 | $ docker rm 130 | ``` 131 | 132 | 133 | ## Create Your Own Dockerfile 134 | You can create your own Dockerfile with specific software and packages installed. This is a nice way to create a reproducible and portable runtime environment for your projects. 135 | Here is an example of a Dockerfile: 136 | ``` 137 | FROM continuumio/miniconda3:24.7.1-0 138 | 139 | # Set the working directory to /home/workdir 140 | RUN mkdir /home/workdir 141 | WORKDIR /home/workdir 142 | 143 | # Create a Conda env named 'myenv' with numpy installed in it 144 | RUN conda create -n myenv numpy=2.0.1 145 | 146 | CMD ["/bin/bash"] 147 | ``` 148 | 149 | So let's look what each of these commands mean: 150 | 151 | **FROM** 152 | 153 | Use the FROM command to specify the parent image that you want your image to derive from. Here, we’re using the `continuumio/miniconda3:24.7.1-0` image. 154 | 155 | **RUN** 156 | 157 | Use RUN to execute any shell command when the image is being built. Note that this is different from a command you want to execute when running the container. 158 | 159 | **WORKDIR** 160 | 161 | Sets the current working directory inside the container (like a `cd` command in shell). All subsequent commands in the Dockerfile will happen inside this directory. 162 | 163 | **CMD** 164 | 165 | This is the command instruction, it specifies what to run when the container is started. Here we’re simply setting the container run the `bash`. 166 | 167 | Other useful commands inside a Dockerfile are: 168 | 169 | **COPY** 170 | 171 | The COPY instruction has the following format: COPY . It copies files from (in the host) into (in the container). This is run at the time that the Docker image is being built, and the copied files are stored in the image (which means the files are not needed to be available when running the container.) 172 | 173 | 174 | ## Managing Storage in Docker 175 | 176 | All files created inside a container are stored on a writable container layer by default. This means that: 177 | 178 | - The data won't exist after the container is removed. 179 | - It would be difficult to access the data outside the container. 180 | - You can't easily move the data on the host machine. 181 | 182 | To address these challenges, Docker has a mechanism for containers to store files on the host machine. This means the files can be easily accessed by other processes outside the container, and they will persist after the container is removed. 183 | 184 | The easiest way to do this is to mount a directory on the host machine to the container using the following command: 185 | ``` 186 | $ docker run -v $(pwd):/home/workdir 187 | ``` 188 | In this example, current directory on host machine (`pwd`) is mounted to `/home/workdir/` inside the container. This means that any file and directory that is available inside the current directory on the host will be accessible at `/home/workdir/` inside the container. If you make changes to these files or directories on either the host or the container, it will be reflected on the other side (practically these are the same files stored on the host, but accessible from two separate places). It's best to only change the files from inside the container then to make sure your changes don't conflict each other. 189 | 190 | You can also mount a directory that doesn't exist on the host to a directory inside the container. For example running the following: 191 | ``` 192 | $ docker run -v /doesnt/exist:/home/workdir 193 | ``` 194 | will automatically create `/doesnt/exist` on the host before starting the container. 195 | 196 | 197 | ## Activating Conda Environments in Docker 198 | 199 | You can use conda inside Docker to manage packages and environments. To do that, you can use conda in the Dockerfile to create a new environment and install packages. However, if you want to activate the environment before starting the container you need to do some extra steps. 200 | 201 | Try building an image using the following Dockerfile: 202 | 203 | ``` 204 | FROM continuumio/miniconda3:24.7.1-0 205 | 206 | # Set the working directory to /home/workdir 207 | RUN mkdir /home/workdir 208 | WORKDIR /home/workdir 209 | 210 | # Create a Conda env named 'myenv' with numpy installed in it 211 | RUN conda create -n myenv numpy=2.0.1 212 | 213 | # Activate the conda environment 214 | RUN conda activate myenv 215 | 216 | CMD ["/bin/bash"] 217 | ``` 218 | 219 | As you noticed, the Docker build in this case fails. This is because Docker runs each command in a new shell, and the conda activate command needs to be run in the same shell where the environment was created. 220 | 221 | There are multiple ways to resolve this issue. One of them, which we recommend, is to add the `conda activate` command to your `.bashrc` file. `.bashrc` is a script file that is executed when a user logs in. In this case, any command included in the `.bashrc` will be executed when the container runs. Try building an image from the following Dockerfile and then run it as a container: 222 | 223 | ``` 224 | FROM continuumio/miniconda3:24.7.1-0 225 | 226 | # Set the working directory to /home/workdir 227 | RUN mkdir /home/workdir 228 | WORKDIR /home/workdir 229 | 230 | # Create a Conda env named 'myenv' with numpy installed in it 231 | RUN conda create -n myenv numpy=2.0.1 232 | 233 | # Activate the conda environment 234 | RUN echo "conda activate myenv" >> ~/.bashrc 235 | ENV PATH="$PATH:/opt/conda/envs/myenv/bin" 236 | 237 | CMD ["/bin/bash"] 238 | ``` 239 | 240 | ## Running Jupyter Notebooks Inside a Container 241 | 242 | You can install and run a Jupyter server inside the container the same way you would do inside a conda environment on your machine. There are a couple of steps you need to follow to make it accessible outside of the container though. 243 | 244 | **First**, you need to create a new user inside the container, and switch to that user. This is generally a good practice as you don't want to run the container as root user. 245 | 246 | **Second**, you need to expose port `8888` that is used by Jupyter server from the container. This allows any processor outside of the container to communicate with processes inside the container through port `8888`. 247 | 248 | **Third**, you need to include Jupyter Lab command at the end of your Dockerfile. For this, you need to pass an extra argument to set the IP of the server to `0.0.0.0` 249 | 250 | The following sample Dockerfile implements these three changes, and runs Jupyter Lab when the container is launched. 251 | ``` 252 | FROM continuumio/miniconda3:24.7.1-0 253 | 254 | # Create a Conda environment with JupyterLab installed 255 | RUN conda create -n myenv numpy=1.25.0 jupyterlab=3.6.3 256 | 257 | # Activate the Conda environment 258 | RUN echo "conda activate myenv" >> ~/.bashrc 259 | ENV PATH="$PATH:/opt/conda/envs/myenv/bin" 260 | 261 | # Create a non-root user and switch to that user 262 | RUN useradd -m jupyteruser 263 | USER jupyteruser 264 | 265 | # Set the working directory to /home/jupyteruser 266 | WORKDIR /home/jupyteruser 267 | 268 | # Expose the JupyterLab port 269 | EXPOSE 8888 270 | 271 | # Start JupyterLab 272 | CMD ["jupyter", "lab", "--ip=0.0.0.0"] 273 | ``` 274 | 275 | **Note**: The new user you create inside the container, in this case `jupyteruser`, can only access their home directory `/home/jupyteruser`. Therefore, you need to launch Jupyter from this working directory, and when you mount directories at the launch time you should mount them to `jupyteruser`'s home directory or any sub-directory in there. 276 | 277 | Finally, to run the container you should publish container's port `8888` to a port on the host (it can be the same `8888` if it's not being used otherwise): 278 | ``` 279 | $ docker run -it -p 8888:8888 280 | ``` 281 | 282 | Lastly, you can copy the url of the Jupyter server and past it in your browser to access Jupyter Lab. 283 | 284 | ## Working with Docker Hub 285 | 286 | So far you have learned how to build Docker images, run Docker containers, and use these to create reproducible work environments. Let's say you now want to share your Docker image with another colleague, or share it publicly for others to access. For this purpose, you can use a registry. Docker Hub is one of main registries for sharing Docker images. In this section, you will learn to push images to your Docker Hub account. 287 | 288 | Here are the steps to follow: 289 | 290 | 1. Create a Docker account. You can do this by selecting **Sign In** at the top-right corner of Docker Desktop Dashboard. 291 | 1. Create a new repository on your Docker Hub account. Open [Docker Hub](https://hub.docker.com) and select **Create repository**. Enter a Name and Description, and set the visibility to Public. 292 | 1. Now that you have a repository, you can build and push an image to this repository as following: 293 | - Build your image using the following command and swapping out `DOCKER_USERNAME` with your username and `IMAGE_NAME` with the name of the image/repository: 294 | ``` 295 | $ docker build -t DOCKER_USERNAME/IMAGE_NAME . 296 | ``` 297 | - Verify that the image has been built locally by running `docker images` or `docker image ls` command. 298 | - To push the image, use the docker push command (similarly replace `DOCKER_USERNAME` with your username and `IMAGE_NAME` with the name of the image/repository): 299 | ``` 300 | $ docker push DOCKER_USERNAME/IMAGE_NAME 301 | ``` 302 | 303 | Now that you have published a Docker image on Docker Hub you can use `docker pull` command to download that image into any machine that doesn't have the image. For this purpose, you need to run: 304 | 305 | ``` 306 | $ docker pull DOCKER_USERNAME/IMAGE_NAME 307 | ``` 308 | 309 | 310 | 311 | ```{admonition} Cleanup Commands 312 | It's good practice to remove unwanted Docker images and containers to cleanup your disk space and memory. You can use `prune` to do this as following: 313 | 314 | - `docker container prune` removes all stopped containers. 315 | - `docker image prune` removes all unused or dangling images (images that do not have a tag). 316 | - `docker system prune` removes all stopped containers, dangling images, and dangling build caches. 317 | ``` 318 | 319 | ```{Tip} 320 | You can consult this Docker CLI [cheatsheet](https://docs.docker.com/get-started/docker_cheatsheet.pdf) for a quick reference of its most used commands. 321 | ``` -------------------------------------------------------------------------------- /lectures/13_raster.md: -------------------------------------------------------------------------------- 1 | # Review of Raster Data 2 | 3 | In this lecture, we will review geospatial raster data structures. 4 | 5 | 6 | **Attribution** 7 | *The content of this lecture are modified from two excellent sources: [Introduction to Raster Data in Python](https://www.earthdatascience.org/courses/intro-to-earth-data-science/file-formats/use-spatial-data/use-vector-data/) from Earth Lab CU Boulder; and [Introduction to Raster Data](https://carpentries-incubator.github.io/geospatial-python/02-intro-vector-data.html) from Software Carpentry.* 8 | 9 | --- 10 | ## Geospatial Raster Data 11 | 12 | Raster data are a primary type of data for geospatial assets. Rasters are stored as a grid of values which are rendered on a map as pixels. Each pixel value represents an area on the Earth’s surface. 13 | 14 | ## What is a Raster? 15 | 16 | Raster data is any pixelated (or gridded) data where each pixel is associated with a specific geographic location. The value of a pixel can be continuous (e.g. elevation) or categorical (e.g. land use). If this sounds familiar, it is because this data structure is very common: it’s how we represent any digital image. A geospatial raster is only different from a digital photo in that it is accompanied by spatial information that connects the data to a particular location. This includes the raster’s extent and cell size, the number of rows and columns, and its coordinate reference system (or CRS). 17 | 18 | ```{figure} ../lectures/figures/rasters.png 19 | --- 20 | name: rasters 21 | width: 600px 22 | align: center 23 | --- 24 | Raster data concept (Source: National Ecological Observatory Network ([NEON](https://www.neonscience.org/resources/learning-hub/tutorials/introduction-working-raster-data-r#toggle-0))) 25 | ``` 26 | 27 | Some examples of continuous rasters include: 28 | 29 | - Precipitation maps. 30 | - Maps of tree height derived from LiDAR data. 31 | - Elevation values for a region. 32 | 33 | Some rasters contain categorical data where each pixel represents a discrete class such as a land cover type (e.g., “forest” or “grassland”) rather than a continuous value such as elevation or temperature. Some examples of classified maps include: 34 | 35 | - Land cover / land use maps. 36 | - Tree height maps classified as short, medium, and tall trees. 37 | - Elevation maps classified as low, medium, and high elevation. 38 | 39 | ## Resolution 40 | 41 | A resolution of a raster represents the area on the ground that each pixel of the raster covers. So, a 1 meter resolution raster, means that each pixel represents a 1 m by 1 m area on the ground. The image below illustrates the effect of changes in resolution. 42 | 43 | ```{figure} ../lectures/figures/raster_resolution.png 44 | --- 45 | name: raster_resolution 46 | width: 600px 47 | align: center 48 | --- 49 | Rasters can be stored at different resolutions. The resolution simply represents the size of each pixel cell. (Source: National Ecological Observatory Network ([NEON](https://www.neonscience.org/resources/learning-hub/tutorials/introduction-working-raster-data-r#toggle-0))) 50 | ``` 51 | 52 | ## Raster Bands 53 | 54 | A raster can contain one or more bands. One type of multi-band raster dataset that is familiar to many of us is a color image. A basic color image consists of three bands: red, green, and blue. Each band represents light reflected from the red, green or blue portions of the electromagnetic spectrum. The pixel brightness for each band, when composited, creates the colors that we see in an image. 55 | 56 | Another type of multi-band raster data is time series in which observations of the same variable (single band) over the same area are stacked together. 57 | 58 | We can plot each band of a multi-band image individually. Or we can composite all three bands together to make a color image. In a multi-band dataset, the rasters will always have the same extent, resolution, and CRS. 59 | 60 | ## NoData Values in Rasters 61 | Raster data often has a `NoDataValue` associated with it. This is a value assigned to pixels where data are missing or no data were collected. 62 | 63 | By default the shape of a raster is always square or rectangular. So if we have a dataset that has a shape that isn't square or rectangular, some pixels at the edge of the raster will have `NoDataValues`. This often happens when the data were collected by an airplane or satellite which only flew over some part of a defined region. 64 | -------------------------------------------------------------------------------- /lectures/16_vector.md: -------------------------------------------------------------------------------- 1 | # Review of Geospatial Vector Data 2 | 3 | In this lecture, we will review geospatial vector data structures. 4 | 5 | 6 | 7 | **Attribution** 8 | *The content of this lecture are modified from three excellent sources: [Introduction to Spatial Vector Data File Formats in Open Source Python](https://www.earthdatascience.org/courses/intro-to-earth-data-science/file-formats/use-spatial-data/use-vector-data/) from Earth Lab CU Boulder; [Introduction to Vector Data](https://carpentries-incubator.github.io/geospatial-python/02-intro-vector-data.html) from Software Carpentry; and [Overview of GeoJSON](https://tyson-swetnam.github.io/agic-2022/geojson/) from Cloud Native Data Workshop.* 9 | 10 | --- 11 | ## Geospatial Vector Data 12 | 13 | Vector data structures represent specific features on the Earth’s surface, and assign attributes to those features. Vectors are composed of discrete geometric locations (x, y values) known as vertices that define the shape of the spatial object. The organization of the vertices determines the type of vector that we are working with: point, line or polygon. 14 | 15 | - **Point**: Each point is defined by a single x, y coordinate and has a dimension of 0. Examples of point data include: sampling locations, the location of individual trees, or the location of survey plots. 16 | 17 | - **LineString**: LineString is composed of many (at least 2) points that are connected and has a dimension of 1. For instance, a road or a stream may be represented by a line. This line is composed of a series of segments, each “bend” in the road or stream represents a vertex that has a defined x, y location. 18 | 19 | - **Polygon**: A polygon consists of 3 or more vertices that are connected and closed and has a dimension of 2. The outlines of survey plot boundaries, lakes, oceans, and states or countries are often represented by polygons. 20 | 21 | 22 | ```{figure} ../lectures/figures/vectors.png 23 | --- 24 | name: vectors 25 | width: 500px 26 | align: center 27 | --- 28 | Types of vector objects (Source: National Ecological Observatory Network ([NEON](https://www.neonscience.org/resources/learning-hub/tutorials/intro-vector-data-r#toggle-0))) 29 | ``` 30 | 31 | If we have more than one vector data shape, you can create a `multiple` type. There three of these data: 32 | 33 | - **MultiPoint**: A `MultiPoint` geometry is represented by multiple coordinate point pairs 34 | 35 | - **MultiLineString**: 36 | 37 | - **MultiPolygon**: 38 | 39 | 40 | ## GeoJSON Schemas 41 | 42 | In this class, we will mostly use GeoJSONs for vector data formats (refer to the lecture on [The Landscape of Geospatial Data and Tools](../lectures/05_landscape.md) for more information about different formats.) You can represent any vector data type in GeoJSON format as described in the schemas in the following. Based on the latest [GeoJSON specification](https://datatracker.ietf.org/doc/html/rfc7946), all coordinates should be recorded using a geographic coordinate reference system, using the World Geodetic System 1984 (WGS 84) [WGS84] datum, with longitude and latitude units of decimal degrees. 43 | 44 | ```{dropdown} GeoJSON Point Schema 45 | ``` json 46 | { 47 | "type": "Point", 48 | "coordinates": [-112.4471, 34.5510] 49 | } 50 | ``` 51 | 52 | 53 | ```{dropdown} GeoJSON LineString Schema 54 | ``` json 55 | { 56 | "type": "LineString", 57 | "coordinates": [ 58 | [-112.4470, 34.5510], [-112.4695, 34.541] 59 | ] 60 | } 61 | ``` 62 | 63 | ```{dropdown} GeoJSON Polygon Schema 64 | ``` json 65 | { 66 | "type": "Polygon", 67 | "coordinates": [ 68 | [[-112.485, 34.529], [-112.445, 34.529], [-112.445, 34.559], [-112.485, 34.559], [-112.485, 34.529]] 69 | ] 70 | } 71 | ``` 72 | 73 | ```{dropdown} GeoJSON MultiPoint Schema 74 | ``` json 75 | { 76 | "type": "MultiPoint", 77 | "coordinates": [ 78 | [-112.4470, 34.5510], 79 | [-112.4695, 34.541] 80 | ] 81 | } 82 | ``` 83 | 84 | ```{dropdown} GeoJSON MultiLineString Schema 85 | ``` json 86 | { 87 | "type": "MultiLineString", 88 | "coordinates": [ 89 | [[-112.44708, 34.5510], [-112.46953, 34.540924]], 90 | [[-112.4471, 34.5510], [-112.4541,34.54447], [-112.46953, 34.540924]] 91 | ] 92 | } 93 | ``` 94 | 95 | ```{dropdown} GeoJSON MultiPolygon Schema 96 | ``` json 97 | { 98 | "type": "MultiPolygon", 99 | "coordinates": [ 100 | [ 101 | [[-112.0, 35.0], [-112.0, 34.0], [-113.0, 34.0], [-113.0, 35.0], [-112.0, 35.0]] 102 | ], 103 | [ 104 | [[-112.50, 35.50], [-112.50, 34.50], [-113.50, 34.50], [-113.50, 35.50], [-112.50, 35.50]] 105 | ], 106 | [ 107 | [[-111.50, 34.50], [-111.50, 33.50], [-112.50, 33.50], [-112.50, 34.50], [-111.50, 34.50]] 108 | ] 109 | ] 110 | } 111 | ``` 112 | 113 | ```{dropdown} GeoJSON GeometryCollection 114 | ``` json 115 | { 116 | "type": "GeometryCollection", 117 | "geometries": [{ 118 | "type": "Point", 119 | "coordinates": [-112.4471, 34.5510] 120 | }, { 121 | "type": "LineString", 122 | "coordinates": [ 123 | [-112.4470, 34.5510], [-112.4695, 34.541] 124 | ] 125 | }] 126 | } 127 | ``` 128 | 129 | ```{dropdown} GeoJSON Feature 130 | ``` json 131 | { 132 | "type": "Feature", 133 | "geometry": { 134 | "type": "Point", 135 | "coordinates": [-112.4470, 34.5510] 136 | }, 137 | "properties": { 138 | "name": "AGIC Venue" 139 | } 140 | } 141 | ``` 142 | 143 | ```{dropdown} GeoJSON FeatureCollection Point 144 | ``` json 145 | { 146 | "type": "FeatureCollection", 147 | "features": [ 148 | { 149 | "type": "Feature", 150 | "properties": { 151 | "Location Name" : "Prescott Resort and Conference Center" 152 | }, 153 | "geometry": { 154 | "type": "Point", 155 | "coordinates": [ 156 | -112.44677424430846, 157 | 34.55109119815299 158 | ] 159 | } 160 | } 161 | ] 162 | } 163 | ``` 164 | 165 | ```{dropdown} GeoJSON FeatureCollection with multiple attributes 166 | ``` json 167 | { 168 | "type": "FeatureCollection", 169 | "features": [ 170 | { 171 | "type": "Feature", 172 | "properties": {}, 173 | "geometry": { 174 | "type": "LineString", 175 | "coordinates": [ 176 | [ 177 | -112.44717121124268, 178 | 34.551069106918945 179 | ], 180 | [ 181 | -112.45414495468138, 182 | 34.54447682068866 183 | ], 184 | [ 185 | -112.46953010559082, 186 | 34.540924192549795 187 | ] 188 | ] 189 | } 190 | }, 191 | { 192 | "type": "Feature", 193 | "properties": {}, 194 | "geometry": { 195 | "type": "Point", 196 | "coordinates": [ 197 | -112.44691371917723, 198 | 34.551210490715455 199 | ] 200 | } 201 | } 202 | ] 203 | } 204 | ``` 205 | 206 | ## Properties 207 | 208 | Vector data has some important advantages: 209 | 210 | - The geometry itself contains information about what the dataset creator thought was important 211 | - The geometry structures hold information in themselves - why choose point over polygon, for instance? 212 | - Each geometry feature can carry multiple attributes instead of just one, e.g. a database of cities can have attributes for name, country, population, etc 213 | - Data storage can be very efficient compared to rasters 214 | 215 | The downsides of vector data include: 216 | 217 | - Potential loss of detail compared to raster 218 | - Potential bias in datasets - what didn’t get recorded? 219 | - Calculations involving multiple vector layers need to do math on the geometry as well as the attributes, so can be slow compared to raster math. 220 | 221 | 222 | ## Tools to Inspect and Manipulate GeoJSON Data 223 | 224 | JSON is a lightweight, text-based, language-independent data interchange format. As a result, you can create and edit GeoJSON files in any text editor software. 225 | 226 | You can open these files in VS Code. There are multiple extensions in VS Code (such as *Geo Data Viewer*) which you can use to visualize GeoJSON in VS Code. You can certainly open then in QGIS as well. 227 | 228 | In the next lecture, you will learn how to work with geospatial vector data in Python. 229 | 230 | 231 |

 

-------------------------------------------------------------------------------- /lectures/22_workflows_intro.md: -------------------------------------------------------------------------------- 1 | # Designing and Automating Data Workflows in Python 2 | 3 | **Attribution** 4 | *This lecture is designed based on great resources available on [Earth Data Science Workflows](https://www.earthdatascience.org/courses/use-data-open-source-python/earth-data-science-workflows/) from Earth Lab CU Boulder.* 5 | 6 | Designing workflows to process and create outputs for many large files and datasets is a key skill in Earth Data Analytics. Yet it is something that most scientists and Earth analysts learn on the fly at some point in their careers. Now that you have learned various Python packages for geospatial data processing, we are going to discuss the ways you would go after designing and automating your data workflows. 7 | 8 | In this chapter, you will learn how to identify key steps in designing a data workflow and how to write pseudo code to outline data workflow. You will also learn the ways you can modularize your code and test it. We will use examples of geospatial data workflows to walk through different steps of designing these workflows. 9 | 10 | The outline below identifies several key steps associated with designing a workflow that can help you structure your thinking and develop an effective design. 11 | 12 | ## Identify the Problem, Challenge or Question 13 | 14 | To begin you need to clarify the question(s) or challenge(s) that you need to address. Knowing the specific problem that you need to solve will help to set bounds on what your workflow does and does not need to do. 15 | 16 | In this chapter, we will use the example of calculating a time series of normalized difference vegetation index (NDVI) to investigate ecological processes and changes. 17 | 18 | NDVI is usually used to study changes due to a disturbance such as a flood or a fire. But NDVI can also be used to understand seasonality - when an area begins to “green-up” or grow after the winter cold period and “brown down” when vegetation in an area begins to die back in an area (usually in the fall and winter). You can calculate NDVI for each month of the year (or even for several years) to understand seasonality in a particular study area. 19 | 20 | Different areas in different parts of the world have different seasonal patterns. For example, the NEON San Joaquin Experimental Range (SJER) in California has an early green-up date on average, a short growing season due to hot temperatures and lack of precipitation and an early brown down. In contrast, the NEON Harvard Forest (HARV) site in Massachusetts has a longer growing season and a later green-up period. 21 | 22 | These two sites are ideal for comparisons of NDVI to identify differences in seasonality across sites and will be used as the study areas for the data workflow challenge. 23 | 24 | Changes in seasonality can be important indicators of ecological change. For example, if green-up begins earlier, such that fruit or seed resources become available sooner than average, animals that forage on the fruits in the spring must either migrate earlier to use these resources, or miss out on them if they do not adjust migration behavior. This phenomenon is referred to as a phenological mismatch. 25 | 26 | ## Identify the Data Needed to Address the Question or Challenge 27 | 28 | Once you have a question in mind, it is time to figure out your data requirements. Requirements are the qualities that your data need to have in order to address a particular question well. 29 | 30 | For our challenge of calculating a NDVI time series, the goal is to create a time series of NDVI for a study area across a year. 31 | 32 | As you want to explore seasonal patterns of “green-up” and “brown down”, you need data that is at least collected each month. 33 | 34 | What data do you select for your analysis? Consider temporal frequency of data collection and spatial resolution. 35 | 36 | For example, NAIP is collected every other year, so it is unlikely to provide good information on seasonality. 37 | 38 | In contrast, MODIS and Landsat may be more useful, because they are collected on daily and bi-monthly frequencies, respectively. 39 | 40 | In terms of resolution, MODIS pixels could be too large depending on the size of your study site. Perhaps Landsat has an ideal combination of temporal frequency, resolution and spatial coverage. 41 | 42 | 43 | ## Design Your Workflow 44 | 45 | Next it is time to design the workflow to create the output(s) that will address your question or challenge. 46 | 47 | To use the NDVI time series to understand seasonality, it can be helpful to create a plot that shows the NDVI values for each month for a site. In addition, you may want to produce some output files (e.g. CSV files) that you can share with others who may be interested in creating their own visualizations. You need to understand your users needs to better design the requirements for your output. 48 | 49 | For the NDVI time series challenge, you will design a workflow that has two outputs: 50 | 51 | 1. A plot of average NDVI (for each Landsat image) for 2 sites over a year. 52 | 2. A CSV file containing the average NDVI values (for each Landsat image) for both sites. 53 | 54 | Once you know your outputs, you can work backwards to determine the data and steps needed to create your final output(s). 55 | 56 | The design process can feel difficult, so it can be helpful to begin the design of the workflow using words (not code), a process known as pseudocoding. With pseudocode, you list out the inputs and outputs of the workflows and then identify the analytical steps needed to create the outputs from the inputs. 57 | 58 | Next lecture walks you through the pseudocoding process, so that you can use this method to easily design data workflows and more toward automating these workflows. 59 | 60 | 61 | ## Implement Your Workflow 62 | 63 | Once you have designed your workflow (using pseudocode first!), it is time to implement. This is where you can put all of your programming skills to the test. 64 | 65 | Try to write code that is clear, efficient, well documented and expressive. Remember that you never know when you may have to re-run or re-use an analysis, and you never know when someone else might need to use your code, too. 66 | 67 | Thus, your aim should be to write code that is readable, reproducible, and efficient. Be sure to: 68 | 69 | 1. Use clear, expressive names for objects, files, etc. Ask yourself if someone reading your code could guess what is contained in that object or file based on the name. 70 | 2. Use reproducible paths for input data and writing outputs (e.g. use `os.path.join()` to define paths, include code to create needed directories, etc). 71 | 3. Writing custom functions for repetitive tasks in your workflows. 72 | 4. Including checks (i.e. tests) in your code to ensure that it is doing what you think is! 73 | 74 | We reviewed some of these topics in our lecture on reproducible science, and more programming specific best practices are introduced in [Data Workflow Best Practices](../lectures/23_workflows_best_practices.ipynb). 75 | 76 |

 

-------------------------------------------------------------------------------- /lectures/23_workflows_pseudocode.md: -------------------------------------------------------------------------------- 1 | # Learn to Write Pseudocode for Python Programming 2 | **Attribution** 3 | *This lecture is designed based on great resources available on [Earth Data Science Workflows](https://www.earthdatascience.org/courses/use-data-open-source-python/earth-data-science-workflows/) from Earth Lab CU Boulder.* 4 | 5 | Pseudcode can help you design data workflows through listing out the individual steps of a workflow in plain language, so the focus is on the overall data process, rather than on the specific code needed. 6 | 7 | 8 | ## Design an Efficient Data Workflow Using Pseudocode 9 | 10 | You have now identified your challenge - to calculate and plot average normalized difference vegetation index (NDVI) for every Landsat scene (individually) across one year for two study locations: 11 | 12 | 1. San Joaquin Experimental Range (SJER) in Southern California, United States 13 | 2. Harvard Forest (HARV) in the Eastern United States 14 | 15 | You have been given a study area boundary for each site. Your next step is to write out the steps that you need to follow to get to your end goal plot and csv file. 16 | 17 | You know you will need to do the following: 18 | 19 | 1. Find Landsat scenes. 20 | 2. Access the scenes on the cloud or download them. 21 | 3. Read the data. 22 | 4. Calculate NDVI. 23 | 5. Save NDVI values to pandas dataframe with the associated date and site name 24 | 6. Plot NDVI time-series 25 | 7. Export NDVI values to a csv file 26 | 27 | Begin your process by pseudocoding - or writing out steps in plain English - everything that you will need to do in order to get to your end goal. The steps below will walk you through beginning this process of writing pseudocode. 28 | 29 | 30 | ## Begin With the Workflow for One Landsat Scene 31 | 32 | 1. Search for one Landsat scene across your area of interest (aoi) 33 | 34 | 2. Get a list of relevant files for the resulting Landsat scene. 35 | 36 | Recall that Landsat data is provided as a series of GeoTIFF files - one for each band (e.g. red, near-infrared) and one with the quality assurance information (e.g. cloud cover and shadows). 37 | 38 | ```{admonition} Steps required to get data for one Landsat scene for a site 39 | 1. Query an API to find the target Landsat scene 40 | 2. Get a list of files/assets for that scene. 41 | 3. Subset the list to just the files/assets required to calculate NDVI. 42 | ``` 43 | 44 | 3. Open and crop the data for one Landsat scene. 45 | 46 | Now that you have the steps to get the data for one Landsat scene, you can expand your pseudocode steps to include opening and cropping the data to the site boundary. 47 | 48 | ```{admonition} Steps required to process one Landsat scene for a site 49 | 1. Query an API to find the target Landsat scene 50 | 2. Get a list of files/assets for that scene. 51 | 3. Subset the list to just the files/assets required to calculate NDVI. 52 | 4. Open and crop the bands needed to calculate NDVI. 53 | ``` 54 | 55 | 4. Calculate NDVI for one Landsat scene. 56 | 57 | Last, you can expand your pseudocode to include using the bands that you opened and cropped to calculate NDVI. 58 | 59 | ```{admonition} Steps required to process one Landsat scene for a site 60 | 1. Query an API to find the target Landsat scene 61 | 2. Get a list of files/assets for that scene. 62 | 3. Subset the list to just the files/assets required to calculate NDVI. 63 | 4. Open and crop the bands needed to calculate NDVI. 64 | 5. Calculate average NDVI for that scene. 65 | ``` 66 | 67 | ## Expand Workflow to Include All Landsat Scenes for a Site 68 | 69 | You have now identified the steps required to process a single Landsat scene. Those steps now need to be repeated across all of the scenes for each site for a year. 70 | 71 | ```{admonition} Steps required to process all Landsat scenes for a site 72 | 1. Query an API to find all Landsat scenes for your aoi 73 | 2. For each scene, use the steps outlined previously for one scene to calculate NDVI for the data in that scene. 74 | 3. Save NDVI value and date for that scene (there are some steps here that you need to flesh out as well) to a list or dataframe that contains average NDVI for each scene at this site. 75 | 76 | ``` 77 | 78 | OK - now you are ready to put the workflow for a single site together. Of course, remember there are some sub steps that have not been fleshed out just yet, but start with the basics and build from there. 79 | 80 | 81 | ```{admonition} Steps required to process all Landsat scenes for a site 82 | 1. Query an API to find all Landsat scenes for your site 83 | 2. For each scene, use the steps outlined previously for one site to calculate NDVI: 84 | - Get a list of files/assets for that scene. 85 | - Subset the list to just the files/assets required to calculate NDVI. 86 | - Open and crop the bands needed to calculate NDVI. 87 | - Calculate average NDVI for that scene. 88 | 3. Save NDVI values and the date for that scene (there are some steps here that you need to flesh out as well) to a list or dataframe that contains average NDVI for each scene at this site. 89 | 4. Export dataframe with mean NDVI values to csv. (i.e. a “data product” output that you can share with others) 90 | 5. Plot the NDVI for each Landsat scene 91 | ``` 92 | 93 | ## Add Multiple Sites Worth of Data to Your Workflow 94 | 95 | In the previous section, you began to think about the steps associated with creating a workflow for: 96 | 1. A single Landsat scene. 97 | 2. A set of Landsat scenes for a particular site. 98 | 99 | But you want to do this for two or more sites so you need to design a workflow that allows for two or more sites as input. Add an additional layer to your pseudo code: 100 | 101 | ```{admonition} Modular workflow for many sites 102 | - Get a list of all the sites. 103 | - For each site, use the steps outlined previously for one site to calculate NDVI: 104 | 1. Query an API to find all Landsat scenes for that site 105 | 2. For each scene, use the steps outlined previously for one scene to calculate NDVI for the data in that scene: 106 | - Get a list of files/assets for that scene. 107 | - Subset the list to just the files/assets required to calculate NDVI. 108 | - Open and crop the bands needed to calculate NDVI. 109 | - Calculate average NDVI for that scene. 110 | 3. Save NDVI values and the date for that scene (there are some steps here that you need to flesh out as well) to a list or dataframe that contains average NDVI for each scene at this site. 111 | 4. Export dataframe with mean NDVI values to csv. (i.e. a “data product” output that you can share with others) 112 | 5. Plot the NDVI for each Landsat scene 113 | ``` 114 | 115 | You have now designed a workflow using pseudocode to process several sites worth of landsat data. 116 | 117 | Of course, the pseudocode above is just beginning. For each of the steps above, you need to flesh out how you can accomplish each task. 118 | 119 | The next lesson in this chapter focuses on data workflow best practices that can help you implement your workflow efficiently and effectively. 120 | 121 |

 

-------------------------------------------------------------------------------- /lectures/figures/EOSDIS-archive-2023.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HamedAlemo/advanced-geo-python/adf939d2cc1e722329ee64ff0eb701c8b3919837/lectures/figures/EOSDIS-archive-2023.png -------------------------------------------------------------------------------- /lectures/figures/Tissot_mercator.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HamedAlemo/advanced-geo-python/adf939d2cc1e722329ee64ff0eb701c8b3919837/lectures/figures/Tissot_mercator.png -------------------------------------------------------------------------------- /lectures/figures/change-drivers.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HamedAlemo/advanced-geo-python/adf939d2cc1e722329ee64ff0eb701c8b3919837/lectures/figures/change-drivers.png -------------------------------------------------------------------------------- /lectures/figures/final-doc-phd-comics.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HamedAlemo/advanced-geo-python/adf939d2cc1e722329ee64ff0eb701c8b3919837/lectures/figures/final-doc-phd-comics.gif -------------------------------------------------------------------------------- /lectures/figures/geoid-ellipsoid.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HamedAlemo/advanced-geo-python/adf939d2cc1e722329ee64ff0eb701c8b3919837/lectures/figures/geoid-ellipsoid.png -------------------------------------------------------------------------------- /lectures/figures/git-add-commit.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HamedAlemo/advanced-geo-python/adf939d2cc1e722329ee64ff0eb701c8b3919837/lectures/figures/git-add-commit.png -------------------------------------------------------------------------------- /lectures/figures/git-url.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HamedAlemo/advanced-geo-python/adf939d2cc1e722329ee64ff0eb701c8b3919837/lectures/figures/git-url.png -------------------------------------------------------------------------------- /lectures/figures/jupyterlab.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HamedAlemo/advanced-geo-python/adf939d2cc1e722329ee64ff0eb701c8b3919837/lectures/figures/jupyterlab.png -------------------------------------------------------------------------------- /lectures/figures/local-datum.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HamedAlemo/advanced-geo-python/adf939d2cc1e722329ee64ff0eb701c8b3919837/lectures/figures/local-datum.png -------------------------------------------------------------------------------- /lectures/figures/maxar-umbra.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HamedAlemo/advanced-geo-python/adf939d2cc1e722329ee64ff0eb701c8b3919837/lectures/figures/maxar-umbra.png -------------------------------------------------------------------------------- /lectures/figures/miniconda-vs-anaconda.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HamedAlemo/advanced-geo-python/adf939d2cc1e722329ee64ff0eb701c8b3919837/lectures/figures/miniconda-vs-anaconda.png -------------------------------------------------------------------------------- /lectures/figures/orange-peel-earth.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HamedAlemo/advanced-geo-python/adf939d2cc1e722329ee64ff0eb701c8b3919837/lectures/figures/orange-peel-earth.jpg -------------------------------------------------------------------------------- /lectures/figures/raster_resolution.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HamedAlemo/advanced-geo-python/adf939d2cc1e722329ee64ff0eb701c8b3919837/lectures/figures/raster_resolution.png -------------------------------------------------------------------------------- /lectures/figures/rasters.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HamedAlemo/advanced-geo-python/adf939d2cc1e722329ee64ff0eb701c8b3919837/lectures/figures/rasters.png -------------------------------------------------------------------------------- /lectures/figures/unix_files.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HamedAlemo/advanced-geo-python/adf939d2cc1e722329ee64ff0eb701c8b3919837/lectures/figures/unix_files.png -------------------------------------------------------------------------------- /lectures/figures/vectors.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HamedAlemo/advanced-geo-python/adf939d2cc1e722329ee64ff0eb701c8b3919837/lectures/figures/vectors.png -------------------------------------------------------------------------------- /lectures/figures/what-is-a-crs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HamedAlemo/advanced-geo-python/adf939d2cc1e722329ee64ff0eb701c8b3919837/lectures/figures/what-is-a-crs.png -------------------------------------------------------------------------------- /lectures/figures/zarr.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HamedAlemo/advanced-geo-python/adf939d2cc1e722329ee64ff0eb701c8b3919837/lectures/figures/zarr.png -------------------------------------------------------------------------------- /logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HamedAlemo/advanced-geo-python/adf939d2cc1e722329ee64ff0eb701c8b3919837/logo.png -------------------------------------------------------------------------------- /references.bib: -------------------------------------------------------------------------------- 1 | --- 2 | --- 3 | 4 | @inproceedings{holdgraf_evidence_2014, 5 | address = {Brisbane, Australia, Australia}, 6 | title = {Evidence for {Predictive} {Coding} in {Human} {Auditory} {Cortex}}, 7 | booktitle = {International {Conference} on {Cognitive} {Neuroscience}}, 8 | publisher = {Frontiers in Neuroscience}, 9 | author = {Holdgraf, Christopher Ramsay and de Heer, Wendy and Pasley, Brian N. and Knight, Robert T.}, 10 | year = {2014} 11 | } 12 | 13 | @article{holdgraf_rapid_2016, 14 | title = {Rapid tuning shifts in human auditory cortex enhance speech intelligibility}, 15 | volume = {7}, 16 | issn = {2041-1723}, 17 | url = {http://www.nature.com/doifinder/10.1038/ncomms13654}, 18 | doi = {10.1038/ncomms13654}, 19 | number = {May}, 20 | journal = {Nature Communications}, 21 | author = {Holdgraf, Christopher Ramsay and de Heer, Wendy and Pasley, Brian N. and Rieger, Jochem W. and Crone, Nathan and Lin, Jack J. and Knight, Robert T. and Theunissen, Frédéric E.}, 22 | year = {2016}, 23 | pages = {13654}, 24 | file = {Holdgraf et al. - 2016 - Rapid tuning shifts in human auditory cortex enhance speech intelligibility.pdf:C\:\\Users\\chold\\Zotero\\storage\\MDQP3JWE\\Holdgraf et al. - 2016 - Rapid tuning shifts in human auditory cortex enhance speech intelligibility.pdf:application/pdf} 25 | } 26 | 27 | @inproceedings{holdgraf_portable_2017, 28 | title = {Portable learning environments for hands-on computational instruction using container-and cloud-based technology to teach data science}, 29 | volume = {Part F1287}, 30 | isbn = {978-1-4503-5272-7}, 31 | doi = {10.1145/3093338.3093370}, 32 | abstract = {© 2017 ACM. There is an increasing interest in learning outside of the traditional classroom setting. This is especially true for topics covering computational tools and data science, as both are challenging to incorporate in the standard curriculum. These atypical learning environments offer new opportunities for teaching, particularly when it comes to combining conceptual knowledge with hands-on experience/expertise with methods and skills. Advances in cloud computing and containerized environments provide an attractive opportunity to improve the effciency and ease with which students can learn. This manuscript details recent advances towards using commonly-Available cloud computing services and advanced cyberinfrastructure support for improving the learning experience in bootcamp-style events. We cover the benets (and challenges) of using a server hosted remotely instead of relying on student laptops, discuss the technology that was used in order to make this possible, and give suggestions for how others could implement and improve upon this model for pedagogy and reproducibility.}, 33 | author = {Holdgraf, Christopher Ramsay and Culich, A. and Rokem, A. and Deniz, F. and Alegro, M. and Ushizima, D.}, 34 | year = {2017}, 35 | keywords = {Teaching, Bootcamps, Cloud computing, Data science, Docker, Pedagogy} 36 | } 37 | 38 | @article{holdgraf_encoding_2017, 39 | title = {Encoding and decoding models in cognitive electrophysiology}, 40 | volume = {11}, 41 | issn = {16625137}, 42 | doi = {10.3389/fnsys.2017.00061}, 43 | abstract = {© 2017 Holdgraf, Rieger, Micheli, Martin, Knight and Theunissen. Cognitive neuroscience has seen rapid growth in the size and complexity of data recorded from the human brain as well as in the computational tools available to analyze this data. This data explosion has resulted in an increased use of multivariate, model-based methods for asking neuroscience questions, allowing scientists to investigate multiple hypotheses with a single dataset, to use complex, time-varying stimuli, and to study the human brain under more naturalistic conditions. These tools come in the form of “Encoding” models, in which stimulus features are used to model brain activity, and “Decoding” models, in which neural features are used to generated a stimulus output. Here we review the current state of encoding and decoding models in cognitive electrophysiology and provide a practical guide toward conducting experiments and analyses in this emerging field. Our examples focus on using linear models in the study of human language and audition. We show how to calculate auditory receptive fields from natural sounds as well as how to decode neural recordings to predict speech. The paper aims to be a useful tutorial to these approaches, and a practical introduction to using machine learning and applied statistics to build models of neural activity. The data analytic approaches we discuss may also be applied to other sensory modalities, motor systems, and cognitive systems, and we cover some examples in these areas. In addition, a collection of Jupyter notebooks is publicly available as a complement to the material covered in this paper, providing code examples and tutorials for predictive modeling in python. The aimis to provide a practical understanding of predictivemodeling of human brain data and to propose best-practices in conducting these analyses.}, 44 | journal = {Frontiers in Systems Neuroscience}, 45 | author = {Holdgraf, Christopher Ramsay and Rieger, J.W. and Micheli, C. and Martin, S. and Knight, R.T. and Theunissen, F.E.}, 46 | year = {2017}, 47 | keywords = {Decoding models, Encoding models, Electrocorticography (ECoG), Electrophysiology/evoked potentials, Machine learning applied to neuroscience, Natural stimuli, Predictive modeling, Tutorials} 48 | } 49 | 50 | @book{ruby, 51 | title = {The Ruby Programming Language}, 52 | author = {Flanagan, David and Matsumoto, Yukihiro}, 53 | year = {2008}, 54 | publisher = {O'Reilly Media} 55 | } -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | jupyter-book 2 | matplotlib 3 | numpy 4 | ghp-import --------------------------------------------------------------------------------