├── .DS_Store ├── .ipynb_checkpoints ├── Ch1_Python_for_EarthSciences-checkpoint.ipynb ├── Ch2_Intro_JypiterNotebook-checkpoint.ipynb ├── Ch3_Python_Basics-checkpoint.ipynb ├── Ch4a_Python_Tools-checkpoint.ipynb ├── Ch4b_Plotting_Tools-checkpoint.ipynb ├── Ch5_Satellite_Cloud-checkpoint.ipynb ├── Ch6_Ocean_Example-checkpoint.ipynb ├── Ch7_Atmosphere-checkpoint.ipynb ├── Ch7_Atmosphere_Example-checkpoint.ipynb ├── Ch8_Land_Example-checkpoint.ipynb ├── Python_Installation-checkpoint.md ├── README-checkpoint.md └── environment-checkpoint.yml ├── Ch1_Python_for_EarthSciences.ipynb ├── Ch2_Intro_JypiterNotebook.ipynb ├── Ch3_Python_Basics.ipynb ├── Ch4a_Python_Tools.ipynb ├── Ch4b_Plotting_Tools.ipynb ├── Ch5_Satellite_Cloud.ipynb ├── Ch6_Ocean_Example.ipynb ├── Ch7_Atmosphere_Example.ipynb ├── Ch8_Land_Example.ipynb ├── LICENSE ├── Notebooks_Results ├── .DS_Store ├── Ch6_Ocean_Example_Results.ipynb ├── Ch7_Atmosphere_Example_Results.ipynb └── Ch8_Land_Example_Results.ipynb ├── Python_Installation.md ├── README.md ├── data ├── .DS_Store ├── ERA5_wind10m_mon05.nc ├── HadISST_sst_2000-2020.nc ├── ndvi_feb2022.nc ├── sst_example.nc └── tmp.nc ├── environment.yml └── figures ├── .DS_Store ├── JupyterNotebook_example.png ├── Jupyter_Notebook_Dashboard.png ├── Jupyter_Notebook_Menus.png ├── data_structures.png ├── globe_data.png ├── jupyter_logo.png ├── map_base_January.png ├── map_base_July.png ├── python_logo.png ├── python_scientific_ecosystem.png └── xarray_logo.png /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/marisolgr/python_sat_tutorials/2ec5fe982ac03285dde32fd031a9c40219f25f4f/.DS_Store -------------------------------------------------------------------------------- /.ipynb_checkpoints/Ch1_Python_for_EarthSciences-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Chapter 1 - Python for Earth Sciences\n", 8 | "![](./figures/globe_data.png)\n", 9 | "\n", 10 | "***\n", 11 | "__Note:__ This in __not__ a tutorial on `Python` - there are many resources for that. The purpose of this tutorial is to learn, through examples, the necessary `Python` code and tools required to work with satellite data. Please see the __Resources__ section in each chapter to learn more about `Python` and libraries used.\n", 12 | "\n", 13 | "***\n", 14 | "\n", 15 | "\n", 16 | "\n", 17 | "`Python` is a well developed, easy to learn programming language that provides many advantages for a wide range of applications, including Earth Sciences':\n", 18 | "\n", 19 | "- It is __Open Source__: it is free, everybody can use it, and everybody can contribute to it\n", 20 | "- It is used by an enormous community of developers\n", 21 | "- It is __Modular__: it has all the libraries *(collection of programs or functions for a specific purpose)* you could possibly need; you do not need to install them all\n", 22 | "- __This means many libraries have been developed by and for (Earth) scientists__\n", 23 | "\n", 24 | "***\n", 25 | "## Python Scientific Ecosystem\n", 26 | "There is a number of libraries and data structures in `Python` that make it ideal for Earth Sciences. Because of Python's modular structure, new and specific libraries are built on, and take advantage of, more basic but well developed ones. For example, the __xarray__ library is not only developed on top of `Python`, but also uses the __SciPy__, __pandas__ and __matplotlib__ libraries, which are built on top of __NumPy__.\n", 27 | "\n", 28 | "\"Python\n", 29 | "\n", 30 | "***\n", 31 | "## How to use these tutorials\n", 32 | "\n", 33 | "There are different ways to program in `Python`. In this tutorial we use the web interface `Jupyter Notebook` (__Chapter 2__ gives a quick overview on how to use it). In __Chapter 3__ we will learn basic and necessary `Python` commands and data structures, and in __Chapters 4a & 4b__ we will learn the tools (libraries) we need (and that make `Python` ideal) for satellite data analysis.\n", 34 | "\n", 35 | "We only cover, through examples, the necessary and basic knowledge you need to be able to navigate the applications chapters, where we use project-like examples to illustrate how to acquire, analyze and visualize satellite data. You'll learn the capabilities of `Python`, and can use these examples to build your own.\n", 36 | "\n", 37 | "## Where to get help\n", 38 | "Beyond the __Resources__ links provided, like with everything these days: when in doubt, __google it!__ You'll find many useful pages and videos. One of the best Q&A resources, and many times the first link in a google search, it is [Stackoverflow](https://stackoverflow.com/). This site is an always evolving community in which people ask and answer coding questions. \n" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "***\n", 46 | "## Resources\n", 47 | "The Official page: [https://www.python.org/](https://www.python.org/)\n", 48 | "\n", 49 | "Some basics about Python: https://www.tutorialspoint.com/python/index.htm\n", 50 | "\n", 51 | "Python tutorial resouces in Chapter 3\n" 52 | ] 53 | }, 54 | { 55 | "cell_type": "code", 56 | "execution_count": null, 57 | "metadata": {}, 58 | "outputs": [], 59 | "source": [] 60 | } 61 | ], 62 | "metadata": { 63 | "kernelspec": { 64 | "display_name": "Python 3 (ipykernel)", 65 | "language": "python", 66 | "name": "python3" 67 | }, 68 | "language_info": { 69 | "codemirror_mode": { 70 | "name": "ipython", 71 | "version": 3 72 | }, 73 | "file_extension": ".py", 74 | "mimetype": "text/x-python", 75 | "name": "python", 76 | "nbconvert_exporter": "python", 77 | "pygments_lexer": "ipython3", 78 | "version": "3.7.6" 79 | } 80 | }, 81 | "nbformat": 4, 82 | "nbformat_minor": 4 83 | } 84 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/Ch2_Intro_JypiterNotebook-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Chapter 2 - Introduction to Jupyter Notebook\n", 8 | "In this tutorial we will use Jupyter Notebook as an interface for `Python`, therefore a quick intro is given. If you know are familiar with Jupyter Notebook or Jupyter Lab you can skip this tutorial. If you want to learn more, there are some useful __Resources__ at the bottom of this page.\n", 9 | "\n", 10 | "***\n", 11 | "\n", 12 | "## What is Jupyter Notebook?\n", 13 | "\n", 14 | "\n", 15 | "\n", 16 | "`Jupyter` is a project to develop open-source software, and `Jupyter` __Notebook__ is their first web-based and user-friendly interface between `Python` and the user. The new version is JupyterLab, but we will use __Notebook__ as the `mybinder` project uses it.\n", 17 | "\n", 18 | "`Jupyter` __Notebook__ is more than interactive coding - meaning you write and execute part of your code and receive immediate response. `Jupyter` __Notebook__ also allows you to add formatted text, equations and figures - this let you keep code, info, and results, including figures, neatly organized and visualize them on the same page. It also allows others to reproduce your results. \n", 19 | "\n", 20 | "\n", 21 | "\n", 22 | "***\n", 23 | "\n", 24 | "## Jupyter Notebook Elements\n", 25 | "\n", 26 | "__The dashboard:__ The main page you see in your browser when you open `Jupyter` __Notebook__, and it lists your files and directories. Clicking on a file will open it on a new tab. (__Note__ that `Jupyter` __Notebook__ `Python` files have the extension .ipynb)\n", 27 | "\n", 28 | "\n", 29 | "\n", 30 | "__Top menu__ (See below): Contains all commands you need to work on `Jupyter` __Notebook__ (we'll talk about the relevant ones below)\n", 31 | "\n", 32 | "__Icon menu:__ It is on top of your script. It provides quick access to commands.\n", 33 | "\n", 34 | "\n", 35 | "\n", 36 | "***\n", 37 | "\n", 38 | "## The Basics\n", 39 | "\n", 40 | "### Cells\n", 41 | "Scripts are divided into cells - code lines that are ran (executed) as a unit and will return the output or results from your code immediately. Cells are not independent of the rest: if you assigned or modified variables in a cell, you do have access to those in following ones.\n", 42 | "\n", 43 | "To run a cell you have few options:\n", 44 | "- Press __Shift-Return__ (run cell and move to the next one) or __Command-Return__ (run cell and stay in the current oone) \n", 45 | "- Click the Run __[>|]__ icon on the Icon menu\n", 46 | "- Go to the Top menu -> Cells -> Run Cells\n", 47 | "\n", 48 | "The selected cell is highlighted by a thick color side bar, and to edit the cell, you'll have to double click on it and a color outline will appear around the cell. \n", 49 | "\n", 50 | "\n", 51 | "### Types of cells: Code and Markdown\n", 52 | "__Markdown__ cells contain formatted text, like this cell, while __Code__ cells contain code to be executed and in many cases return an output. The text within a __Code__ cell shows different colors, depending on the function (operation, variable, etc).\n", 53 | "\n", 54 | "To change the type of cell:\n", 55 | "- On the __Icon menu__: in the middle (or the next to last element if running locally on your own computer) of this menu there is a drop-down menu that shows the type and allows you to change it\n", 56 | "- Press __Esc__ and then __m__ (to change to Markdown type) or __y__ (to change to Code type)\n", 57 | "\n", 58 | "Pressing __Esc__ allows you access to keyboard commands (you can see them all from the Top Menu -> Help -> Keyboard Shortcuts).\n", 59 | "\n", 60 | "### Copy-pasting, adding and deleting cells\n", 61 | "\n", 62 | "When you run the last cell using __Shift-Return__, a new cell is added automoatically bellow, but sometimes we need to add a cell in between cells, delete one, or make a copy of the current one. For this, you have two options:\n", 63 | "\n", 64 | "- Click on the __Icon menu__: __+__ for adding a cell, __scissors__ to cut, __( )__ to copy, and __clipboard__ to paste\n", 65 | "- Keyboard commands: press __Esc__ and then: __a__ or __b__ for inserting a cell (above or below), __d d__ to delete a cell, __c__ to copy, __x__ to cut, and __p__ to paste cell below. \n", 66 | "\n", 67 | "***\n", 68 | "__Test this:__ Double click in the next cell to edit mode and then run the cell using one of the options above. Also try changing between __Markdown__ and __Code__ modes, re-running the cell to see the differences." 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": null, 74 | "metadata": {}, 75 | "outputs": [], 76 | "source": [ 77 | "## __Test this:__\n", 78 | "### change the type of this cell between __Markdown__ and __Code__, and then run it to see the difference\n", 79 | "myvar = 5+6\n", 80 | "print(myvar)" 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "***\n", 88 | "## Housekeeping\n", 89 | "\n", 90 | "### Saving your work\n", 91 | "If you're working on a `binder` in the cloud, you might want to save any file you modify locally for further reference. To do so you could:\n", 92 | "\n", 93 | "- On the Icon Menu, somewhere in the middle, there is a Download button\n", 94 | "- In the Top Menu -> File -> Download as to make a local copy.\n", 95 | "\n", 96 | "### The Kernel\n", 97 | "\n", 98 | "Sometimes there is need to interrupt a running or frozen code. In this case you'll need to restart the __kernel__ (Top menu -> Kernel -> Restart) and then run all the cells again as it deletes all variables in memory.\n", 99 | "\n", 100 | "__Kernel__ is the process, individual for each notebook, that is running your code, interacting directly with `Python`.\n", 101 | "***" 102 | ] 103 | }, 104 | { 105 | "cell_type": "markdown", 106 | "metadata": {}, 107 | "source": [ 108 | "## Resources\n", 109 | "[Jupyter Notebook Documentation](https://jupyter-notebook.readthedocs.io/en/stable/)\n", 110 | "\n", 111 | "[The Official Jupyter and Jupyter Notebook/Lab page](https://jupyter.org/)\n", 112 | "\n", 113 | "_Two useful tutorials:_\n", 114 | "\n", 115 | "[One, on text mode, from Real Phyton](https://realpython.com/jupyter-notebook-introduction/)\n", 116 | "\n", 117 | "[Another on video mode](https://www.youtube.com/watch?v=HW29067qVWk)\n", 118 | "\n", 119 | "A cheat sheet is always a good reference: [Cheat Sheet](https://datacamp-community-prod.s3.amazonaws.com/48093c40-5303-45f4-bbf9-0c96c0133c40)" 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": null, 125 | "metadata": {}, 126 | "outputs": [], 127 | "source": [] 128 | } 129 | ], 130 | "metadata": { 131 | "kernelspec": { 132 | "display_name": "Python 3 (ipykernel)", 133 | "language": "python", 134 | "name": "python3" 135 | }, 136 | "language_info": { 137 | "codemirror_mode": { 138 | "name": "ipython", 139 | "version": 3 140 | }, 141 | "file_extension": ".py", 142 | "mimetype": "text/x-python", 143 | "name": "python", 144 | "nbconvert_exporter": "python", 145 | "pygments_lexer": "ipython3", 146 | "version": "3.7.6" 147 | } 148 | }, 149 | "nbformat": 4, 150 | "nbformat_minor": 4 151 | } 152 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/Ch4a_Python_Tools-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Chapter 4a - Python Tools: xarray\n", 8 | "\n", 9 | "Chapter 4, divided into two parts, will cover two libraries that are essential to satellite data analysis and visualization: __xarray__ and __matplotlib__. In Chapter 4a we will cover the basics of __xarray__ with examples, and in Chapter 4b we will make customized visualizations of data using __matplotlib__.\n", 10 | "\n", 11 | "Although we show complete examples here, we invite you to edit and rerun them to better grasp their functionality.\n", 12 | " \n", 13 | "***\n", 14 | "\n", 15 | "\n", 16 | "## xarray \n", 17 | " \n", 18 | "__xarray__ is an open source `Python` library designed to handle (read, write, analyze, visualize, etc.) sets of labeled multi-dimensional arrays and metadata common in _(Earth)_ sciences. Its data structure, the __Dataset__, is built to reflect a netcdf file. __xarray__ was built on top of the __pandas__ library, which processes labeled tabular data, inheriting several of its methods and functionality.\n", 19 | "\n", 20 | "For this reason, when importing __xarray__, we will also import __numpy__ and __pandas__, so we can use all their methods. \n", 21 | "\n", 22 | "__Test this:__ Run the next cell to import these libraries. We are importing them using their conventional nickname - although feel free to choose yours. Note that when you run an importing cell, no output is displayed other than a number betwen [ ] on the left side of the cell.\n" 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": null, 28 | "metadata": {}, 29 | "outputs": [], 30 | "source": [ 31 | "import numpy as np\n", 32 | "import pandas as pd\n", 33 | "import xarray as xr\n", 34 | "\n", 35 | "# this library helps to make your code execution less messy\n", 36 | "import warnings\n", 37 | "warnings.simplefilter('ignore') # filter some warning messages" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "### Reading and exploring Data Sets\n", 45 | " \n", 46 | "__Run the next cell:__ Let's start by reading and exploring the content of a `netcdf` file located locally. __It is so easy!__\n", 47 | "\n", 48 | "Once the content is displayed, you can click on the file and disk icons on the right to get more details on each parameter.\n", 49 | "\n", 50 | "Also note that the __data array__ or __variable__ _(SST)_ has 3 __dimensions__ _(latitude, longitude and time)_ , and that each dimension has a data variable (__coordinate__) associated with it. Each variable as well as the file as a whole has metadata denominated __attributes__." 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": null, 56 | "metadata": {}, 57 | "outputs": [], 58 | "source": [ 59 | "ds = xr.open_dataset('./data/HadISST_sst_2000-2020.nc') # read a local netcdf file\n", 60 | "ds.close() # close the file, so can be used by you or others. it is good practice.\n", 61 | "ds # display the content of the dataset object" 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": {}, 67 | "source": [ 68 | "__xarray__ can also read data online. We are going to learn how read data from the cloud in the application chapters, but for now, we will exemplify __xarray__ and `Python` capability of reading from an online file. __Run the next cell__ to do so." 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": null, 74 | "metadata": {}, 75 | "outputs": [], 76 | "source": [ 77 | "# assign a string variable with the url address of the datafile\n", 78 | "url = 'https://podaac-opendap.jpl.nasa.gov/opendap/allData/ghrsst/data/GDS2/L4/GLOB/CMC/CMC0.2deg/v2/2011/305/20111101120000-CMC-L4_GHRSST-SSTfnd-CMC0.2deg-GLOB-v02.0-fv02.0.nc'\n", 79 | "ds_sst = xr.open_dataset(url) # reads same way as local files!\n", 80 | "ds_sst" 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "### Visualizing data\n", 88 | " \n", 89 | "An image is worth a thousand _attributes_ ! Sometimes what we need is a quick visualization of our data, and __xrray__ is there to help. In __the next cells__, visualization for both opened datasets are shown. " 90 | ] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "execution_count": null, 95 | "metadata": {}, 96 | "outputs": [], 97 | "source": [ 98 | "ds_sst.analysed_sst.plot() # note that we needed to choose one of the variable in the Dataset to be displayed\n" 99 | ] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "execution_count": null, 104 | "metadata": {}, 105 | "outputs": [], 106 | "source": [ 107 | "ds.sst[0,:,:].plot() # we choose a time to visualize the spatial data (lat, lon) at that time (zero or the first time entry)\n" 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "#### Yes! it is that easy! \n", 115 | "Although we'll get more sophisticated in the Chapter 4b." 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "### Some basic methods of Dataset\n", 123 | " \n", 124 | "__xarray__ also lets you operate over the dataset in a simple way. Many operations are built as methods of the Dataset class that can be accessed by adding a `.` after the Dataset name. __Test this:__ In the next cell, we access the _averaging_ method to make a time series of sea surface temperature over the entire globe and display it. __All in one line!__" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": null, 130 | "metadata": {}, 131 | "outputs": [], 132 | "source": [ 133 | "ds.sst.mean(dim=['latitude','longitude']).plot() # select a variable and average it\n", 134 | "# over spatial dimensions, and plot the final result\n" 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": {}, 140 | "source": [ 141 | "### Selecting data\n", 142 | "\n", 143 | "Sometimes we want to visualize or operate only on a portion of the data. __In the next cell__ we demonstrate the method `.sel`, which selects data along dimensions, in this case specified as a range of the coordinates using the function _slice_." 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": null, 149 | "metadata": {}, 150 | "outputs": [], 151 | "source": [ 152 | "ds.sst.sel(time=slice('2012-01-01','2013-12-31')).mean(dim=['time']).plot() # select a period of time" 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": null, 158 | "metadata": {}, 159 | "outputs": [], 160 | "source": [ 161 | "ds.sst.sel(latitude=slice(50,-50)).mean(dim=['time']).plot() # select a range of latitudes. \n", 162 | "# note that we need to go from 50 to -50 as the laitude coordinate data goes from 90 to -90" 163 | ] 164 | }, 165 | { 166 | "cell_type": "markdown", 167 | "metadata": {}, 168 | "source": [ 169 | "Another useful way to select data is the method __.where__, which instead of selecting by a coordinate, selects using a condition over the data or the coordinates. __Test this:__ In the next cell we extract the _ocean mask_ contained in the NASA surface temperature dataset." 170 | ] 171 | }, 172 | { 173 | "cell_type": "code", 174 | "execution_count": null, 175 | "metadata": {}, 176 | "outputs": [], 177 | "source": [ 178 | "ds_sst.analysed_sst.where(ds_sst.mask==1).plot() # we select, using .where, the data in the variable 'mask' that is equal to 1, \n", 179 | "# applied it to the variable 'analysed_sst', and plot the data. \n", 180 | "# Try changing the value for mask - for example 2 is land, 8 is ice." 181 | ] 182 | }, 183 | { 184 | "cell_type": "markdown", 185 | "metadata": {}, 186 | "source": [ 187 | "### Operating between two Data Arrays\n", 188 | " \n", 189 | "__In the next__ example we compare two years of temperature. We operate over the same Data Array, but we averaging over 2015 in the first line, and over 2012 in the second line. Each `.sel` operation returns a new Data Array. We can subtract them by using simple `-`, since they have the same dimensions and coordinates. At the end, we just plot the result. __It is that simple!__" 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": null, 195 | "metadata": {}, 196 | "outputs": [], 197 | "source": [ 198 | "# comparing 2015 and 2012 sea surface temperatures\n", 199 | "(ds.sst.sel(time=slice('2015-01-01','2015-12-31')).mean(dim=['time'])\n", 200 | "-ds.sst.sel(time=slice('2012-01-01','2012-12-31')).mean(dim=['time'])).plot() # note that in this case i could split the line in two\n", 201 | "# makes it easier to read" 202 | ] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "metadata": {}, 207 | "source": [ 208 | "We will cover more examples of methods and operations over datasets in the following chapters. But if you want to learn more, and we recommend it, given the many awesome capabilities of xarray, please look at the __Resources__ section below. \n", 209 | "\n", 210 | "***\n", 211 | "\n", 212 | "### Saving your Datasets and DataArrays\n", 213 | "There is one more thing you should learn here. In the applications chapters we go from obtaining the data to analyzing and producing a visualization. But sometimes, we want to save the data we acquire to process later, in a different script, or in the same but not have to download it every time. \n", 214 | "\n", 215 | "__The next cell__ shows you how to do so in two simple steps:\n", 216 | "\n", 217 | "- Assign the outcome of an operation to a variable, which will be a new dataset or data array object\n", 218 | "- Save it to a new `netcdf` file" 219 | ] 220 | }, 221 | { 222 | "cell_type": "code", 223 | "execution_count": null, 224 | "metadata": {}, 225 | "outputs": [], 226 | "source": [ 227 | "# same operation as before, minus the plotting method\n", 228 | "my_ds = (ds.sst.sel(time=slice('2015-01-01','2015-12-31')).mean(dim=['time'])-ds.sst.sel(time=slice('2012-01-01','2012-12-31')).mean(dim=['time']))\n", 229 | "# save the new dataset `my_ds` to a file in the directory data\n", 230 | "my_ds.to_netcdf('./data/Global_SST_2015-2012.nc')\n", 231 | "# explore the content of `my_ds`. note that the time dimension does not exist anymore\n", 232 | "my_ds" 233 | ] 234 | }, 235 | { 236 | "cell_type": "markdown", 237 | "metadata": {}, 238 | "source": [ 239 | "*** \n", 240 | "\n", 241 | "## Resources\n", 242 | "\n", 243 | "[The __xarray__ official site](http://xarray.pydata.org/en/stable/).\n", 244 | "\n", 245 | "Great [introduction](https://www.youtube.com/watch?v=Dgr_d8iEWk4&t=908s) to __xarray__ capabilities.\n", 246 | "\n", 247 | "If you really want to dig deep watch this [video](https://www.youtube.com/watch?v=ww4EYv20Ucw).\n", 248 | "\n", 249 | "A step-by-step [guide](https://rabernat.github.io/research_computing_2018/xarray.html) to __xarray__ handling of netcdf files, and many of the methods seeing here, like `.sel` and `.where`.\n", 250 | "\n", 251 | "### More on:\n", 252 | "\n", 253 | "Sometimes, the best way to learn how to do something is go directly to the reference page for a function or method. There you can see what arguments, types of data, and outputs to expect. Most of the time, they have useful examples:\n", 254 | "\n", 255 | "- Method [__.where( )__](http://xarray.pydata.org/en/stable/generated/xarray.DataArray.where.html)\n", 256 | "\n", 257 | "- Method [__.sel( )__](http://xarray.pydata.org/en/stable/generated/xarray.DataArray.sel.html)\n", 258 | "\n", 259 | "- Method [__.mean( )__](http://xarray.pydata.org/en/stable/generated/xarray.DataArray.mean.html)\n" 260 | ] 261 | }, 262 | { 263 | "cell_type": "code", 264 | "execution_count": null, 265 | "metadata": {}, 266 | "outputs": [], 267 | "source": [] 268 | } 269 | ], 270 | "metadata": { 271 | "kernelspec": { 272 | "display_name": "Python 3 (ipykernel)", 273 | "language": "python", 274 | "name": "python3" 275 | }, 276 | "language_info": { 277 | "codemirror_mode": { 278 | "name": "ipython", 279 | "version": 3 280 | }, 281 | "file_extension": ".py", 282 | "mimetype": "text/x-python", 283 | "name": "python", 284 | "nbconvert_exporter": "python", 285 | "pygments_lexer": "ipython3", 286 | "version": "3.7.6" 287 | } 288 | }, 289 | "nbformat": 4, 290 | "nbformat_minor": 4 291 | } 292 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/Ch4b_Plotting_Tools-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Chapter 4b - Plotting Tools\n", 8 | "\n", 9 | "In this chapter we will learn to visualize data beyond a quick plot as in the previous chapter. We will present two examples (a time series plot and a map) using the libraries __matplotlib__ and __cartopy__. \n", 10 | "\n", 11 | "***\n", 12 | "\n", 13 | "Let's start by importing the pertinent libraries." 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": null, 19 | "metadata": {}, 20 | "outputs": [], 21 | "source": [ 22 | "# basic libraries\n", 23 | "import numpy as np\n", 24 | "import pandas as pd\n", 25 | "import xarray as xr\n", 26 | "\n", 27 | "# necesary libraries for plotting\n", 28 | "import matplotlib.pyplot as plt # note that in both cases we import one object within the library\n", 29 | "import cartopy.crs as ccrs\n", 30 | "\n", 31 | "import warnings\n", 32 | "warnings.simplefilter('ignore') # filter some warning messages" 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": {}, 38 | "source": [ 39 | "## Plot SST anomaly timeseries\n", 40 | "\n", 41 | "We will use the same data from the previous chapter to calculate and plot global sea surface temperature anomalies from the Hadley dataset. And we will also calculate the climatology and anomalies of monthly data to show a slightly more complicated plot, illustrating some of the __xarray__ methods. __Run the next cell.__" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": null, 47 | "metadata": {}, 48 | "outputs": [], 49 | "source": [ 50 | "# open the dataset\n", 51 | "ds = xr.open_dataset('./data/HadISST_sst_2000-2020.nc') # read the netcdf file\n", 52 | "ds.close() \n", 53 | "\n", 54 | "# select north and southern hemispheres, and average spatially to obtain a time series\n", 55 | "nh_sst = ds.sst.sel(latitude=slice(90,0)).mean(dim=['latitude','longitude'])\n", 56 | "sh_sst = ds.sst.sel(latitude=slice(0,-90)).mean(dim=['latitude','longitude'])\n", 57 | "\n", 58 | "# calculate climatology\n", 59 | "nh_clim = nh_sst.groupby('time.month').mean('time') # application of two methods:\n", 60 | "# first groupby, and then the operation to perform over the group\n", 61 | "\n", 62 | "# calculate and explore the anomalies\n", 63 | "nh_ssta = nh_sst.groupby('time.month') - nh_clim # groupby 'aligns' the data with the climatology, \n", 64 | " # but only substract the appropiate climatology data point\n", 65 | "nh_ssta # the new dataarray (one variable) has a new coordinate, but not dimension" 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "metadata": {}, 71 | "source": [ 72 | "### The actual plotting of the data\n", 73 | " \n", 74 | "Making a simple plot using __matplotlib__ might seem like too much code, since there are many parameters to customize. However, it comes in handy for more detailed plots. __In the next cell__ we introduce some of the basic methods in a plot of the hemispheric averages calculated:\n", 75 | "- Defining a figure and its size\n", 76 | "- The function __plot__\n", 77 | "- How to add labels and legend\n", 78 | "- And how to display and 'finalize' a plot" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": null, 84 | "metadata": {}, 85 | "outputs": [], 86 | "source": [ 87 | "plt.figure(figsize=(10,4))\n", 88 | "plt.plot(nh_sst.time, nh_sst, '.-',label='NH') # the basic method plot() is used for line plots.\n", 89 | "plt.plot(sh_sst.time, sh_sst, '+-', c='tab:orange', label='SH')\n", 90 | "plt.grid(True)\n", 91 | "plt.legend(loc=0)\n", 92 | "plt.ylabel('SST (C)', fontsize=14)\n", 93 | "plt.title('Monthly Hemisphheric SST', fontsize=16)\n", 94 | "plt.show() # necesary line to finalize and properly display a figure" 95 | ] 96 | }, 97 | { 98 | "cell_type": "markdown", 99 | "metadata": {}, 100 | "source": [ 101 | "__In the next cell__ we plot the anomalies calculated, separating with color the positive and negative values. This is a more complicated plot that requires operating over the data first (using the method `.where`), but the plotting part is straight forward." 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": null, 107 | "metadata": {}, 108 | "outputs": [], 109 | "source": [ 110 | "plt.figure(figsize=(12,4))\n", 111 | "pos = nh_ssta.where(nh_ssta>=0) # select only positive values \n", 112 | "neg = nh_ssta.where(nh_ssta<0) # select only negative values\n", 113 | "dates = nh_ssta.time.dt.year + (nh_ssta.time.dt.month-1)/12 # make a list of time steps using year and month\n", 114 | "plt.bar(dates, pos.values, width=1/13, color='tab:red', edgecolor=None) # plot positive values\n", 115 | "plt.bar(dates, neg.values, width=1/13, color='tab:blue') # plot negative values\n", 116 | "plt.axhline(color='grey') # plot a grey horizontal line at y=0\n", 117 | "plt.grid(True, zorder=0)\n", 118 | "plt.ylabel('SST anomalies (C)')\n", 119 | "plt.title('Northern Hemisphere SST anomalies')\n", 120 | "plt.xticks([*range(2000,2021,1)], rotation=40)\n", 121 | "plt.autoscale(enable=True, axis='x', tight=True)\n", 122 | "plt.show()\n" 123 | ] 124 | }, 125 | { 126 | "cell_type": "markdown", 127 | "metadata": {}, 128 | "source": [ 129 | "***\n", 130 | "## Map plotting\n", 131 | "\n", 132 | "Now we will customize our maps. While the quick plot method from __xarray__ is all we need in many cases, sometimes we require a more customized or nicer image for a presentation or publication. It might seem like complicated code, but really there are many elements that could be left to the default values, and we wanted to show how to customize some of them if you need them.\n", 133 | "\n", 134 | "For global plots, the extent and the coordinate labels are sometimes not necessary to define, but we choose a regional plot for the next example to show how to customize these parameters. \n", 135 | "\n", 136 | "__Note__ that in the next to last line, we will also save our figure. We still use the _.show( )_ method in last line to display it." 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": null, 142 | "metadata": {}, 143 | "outputs": [], 144 | "source": [ 145 | "# import functions to label coordinates and add color to the land mass\n", 146 | "from cartopy.mpl.ticker import LongitudeFormatter, LatitudeFormatter\n", 147 | "import cartopy.feature as cfeature\n", 148 | "import calendar # quick access to names and numbers related to dates\n", 149 | "\n", 150 | "# select a region of our data\n", 151 | "region = np.array([[30,-40],[25,120]]) # numpy array that specifies the lat/lon boundaries of our selected region\n", 152 | "io_sst = ds.sst.sel(latitude=slice(region[0,0],region[0,1]),longitude=slice(region[1,0],region[1,1])) # select region\n", 153 | "\n", 154 | "for mon in [1,7]: # select two months of data to plot: month 1 and month 7\n", 155 | " moname = calendar.month_name[mon] # get the name of the month\n", 156 | " tmp = io_sst.sel(time=ds.time.dt.month==mon).mean('time') # select only one month at a time in a temporal object\n", 157 | "\n", 158 | " # create and set the figure context\n", 159 | " fig = plt.figure(figsize=(8,5)) # create a figure object, and assign it a variable name fig\n", 160 | " ax = plt.axes(projection=ccrs.PlateCarree()) # projection type - this one is easy to use\n", 161 | " ax.coastlines(resolution='50m',linewidth=2,color='black') \n", 162 | " ax.add_feature(cfeature.LAND, color='black')\n", 163 | " ax.set_extent([region[1,0],region[1,1],region[0,0],region[0,1]],crs=ccrs.PlateCarree()) \n", 164 | " ax.set_xticks([*range(region[1,0],region[1,1]+1,20)], crs=ccrs.PlateCarree()) # customize ticks and labels to longitude\n", 165 | " ax.set_yticks([*range(region[0,1],region[0,0]+1,10)], crs=ccrs.PlateCarree()) # customize ticks and labels to latitude\n", 166 | " ax.xaxis.set_major_formatter(LongitudeFormatter(zero_direction_label=True))\n", 167 | " ax.yaxis.set_major_formatter(LatitudeFormatter())\n", 168 | " plt.grid(True, alpha=0.5) # add a grid. the alpha argument specify the level of transparency of a plot figure\n", 169 | "\n", 170 | " # the core: the data to plot\n", 171 | " plt.contourf(tmp.longitude,tmp.latitude, tmp,15, cmap='RdYlBu_r') # contourf (filled contour plot) takes the 1D lat and lon coordinates for the 2D data. cmap specify the colormap to use.\n", 172 | " cbar=plt.colorbar()\n", 173 | " cbar.set_label('SST (C)') # color bar label\n", 174 | " plt.title(moname+' SST (2000-2020)')\n", 175 | " fig.savefig('./figures/map_base_'+moname+'.png') # save your figure by usinig the method .savefig. python recognized the format from the filename extension. \n", 176 | " plt.show()" 177 | ] 178 | }, 179 | { 180 | "cell_type": "markdown", 181 | "metadata": {}, 182 | "source": [ 183 | "***\n", 184 | "### And that's it. Now you're ready to go over the application chapters and follow the code. Also you should be able to edit it and get your own results!\n", 185 | "\n", 186 | "***\n", 187 | "## Resources\n", 188 | "\n", 189 | "[The Official __Matplotlib__ site](https://matplotlib.org/) \n", 190 | "\n", 191 | "Make sure to look at their [gallery](https://matplotlib.org/stable/gallery/index.html), which contains the code for each plot\n", 192 | "\n", 193 | "A very simple, step by step [tutorial](https://github.com/rougier/matplotlib-tutorial) to matplotlib\n", 194 | "\n", 195 | "[The Official __Cartopy__ - site](https://scitools.org.uk/cartopy/docs/latest/), and [gallery](https://scitools.org.uk/cartopy/docs/latest/gallery/index.html)\n", 196 | "\n", 197 | "R. Abernathey's [tutorial](https://rabernat.github.io/research_computing_2018/maps-with-cartopy.html) to Cartopy - Step by Step and very accessible\n", 198 | "\n", 199 | "[__Seaborn__](https://seaborn.pydata.org/index.html) - We didn't talk about Seaborn, but it is a very nice library with beatiful and well designed functions for statistical data visualization. Make sure you take a look at their gallery\n", 200 | "\n", 201 | "[The offical __Groupby__ reference](http://xarray.pydata.org/en/stable/groupby.html)\n", 202 | "\n", 203 | "The __xarray__ page also have some useful examples for [weather](http://xarray.pydata.org/en/stable/examples/weather-data.html) and [climate](http://xarray.pydata.org/en/stable/examples/monthly-means.html) data that applies the methods (and more) used here.\n" 204 | ] 205 | }, 206 | { 207 | "cell_type": "code", 208 | "execution_count": null, 209 | "metadata": {}, 210 | "outputs": [], 211 | "source": [] 212 | } 213 | ], 214 | "metadata": { 215 | "kernelspec": { 216 | "display_name": "Python 3 (ipykernel)", 217 | "language": "python", 218 | "name": "python3" 219 | }, 220 | "language_info": { 221 | "codemirror_mode": { 222 | "name": "ipython", 223 | "version": 3 224 | }, 225 | "file_extension": ".py", 226 | "mimetype": "text/x-python", 227 | "name": "python", 228 | "nbconvert_exporter": "python", 229 | "pygments_lexer": "ipython3", 230 | "version": "3.7.6" 231 | } 232 | }, 233 | "nbformat": 4, 234 | "nbformat_minor": 4 235 | } 236 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/Ch5_Satellite_Cloud-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Chapter 5 - Satellite Data in the Cloud\n", 8 | "\n", 9 | "This chapter gives a short introduction to accessing satellite data in the Cloud. It is more informative than practical, so you could jump straight to the examples in the next chapters. However, you might want some background if you want to modify the examples to get other data that interests you." 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "***\n", 17 | "Satellite data and satellite-based data are available from many sources, commercial and public. Not all data are free, specially data products from private companies. \n", 18 | "\n", 19 | "### Many data providers, including NASA, NOAA, and Copernicus share their data publicly and free. Including in the Cloud.\n", 20 | "\n", 21 | "## Today, an increasing portion of these data is stored in two accesible Cloud providers: [Amazon (AWS)](https://registry.opendata.aws/) and [Google (Earth Engine)](https://developers.google.com/earth-engine/datasets)\n", 22 | " \n", 23 | "Also, data comes in different formats.\n", 24 | "\n", 25 | "We will use data from AWS because of data is stored in a format that is easily analyzed using `Python` and `xarray`, while Google uses its own interface (worth checking out, though)." 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "***\n", 33 | "***\n", 34 | "Historically, satellite data is stored as snapshot images. With time, as the satellite-era grow in length, the number of images available has grown to amazing levels. \n", 35 | "\n", 36 | "And while these data has become an incredible tool for interannual and longer analysis, accessing the data for temporal analysis requires accesing each time step file, which is cumbersome and time and resource consuming.\n", 37 | "\n", 38 | "### New formats, like `zarr` have been developed to address this issue and provide faster access the data, not only in the temporal axis. \n", 39 | "\n", 40 | "Not all data is in this format, but the number of data sets is increasing and we will take advantage of those that are available." 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "***\n", 48 | "***\n", 49 | "### *In these tutorials we aim to facilitate acqusition of satellite-base data and their temporal analysis*\n", 50 | "***" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "***\n", 58 | "## In the Ocean Example (Chapter 6), we use a NASA high resolution sea surface temperature ([MUR SST](https://registry.opendata.aws/mur/)) data product, which is stored in `zarr` format\n", 59 | "Acquiring the data is simple, requiring only few lines of code. It is a long process, however, given the high resolution - in space and times - of the data. " 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "***\n", 67 | "## In the Atmopshere Example (Chapter 7), we use a Copernicous satellite-based reanalysis product that most earth scientists are familiar with: [ECMWF/ERA-5](https://registry.opendata.aws/ecmwf-era5/)\n", 68 | "These data is stored in the cloud, in `zarr` format, but due to its volume, it is saved as monthly files. So, each file has to be accessed individually for a region and time to be selected. Note that because the files are in the cloud, we are not downloading them - only the selected portion is.\n", 69 | "\n", 70 | "#### In the example, we'll analyze wind vectors, but this notebook is easily modified to analyze another variable contained in this data set. " 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "***\n", 78 | "## Finally, in the Land Example (Chapter 8), we use NASA's MODIS satellite data to examine changes in vegetation through time. \n", 79 | "\n", 80 | "This data is available through diferent names and providers, and it is available in the AWS cloud (not free to date), and Google Earth Engine. It is also available online directly through NASA or NOAA, and other private providers.\n", 81 | "\n", 82 | "*Therefore, in this chapter we exemplify how to use `python` and `xarray` to process files from an online server - in this case a thredds server. Not the cloud!*\n", 83 | "\n", 84 | "You'll notice that this is straight forward method, although a bit lenghty. However, this method needs a good and stable bandwith because it downloads every file and then select the area (instead of select in the cloud before downloading it). \n", 85 | "\n", 86 | "We expect this, and other data, to be available in the cloud in zarr format soon - for free - so its acquisition would be similar to data in previous chapters." 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "***\n", 94 | "# Resources\n", 95 | "\n", 96 | "### Data:\n", 97 | "- [AWS Registry for Open Data](https://registry.opendata.aws/)\n", 98 | "- [Google Earth Engine Data](https://developers.google.com/earth-engine/datasets)\n", 99 | "- [NOAA Big Data Program](https://www.noaa.gov/organization/information-technology/list-of-big-data-program-datasets) Data list\n", 100 | "\n", 101 | "### Cloud resources:\n", 102 | "- [Pangeo](https://pangeo.io/cloud.html)\n", 103 | "- [Chameleon Cloud](https://www.chameleoncloud.org/)\n", 104 | "- [Datalore](https://datalore.jetbrains.com/)\n", 105 | "- [mybinder](https://mybinder.org/)\n", 106 | "- [pangeo binder](https://binder.pangeo.io/)\n", 107 | "\n", 108 | "### Libraries related to cloud access and formats used here:\n", 109 | "- [fsspec](https://filesystem-spec.readthedocs.io/en/latest/) filesystem interfaces for python\n", 110 | "- [zarr](https://zarr.readthedocs.io/en/stable/) Big data storage formatt\n", 111 | "- [dask](https://dask.org/) library to enable parallel processing for python\n", 112 | "\n", 113 | "### If you want to dig deeper:\n", 114 | "- [Pangeo tutorial for AGU OSM2020](https://github.com/pangeo-gallery/osm2020tutorial)\n", 115 | "- [Methods for accessing a AWS bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-bucket-intro.html). Bucket is the name of the cloud storage object. S3 stands for Amazon's Simple Storage Service.\n", 116 | "- [earthengine-api](https://github.com/google/earthengine-api/blob/master/python/examples/ipynb/Earth_Engine_REST_API_compute_table.ipynb) Use Python to access cloud data in the Google Earth Engine.\n", 117 | "- [satpy](https://github.com/pytroll/satpy) Python library to analyze satellite data\n", 118 | "- [pysat](https://github.com/pysat/pysat) A Python satellite data analysis toolkit" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": null, 124 | "metadata": {}, 125 | "outputs": [], 126 | "source": [] 127 | } 128 | ], 129 | "metadata": { 130 | "kernelspec": { 131 | "display_name": "Python 3 (ipykernel)", 132 | "language": "python", 133 | "name": "python3" 134 | }, 135 | "language_info": { 136 | "codemirror_mode": { 137 | "name": "ipython", 138 | "version": 3 139 | }, 140 | "file_extension": ".py", 141 | "mimetype": "text/x-python", 142 | "name": "python", 143 | "nbconvert_exporter": "python", 144 | "pygments_lexer": "ipython3", 145 | "version": "3.7.6" 146 | } 147 | }, 148 | "nbformat": 4, 149 | "nbformat_minor": 4 150 | } 151 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/Ch6_Ocean_Example-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Chapter 6 - Example: Ocean Data \n", 8 | "### Days with sea surface temperature above a threshold\n", 9 | "\n", 10 | "In this chapter we exemplify the use of Sea Surface Temperature (SST) data in the cloud. \n", 11 | "\n", 12 | "This example analyzes a time series from an area of the ocean or a point. If an area, it averages SST values into a single value. Then it analyze the time series to assess when SST is above a given threshold. This could be used to study marine heatwaves, or use a SST threshold relevant to a marine species of interest." 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": null, 18 | "metadata": {}, 19 | "outputs": [], 20 | "source": [ 21 | "import warnings \n", 22 | "warnings.simplefilter('ignore') \n", 23 | "import numpy as np\n", 24 | "import pandas as pd\n", 25 | "import xarray as xr\n", 26 | "import matplotlib.pyplot as plt \n", 27 | "import hvplot.pandas # this library helps to make interactive plots\n", 28 | "import hvplot.xarray\n", 29 | "import fsspec # these libraries help reading cloud data\n", 30 | "import s3fs\n", 31 | "import dask\n", 32 | "from dask.distributed import performance_report, Client, progress" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": null, 38 | "metadata": {}, 39 | "outputs": [], 40 | "source": [ 41 | "# input parameters\n", 42 | "\n", 43 | "# select either a range of lat/lon or a point. \n", 44 | "# If a point, set both entries to the same value\n", 45 | "latr = [19, 20] # make sure lat1 < lat2 since no test is done below to simplify the code\n", 46 | "lonr = [-158, -157] # lon1 < lon2, range -180:180. resolution daily 1km!\n", 47 | "\n", 48 | "# time range. data range available: 2002-06-01 to 2020-01-20. [start with a short period]\n", 49 | "dater = ['2012-01-01','2016-12-31'] # dates on the format 'YYYY-MM-DD' as string" 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": {}, 55 | "source": [ 56 | "***\n", 57 | "## We are going to use the Multi-Scale Ultra High Resolution (MUR) Sea Surface Temperature (SST) data set\n", 58 | "### This dataset is stored the Amazon (AWS) Cloud. For more info and links to the data detail and examples, see: https://registry.opendata.aws/mur/\n", 59 | "\n", 60 | "This dataset is stored in `zarr` format, which is an optimized format for the large datasets and the cloud. It is not stored as one 'image' at a time or a gigantic netcdf file, but in 'chunks', so it is perfect for extracting time series.\n", 61 | "\n", 62 | "First, we open the dataset and explore it, but we are not downloading anything yet." 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": null, 68 | "metadata": {}, 69 | "outputs": [], 70 | "source": [ 71 | "# first determine the file name using, in the format:\n", 72 | "# the s3 bucket [mur-sst], and the region [us-west-2], and the folder if applicable [zarr-v1] \n", 73 | "file_location = 'https://mur-sst.s3.us-west-2.amazonaws.com/zarr-v1'\n", 74 | "\n", 75 | "ds_sst = xr.open_zarr(file_location,consolidated=True) # open a zarr file using xarray\n", 76 | "# it is similar to open_dataset but it only reads the metadata\n", 77 | "\n", 78 | "ds_sst # we can treat it as a dataset!\n" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "## Now that we know what the file contains, we select our data (region and time), operate on it if needed (if a region, average), and download only the selected data \n", 86 | "It takes a while given the high resolution of the data. So, be patient.... and if you're only testing, might want to choose a small region and a short time period first. " 87 | ] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "execution_count": null, 92 | "metadata": {}, 93 | "outputs": [], 94 | "source": [ 95 | "# decide if a point or a region was given.\n", 96 | "if (latr[0]==latr[1]) | (lonr[0]==lonr[1]): # if we give it only one point\n", 97 | " sst = ds_sst['analysed_sst'].sel(time = slice(dater[0],dater[1]),\n", 98 | " lat = latr[0], \n", 99 | " lon = lonr[0]\n", 100 | " ).load()\n", 101 | "else: # if we give it an area, it extract the area and average SST over the area and returns a time series of SST\n", 102 | " sst = ds_sst['analysed_sst'].sel(time = slice(dater[0],dater[1]),\n", 103 | " lat = slice(latr[0], latr[1]), \n", 104 | " lon = slice(lonr[0], lonr[1])\n", 105 | " ).mean(dim={'lat','lon'}, skipna=True, keep_attrs=True).load() # skip 'not a number' (NaN) values and keep attributes\n", 106 | "\n", 107 | "sst = sst-273.15 # transform units from Kelvin to Celsius\n", 108 | "sst.attrs['units']='deg C' # update units in metadata\n", 109 | "sst.to_netcdf('data/sst_example.nc') # saving the data, incase we want to come back to analyze the same data, but don't want to acquire it again from the cloud.\n", 110 | "sst # take a peak" 111 | ] 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "metadata": {}, 116 | "source": [ 117 | "***\n", 118 | "### *Execute the next cell only if your reading the data from a file - either no access to cloud, or not want to keep reading from it. Skip otherwise. (No problem if you executed it by mistake).*" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": null, 124 | "metadata": {}, 125 | "outputs": [], 126 | "source": [ 127 | "sst = xr.open_dataset('data/sst_example.nc') \n", 128 | "sst.close()\n", 129 | "sst = sst.analysed_sst # select only one variable\n", 130 | "sst" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": {}, 136 | "source": [ 137 | "***\n", 138 | "## Let's plot the data using two different libraries.\n", 139 | "#### - `matplotlib` that we already learn.\n", 140 | "#### - `hovplot` is a more interactive library for web display. It provides you with the data details when you hover your cursor over the figure. Very nice for inspecting the data." 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": null, 146 | "metadata": {}, 147 | "outputs": [], 148 | "source": [ 149 | "# matplotlib method #\n", 150 | "print('matplotlib') \n", 151 | "sst.plot() # this is all you need\n", 152 | "\n", 153 | "# all the stuff here to make it look nice. \n", 154 | "plt.ylabel('SST ($^\\circ$C)')\n", 155 | "plt.xlabel('Year')\n", 156 | "plt.title('Location: '+str(latr)+'$^\\circ$N, '+str(lonr)+'$^\\circ$W')\n", 157 | "plt.grid(True, alpha=0.3)\n", 158 | "plt.show()\n", 159 | "\n", 160 | "# hovplot method #\n", 161 | "print('hovplot')\n", 162 | "df = pd.DataFrame(data=sst.data, index=sst.time.data,columns=['SST (C)'])\n", 163 | "df.index.name = 'Date'\n", 164 | "df.hvplot(grid=True)" 165 | ] 166 | }, 167 | { 168 | "cell_type": "markdown", 169 | "metadata": {}, 170 | "source": [ 171 | "***\n", 172 | "## Now, let's analyze our data.\n", 173 | "#### First, the basics: climatology and anomalies. Also plotting using `hovplot`." 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": null, 179 | "metadata": {}, 180 | "outputs": [], 181 | "source": [ 182 | "# Calculate the climatology\n", 183 | "sst_climatology = sst.groupby('time.dayofyear').mean('time',keep_attrs=True,skipna=False) # Group by day, all years. skipna ignore missing (NaN) values \n", 184 | "sst_climstd = sst.groupby('time.dayofyear').std('time',keep_attrs=True,skipna=False) # Calculate standard deviation. Keep data attributes.\n", 185 | "\n", 186 | "# creates a dataset with climatology and standard deviaton for easy plotting with hvplot\n", 187 | "ds = xr.Dataset({'clim':sst_climatology,'+Std':sst_climatology+sst_climstd,'-Std':sst_climatology-sst_climstd}) # add standard deviation time series +/-\n", 188 | "ds.hvplot(color=['k','grey','grey'], grid=True, title='SST Climatology') # plot the climatology (black, and the standard deviation in grey)" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": null, 194 | "metadata": {}, 195 | "outputs": [], 196 | "source": [ 197 | "# calculate the anomalies\n", 198 | "sst_anomaly = sst.groupby('time.dayofyear')-sst_climatology \n", 199 | "sst_anomaly_monthly = sst_anomaly.resample(time='1MS', loffset='15D').mean(keep_attrs=True,skipna=False) # calculate monthly anomalies/smoothing\n", 200 | "\n", 201 | "# make a plot \n", 202 | "plt.plot(sst_anomaly.time,sst_anomaly)\n", 203 | "plt.plot(sst_anomaly_monthly.time,sst_anomaly_monthly, 'r')\n", 204 | "\n", 205 | "plt.grid()\n", 206 | "plt.ylabel('SSTa (C)')\n", 207 | "plt.title('SST Anomalies')\n", 208 | "plt.show()" 209 | ] 210 | }, 211 | { 212 | "cell_type": "markdown", 213 | "metadata": {}, 214 | "source": [ 215 | "***\n", 216 | "## We analyze the data further by dividing it by a threshold.\n", 217 | "\n", 218 | "- One way is to set a threshold that has some relevance. For example, a thermal threshold for a marine species we are studying. \n", 219 | "\n", 220 | "- Another way is choosing the maximum value in the climatology (mean value + 1 standard deviation), which we can calculate or read by hovering our cursor over the climatology plot above.\n", 221 | "\n", 222 | "### Once the threshold is choosen, we identify when SST is over that threshold, and count how many days that occurred each year" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": null, 228 | "metadata": {}, 229 | "outputs": [], 230 | "source": [ 231 | "# we define a function that take a threshold value, and analyze and plot our data\n", 232 | "def SST_above(thr):\n", 233 | " \n", 234 | " fig, axs = plt.subplots(1,2,figsize=(16,4)) # creates a figure with two panels\n", 235 | " \n", 236 | " # first part - values above threshold - timeseries\n", 237 | " plt.subplot(1,2,1) # plot on the first panel (last number)\n", 238 | " plt.plot(sst.time,sst.data, lw=1)\n", 239 | " a=sst>=thr # test when data is equal or greater than the threshold. a is a logical array (True/False values)\n", 240 | " plt.plot(sst.time[a], sst.data[a],'.r', markersize=3) # plot only the values equal or above threshold\n", 241 | " # all stuff here to make it look good\n", 242 | " plt.ylabel('SST ($^\\circ$C)')\n", 243 | " plt.xlabel('Year')\n", 244 | " plt.title('Location: '+str(latr)+'$^\\circ$N, '+str(lonr)+'$^\\circ$W')\n", 245 | " plt.grid(True, alpha=0.3)\n", 246 | " \n", 247 | "\n", 248 | " # second part - days per year above threshold\n", 249 | " plt.subplot(1,2,2) # plot on the second panel\n", 250 | " dts = sst[sst>=thr].time # select dates when SST is equal or greater than the threshold. note that this time is not a logical array, but the time values\n", 251 | " hot_days = dts.groupby('time.year').count() # agregate by year, by counting \n", 252 | " plt.bar(hot_days.year, hot_days) # bar plot of days per year\n", 253 | " plt.xlim(int(dater[0][:4]), int(dater[1][:4])+1) # make it nice\n", 254 | " plt.ylabel('No. days above '+str(np.round(thr,1))+'C')\n", 255 | " plt.grid(True, alpha=0.3)\n", 256 | " plt.show() # display and finaiize this figure, so the next is not overwritten\n", 257 | "\n", 258 | "## the actual analysis: two examples ##\n", 259 | "\n", 260 | "### Maximum climatology threshold\n", 261 | "thr = ds['+Std'].max() # setting threshold as maximum climatological value: mean + 1 standard deviation\n", 262 | "print('Max climatological SST = ',np.round(thr,1),'C')\n", 263 | "SST_above(thr) # Call function we defined\n", 264 | "\n", 265 | "### A relevant threshold. \n", 266 | "# For example, for hawaii (the select region), 28C is a relevant threshold for coral bleaching (https://coralreefwatch.noaa.gov/product/5km/tutorial/crw08a_bleaching_threshold.php)\n", 267 | "thr = 28\n", 268 | "print('\\n\\nBiologically relevant SST = ',thr,'C')\n", 269 | "SST_above(thr) # Call function" 270 | ] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": {}, 275 | "source": [ 276 | "***\n", 277 | "#### Now, a different analsys of anomalously warm SST days. \n", 278 | "## Marine heatwaves\n", 279 | "Defined as any period with SST anomalies above the threshold determined by the 90th percentile value of a given period - in this case our data time period." 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": null, 285 | "metadata": {}, 286 | "outputs": [], 287 | "source": [ 288 | "# first, calculate the threshold: 90th percentile\n", 289 | "thr = np.percentile(sst_anomaly, 90)\n", 290 | "\n", 291 | "fig, axs = plt.subplots(3,1,figsize=(16,16)) # make a figure of 3 vertical panels\n", 292 | "\n", 293 | "# same plot as in our function above, but this time we are plotting the anomalies.\n", 294 | "plt.subplot(3,1,1) \n", 295 | "plt.plot(sst_anomaly.time,sst_anomaly.data, lw=1)\n", 296 | "plt.axhline(y=0, c='k', zorder=0, alpha=0.5) # add a line to highlight the x axis \n", 297 | "a=sst_anomaly>=thr # select data above the threshold\n", 298 | "plt.plot(sst_anomaly.time[a], sst_anomaly.data[a],'.r', markersize=3)\n", 299 | "# all stuff here to make it look good\n", 300 | "plt.ylabel('SST anomalies ($^\\circ$C)')\n", 301 | "plt.xlabel('Year')\n", 302 | "plt.title('Location: '+str(latr)+'$^\\circ$N, '+str(lonr)+'$^\\circ$W')\n", 303 | "plt.grid(True, alpha=0.3)\n", 304 | "\n", 305 | "# Now plot on the original data (not anomalies)\n", 306 | "plt.subplot(3,1,2) # second panel\n", 307 | "plt.plot(sst.time,sst.data, lw=1)\n", 308 | "plt.plot(sst.time[a], sst.data[a],'.r', markersize=3) # plot only the values equal or above threshold\n", 309 | "# all stuff here to make it look good\n", 310 | "plt.ylabel('SST ($^\\circ$C)')\n", 311 | "plt.xlabel('Year')\n", 312 | "plt.title('Location: '+str(latr)+'$^\\circ$N, '+str(lonr)+'$^\\circ$W')\n", 313 | "plt.grid(True, alpha=0.3)\n", 314 | "\n", 315 | "# plot of marine heatwave days per year\n", 316 | "dts = sst_anomaly[sst_anomaly>=thr].time\n", 317 | "mhw = dts.groupby('time.year').count()\n", 318 | "plt.subplot(3,1,3) # third panel\n", 319 | "plt.bar(mhw.year,mhw)\n", 320 | "plt.ylabel('No. days SSTa > '+str(np.round(thr,1))+'C')\n", 321 | "plt.grid(True, alpha=0.3)\n", 322 | "plt.show()\n", 323 | "\n", 324 | "mhw # print the numbers of days" 325 | ] 326 | }, 327 | { 328 | "cell_type": "markdown", 329 | "metadata": {}, 330 | "source": [ 331 | "## Resources\n", 332 | "\n", 333 | "For the cloud and data in the cloud, see resources listed in Chapter 5.\n", 334 | "\n", 335 | "### Resources specifically for this chapter:\n", 336 | "\n", 337 | "- [MUR SST Data](https://registry.opendata.aws/mur/). SST data in the cloud, with references the official datta website, examples and other resources.\n", 338 | "\n", 339 | "- [Pangeo OSM2020 Tutorial](https://github.com/pangeo-gallery/osm2020tutorial). This is a very good tutorial for ocean application and cloud computing. Plenty of examples. Many of the commands here are from this tutorial.\n", 340 | "\n", 341 | "### About MHW\n", 342 | "\n", 343 | "- [Marine heatwaves](http://www.marineheatwaves.org/all-about-mhws.html). A good place to begin to get info about the subject.\n", 344 | "\n", 345 | "- [Marine heatwaves code](https://github.com/ecjoliver/marineHeatWaves). Marine heatwaves code from E. Oliver.\n", 346 | "\n", 347 | "### If you want to learn more:\n", 348 | "\n", 349 | "- [Methods for accessing a AWS bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-bucket-intro.html). Bucket is the name of the cloud storage object. S3 stands for Amazon's Simple Storage Service.\n", 350 | "\n", 351 | "- [hvplot site](https://hvplot.holoviz.org/index.html). Plotting tool used here.\n", 352 | "\n", 353 | "- [zarr](https://zarr.readthedocs.io/en/stable/). Learn more about this big data storage format." 354 | ] 355 | }, 356 | { 357 | "cell_type": "code", 358 | "execution_count": null, 359 | "metadata": {}, 360 | "outputs": [], 361 | "source": [] 362 | } 363 | ], 364 | "metadata": { 365 | "kernelspec": { 366 | "display_name": "Python 3 (ipykernel)", 367 | "language": "python", 368 | "name": "python3" 369 | }, 370 | "language_info": { 371 | "codemirror_mode": { 372 | "name": "ipython", 373 | "version": 3 374 | }, 375 | "file_extension": ".py", 376 | "mimetype": "text/x-python", 377 | "name": "python", 378 | "nbconvert_exporter": "python", 379 | "pygments_lexer": "ipython3", 380 | "version": "3.7.6" 381 | } 382 | }, 383 | "nbformat": 4, 384 | "nbformat_minor": 4 385 | } 386 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/Ch7_Atmosphere-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Chapter 7 - Atmospheric Data Example\n", 8 | "### Analyze high ozone events in the atmosphere over a region\n", 9 | "\n", 10 | "In this chapter we exemplify the use of an atmospheric data set, in this case ozone (atually N02 tropospheric column density, that poxies tropospheric ozone), characterize its variability over a given region, and identify high concentration events and # of days of high ozone levels year.\n", 11 | "\n", 12 | "This data estimates ozone in the whole air column, which is not necessarly reflection of ozone at ground level, so it is not a recommended for air quality at the surface. We selected this one because of the lenght of its data" 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": null, 18 | "metadata": {}, 19 | "outputs": [], 20 | "source": [ 21 | "# libraries\n", 22 | "# libraries\n", 23 | "import numpy as np\n", 24 | "import pandas as pd\n", 25 | "import xarray as xr\n", 26 | "import matplotlib.pyplot as plt \n", 27 | "import hvplot.pandas\n", 28 | "import hvplot.xarray\n", 29 | "import fsspec\n", 30 | "import s3fs\n", 31 | "import dask\n", 32 | "from dask.distributed import performance_report, Client, progress\n", 33 | "xr.set_options(display_style=\"html\") #display dataset nicely\n", 34 | "\n", 35 | "# this library helps to make your code execution less messy\n", 36 | "import warnings\n", 37 | "warnings.simplefilter('ignore') # filter some warning messages" 38 | ] 39 | }, 40 | { 41 | "cell_type": "code", 42 | "execution_count": null, 43 | "metadata": {}, 44 | "outputs": [], 45 | "source": [ 46 | "file_location = 's3://nasanex/MODIS'\n", 47 | "\n", 48 | "ds_ndvi = xr.open_zarr(fsspec.get_mapper(file_location, anon=True),consolidated=True)\n", 49 | "\n", 50 | "ds_ndvi" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": null, 56 | "metadata": {}, 57 | "outputs": [], 58 | "source": [ 59 | "# input parameters:\n", 60 | "# area to analyze: lat, lon ranges\n", 61 | "# time frame" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": null, 67 | "metadata": {}, 68 | "outputs": [], 69 | "source": [ 70 | "# read ozone data\n", 71 | "# https://registry.opendata.aws/omi-no2-nasa/\n", 72 | "# https://aura.gsfc.nasa.gov/omi.htmld\n", 73 | "# s3://omi-no2-nasa/ \n", 74 | "# look at the data, description and attributes\n", 75 | "\n", 76 | "# plot overall climatology for the region, and max/min values (orr 10-90th percentiles)''" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": null, 82 | "metadata": {}, 83 | "outputs": [], 84 | "source": [ 85 | "# plot time series\n", 86 | "# identify moments above 90#\n", 87 | "# count # days per year above it per year" 88 | ] 89 | }, 90 | { 91 | "cell_type": "code", 92 | "execution_count": null, 93 | "metadata": {}, 94 | "outputs": [], 95 | "source": [ 96 | "# resources" 97 | ] 98 | } 99 | ], 100 | "metadata": { 101 | "kernelspec": { 102 | "display_name": "Python 3", 103 | "language": "python", 104 | "name": "python3" 105 | }, 106 | "language_info": { 107 | "codemirror_mode": { 108 | "name": "ipython", 109 | "version": 3 110 | }, 111 | "file_extension": ".py", 112 | "mimetype": "text/x-python", 113 | "name": "python", 114 | "nbconvert_exporter": "python", 115 | "pygments_lexer": "ipython3", 116 | "version": "3.7.6" 117 | } 118 | }, 119 | "nbformat": 4, 120 | "nbformat_minor": 4 121 | } 122 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/Ch7_Atmosphere_Example-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Chapter 7 - Example: Atmospheric Data \n", 8 | "### Analyze monthy wind data for a selected region\n", 9 | "\n", 10 | "In this chapter, we exemplify the use of an atmospheric/climate data set, the reanalysis dataset ERA-5, to analyze change in wind vectors at 10m. We characterize its variability over a given region, plot the field and calculate linear trends.\n", 11 | "\n", 12 | "[ERA-5 (ECMWF)](https://registry.opendata.aws/ecmwf-era5/) reanalysis incorporates satellite and in-situ data, and its output variables include ocean, land and atmospheric ones. Therefore, this script can be easily modified for other data. " 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": null, 18 | "metadata": {}, 19 | "outputs": [], 20 | "source": [ 21 | "import warnings\n", 22 | "warnings.simplefilter('ignore') \n", 23 | "\n", 24 | "import numpy as np\n", 25 | "import pandas as pd\n", 26 | "import xarray as xr\n", 27 | "from calendar import month_abbr # function that gives you the abbreviated name of a month\n", 28 | "from calendar import monthrange # gives the number of day in a month\n", 29 | "import matplotlib.pyplot as plt \n", 30 | "import hvplot.pandas\n", 31 | "import hvplot.xarray\n", 32 | "import fsspec\n", 33 | "import s3fs\n", 34 | "import dask\n", 35 | "from dask.distributed import performance_report, Client, progress\n", 36 | "import os # library to interact with the operating system" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "***\n", 44 | "## For this example we select a region, and also a specific month and a range of years to analyze" 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": null, 50 | "metadata": {}, 51 | "outputs": [], 52 | "source": [ 53 | "# Select region by defining latitude and longitude range. \n", 54 | "# ERA-5 data has a 1/4 degree resolution. \n", 55 | "latr = [39, 40] # Latitude range. Make sure lat1 < lat2 since no test is done below to simplify the code. resolution 0.25 degrees\n", 56 | "lonr = [-125, -123] # lon1 < lon2. and use the range -180 : 180\n", 57 | "# time selection\n", 58 | "mon = 5 # month to analyze\n", 59 | "iyr = 2000 # initial year. by default, we set it to the start year of ERA5 dataset\n", 60 | "fyr = 2021 # final year. by default, we set it to the end year of ERA5 dataset\n" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "***\n", 68 | "## Acquire data from the AWS cloud\n", 69 | "\n", 70 | "In this case, files are stored in a different format than SST. ERA5 data is stored in monthly files (of daily data) organized in yearly folders. Then, monhtly files have to be accessed individually." 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": null, 76 | "metadata": { 77 | "scrolled": true 78 | }, 79 | "outputs": [], 80 | "source": [ 81 | "tdt = list() # initialize a list to store the time index\n", 82 | "\n", 83 | "# v meridional component\n", 84 | "print('Acquiring meridional wind v10m')\n", 85 | "for iy, y in enumerate(range(iyr, fyr+1)): # for loop over the selected years\n", 86 | " file_location = 'https://era5-pds.s3.us-east-1.amazonaws.com/zarr/'+str(y)+'/'+str(mon).zfill(2)+'/data/northward_wind_at_10_metres.zarr'\n", 87 | " # filename includes: bucket name: era5-pds, year: y (transformed to string type), month: mon, and the name of the variable with extenssion zarr\n", 88 | " ds = xr.open_zarr(file_location,consolidated=True) # open access to data\n", 89 | "\n", 90 | " # generate time frame to obtain the whole month data (first to last day of selected month)\n", 91 | " dte1 = str(y)+'-'+str(mon).zfill(2)+'-01'\n", 92 | " dte2 = str(y)+'-'+str(mon).zfill(2)+'-'+str(monthrange(y, mon)[1]) #monthrange provides the lenght of the month\n", 93 | " # select data region and time - meridional wind\n", 94 | " vds = ds['northward_wind_at_10_metres'].sel(time0 = slice(dte1,dte2),\n", 95 | " lat = slice(latr[1],latr[0],), \n", 96 | " lon = slice(lonr[0]+360,lonr[1]+360)\n", 97 | " ).mean(axis=0).load() # calculae mean before downloading it\n", 98 | " if iy==0: # if the first year, create an array to store data\n", 99 | " v10_dt = np.full((len(range(iyr, fyr+1)),vds.shape[0],vds.shape[1]), np.nan) # create an array of the size [years,lat,lon]\n", 100 | " v10_dt[iy,:,:] = vds.data # store selected data per year\n", 101 | " \n", 102 | "# u component\n", 103 | "print('Acquiring zonal wind u10m')\n", 104 | "for iy, y in enumerate(range(iyr, fyr+1)):\n", 105 | " file_location = 'https://era5-pds.s3.us-east-1.amazonaws.com/zarr/'+str(y)+'/'+str(mon).zfill(2)+'/data/eastward_wind_at_10_metres.zarr'\n", 106 | " # note that each variable has a distintive file name\n", 107 | " ds = xr.open_zarr(file_location,consolidated=True)\n", 108 | "\n", 109 | " dte1 = str(y)+'-'+str(mon).zfill(2)+'-01'\n", 110 | " dte2 = str(y)+'-'+str(mon).zfill(2)+'-'+str(monthrange(y, mon)[1])\n", 111 | " uds = ds['eastward_wind_at_10_metres'].sel(time0 = slice(dte1,dte2),\n", 112 | " lat = slice(latr[1],latr[0],), \n", 113 | " lon = slice(lonr[0]+360,lonr[1]+360)\n", 114 | " ).mean(axis=0).load()\n", 115 | " if iy==0: \n", 116 | " u10_dt = np.full((len(range(iyr, fyr+1)),uds.shape[0],uds.shape[1]), np.nan)\n", 117 | " u10_dt[iy,:,:] = uds.data \n", 118 | " \n", 119 | " # append month-year time to the list\n", 120 | " tdt.append(str(y)+'-'+str(mon).zfill(2)+'-01') # add first day of month\n", 121 | " \n" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": null, 127 | "metadata": {}, 128 | "outputs": [], 129 | "source": [ 130 | "# Build a dataset from the selected data. not only a dataarray since we have 2 variables for the vector\n", 131 | "mw10 = xr.Dataset(data_vars=dict(u10m=(['time','lat','lon'],u10_dt),\n", 132 | " v10m=(['time','lat','lon'],v10_dt), ),\n", 133 | " coords=dict(time=tdt,lat=vds.lat.values, lon=vds.lon.values-360),attrs=vds.attrs) \n", 134 | "# Add a wind speed variable\n", 135 | "mw10['wsp10m'] = np.sqrt(mw10.u10m**2+mw10.v10m**2) # calculate wind speed\n", 136 | "mw10.to_netcdf('./data/ERA5_wind10m_mon'+str(mon).zfill(2)+'.nc') # saving the file for a future use, so we don't have to get data again\n", 137 | "mw10 # taking a peek\n" 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": null, 143 | "metadata": {}, 144 | "outputs": [], 145 | "source": [ 146 | "mw10 = xr.open_dataset('./data/ERA5_wind10m_mon05.nc')\n", 147 | "mw10.close()\n", 148 | "mw10" 149 | ] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": {}, 154 | "source": [ 155 | "***\n", 156 | "## Plotting the data\n", 157 | "\n", 158 | "As before, there is a simple way to plot the data for quick inspection, and also a way to make the plot ready for sharing or publication." 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": null, 164 | "metadata": {}, 165 | "outputs": [], 166 | "source": [ 167 | "# simple plot of data, using the matplotlib function quiver to plot vectors\n", 168 | "x,y = np.meshgrid(mw10.lon,mw10.lat) # generate an lat/lon grid to plot the vectors\n", 169 | "plt.quiver(x, y, mw10.u10m[0,:,:], mw10.v10m[0,:,:]) \n", 170 | "plt.show()" 171 | ] 172 | }, 173 | { 174 | "cell_type": "code", 175 | "execution_count": null, 176 | "metadata": {}, 177 | "outputs": [], 178 | "source": [ 179 | "# Now a more detailed plot\n", 180 | "from cartopy.mpl.ticker import LongitudeFormatter, LatitudeFormatter\n", 181 | "import cartopy.feature as cfeature\n", 182 | "import cartopy.crs as ccrs\n", 183 | "from calendar import month_abbr\n", 184 | "\n", 185 | "# Select a region of our data, giving it a margin\n", 186 | "margin = 0.5 # extra space for the plot\n", 187 | "region = np.array([[latr[0]-margin,latr[1]+margin],[lonr[0]-margin,lonr[1]+margin]]) # numpy array that specifies the lat/lon boundaries of our selected region\n", 188 | "\n", 189 | "# Create and set the figure context\n", 190 | "fig = plt.figure(figsize=(8,5)) # create a figure object, and assign it a variable name fig\n", 191 | "ax = plt.axes(projection=ccrs.PlateCarree()) # projection type - this one is easy to use\n", 192 | "ax.coastlines(resolution='50m',linewidth=2,color='black') \n", 193 | "ax.add_feature(cfeature.LAND, color='grey', alpha=0.3)\n", 194 | "ax.set_extent([region[1,0],region[1,1],region[0,0],region[0,1]],crs=ccrs.PlateCarree()) \n", 195 | "ax.set_xticks([*np.arange(region[1,0],region[1,1]+1,1)], crs=ccrs.PlateCarree()) # customize ticks and labels to longitude\n", 196 | "ax.set_yticks([*np.arange(region[0,0],region[0,1]+1,1)], crs=ccrs.PlateCarree()) # customize ticks and labels to latitude\n", 197 | "ax.xaxis.set_major_formatter(LongitudeFormatter(zero_direction_label=True))\n", 198 | "ax.yaxis.set_major_formatter(LatitudeFormatter())\n", 199 | "\n", 200 | "# Plot average wind for the selected month, color is the wind speed\n", 201 | "plt.quiver(x, y, mw10.u10m.mean(axis=0), mw10.v10m.mean(axis=0),mw10.wsp10m.mean(axis=0), cmap='jet')\n", 202 | "cbar=plt.colorbar()\n", 203 | "cbar.set_label('m/s') # color bar label\n", 204 | "plt.title('Wind for '+month_abbr[mon]+' ('+str(iyr)+'-'+str(fyr)+')')\n", 205 | "#fig.savefig('filename') # save your figure by usinig the method .savefig. python recognized the format from the filename extension. \n", 206 | "plt.show()" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "*** \n", 214 | "## To analyze the data in time, we select only one point in space. \n", 215 | "But if you want to analyze the entire field, you can:\n", 216 | "- Average spatially using .mean(axis=(1,2)) on the variables\n", 217 | "- Repeat the analysis for each point (using a `for` loop)\n", 218 | "- Or even better: use `xarray` methods to apply a function to the array" 219 | ] 220 | }, 221 | { 222 | "cell_type": "code", 223 | "execution_count": null, 224 | "metadata": {}, 225 | "outputs": [], 226 | "source": [ 227 | "print('Latitude values: ', mw10.lat.values)\n", 228 | "print('Longitude values: ',mw10.lon.values)" 229 | ] 230 | }, 231 | { 232 | "cell_type": "code", 233 | "execution_count": null, 234 | "metadata": {}, 235 | "outputs": [], 236 | "source": [ 237 | "# select a point from the range of latitude and longitude values above\n", 238 | "slat = 39 # selected latitude\n", 239 | "slon = -124 # selected longitude" 240 | ] 241 | }, 242 | { 243 | "cell_type": "code", 244 | "execution_count": null, 245 | "metadata": {}, 246 | "outputs": [], 247 | "source": [ 248 | "# Select data for an specific location, and do a simple plot of each variable\n", 249 | "plt.figure(figsize=(12,8))\n", 250 | "\n", 251 | "# meridional wind change\n", 252 | "plt.subplot(2,2,1)\n", 253 | "plt.plot(range(iyr,fyr+1),mw10.v10m.sel(lat=slat,lon=slon), 'bd-',zorder=2)\n", 254 | "plt.axhline(y=0,c='k', alpha=0.4)\n", 255 | "plt.ylabel('Wind speed (m/s)')\n", 256 | "plt.title('Meridional wind (v), Lat='+str(slat)+', Lon='+str(slon))\n", 257 | "plt.grid(zorder=0)\n", 258 | "\n", 259 | "# zonal wind change\n", 260 | "plt.subplot(2,2,2)\n", 261 | "plt.plot(range(iyr,fyr+1),mw10.u10m.sel(lat=slat,lon=slon), 'go-',zorder=2)\n", 262 | "plt.axhline(y=0,c='k', alpha=0.4)\n", 263 | "plt.ylabel('Wind speed (m/s)')\n", 264 | "plt.title('Zonal wind (u), Lat='+str(slat)+', Lon='+str(slon))\n", 265 | "plt.grid(zorder=0)\n", 266 | "\n", 267 | "# wind speed change\n", 268 | "plt.subplot(2,2,3)\n", 269 | "plt.plot(range(iyr,fyr+1), mw10.wsp10m.sel(lat=slat,lon=slon), 's-',c='darkorange',zorder=2)\n", 270 | "plt.axhline(y=0,c='k', alpha=0.4)\n", 271 | "plt.ylabel('Wind speed (m/s)')\n", 272 | "plt.title('Wind speed, Lat='+str(slat)+', Lon='+str(slon))\n", 273 | "plt.grid(zorder=0)\n", 274 | "\n", 275 | "plt.tight_layout()\n", 276 | "plt.show()" 277 | ] 278 | }, 279 | { 280 | "cell_type": "markdown", 281 | "metadata": {}, 282 | "source": [ 283 | "***\n", 284 | "## Now, let's calculate the temporal trend on one of the wind variables, using a first degree linear regression " 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": null, 290 | "metadata": {}, 291 | "outputs": [], 292 | "source": [ 293 | "# libraries for statistics and machine learning functions\n", 294 | "from sklearn.preprocessing import PolynomialFeatures\n", 295 | "import statsmodels.api as sm\n", 296 | "\n", 297 | "var='wsp10m' # select a variable from our Dataset\n", 298 | "\n", 299 | "x = np.array([*range(iyr,fyr+1)]).reshape(-1,1) # we generate an array of years, and transpose it by using .reshape(-1,1)\n", 300 | "y = mw10[var].sel(lat=slat,lon=slon).values.reshape(-1,1) # selected variable at the selected point\n", 301 | "\n", 302 | "polf = PolynomialFeatures(1) # linear regression (order=1)\n", 303 | "xp = polf.fit_transform(x) # generate a array with the years and a dummy / constant variable\n", 304 | "mods = sm.OLS(y,xp).fit() # calculate regression model, stored in mods\n", 305 | "\n", 306 | "print(mods.summary()) # each variable of the modell can also be accessed individually\n", 307 | "\n", 308 | "# this summary shows different metrics and significance levels along with the equation variables and constants. \n", 309 | "# for more details see the resources section below" 310 | ] 311 | }, 312 | { 313 | "cell_type": "markdown", 314 | "metadata": {}, 315 | "source": [ 316 | "***\n", 317 | "# Resources\n", 318 | "**Data**\n", 319 | "- AWS [ERA-5 (ECMWF)](https://registry.opendata.aws/ecmwf-era5/) reanalysis data.\n", 320 | "This page also has links to other tutorials that use other libraries.\n", 321 | "- [List of data available](https://github.com/planet-os/notebooks/blob/master/aws/era5-pds.md) on ERA5 and details on how the files are organized.\n", 322 | "- Google Earth Engine ERA-5 data. [[Monthly]](https://developers.google.com/earth-engine/datasets/catalog/ECMWF_ERA5_MONTHLY#bands) [[Daily]](https://developers.google.com/earth-engine/datasets/catalog/ECMWF_ERA5_DAILY).\n", 323 | "\n", 324 | "**More on the libraries:**\n", 325 | "- [xarray apply](https://www.programcreek.com/python/example/123575/xarray.apply_ufunc) Examples on how to apply a function to an xarray structure\n", 326 | "- [sckit-learn (sklearn)](https://scikit-learn.org/stable/) a library for machine learning functions\n", 327 | "- [statsmodels](https://www.statsmodels.org/stable/user-guide.html) a library to calculalte statistical models.\n", 328 | "\n", 329 | "\n" 330 | ] 331 | }, 332 | { 333 | "cell_type": "code", 334 | "execution_count": null, 335 | "metadata": {}, 336 | "outputs": [], 337 | "source": [] 338 | } 339 | ], 340 | "metadata": { 341 | "kernelspec": { 342 | "display_name": "Python 3 (ipykernel)", 343 | "language": "python", 344 | "name": "python3" 345 | }, 346 | "language_info": { 347 | "codemirror_mode": { 348 | "name": "ipython", 349 | "version": 3 350 | }, 351 | "file_extension": ".py", 352 | "mimetype": "text/x-python", 353 | "name": "python", 354 | "nbconvert_exporter": "python", 355 | "pygments_lexer": "ipython3", 356 | "version": "3.7.6" 357 | } 358 | }, 359 | "nbformat": 4, 360 | "nbformat_minor": 4 361 | } 362 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/Ch8_Land_Example-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Chapter 8 - Example: Land Data\n", 8 | "### Changes in vegetation index through the years for an area. NDVI index inidicates the percentage of vegetation for each grid point.\n", 9 | "\n", 10 | "In this chapter we don't use data from the cloud, but exemplify how to obtain timeseries data from the data stored in temporally separeted files in the internet and analyze it. You'll see that it is not very different from previous chapters, except that there is not a centralized repository for data. In the future (hopefully soon), when data is in the cloud on a similar data format, accessing from these data would be similar to chapters 6 and 7.\n", 11 | "\n", 12 | "## This script reads NDVI (vegetation index) files from a `thredds` server, compile the region and time selected, and then analyze the change in vegetation index through time." 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": null, 18 | "metadata": {}, 19 | "outputs": [], 20 | "source": [ 21 | "import warnings\n", 22 | "warnings.simplefilter('ignore') \n", 23 | "\n", 24 | "import pandas as pd\n", 25 | "import numpy as np\n", 26 | "import xarray as xr\n", 27 | "xr.set_options(display_style=\"html\") # display dataset nicely\n", 28 | "import os\n", 29 | "import re # regular expressions\n", 30 | "from datetime import date\n", 31 | "from calendar import month_abbr\n", 32 | "import urllib as ur # library to download files online \n", 33 | "import requests # library to read files online \n", 34 | "import matplotlib.pyplot as plt \n", 35 | "import hvplot.pandas\n", 36 | "import hvplot.xarray\n" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": null, 42 | "metadata": {}, 43 | "outputs": [], 44 | "source": [ 45 | "# Select a region \n", 46 | "lat1, lat2 = 16, 18 # two latitudes for a range: lat1=0.3) # create a mask for veg. index >= 30% in the first time step. other locations set to NaN\n", 197 | "veg_area = mask0.count() # count the number of grid points above when the mask is applied - need it if you want to calculate area later\n", 198 | "for i in range(len(ayrs)): \n", 199 | " tmp=ndvi[i,:,:]*mask0 # apply the mask for each year\n", 200 | " veg_mean.append(tmp.mean())\n", 201 | "\n", 202 | "plt.bar(ayrs,veg_mean-np.nanmean(veg_mean))\n", 203 | "plt.title('Vegetation Index Change for '+month_abbr[mon]+' '+str(dy).zfill(2))\n", 204 | "plt.ylabel('NDVI')\n", 205 | "plt.grid(True, alpha=0.3)\n", 206 | "plt.show()" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "***\n", 214 | "# Resources\n", 215 | "\n", 216 | "\n", 217 | "### Data and data sources: \n", 218 | "- [NDVI Normalized Difference Vegetation Index (Climate Data Record)](https://www.ncei.noaa.gov/products/climate-data-records/normalized-difference-vegetation-index) data. \n", 219 | "- [NDVI data list](https://www.ncei.noaa.gov/thredds/catalog/cdr/ndvi/catalog.html) \n", 220 | "\n", 221 | "### Other locations for MODIS and NDVI data\n", 222 | "- [AWS](https://registry.opendata.aws/modis-astraea/)\n", 223 | "- [AWS NASA NEX](https://registry.opendata.aws/nasanex/)\n", 224 | "- [Earth Engine](https://developers.google.com/earth-engine/datasets/catalog/NOAA_CDR_AVHRR_NDVI_V5#description)\n", 225 | "- [USGS](https://lpdaac.usgs.gov/products/mod13q1v006/)\n", 226 | "\n", 227 | "### Other data in `thredds`\n", 228 | "- [NCEI thredds](https://www.ncei.noaa.gov/thredds/catalog.html) NOAA National Centers for Environmental Information thredds catalog.\n", 229 | "- [How to access data file ini thredds](https://www.unidata.ucar.edu/software/tds/current/tutorial/CatalogPrimer.html)\n", 230 | "\n", 231 | "### More on the libraries:\n", 232 | "- [A short article on how to download files from url in Python](https://betterprogramming.pub/3-simple-ways-to-download-files-with-python-569cb91acae6)\n", 233 | "- [urllib/request](https://docs.python.org/3/library/urllib.request.html?highlight=retrieve) library \n", 234 | "- Regular expressions [re](https://docs.python.org/3/howto/regex.html). Useful method to manipulate strings. See this [tutorial](https://www.tutorialspoint.com/python/python_reg_expressions.htm) for a more friendly approach.\n" 235 | ] 236 | }, 237 | { 238 | "cell_type": "code", 239 | "execution_count": null, 240 | "metadata": {}, 241 | "outputs": [], 242 | "source": [] 243 | } 244 | ], 245 | "metadata": { 246 | "kernelspec": { 247 | "display_name": "Python 3 (ipykernel)", 248 | "language": "python", 249 | "name": "python3" 250 | }, 251 | "language_info": { 252 | "codemirror_mode": { 253 | "name": "ipython", 254 | "version": 3 255 | }, 256 | "file_extension": ".py", 257 | "mimetype": "text/x-python", 258 | "name": "python", 259 | "nbconvert_exporter": "python", 260 | "pygments_lexer": "ipython3", 261 | "version": "3.7.6" 262 | } 263 | }, 264 | "nbformat": 4, 265 | "nbformat_minor": 4 266 | } 267 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/Python_Installation-checkpoint.md: -------------------------------------------------------------------------------- 1 | If you decide to install `Python`, there are a number of ways and resources to do so. In here, we will explain how to install Python and Jupyterlab using __Conda__, a package manager. 2 | 3 | ## Conda 4 | [Conda](https://docs.conda.io/en/latest/) is a tool that help you install and update Python and any library you might need. Although not only for beginners, conda makes like easier and simplier to install module-built `Python`. 5 | 6 | We will use __Miniconda__ which is a minimal installer of conda. This means that it install conda, `Python`, packages that they depend on, and only a small number of other packages. The user will install only the packages (libraries) needed. This allows your Python installation to be tailored to your needs. 7 | 8 | 9 | ### Install Miniconda 10 | Download miniconda from [here](https://docs.conda.io/en/latest/miniconda.html). Choose your platform, and make sure to get Python 3.8. Follow the Regular installation instructions for each platform are located [here](https://conda.io/projects/conda/en/latest/user-guide/install/index.html#regular-installation). (There are many sites with easier (or harder) to follow instructrions, but this site is the most up to date one.) 11 | 12 | Additional libraries can be install one by one, or as we will use here, using a list of libraries in a file named __environment.yml__. Download this file from directly from [here](https://github.com/marisolgr/python_sat_tutorials/blob/main/environment.yml), or find it in the main folder of this tutorial if you cloned it from Github (instructions below). 13 | 14 | ### Install the necessary libraries and create a new 'environement' 15 | 16 | Start by opening an anaconda promt: 17 | - Windows: From your start button, look for Anaconda, within that folder open 'Anaconda Prompt'. You are done, skip to next section on installing libraries. 18 | - macOS: Open Launchpad, then open Terminal or iTerm. 19 | - Linux–CentOS: Open Applications - System Tools - Terminal. 20 | - Linux–Ubuntu: Open the Dash by clicking the upper left Ubuntu icon, then type “terminal”. 21 | 22 | In the anaconda window type `conda list`. If Anaconda is installed and working, this will display a list of installed packages and their versions. 23 | 24 | At your anaconda prompt, type `conda env create -f environment.yml`. You may need to include the directory the environment.yml file was downloaded to. 25 | 26 | Once finished, let's open a __JupyterLab__, as a test. At the anaconda prompt type `conda activate tutorialenv` then type `jupyter lab`. This will open a jupyter lab in your browser. From there you can open notebooks and run them. 27 | 28 | ## Github 29 | 30 | To download the content of a __GitGub__ repository into your local computer is referred to 'cloning'. To clone this (or any repository): 31 | 32 | - In a terminal type `git clone https://github.com/marisolgr/python_sat_tutorials` 33 | - Or you can download it by going to the main page [here](https://github.com/marisolgr/python_sat_tutorials) and clicking on the green button named `Code`, at the top right of the file list. Then, click on __Download ZIP__ from the drop menu. -------------------------------------------------------------------------------- /.ipynb_checkpoints/README-checkpoint.md: -------------------------------------------------------------------------------- 1 | # PythonSat Tutorials 2 | Tutorial to learn how to access and process Satellite Data using Python and JupyterLab in the Cloud 3 | 4 | _Modified from 'Python for Oceanographers' by: Chelle Gentemann and Marisol Garcia-Reyes, link: https://github.com/python4oceanography/ocean_python_tutorial_ 5 | 6 | ## Objective 7 | This tutorial aims to provide scientist who want to use satellite data with the necessary tools for obtaining, analyzing and visualizing these data using, and to so in the Cloud. This project, supported by the Better Scientific Software foundation and NASA, aims to increase accessibility of satellite data & cloud technologies to a broad scientific community through easy-to-follow Python tutorial. 8 | 9 | ## How it works 10 | __Note:__ This in __not__ a tutorial on Python per se - there are a myrriad of resources for that. The purpose of this tutorial is to learn, through __examples__, only the necessary Python code and tools required to work with satellite data. We want you to get your toes wet, get to see and use the power of Python, and then maybe you want to learn more. For that, we encourage you to visit the links on the __Resources__ section at the end of each chapter. 11 | 12 | *** 13 | This tutorials are developed to run on the Cloud and access satellite data on the Cloud as well (_For this first release, data used is local or available online, the second release will include cloud data access_). To launch the tutorials, click on the binder icon: 14 | 15 | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/marisolgr/python_sat_tutorials/HEAD) 16 | 17 | This tutorial is divided in Chapters that provide the necessary tools as building blocks. These chapters are stand along, so can be skipped if you are familiar with the particular tool presented. 18 | 19 | ## Chapters: 20 | 21 | 1. Introduction to Python for Earth Science: basic concepts about Python 22 | 23 | 2. Introduction to Jupyter Lab: How to use the web interface JupyterLab 24 | 25 | 3. Python Basics: Basic concepts and features of Python 26 | 27 | 4a. Python Tools: xarray, the library that makes satellite data analysis easy 28 | 29 | 4b. Plotting Tools: Python plotting libraries 30 | 31 | The tutorials can also be cloned from this repository, and run locally on your computer (you would need access to the cloud). To get instructions of how to install Python, JupyterLab, clone the tutorials from Github, and to access the data on the cloud, see [here](https://github.com/marisolgr/python_sat_tutorials/blob/main/Python_Installation.md) 32 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/environment-checkpoint.yml: -------------------------------------------------------------------------------- 1 | name: tutorialenv 2 | channels: 3 | - conda-forge 4 | - defaults 5 | dependencies: 6 | - cartopy=0.17.0 7 | - jupyterlab=2.2.8 8 | - matplotlib=3.2.2 9 | - numpy=1.18,5 10 | - pandas=1.0.5 11 | - python=3.7.6 12 | - xarray=0.15.1 13 | - netcdf4=1.5.3 -------------------------------------------------------------------------------- /Ch1_Python_for_EarthSciences.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Chapter 1 - Python for Earth Sciences\n", 8 | "![](./figures/globe_data.png)\n", 9 | "\n", 10 | "***\n", 11 | "__Note:__ This in __not__ a tutorial on `Python` - there are many resources for that. The purpose of this tutorial is to learn, through examples, the necessary `Python` code and tools required to work with satellite data. Please see the __Resources__ section in each chapter to learn more about `Python` and libraries used.\n", 12 | "\n", 13 | "***\n", 14 | "\n", 15 | "\n", 16 | "\n", 17 | "`Python` is a well developed, easy to learn programming language that provides many advantages for a wide range of applications, including Earth Sciences':\n", 18 | "\n", 19 | "- It is __Open Source__: it is free, everybody can use it, and everybody can contribute to it\n", 20 | "- It is used by an enormous community of developers\n", 21 | "- It is __Modular__: it has all the libraries *(collection of programs or functions for a specific purpose)* you could possibly need; you do not need to install them all\n", 22 | "- __This means many libraries have been developed by and for (Earth) scientists__\n", 23 | "\n", 24 | "***\n", 25 | "## Python Scientific Ecosystem\n", 26 | "There is a number of libraries and data structures in `Python` that make it ideal for Earth Sciences. Because of Python's modular structure, new and specific libraries are built on, and take advantage of, more basic but well developed ones. For example, the __xarray__ library is not only developed on top of `Python`, but also uses the __SciPy__, __pandas__ and __matplotlib__ libraries, which are built on top of __NumPy__.\n", 27 | "\n", 28 | "\"Python\n", 29 | "\n", 30 | "***\n", 31 | "## How to use these tutorials\n", 32 | "\n", 33 | "There are different ways to program in `Python`. In this tutorial we use the web interface `Jupyter Notebook` (__Chapter 2__ gives a quick overview on how to use it). In __Chapter 3__ we will learn basic and necessary `Python` commands and data structures, and in __Chapters 4a & 4b__ we will learn the tools (libraries) we need (and that make `Python` ideal) for satellite data analysis.\n", 34 | "\n", 35 | "We only cover, through examples, the necessary and basic knowledge you need to be able to navigate the applications chapters, where we use project-like examples to illustrate how to acquire, analyze and visualize satellite data. You'll learn the capabilities of `Python`, and can use these examples to build your own.\n", 36 | "\n", 37 | "## Where to get help\n", 38 | "Beyond the __Resources__ links provided, like with everything these days: when in doubt, __google it!__ You'll find many useful pages and videos. One of the best Q&A resources, and many times the first link in a google search, it is [Stackoverflow](https://stackoverflow.com/). This site is an always evolving community in which people ask and answer coding questions. \n" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "***\n", 46 | "## Resources\n", 47 | "The Official page: [https://www.python.org/](https://www.python.org/)\n", 48 | "\n", 49 | "Some basics about Python: https://www.tutorialspoint.com/python/index.htm\n", 50 | "\n", 51 | "Python tutorial resouces in Chapter 3\n" 52 | ] 53 | }, 54 | { 55 | "cell_type": "code", 56 | "execution_count": null, 57 | "metadata": {}, 58 | "outputs": [], 59 | "source": [] 60 | } 61 | ], 62 | "metadata": { 63 | "kernelspec": { 64 | "display_name": "Python 3 (ipykernel)", 65 | "language": "python", 66 | "name": "python3" 67 | }, 68 | "language_info": { 69 | "codemirror_mode": { 70 | "name": "ipython", 71 | "version": 3 72 | }, 73 | "file_extension": ".py", 74 | "mimetype": "text/x-python", 75 | "name": "python", 76 | "nbconvert_exporter": "python", 77 | "pygments_lexer": "ipython3", 78 | "version": "3.7.6" 79 | } 80 | }, 81 | "nbformat": 4, 82 | "nbformat_minor": 4 83 | } 84 | -------------------------------------------------------------------------------- /Ch2_Intro_JypiterNotebook.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Chapter 2 - Introduction to Jupyter Notebook\n", 8 | "In this tutorial we will use Jupyter Notebook as an interface for `Python`, therefore a quick intro is given. If you know are familiar with Jupyter Notebook or Jupyter Lab you can skip this tutorial. If you want to learn more, there are some useful __Resources__ at the bottom of this page.\n", 9 | "\n", 10 | "***\n", 11 | "\n", 12 | "## What is Jupyter Notebook?\n", 13 | "\n", 14 | "\n", 15 | "\n", 16 | "`Jupyter` is a project to develop open-source software, and `Jupyter` __Notebook__ is their first web-based and user-friendly interface between `Python` and the user. The new version is JupyterLab, but we will use __Notebook__ as the `mybinder` project uses it.\n", 17 | "\n", 18 | "`Jupyter` __Notebook__ is more than interactive coding - meaning you write and execute part of your code and receive immediate response. `Jupyter` __Notebook__ also allows you to add formatted text, equations and figures - this let you keep code, info, and results, including figures, neatly organized and visualize them on the same page. It also allows others to reproduce your results. \n", 19 | "\n", 20 | "\n", 21 | "\n", 22 | "***\n", 23 | "\n", 24 | "## Jupyter Notebook Elements\n", 25 | "\n", 26 | "__The dashboard:__ The main page you see in your browser when you open `Jupyter` __Notebook__, and it lists your files and directories. Clicking on a file will open it on a new tab. (__Note__ that `Jupyter` __Notebook__ `Python` files have the extension .ipynb)\n", 27 | "\n", 28 | "\n", 29 | "\n", 30 | "__Top menu__ (See below): Contains all commands you need to work on `Jupyter` __Notebook__ (we'll talk about the relevant ones below)\n", 31 | "\n", 32 | "__Icon menu:__ It is on top of your script. It provides quick access to commands.\n", 33 | "\n", 34 | "\n", 35 | "\n", 36 | "***\n", 37 | "\n", 38 | "## The Basics\n", 39 | "\n", 40 | "### Cells\n", 41 | "Scripts are divided into cells - code lines that are ran (executed) as a unit and will return the output or results from your code immediately. Cells are not independent of the rest: if you assigned or modified variables in a cell, you do have access to those in following ones.\n", 42 | "\n", 43 | "To run a cell you have few options:\n", 44 | "- Press __Shift-Return__ (run cell and move to the next one) or __Command-Return__ (run cell and stay in the current oone) \n", 45 | "- Click the Run __[>|]__ icon on the Icon menu\n", 46 | "- Go to the Top menu -> Cells -> Run Cells\n", 47 | "\n", 48 | "The selected cell is highlighted by a thick color side bar, and to edit the cell, you'll have to double click on it and a color outline will appear around the cell. \n", 49 | "\n", 50 | "\n", 51 | "### Types of cells: Code and Markdown\n", 52 | "__Markdown__ cells contain formatted text, like this cell, while __Code__ cells contain code to be executed and in many cases return an output. The text within a __Code__ cell shows different colors, depending on the function (operation, variable, etc).\n", 53 | "\n", 54 | "To change the type of cell:\n", 55 | "- On the __Icon menu__: in the middle (or the next to last element if running locally on your own computer) of this menu there is a drop-down menu that shows the type and allows you to change it\n", 56 | "- Press __Esc__ and then __m__ (to change to Markdown type) or __y__ (to change to Code type)\n", 57 | "\n", 58 | "Pressing __Esc__ allows you access to keyboard commands (you can see them all from the Top Menu -> Help -> Keyboard Shortcuts).\n", 59 | "\n", 60 | "### Copy-pasting, adding and deleting cells\n", 61 | "\n", 62 | "When you run the last cell using __Shift-Return__, a new cell is added automoatically bellow, but sometimes we need to add a cell in between cells, delete one, or make a copy of the current one. For this, you have two options:\n", 63 | "\n", 64 | "- Click on the __Icon menu__: __+__ for adding a cell, __scissors__ to cut, __( )__ to copy, and __clipboard__ to paste\n", 65 | "- Keyboard commands: press __Esc__ and then: __a__ or __b__ for inserting a cell (above or below), __d d__ to delete a cell, __c__ to copy, __x__ to cut, and __p__ to paste cell below. \n", 66 | "\n", 67 | "***\n", 68 | "__Test this:__ Double click in the next cell to edit mode and then run the cell using one of the options above. Also try changing between __Markdown__ and __Code__ modes, re-running the cell to see the differences." 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": null, 74 | "metadata": {}, 75 | "outputs": [], 76 | "source": [ 77 | "## __Test this:__\n", 78 | "### change the type of this cell between __Markdown__ and __Code__, and then run it to see the difference\n", 79 | "myvar = 5+6\n", 80 | "print(myvar)" 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "***\n", 88 | "## Housekeeping\n", 89 | "\n", 90 | "### Saving your work\n", 91 | "If you're working on a `binder` in the cloud, you might want to save any file you modify locally for further reference. To do so you could:\n", 92 | "\n", 93 | "- On the Icon Menu, somewhere in the middle, there is a Download button\n", 94 | "- In the Top Menu -> File -> Download as to make a local copy.\n", 95 | "\n", 96 | "### The Kernel\n", 97 | "\n", 98 | "Sometimes there is need to interrupt a running or frozen code. In this case you'll need to restart the __kernel__ (Top menu -> Kernel -> Restart) and then run all the cells again as it deletes all variables in memory.\n", 99 | "\n", 100 | "__Kernel__ is the process, individual for each notebook, that is running your code, interacting directly with `Python`.\n", 101 | "***" 102 | ] 103 | }, 104 | { 105 | "cell_type": "markdown", 106 | "metadata": {}, 107 | "source": [ 108 | "## Resources\n", 109 | "[Jupyter Notebook Documentation](https://jupyter-notebook.readthedocs.io/en/stable/)\n", 110 | "\n", 111 | "[The Official Jupyter and Jupyter Notebook/Lab page](https://jupyter.org/)\n", 112 | "\n", 113 | "_Two useful tutorials:_\n", 114 | "\n", 115 | "[One, on text mode, from Real Phyton](https://realpython.com/jupyter-notebook-introduction/)\n", 116 | "\n", 117 | "[Another on video mode](https://www.youtube.com/watch?v=HW29067qVWk)\n", 118 | "\n", 119 | "A cheat sheet is always a good reference: [Cheat Sheet](https://datacamp-community-prod.s3.amazonaws.com/48093c40-5303-45f4-bbf9-0c96c0133c40)" 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": null, 125 | "metadata": {}, 126 | "outputs": [], 127 | "source": [] 128 | } 129 | ], 130 | "metadata": { 131 | "kernelspec": { 132 | "display_name": "Python 3 (ipykernel)", 133 | "language": "python", 134 | "name": "python3" 135 | }, 136 | "language_info": { 137 | "codemirror_mode": { 138 | "name": "ipython", 139 | "version": 3 140 | }, 141 | "file_extension": ".py", 142 | "mimetype": "text/x-python", 143 | "name": "python", 144 | "nbconvert_exporter": "python", 145 | "pygments_lexer": "ipython3", 146 | "version": "3.7.6" 147 | } 148 | }, 149 | "nbformat": 4, 150 | "nbformat_minor": 4 151 | } 152 | -------------------------------------------------------------------------------- /Ch3_Python_Basics.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Chapter 3: Python Basics\n", 8 | "\n", 9 | "In the next Chapters, we will learn `Python` as we do `Python`. However, there are basic concepts and features we will need. This chapter is about those basic concepts. You could skip this chapter if you're familiar with `Python`.\n", 10 | "\n", 11 | "This, and following, chapters are fully interactive in the sense that you have to run the code to see any result. __Test this__ prompts will precede cells to run to test a particular concept, and instructions are given if needed. We provide only the necessary introduction to a concept in a __Markdown__ cell, but any further information and instructions are given as __comments__ within the cell to run." 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## Modules\n", 19 | "As mentioned, `Python` is __modular__: it has all the libraries *(collection of programs or functions for a specific purpose)* you could possibly need, but you do not need to install them all. However, libraries that you need should be installed and imported when coding.\n", 20 | "\n", 21 | "For this tutorial you won't need to install any library, unless you're running them on your local computer. In that case, see instructions [here](https://github.com/marisolgr/python_sat_tutorials/blob/main/Python_Installation.md).\n", 22 | "\n", 23 | "__Importing__ libraries into your code is done at the beginning of the code. You could give them a _nickname_ to simplify coding, and you could also import only one function from a particular library if you do not want to import the whole thing. A cell with library import commands needs to be run firrstt for the libraries to be loaded. __Test this__ running the next cell." 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": null, 29 | "metadata": {}, 30 | "outputs": [], 31 | "source": [ 32 | "# import modules/libraries\n", 33 | "import numpy as np # using a nickname\n", 34 | "import math # using the original library name\n", 35 | "\n", 36 | "var = [1,2,3,4,5,np.nan] # nan is a void value: Not a number\n", 37 | "\n", 38 | "# the function print accept different variables separated by a comma\n", 39 | "print('Mean value = ', np.mean(var)) \n", 40 | "print('Mean value, ignoring nan = ', np.nanmean(var)) \n", 41 | "\n", 42 | "# in this example, instead of passing 2 arguments to print, we concatenate two strings (+), needing to convert the numerical variable pi to string\n", 43 | "print('At some point, we will need the value of pi='+str(math.pi))" 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": {}, 49 | "source": [ 50 | "## Data Types, Collections & Structures\n", 51 | "\n", 52 | "Like other programming languages, `Python` basic data types include integers, floats, complex, strings, and boolean. Variables can be reassigned anytime. \n", 53 | "\n", 54 | "`Python` has some basic data collections. We will talk about three of them, mostly so you can recognize them when you encounter them:¶\n", 55 | "\n", 56 | "- __List__ - ordered, changeable, allows duplicates\n", 57 | "\n", 58 | "- __Tuple__ - ordered, unchangeable, allows duplicates\n", 59 | "\n", 60 | "- __Dictionary__ - unordered, changeable, no duplicates allowed\n", 61 | "\n", 62 | "All collections can contain any type of data, including other collections. \n", 63 | "\n", 64 | "__Test this:__ In the next cells, we will define variables of each type of collection, modify them and access their elements. Try them by following the editing instructions and rerunning the cell. \n" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": {}, 70 | "source": [ 71 | "## Lists\n", 72 | "\n", 73 | "The simplest and most used type." 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": null, 79 | "metadata": {}, 80 | "outputs": [], 81 | "source": [ 82 | "mylist=['temperature', 'wind', 'salinity'] # note the use of [ ]\n", 83 | "print(mylist, '\\n') # add extra line with '\\n'\n", 84 | "\n", 85 | "# accessing elements \n", 86 | "print(mylist[0], '\\n') # note index starts at zero\n", 87 | "\n", 88 | "# change an element using an index\n", 89 | "mylist[1] = 'wind speed'\n", 90 | "\n", 91 | "# add an element\n", 92 | "mylist.append('current')\n", 93 | "print(mylist)" 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "metadata": {}, 99 | "source": [ 100 | "## Tuples\n", 101 | "\n", 102 | "We are not going to use this collection type, but some functions return variables of this type, and trying to modify those are a common source of error. It is good to be able to recognize them." 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": null, 108 | "metadata": {}, 109 | "outputs": [], 110 | "source": [ 111 | "mytuple = ('latitude', 'longitude', 'time') # note the use of ( )\n", 112 | "print(mytuple, '\\n')\n", 113 | "\n", 114 | "# accessing elements \n", 115 | "print(mytuple[0], '\\n') # note that to access we also use []\n", 116 | "\n", 117 | "# try changing an element using an index ...\n", 118 | "mytuple[3] = 'depth'" 119 | ] 120 | }, 121 | { 122 | "cell_type": "markdown", 123 | "metadata": {}, 124 | "source": [ 125 | "Trying to change a __tuple__ will generate a very explicit error. They can be scary, due to the length of the output, but it is very useful: It will tell you exactly what the error is (bottom line) and in which line it occurred (right arrow on the left). \n", 126 | "\n", 127 | "__Try this:__ Add line numbers to your code by clicking the top menu -> View -> Show Line Numbers tab. This facilate life when an error occur." 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "## Dictionaries\n", 135 | "\n", 136 | "Dictionaries are indexed pairs of keys and values, and we are going to use them in the next chapters. " 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": null, 142 | "metadata": {}, 143 | "outputs": [], 144 | "source": [ 145 | "mydict = {'instrument': 'temperature sensor', 'measurement':'SST','depth': 5} # note the use of {} and :\n", 146 | "print(mydict, '\\n')\n", 147 | "\n", 148 | "# access an element\n", 149 | "print(mydict['instrument'], '\\n') # nothe that we also use []\n", 150 | "\n", 151 | "# add an element \n", 152 | "mydict['units'] = 'C' # note the use of [], as it is accessing a (new) element\n", 153 | "print(mydict)" 154 | ] 155 | }, 156 | { 157 | "cell_type": "markdown", 158 | "metadata": {}, 159 | "source": [ 160 | "## Data Structures\n", 161 | "\n", 162 | "Although `Python` was not originally designed for scientific data, its community base and free nature allows for the development of libraries that handle scientific data and operations nicely & efficiently, as we will see in the next chapters. Part of the great developments in `Python` are the data structures that can efficiently handle:\n", 163 | "\n", 164 | "- large amounts of data\n", 165 | "\n", 166 | "- multiple dimensions\n", 167 | "\n", 168 | "- mathematical and statistical operations over parts or the whole data set\n", 169 | "\n", 170 | "- metadata and data attributes\n", 171 | "\n", 172 | "These structures are defined in numerical/scientific oriented libraries (that need to be loaded in our code): numpy, pandas & xarray. We will use them in the next chapters (describing __xarray__ more in detail in Chapter 4a), and important features will be noted in the comments. Descriptions, documentation and tutorials for these libraries can be found in the __Resources__ section below.\n", 173 | "\n", 174 | "Here we just describe their data structures, and even better, illustrate them in the figure below.\n", 175 | "\n", 176 | "- __numpy arrays__: Multi-dimensional numerical arrays. \n", 177 | "\n", 178 | "- __pandas DataFrames__: Data Frames which resemble tables in excel - two dimensional (tabular) arrays that take different types of data, with labeled rows and columns.\n", 179 | "\n", 180 | "- __xarray DataArrays & Datasets__: Datasets that contain one or multiple data arrays (one per variable) that can be indexed by labels or numbers, referred to as coordinates. They follow the `netcdf` file format, so they can accommodate metadata and multidimensional grids or time series.\n", 181 | "\n", 182 | "\n" 183 | ] 184 | }, 185 | { 186 | "cell_type": "markdown", 187 | "metadata": {}, 188 | "source": [ 189 | "***\n", 190 | "## Basic Python Syntaxis\n", 191 | "\n", 192 | "### For Loops\n", 193 | "\n", 194 | "`Python` is a positional language. This means that the position of the first character in a line has a meaning. This is illustrated in the next cell, where the operations to execute within a __for loop__ are positioned to 4 spaces to the right. __Try this__ by running the next cells. More details are provided in the comments." 195 | ] 196 | }, 197 | { 198 | "cell_type": "code", 199 | "execution_count": null, 200 | "metadata": {}, 201 | "outputs": [], 202 | "source": [ 203 | "somelist = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i','j']\n", 204 | "for item in somelist: # for loops take any list as arguments, and many objects can be 'read' as list using the function enumerate()\n", 205 | " print(item) \n", 206 | "print('\\nEnd of loop\\n',somelist) # note the position of this line. the lack of spaces at the beginning indicates we are already outside the loop\n" 207 | ] 208 | }, 209 | { 210 | "cell_type": "code", 211 | "execution_count": null, 212 | "metadata": {}, 213 | "outputs": [], 214 | "source": [ 215 | "# to iterate over the indices of a list we will use the function range\n", 216 | "for index in range(0,10,1): # by default this function makes a numerical list starting in zero \n", 217 | " # try changing the range to range(10) instead, and also the step from 1 to 2\n", 218 | " print(index, somelist[index])\n", 219 | "\n", 220 | "print('\\n')\n", 221 | "\n", 222 | "# this is a more complicated way, but sometimes necessary to obtain the indices of a list\n", 223 | "for index, value in enumerate(somelist): \n", 224 | " print(index, value, somelist[index])" 225 | ] 226 | }, 227 | { 228 | "cell_type": "markdown", 229 | "metadata": {}, 230 | "source": [ 231 | "### Conditional Statements\n", 232 | "Conditional statements are similar to other programming languages but pay attention to:\n", 233 | "- the `:` after the conditions\n", 234 | "- the `elif` for another condition\n", 235 | "- for multiple conditions, each one has to be enclosed by `( )`\n", 236 | "\n", 237 | "__Try this__ by running the next cell. Then change the assigned value of _lat_ to obtain different results, and try again." 238 | ] 239 | }, 240 | { 241 | "cell_type": "code", 242 | "execution_count": null, 243 | "metadata": {}, 244 | "outputs": [], 245 | "source": [ 246 | "lat = 12\n", 247 | "if (lat <= -23.5) or (lat >= 23.5):\n", 248 | " print('extra-tropical')\n", 249 | "elif lat == 0:\n", 250 | " print('equator')\n", 251 | "else:\n", 252 | " print('tropical')" 253 | ] 254 | }, 255 | { 256 | "cell_type": "markdown", 257 | "metadata": {}, 258 | "source": [ 259 | "One cool feature in `Python` is its __comprehensive lists__. This is a simplified line code that combines (usually) a __for loop__ with a __conditional expression__. They simplify code, but as a beginner sometimes it is difficult to construct or interpret them. We present them here because they are useful, and when asking for help online, many times they are proposed as the solution so it is important to be able to understand them.\n", 260 | "\n", 261 | "__Try this:__ In the next cell you'll see two ways to do the same piece of code, the second as a comprehensive list." 262 | ] 263 | }, 264 | { 265 | "cell_type": "code", 266 | "execution_count": null, 267 | "metadata": {}, 268 | "outputs": [], 269 | "source": [ 270 | "# explicit for loop and conditional\n", 271 | "for lat in range(-90,90,10):\n", 272 | " if (lat <= -23.5) or (lat >= 23.5):\n", 273 | " print('extra-tropical')\n", 274 | " else:\n", 275 | " print('tropical')\n", 276 | " \n", 277 | "# comprehensive list\n", 278 | "result=['tropical' if np.abs(lat)<=23.5 else ('extra-tropical') for lat in range(-90,90,10)]\n", 279 | "\n", 280 | "result # note that if the last line is a variable by itself, it prints it out" 281 | ] 282 | }, 283 | { 284 | "cell_type": "markdown", 285 | "metadata": {}, 286 | "source": [ 287 | "### Functions\n", 288 | "Functions are pieces of code that are needed several times within a program, and therefore it is easier to isolate and call them every time as needed, instead of spelling out the code every time. They make the code cleaner and less prone to errors. __Run the next cell__ to see an example of a function definition and use." 289 | ] 290 | }, 291 | { 292 | "cell_type": "code", 293 | "execution_count": null, 294 | "metadata": {}, 295 | "outputs": [], 296 | "source": [ 297 | "def my_func(arg1): # note the :\n", 298 | " cel = (arg1 - 32) * 5/9\n", 299 | " print (arg1,'Farenheit = ', np.round(cel,1) , ' Celsius')\n", 300 | " \n", 301 | "for far in range(0,101,10): # note the range ends at 100, not at 101. the upper limit is an open limit.\n", 302 | " my_func(far)" 303 | ] 304 | }, 305 | { 306 | "cell_type": "markdown", 307 | "metadata": {}, 308 | "source": [ 309 | "***\n", 310 | "## Objects, Attributes & Methods\n", 311 | "\n", 312 | "`Python` is an object-oriented programming language. This means almost everything is an object or instance of a class. Variables are objects, and therefore they have attributes & methods intrinsic to the class they belong to.\n", 313 | "\n", 314 | "- __Properties or Attributes__ of an object (variable) are accessed with __.attribute__ after the object name.\n", 315 | "\n", 316 | "- __Methods__ are functions and are accessed with __.method(arguments)__ after the object name.\n", 317 | "\n", 318 | "__Try this:__ In the next cell we demonstrate the use of an attribute and a method of a particular type of a very unique class: __date__, which is an incredibly useful library in `Python`." 319 | ] 320 | }, 321 | { 322 | "cell_type": "code", 323 | "execution_count": null, 324 | "metadata": {}, 325 | "outputs": [], 326 | "source": [ 327 | "from datetime import date # import the date function from the datetime package\n", 328 | "\n", 329 | "today = date.today() # create an instance (object or variable) of the class date\n", 330 | "print(today, '\\n')\n", 331 | "\n", 332 | "## date object attributes are access with a . but not ()\n", 333 | "print('year = ',today.year, '\\nmonth = ', today.month, '\\nday = ', today.day, '\\n') \n", 334 | "\n", 335 | "## date object methods are accessed with a . and (), even without arguments. \n", 336 | "print(today.ctime()) # access the method date in string format" 337 | ] 338 | }, 339 | { 340 | "cell_type": "markdown", 341 | "metadata": {}, 342 | "source": [ 343 | "***\n", 344 | "## Resources\n", 345 | "\n", 346 | "### Python tutorials\n", 347 | "\n", 348 | "[The Official Python Site](https://docs.python.org/3/tutorial/)\n", 349 | "\n", 350 | "A great site, a simple and good tutorial, that also serves as reference: [w3schools](https://www.w3schools.com/python/)\n", 351 | "\n", 352 | "__Python for scientists:__\n", 353 | "\n", 354 | "Basic and easy [tutorial](https://scipy-lectures.org/intro/) to use as reference.\n", 355 | "\n", 356 | "Another [tutorial](http://earthpy.org/category/introduction-to-python.html). Oldish, but nice and simple.\n", 357 | "\n", 358 | "A more advanced and complete [tutorial](https://astrofrog.github.io/py4sci/)\n", 359 | "\n", 360 | "### More on:\n", 361 | "\n", 362 | "Data types and collections: [https://www.geeksforgeeks.org/python-data-types/](https://www.geeksforgeeks.org/python-data-types/)\n", 363 | "\n", 364 | "Functions - w3schools.com is a good reference with try your self examples: [https://www.w3schools.com/python/python_functions.asp](https://www.w3schools.com/python/python_functions.asp)\n", 365 | "\n", 366 | "_Comprehensive lists_\n", 367 | "\n", 368 | "- Why and how use them: [https://realpython.com/list-comprehension-python/](https://realpython.com/list-comprehension-python)\n", 369 | "- A simple and step-by-step tutorial: [https://www.w3schools.com/python/python_lists_comprehension.asp](https://www.w3schools.com/python/python_lists_comprehension.asp)\n", 370 | "\n", 371 | "Date functions - One of the best functionalities of Python is the manage of time. Here is a good page to learn about these functions: [https://www.guru99.com/date-time-and-datetime-classes-in-python.html](https://www.guru99.com/date-time-and-datetime-classes-in-python.html)\n", 372 | "\n", 373 | "### __More on the libraries:__\n", 374 | "\n", 375 | "__numpy__\n", 376 | "\n", 377 | "[The official site](https://numpy.org/)\n", 378 | "\n", 379 | "A simple and to the point numpy [tutorial](https://numpy.org/doc/stable/user/absolute_beginners.html)\n", 380 | "\n", 381 | "__pandas__\n", 382 | "\n", 383 | "[The official site](https://pandas.pydata.org/)\n", 384 | "\n", 385 | "A thorough [tutorial](https://bitbucket.org/hrojas/learn-pandas/src/master/)\n", 386 | "\n", 387 | "Super useful to have around [Cheat Sheet](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf)\n", 388 | "\n", 389 | "__xarray__\n", 390 | "\n", 391 | "[The official site](http://xarray.pydata.org/en/stable/)\n", 392 | "\n", 393 | "Intro [video](https://www.youtube.com/watch?v=X0pAhJgySxk) to what xarray is" 394 | ] 395 | }, 396 | { 397 | "cell_type": "code", 398 | "execution_count": null, 399 | "metadata": {}, 400 | "outputs": [], 401 | "source": [] 402 | } 403 | ], 404 | "metadata": { 405 | "kernelspec": { 406 | "display_name": "Python 3 (ipykernel)", 407 | "language": "python", 408 | "name": "python3" 409 | }, 410 | "language_info": { 411 | "codemirror_mode": { 412 | "name": "ipython", 413 | "version": 3 414 | }, 415 | "file_extension": ".py", 416 | "mimetype": "text/x-python", 417 | "name": "python", 418 | "nbconvert_exporter": "python", 419 | "pygments_lexer": "ipython3", 420 | "version": "3.7.6" 421 | } 422 | }, 423 | "nbformat": 4, 424 | "nbformat_minor": 4 425 | } 426 | -------------------------------------------------------------------------------- /Ch4a_Python_Tools.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Chapter 4a - Python Tools: xarray\n", 8 | "\n", 9 | "Chapter 4, divided into two parts, will cover two libraries that are essential to satellite data analysis and visualization: __xarray__ and __matplotlib__. In Chapter 4a we will cover the basics of __xarray__ with examples, and in Chapter 4b we will make customized visualizations of data using __matplotlib__.\n", 10 | "\n", 11 | "Although we show complete examples here, we invite you to edit and rerun them to better grasp their functionality.\n", 12 | " \n", 13 | "***\n", 14 | "\n", 15 | "\n", 16 | "## xarray \n", 17 | " \n", 18 | "__xarray__ is an open source `Python` library designed to handle (read, write, analyze, visualize, etc.) sets of labeled multi-dimensional arrays and metadata common in _(Earth)_ sciences. Its data structure, the __Dataset__, is built to reflect a netcdf file. __xarray__ was built on top of the __pandas__ library, which processes labeled tabular data, inheriting several of its methods and functionality.\n", 19 | "\n", 20 | "For this reason, when importing __xarray__, we will also import __numpy__ and __pandas__, so we can use all their methods. \n", 21 | "\n", 22 | "__Test this:__ Run the next cell to import these libraries. We are importing them using their conventional nickname - although feel free to choose yours. Note that when you run an importing cell, no output is displayed other than a number betwen [ ] on the left side of the cell.\n" 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": null, 28 | "metadata": {}, 29 | "outputs": [], 30 | "source": [ 31 | "import numpy as np\n", 32 | "import pandas as pd\n", 33 | "import xarray as xr\n", 34 | "\n", 35 | "# this library helps to make your code execution less messy\n", 36 | "import warnings\n", 37 | "warnings.simplefilter('ignore') # filter some warning messages" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "### Reading and exploring Data Sets\n", 45 | " \n", 46 | "__Run the next cell:__ Let's start by reading and exploring the content of a `netcdf` file located locally. __It is so easy!__\n", 47 | "\n", 48 | "Once the content is displayed, you can click on the file and disk icons on the right to get more details on each parameter.\n", 49 | "\n", 50 | "Also note that the __data array__ or __variable__ _(SST)_ has 3 __dimensions__ _(latitude, longitude and time)_ , and that each dimension has a data variable (__coordinate__) associated with it. Each variable as well as the file as a whole has metadata denominated __attributes__." 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": null, 56 | "metadata": {}, 57 | "outputs": [], 58 | "source": [ 59 | "ds = xr.open_dataset('./data/HadISST_sst_2000-2020.nc') # read a local netcdf file\n", 60 | "ds.close() # close the file, so can be used by you or others. it is good practice.\n", 61 | "ds # display the content of the dataset object" 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": {}, 67 | "source": [ 68 | "__xarray__ can also read data online. We are going to learn how read data from the cloud in the application chapters, but for now, we will exemplify __xarray__ and `Python` capability of reading from an online file. __Run the next cell__ to do so." 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": null, 74 | "metadata": {}, 75 | "outputs": [], 76 | "source": [ 77 | "# assign a string variable with the url address of the datafile\n", 78 | "url = 'https://podaac-opendap.jpl.nasa.gov/opendap/allData/ghrsst/data/GDS2/L4/GLOB/CMC/CMC0.2deg/v2/2011/305/20111101120000-CMC-L4_GHRSST-SSTfnd-CMC0.2deg-GLOB-v02.0-fv02.0.nc'\n", 79 | "ds_sst = xr.open_dataset(url) # reads same way as local files!\n", 80 | "ds_sst" 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "### Visualizing data\n", 88 | " \n", 89 | "An image is worth a thousand _attributes_ ! Sometimes what we need is a quick visualization of our data, and __xrray__ is there to help. In __the next cells__, visualization for both opened datasets are shown. " 90 | ] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "execution_count": null, 95 | "metadata": {}, 96 | "outputs": [], 97 | "source": [ 98 | "ds_sst.analysed_sst.plot() # note that we needed to choose one of the variable in the Dataset to be displayed\n" 99 | ] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "execution_count": null, 104 | "metadata": {}, 105 | "outputs": [], 106 | "source": [ 107 | "ds.sst[0,:,:].plot() # we choose a time to visualize the spatial data (lat, lon) at that time (zero or the first time entry)\n" 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "#### Yes! it is that easy! \n", 115 | "Although we'll get more sophisticated in the Chapter 4b." 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "### Some basic methods of Dataset\n", 123 | " \n", 124 | "__xarray__ also lets you operate over the dataset in a simple way. Many operations are built as methods of the Dataset class that can be accessed by adding a `.` after the Dataset name. __Test this:__ In the next cell, we access the _averaging_ method to make a time series of sea surface temperature over the entire globe and display it. __All in one line!__" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": null, 130 | "metadata": {}, 131 | "outputs": [], 132 | "source": [ 133 | "ds.sst.mean(dim=['latitude','longitude']).plot() # select a variable and average it\n", 134 | "# over spatial dimensions, and plot the final result\n" 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": {}, 140 | "source": [ 141 | "### Selecting data\n", 142 | "\n", 143 | "Sometimes we want to visualize or operate only on a portion of the data. __In the next cell__ we demonstrate the method `.sel`, which selects data along dimensions, in this case specified as a range of the coordinates using the function _slice_." 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": null, 149 | "metadata": {}, 150 | "outputs": [], 151 | "source": [ 152 | "ds.sst.sel(time=slice('2012-01-01','2013-12-31')).mean(dim=['time']).plot() # select a period of time" 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": null, 158 | "metadata": {}, 159 | "outputs": [], 160 | "source": [ 161 | "ds.sst.sel(latitude=slice(50,-50)).mean(dim=['time']).plot() # select a range of latitudes. \n", 162 | "# note that we need to go from 50 to -50 as the laitude coordinate data goes from 90 to -90" 163 | ] 164 | }, 165 | { 166 | "cell_type": "markdown", 167 | "metadata": {}, 168 | "source": [ 169 | "Another useful way to select data is the method __.where__, which instead of selecting by a coordinate, selects using a condition over the data or the coordinates. __Test this:__ In the next cell we extract the _ocean mask_ contained in the NASA surface temperature dataset." 170 | ] 171 | }, 172 | { 173 | "cell_type": "code", 174 | "execution_count": null, 175 | "metadata": {}, 176 | "outputs": [], 177 | "source": [ 178 | "ds_sst.analysed_sst.where(ds_sst.mask==1).plot() # we select, using .where, the data in the variable 'mask' that is equal to 1, \n", 179 | "# applied it to the variable 'analysed_sst', and plot the data. \n", 180 | "# Try changing the value for mask - for example 2 is land, 8 is ice." 181 | ] 182 | }, 183 | { 184 | "cell_type": "markdown", 185 | "metadata": {}, 186 | "source": [ 187 | "### Operating between two Data Arrays\n", 188 | " \n", 189 | "__In the next__ example we compare two years of temperature. We operate over the same Data Array, but we averaging over 2015 in the first line, and over 2012 in the second line. Each `.sel` operation returns a new Data Array. We can subtract them by using simple `-`, since they have the same dimensions and coordinates. At the end, we just plot the result. __It is that simple!__" 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": null, 195 | "metadata": {}, 196 | "outputs": [], 197 | "source": [ 198 | "# comparing 2015 and 2012 sea surface temperatures\n", 199 | "(ds.sst.sel(time=slice('2015-01-01','2015-12-31')).mean(dim=['time'])\n", 200 | "-ds.sst.sel(time=slice('2012-01-01','2012-12-31')).mean(dim=['time'])).plot() # note that in this case i could split the line in two\n", 201 | "# makes it easier to read" 202 | ] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "metadata": {}, 207 | "source": [ 208 | "We will cover more examples of methods and operations over datasets in the following chapters. But if you want to learn more, and we recommend it, given the many awesome capabilities of xarray, please look at the __Resources__ section below. \n", 209 | "\n", 210 | "***\n", 211 | "\n", 212 | "### Saving your Datasets and DataArrays\n", 213 | "There is one more thing you should learn here. In the applications chapters we go from obtaining the data to analyzing and producing a visualization. But sometimes, we want to save the data we acquire to process later, in a different script, or in the same but not have to download it every time. \n", 214 | "\n", 215 | "__The next cell__ shows you how to do so in two simple steps:\n", 216 | "\n", 217 | "- Assign the outcome of an operation to a variable, which will be a new dataset or data array object\n", 218 | "- Save it to a new `netcdf` file" 219 | ] 220 | }, 221 | { 222 | "cell_type": "code", 223 | "execution_count": null, 224 | "metadata": {}, 225 | "outputs": [], 226 | "source": [ 227 | "# same operation as before, minus the plotting method\n", 228 | "my_ds = (ds.sst.sel(time=slice('2015-01-01','2015-12-31')).mean(dim=['time'])-ds.sst.sel(time=slice('2012-01-01','2012-12-31')).mean(dim=['time']))\n", 229 | "# save the new dataset `my_ds` to a file in the directory data\n", 230 | "my_ds.to_netcdf('./data/Global_SST_2015-2012.nc')\n", 231 | "# explore the content of `my_ds`. note that the time dimension does not exist anymore\n", 232 | "my_ds" 233 | ] 234 | }, 235 | { 236 | "cell_type": "markdown", 237 | "metadata": {}, 238 | "source": [ 239 | "*** \n", 240 | "\n", 241 | "## Resources\n", 242 | "\n", 243 | "[The __xarray__ official site](http://xarray.pydata.org/en/stable/).\n", 244 | "\n", 245 | "Great [introduction](https://www.youtube.com/watch?v=Dgr_d8iEWk4&t=908s) to __xarray__ capabilities.\n", 246 | "\n", 247 | "If you really want to dig deep watch this [video](https://www.youtube.com/watch?v=ww4EYv20Ucw).\n", 248 | "\n", 249 | "A step-by-step [guide](https://rabernat.github.io/research_computing_2018/xarray.html) to __xarray__ handling of netcdf files, and many of the methods seeing here, like `.sel` and `.where`.\n", 250 | "\n", 251 | "### More on:\n", 252 | "\n", 253 | "Sometimes, the best way to learn how to do something is go directly to the reference page for a function or method. There you can see what arguments, types of data, and outputs to expect. Most of the time, they have useful examples:\n", 254 | "\n", 255 | "- Method [__.where( )__](http://xarray.pydata.org/en/stable/generated/xarray.DataArray.where.html)\n", 256 | "\n", 257 | "- Method [__.sel( )__](http://xarray.pydata.org/en/stable/generated/xarray.DataArray.sel.html)\n", 258 | "\n", 259 | "- Method [__.mean( )__](http://xarray.pydata.org/en/stable/generated/xarray.DataArray.mean.html)\n" 260 | ] 261 | }, 262 | { 263 | "cell_type": "code", 264 | "execution_count": null, 265 | "metadata": {}, 266 | "outputs": [], 267 | "source": [] 268 | } 269 | ], 270 | "metadata": { 271 | "kernelspec": { 272 | "display_name": "Python 3 (ipykernel)", 273 | "language": "python", 274 | "name": "python3" 275 | }, 276 | "language_info": { 277 | "codemirror_mode": { 278 | "name": "ipython", 279 | "version": 3 280 | }, 281 | "file_extension": ".py", 282 | "mimetype": "text/x-python", 283 | "name": "python", 284 | "nbconvert_exporter": "python", 285 | "pygments_lexer": "ipython3", 286 | "version": "3.7.6" 287 | } 288 | }, 289 | "nbformat": 4, 290 | "nbformat_minor": 4 291 | } 292 | -------------------------------------------------------------------------------- /Ch4b_Plotting_Tools.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Chapter 4b - Plotting Tools\n", 8 | "\n", 9 | "In this chapter we will learn to visualize data beyond a quick plot as in the previous chapter. We will present two examples (a time series plot and a map) using the libraries __matplotlib__ and __cartopy__. \n", 10 | "\n", 11 | "***\n", 12 | "\n", 13 | "Let's start by importing the pertinent libraries." 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": null, 19 | "metadata": {}, 20 | "outputs": [], 21 | "source": [ 22 | "# basic libraries\n", 23 | "import numpy as np\n", 24 | "import pandas as pd\n", 25 | "import xarray as xr\n", 26 | "\n", 27 | "# necesary libraries for plotting\n", 28 | "import matplotlib.pyplot as plt # note that in both cases we import one object within the library\n", 29 | "import cartopy.crs as ccrs\n", 30 | "\n", 31 | "import warnings\n", 32 | "warnings.simplefilter('ignore') # filter some warning messages" 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": {}, 38 | "source": [ 39 | "## Plot SST anomaly timeseries\n", 40 | "\n", 41 | "We will use the same data from the previous chapter to calculate and plot global sea surface temperature anomalies from the Hadley dataset. And we will also calculate the climatology and anomalies of monthly data to show a slightly more complicated plot, illustrating some of the __xarray__ methods. __Run the next cell.__" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": null, 47 | "metadata": {}, 48 | "outputs": [], 49 | "source": [ 50 | "# open the dataset\n", 51 | "ds = xr.open_dataset('./data/HadISST_sst_2000-2020.nc') # read the netcdf file\n", 52 | "ds.close() \n", 53 | "\n", 54 | "# select north and southern hemispheres, and average spatially to obtain a time series\n", 55 | "nh_sst = ds.sst.sel(latitude=slice(90,0)).mean(dim=['latitude','longitude'])\n", 56 | "sh_sst = ds.sst.sel(latitude=slice(0,-90)).mean(dim=['latitude','longitude'])\n", 57 | "\n", 58 | "# calculate climatology\n", 59 | "nh_clim = nh_sst.groupby('time.month').mean('time') # application of two methods:\n", 60 | "# first groupby, and then the operation to perform over the group\n", 61 | "\n", 62 | "# calculate and explore the anomalies\n", 63 | "nh_ssta = nh_sst.groupby('time.month') - nh_clim # groupby 'aligns' the data with the climatology, \n", 64 | " # but only substract the appropiate climatology data point\n", 65 | "nh_ssta # the new dataarray (one variable) has a new coordinate, but not dimension" 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "metadata": {}, 71 | "source": [ 72 | "### The actual plotting of the data\n", 73 | " \n", 74 | "Making a simple plot using __matplotlib__ might seem like too much code, since there are many parameters to customize. However, it comes in handy for more detailed plots. __In the next cell__ we introduce some of the basic methods in a plot of the hemispheric averages calculated:\n", 75 | "- Defining a figure and its size\n", 76 | "- The function __plot__\n", 77 | "- How to add labels and legend\n", 78 | "- And how to display and 'finalize' a plot" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": null, 84 | "metadata": {}, 85 | "outputs": [], 86 | "source": [ 87 | "plt.figure(figsize=(10,4))\n", 88 | "plt.plot(nh_sst.time, nh_sst, '.-',label='NH') # the basic method plot() is used for line plots.\n", 89 | "plt.plot(sh_sst.time, sh_sst, '+-', c='tab:orange', label='SH')\n", 90 | "plt.grid(True)\n", 91 | "plt.legend(loc=0)\n", 92 | "plt.ylabel('SST (C)', fontsize=14)\n", 93 | "plt.title('Monthly Hemisphheric SST', fontsize=16)\n", 94 | "plt.show() # necesary line to finalize and properly display a figure" 95 | ] 96 | }, 97 | { 98 | "cell_type": "markdown", 99 | "metadata": {}, 100 | "source": [ 101 | "__In the next cell__ we plot the anomalies calculated, separating with color the positive and negative values. This is a more complicated plot that requires operating over the data first (using the method `.where`), but the plotting part is straight forward." 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": null, 107 | "metadata": {}, 108 | "outputs": [], 109 | "source": [ 110 | "plt.figure(figsize=(12,4))\n", 111 | "pos = nh_ssta.where(nh_ssta>=0) # select only positive values \n", 112 | "neg = nh_ssta.where(nh_ssta<0) # select only negative values\n", 113 | "dates = nh_ssta.time.dt.year + (nh_ssta.time.dt.month-1)/12 # make a list of time steps using year and month\n", 114 | "plt.bar(dates, pos.values, width=1/13, color='tab:red', edgecolor=None) # plot positive values\n", 115 | "plt.bar(dates, neg.values, width=1/13, color='tab:blue') # plot negative values\n", 116 | "plt.axhline(color='grey') # plot a grey horizontal line at y=0\n", 117 | "plt.grid(True, zorder=0)\n", 118 | "plt.ylabel('SST anomalies (C)')\n", 119 | "plt.title('Northern Hemisphere SST anomalies')\n", 120 | "plt.xticks([*range(2000,2021,1)], rotation=40)\n", 121 | "plt.autoscale(enable=True, axis='x', tight=True)\n", 122 | "plt.show()\n" 123 | ] 124 | }, 125 | { 126 | "cell_type": "markdown", 127 | "metadata": {}, 128 | "source": [ 129 | "***\n", 130 | "## Map plotting\n", 131 | "\n", 132 | "Now we will customize our maps. While the quick plot method from __xarray__ is all we need in many cases, sometimes we require a more customized or nicer image for a presentation or publication. It might seem like complicated code, but really there are many elements that could be left to the default values, and we wanted to show how to customize some of them if you need them.\n", 133 | "\n", 134 | "For global plots, the extent and the coordinate labels are sometimes not necessary to define, but we choose a regional plot for the next example to show how to customize these parameters. \n", 135 | "\n", 136 | "__Note__ that in the next to last line, we will also save our figure. We still use the _.show( )_ method in last line to display it." 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": null, 142 | "metadata": {}, 143 | "outputs": [], 144 | "source": [ 145 | "# import functions to label coordinates and add color to the land mass\n", 146 | "from cartopy.mpl.ticker import LongitudeFormatter, LatitudeFormatter\n", 147 | "import cartopy.feature as cfeature\n", 148 | "import calendar # quick access to names and numbers related to dates\n", 149 | "\n", 150 | "# select a region of our data\n", 151 | "region = np.array([[30,-40],[25,120]]) # numpy array that specifies the lat/lon boundaries of our selected region\n", 152 | "io_sst = ds.sst.sel(latitude=slice(region[0,0],region[0,1]),longitude=slice(region[1,0],region[1,1])) # select region\n", 153 | "\n", 154 | "for mon in [1,7]: # select two months of data to plot: month 1 and month 7\n", 155 | " moname = calendar.month_name[mon] # get the name of the month\n", 156 | " tmp = io_sst.sel(time=ds.time.dt.month==mon).mean('time') # select only one month at a time in a temporal object\n", 157 | "\n", 158 | " # create and set the figure context\n", 159 | " fig = plt.figure(figsize=(8,5)) # create a figure object, and assign it a variable name fig\n", 160 | " ax = plt.axes(projection=ccrs.PlateCarree()) # projection type - this one is easy to use\n", 161 | " ax.coastlines(resolution='50m',linewidth=2,color='black') \n", 162 | " ax.add_feature(cfeature.LAND, color='black')\n", 163 | " ax.set_extent([region[1,0],region[1,1],region[0,0],region[0,1]],crs=ccrs.PlateCarree()) \n", 164 | " ax.set_xticks([*range(region[1,0],region[1,1]+1,20)], crs=ccrs.PlateCarree()) # customize ticks and labels to longitude\n", 165 | " ax.set_yticks([*range(region[0,1],region[0,0]+1,10)], crs=ccrs.PlateCarree()) # customize ticks and labels to latitude\n", 166 | " ax.xaxis.set_major_formatter(LongitudeFormatter(zero_direction_label=True))\n", 167 | " ax.yaxis.set_major_formatter(LatitudeFormatter())\n", 168 | " plt.grid(True, alpha=0.5) # add a grid. the alpha argument specify the level of transparency of a plot figure\n", 169 | "\n", 170 | " # the core: the data to plot\n", 171 | " plt.contourf(tmp.longitude,tmp.latitude, tmp,15, cmap='RdYlBu_r') # contourf (filled contour plot) takes the 1D lat and lon coordinates for the 2D data. cmap specify the colormap to use.\n", 172 | " cbar=plt.colorbar()\n", 173 | " cbar.set_label('SST (C)') # color bar label\n", 174 | " plt.title(moname+' SST (2000-2020)')\n", 175 | " fig.savefig('./figures/map_base_'+moname+'.png') # save your figure by usinig the method .savefig. python recognized the format from the filename extension. \n", 176 | " plt.show()" 177 | ] 178 | }, 179 | { 180 | "cell_type": "markdown", 181 | "metadata": {}, 182 | "source": [ 183 | "***\n", 184 | "### And that's it. Now you're ready to go over the application chapters and follow the code. Also you should be able to edit it and get your own results!\n", 185 | "\n", 186 | "***\n", 187 | "## Resources\n", 188 | "\n", 189 | "[The Official __Matplotlib__ site](https://matplotlib.org/) \n", 190 | "\n", 191 | "Make sure to look at their [gallery](https://matplotlib.org/stable/gallery/index.html), which contains the code for each plot\n", 192 | "\n", 193 | "A very simple, step by step [tutorial](https://github.com/rougier/matplotlib-tutorial) to matplotlib\n", 194 | "\n", 195 | "[The Official __Cartopy__ - site](https://scitools.org.uk/cartopy/docs/latest/), and [gallery](https://scitools.org.uk/cartopy/docs/latest/gallery/index.html)\n", 196 | "\n", 197 | "R. Abernathey's [tutorial](https://rabernat.github.io/research_computing_2018/maps-with-cartopy.html) to Cartopy - Step by Step and very accessible\n", 198 | "\n", 199 | "[__Seaborn__](https://seaborn.pydata.org/index.html) - We didn't talk about Seaborn, but it is a very nice library with beatiful and well designed functions for statistical data visualization. Make sure you take a look at their gallery\n", 200 | "\n", 201 | "[The offical __Groupby__ reference](http://xarray.pydata.org/en/stable/groupby.html)\n", 202 | "\n", 203 | "The __xarray__ page also have some useful examples for [weather](http://xarray.pydata.org/en/stable/examples/weather-data.html) and [climate](http://xarray.pydata.org/en/stable/examples/monthly-means.html) data that applies the methods (and more) used here.\n" 204 | ] 205 | }, 206 | { 207 | "cell_type": "code", 208 | "execution_count": null, 209 | "metadata": {}, 210 | "outputs": [], 211 | "source": [] 212 | } 213 | ], 214 | "metadata": { 215 | "kernelspec": { 216 | "display_name": "Python 3 (ipykernel)", 217 | "language": "python", 218 | "name": "python3" 219 | }, 220 | "language_info": { 221 | "codemirror_mode": { 222 | "name": "ipython", 223 | "version": 3 224 | }, 225 | "file_extension": ".py", 226 | "mimetype": "text/x-python", 227 | "name": "python", 228 | "nbconvert_exporter": "python", 229 | "pygments_lexer": "ipython3", 230 | "version": "3.7.6" 231 | } 232 | }, 233 | "nbformat": 4, 234 | "nbformat_minor": 4 235 | } 236 | -------------------------------------------------------------------------------- /Ch5_Satellite_Cloud.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Chapter 5 - Satellite Data in the Cloud\n", 8 | "\n", 9 | "This chapter gives a short introduction to accessing satellite data in the Cloud. It is more informative than practical, so you could jump straight to the examples in the next chapters. However, you might want some background if you want to modify the examples to get other data that interests you." 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "***\n", 17 | "Satellite data and satellite-based data are available from many sources, commercial and public. Not all data are free, specially data products from private companies. \n", 18 | "\n", 19 | "### Many data providers, including NASA, NOAA, and Copernicus share their data publicly and free. Including in the Cloud.\n", 20 | "\n", 21 | "## Today, an increasing portion of these data is stored in two accesible Cloud providers: [Amazon (AWS)](https://registry.opendata.aws/) and [Google (Earth Engine)](https://developers.google.com/earth-engine/datasets)\n", 22 | " \n", 23 | "Also, data comes in different formats.\n", 24 | "\n", 25 | "We will use data from AWS because of data is stored in a format that is easily analyzed using `Python` and `xarray`, while Google uses its own interface (worth checking out, though)." 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "***\n", 33 | "***\n", 34 | "Historically, satellite data is stored as snapshot images. With time, as the satellite-era grow in length, the number of images available has grown to amazing levels. \n", 35 | "\n", 36 | "And while these data has become an incredible tool for interannual and longer analysis, accessing the data for temporal analysis requires accesing each time step file, which is cumbersome and time and resource consuming.\n", 37 | "\n", 38 | "### New formats, like `zarr` have been developed to address this issue and provide faster access the data, not only in the temporal axis. \n", 39 | "\n", 40 | "Not all data is in this format, but the number of data sets is increasing and we will take advantage of those that are available." 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "***\n", 48 | "***\n", 49 | "### *In these tutorials we aim to facilitate acqusition of satellite-base data and their temporal analysis*\n", 50 | "***" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "***\n", 58 | "## In the Ocean Example (Chapter 6), we use a NASA high resolution sea surface temperature ([MUR SST](https://registry.opendata.aws/mur/)) data product, which is stored in `zarr` format\n", 59 | "Acquiring the data is simple, requiring only few lines of code. It is a long process, however, given the high resolution - in space and times - of the data. " 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "***\n", 67 | "## In the Atmopshere Example (Chapter 7), we use a Copernicous satellite-based reanalysis product that most earth scientists are familiar with: [ECMWF/ERA-5](https://registry.opendata.aws/ecmwf-era5/)\n", 68 | "These data is stored in the cloud, in `zarr` format, but due to its volume, it is saved as monthly files. So, each file has to be accessed individually for a region and time to be selected. Note that because the files are in the cloud, we are not downloading them - only the selected portion is.\n", 69 | "\n", 70 | "#### In the example, we'll analyze wind vectors, but this notebook is easily modified to analyze another variable contained in this data set. " 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "***\n", 78 | "## Finally, in the Land Example (Chapter 8), we use NASA's MODIS satellite data to examine changes in vegetation through time. \n", 79 | "\n", 80 | "This data is available through diferent names and providers, and it is available in the AWS cloud (not free to date), and Google Earth Engine. It is also available online directly through NASA or NOAA, and other private providers.\n", 81 | "\n", 82 | "*Therefore, in this chapter we exemplify how to use `python` and `xarray` to process files from an online server - in this case a thredds server. Not the cloud!*\n", 83 | "\n", 84 | "You'll notice that this is straight forward method, although a bit lenghty. However, this method needs a good and stable bandwith because it downloads every file and then select the area (instead of select in the cloud before downloading it). \n", 85 | "\n", 86 | "We expect this, and other data, to be available in the cloud in zarr format soon - for free - so its acquisition would be similar to data in previous chapters." 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "***\n", 94 | "# Resources\n", 95 | "\n", 96 | "### Data:\n", 97 | "- [AWS Registry for Open Data](https://registry.opendata.aws/)\n", 98 | "- [Google Earth Engine Data](https://developers.google.com/earth-engine/datasets)\n", 99 | "- [NOAA Big Data Program](https://www.noaa.gov/organization/information-technology/list-of-big-data-program-datasets) Data list\n", 100 | "\n", 101 | "### Cloud resources:\n", 102 | "- [Pangeo](https://pangeo.io/cloud.html)\n", 103 | "- [Chameleon Cloud](https://www.chameleoncloud.org/)\n", 104 | "- [Datalore](https://datalore.jetbrains.com/)\n", 105 | "- [mybinder](https://mybinder.org/)\n", 106 | "- [pangeo binder](https://binder.pangeo.io/)\n", 107 | "\n", 108 | "### Libraries related to cloud access and formats used here:\n", 109 | "- [fsspec](https://filesystem-spec.readthedocs.io/en/latest/) filesystem interfaces for python\n", 110 | "- [zarr](https://zarr.readthedocs.io/en/stable/) Big data storage formatt\n", 111 | "- [dask](https://dask.org/) library to enable parallel processing for python\n", 112 | "\n", 113 | "### If you want to dig deeper:\n", 114 | "- [Pangeo tutorial for AGU OSM2020](https://github.com/pangeo-gallery/osm2020tutorial)\n", 115 | "- [Methods for accessing a AWS bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-bucket-intro.html). Bucket is the name of the cloud storage object. S3 stands for Amazon's Simple Storage Service.\n", 116 | "- [earthengine-api](https://github.com/google/earthengine-api/blob/master/python/examples/ipynb/Earth_Engine_REST_API_compute_table.ipynb) Use Python to access cloud data in the Google Earth Engine.\n", 117 | "- [satpy](https://github.com/pytroll/satpy) Python library to analyze satellite data\n", 118 | "- [pysat](https://github.com/pysat/pysat) A Python satellite data analysis toolkit" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": null, 124 | "metadata": {}, 125 | "outputs": [], 126 | "source": [] 127 | } 128 | ], 129 | "metadata": { 130 | "kernelspec": { 131 | "display_name": "Python 3 (ipykernel)", 132 | "language": "python", 133 | "name": "python3" 134 | }, 135 | "language_info": { 136 | "codemirror_mode": { 137 | "name": "ipython", 138 | "version": 3 139 | }, 140 | "file_extension": ".py", 141 | "mimetype": "text/x-python", 142 | "name": "python", 143 | "nbconvert_exporter": "python", 144 | "pygments_lexer": "ipython3", 145 | "version": "3.7.6" 146 | } 147 | }, 148 | "nbformat": 4, 149 | "nbformat_minor": 4 150 | } 151 | -------------------------------------------------------------------------------- /Ch6_Ocean_Example.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Chapter 6 - Example: Ocean Data \n", 8 | "### Days with sea surface temperature above a threshold\n", 9 | "\n", 10 | "In this chapter we exemplify the use of Sea Surface Temperature (SST) data in the cloud. \n", 11 | "\n", 12 | "This example analyzes a time series from an area of the ocean or a point. If an area, it averages SST values into a single value. Then it analyze the time series to assess when SST is above a given threshold. This could be used to study marine heatwaves, or use a SST threshold relevant to a marine species of interest." 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": null, 18 | "metadata": {}, 19 | "outputs": [], 20 | "source": [ 21 | "import warnings \n", 22 | "warnings.simplefilter('ignore') \n", 23 | "import numpy as np\n", 24 | "import pandas as pd\n", 25 | "import xarray as xr\n", 26 | "import matplotlib.pyplot as plt \n", 27 | "import hvplot.pandas # this library helps to make interactive plots\n", 28 | "import hvplot.xarray\n", 29 | "import fsspec # these libraries help reading cloud data\n", 30 | "import s3fs\n", 31 | "import dask\n", 32 | "from dask.distributed import performance_report, Client, progress" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": null, 38 | "metadata": {}, 39 | "outputs": [], 40 | "source": [ 41 | "# input parameters\n", 42 | "\n", 43 | "# select either a range of lat/lon or a point. \n", 44 | "# If a point, set both entries to the same value\n", 45 | "latr = [19, 20] # make sure lat1 < lat2 since no test is done below to simplify the code\n", 46 | "lonr = [-158, -157] # lon1 < lon2, range -180:180. resolution daily 1km!\n", 47 | "\n", 48 | "# time range. data range available: 2002-06-01 to 2020-01-20. [start with a short period]\n", 49 | "dater = ['2012-01-01','2016-12-31'] # dates on the format 'YYYY-MM-DD' as string" 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": {}, 55 | "source": [ 56 | "***\n", 57 | "## We are going to use the Multi-Scale Ultra High Resolution (MUR) Sea Surface Temperature (SST) data set\n", 58 | "### This dataset is stored the Amazon (AWS) Cloud. For more info and links to the data detail and examples, see: https://registry.opendata.aws/mur/\n", 59 | "\n", 60 | "This dataset is stored in `zarr` format, which is an optimized format for the large datasets and the cloud. It is not stored as one 'image' at a time or a gigantic netcdf file, but in 'chunks', so it is perfect for extracting time series.\n", 61 | "\n", 62 | "First, we open the dataset and explore it, but we are not downloading anything yet." 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": null, 68 | "metadata": {}, 69 | "outputs": [], 70 | "source": [ 71 | "# first determine the file name using, in the format:\n", 72 | "# the s3 bucket [mur-sst], and the region [us-west-2], and the folder if applicable [zarr-v1] \n", 73 | "file_location = 'https://mur-sst.s3.us-west-2.amazonaws.com/zarr-v1'\n", 74 | "\n", 75 | "ds_sst = xr.open_zarr(file_location,consolidated=True) # open a zarr file using xarray\n", 76 | "# it is similar to open_dataset but it only reads the metadata\n", 77 | "\n", 78 | "ds_sst # we can treat it as a dataset!\n" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "## Now that we know what the file contains, we select our data (region and time), operate on it if needed (if a region, average), and download only the selected data \n", 86 | "It takes a while given the high resolution of the data. So, be patient.... and if you're only testing, might want to choose a small region and a short time period first. " 87 | ] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "execution_count": null, 92 | "metadata": {}, 93 | "outputs": [], 94 | "source": [ 95 | "# decide if a point or a region was given.\n", 96 | "if (latr[0]==latr[1]) | (lonr[0]==lonr[1]): # if we give it only one point\n", 97 | " sst = ds_sst['analysed_sst'].sel(time = slice(dater[0],dater[1]),\n", 98 | " lat = latr[0], \n", 99 | " lon = lonr[0]\n", 100 | " ).load()\n", 101 | "else: # if we give it an area, it extract the area and average SST over the area and returns a time series of SST\n", 102 | " sst = ds_sst['analysed_sst'].sel(time = slice(dater[0],dater[1]),\n", 103 | " lat = slice(latr[0], latr[1]), \n", 104 | " lon = slice(lonr[0], lonr[1])\n", 105 | " ).mean(dim={'lat','lon'}, skipna=True, keep_attrs=True).load() # skip 'not a number' (NaN) values and keep attributes\n", 106 | "\n", 107 | "sst = sst-273.15 # transform units from Kelvin to Celsius\n", 108 | "sst.attrs['units']='deg C' # update units in metadata\n", 109 | "sst.to_netcdf('data/sst_example.nc') # saving the data, incase we want to come back to analyze the same data, but don't want to acquire it again from the cloud.\n", 110 | "sst # take a peak" 111 | ] 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "metadata": {}, 116 | "source": [ 117 | "***\n", 118 | "### *Execute the next cell only if your reading the data from a file - either no access to cloud, or not want to keep reading from it. Skip otherwise. (No problem if you executed it by mistake).*" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": null, 124 | "metadata": {}, 125 | "outputs": [], 126 | "source": [ 127 | "sst = xr.open_dataset('data/sst_example.nc') \n", 128 | "sst.close()\n", 129 | "sst = sst.analysed_sst # select only one variable\n", 130 | "sst" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": {}, 136 | "source": [ 137 | "***\n", 138 | "## Let's plot the data using two different libraries.\n", 139 | "#### - `matplotlib` that we already learn.\n", 140 | "#### - `hovplot` is a more interactive library for web display. It provides you with the data details when you hover your cursor over the figure. Very nice for inspecting the data." 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": null, 146 | "metadata": {}, 147 | "outputs": [], 148 | "source": [ 149 | "# matplotlib method #\n", 150 | "print('matplotlib') \n", 151 | "sst.plot() # this is all you need\n", 152 | "\n", 153 | "# all the stuff here to make it look nice. \n", 154 | "plt.ylabel('SST ($^\\circ$C)')\n", 155 | "plt.xlabel('Year')\n", 156 | "plt.title('Location: '+str(latr)+'$^\\circ$N, '+str(lonr)+'$^\\circ$W')\n", 157 | "plt.grid(True, alpha=0.3)\n", 158 | "plt.show()\n", 159 | "\n", 160 | "# hovplot method #\n", 161 | "print('hovplot')\n", 162 | "df = pd.DataFrame(data=sst.data, index=sst.time.data,columns=['SST (C)'])\n", 163 | "df.index.name = 'Date'\n", 164 | "df.hvplot(grid=True)" 165 | ] 166 | }, 167 | { 168 | "cell_type": "markdown", 169 | "metadata": {}, 170 | "source": [ 171 | "***\n", 172 | "## Now, let's analyze our data.\n", 173 | "#### First, the basics: climatology and anomalies. Also plotting using `hovplot`." 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": null, 179 | "metadata": {}, 180 | "outputs": [], 181 | "source": [ 182 | "# Calculate the climatology\n", 183 | "sst_climatology = sst.groupby('time.dayofyear').mean('time',keep_attrs=True,skipna=False) # Group by day, all years. skipna ignore missing (NaN) values \n", 184 | "sst_climstd = sst.groupby('time.dayofyear').std('time',keep_attrs=True,skipna=False) # Calculate standard deviation. Keep data attributes.\n", 185 | "\n", 186 | "# creates a dataset with climatology and standard deviaton for easy plotting with hvplot\n", 187 | "ds = xr.Dataset({'clim':sst_climatology,'+Std':sst_climatology+sst_climstd,'-Std':sst_climatology-sst_climstd}) # add standard deviation time series +/-\n", 188 | "ds.hvplot(color=['k','grey','grey'], grid=True, title='SST Climatology') # plot the climatology (black, and the standard deviation in grey)" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": null, 194 | "metadata": {}, 195 | "outputs": [], 196 | "source": [ 197 | "# calculate the anomalies\n", 198 | "sst_anomaly = sst.groupby('time.dayofyear')-sst_climatology \n", 199 | "sst_anomaly_monthly = sst_anomaly.resample(time='1MS', loffset='15D').mean(keep_attrs=True,skipna=False) # calculate monthly anomalies/smoothing\n", 200 | "\n", 201 | "# make a plot \n", 202 | "plt.plot(sst_anomaly.time,sst_anomaly)\n", 203 | "plt.plot(sst_anomaly_monthly.time,sst_anomaly_monthly, 'r')\n", 204 | "\n", 205 | "plt.grid()\n", 206 | "plt.ylabel('SSTa (C)')\n", 207 | "plt.title('SST Anomalies')\n", 208 | "plt.show()" 209 | ] 210 | }, 211 | { 212 | "cell_type": "markdown", 213 | "metadata": {}, 214 | "source": [ 215 | "***\n", 216 | "## We analyze the data further by dividing it by a threshold.\n", 217 | "\n", 218 | "- One way is to set a threshold that has some relevance. For example, a thermal threshold for a marine species we are studying. \n", 219 | "\n", 220 | "- Another way is choosing the maximum value in the climatology (mean value + 1 standard deviation), which we can calculate or read by hovering our cursor over the climatology plot above.\n", 221 | "\n", 222 | "### Once the threshold is choosen, we identify when SST is over that threshold, and count how many days that occurred each year" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": null, 228 | "metadata": {}, 229 | "outputs": [], 230 | "source": [ 231 | "# we define a function that take a threshold value, and analyze and plot our data\n", 232 | "def SST_above(thr):\n", 233 | " \n", 234 | " fig, axs = plt.subplots(1,2,figsize=(16,4)) # creates a figure with two panels\n", 235 | " \n", 236 | " # first part - values above threshold - timeseries\n", 237 | " plt.subplot(1,2,1) # plot on the first panel (last number)\n", 238 | " plt.plot(sst.time,sst.data, lw=1)\n", 239 | " a=sst>=thr # test when data is equal or greater than the threshold. a is a logical array (True/False values)\n", 240 | " plt.plot(sst.time[a], sst.data[a],'.r', markersize=3) # plot only the values equal or above threshold\n", 241 | " # all stuff here to make it look good\n", 242 | " plt.ylabel('SST ($^\\circ$C)')\n", 243 | " plt.xlabel('Year')\n", 244 | " plt.title('Location: '+str(latr)+'$^\\circ$N, '+str(lonr)+'$^\\circ$W')\n", 245 | " plt.grid(True, alpha=0.3)\n", 246 | " \n", 247 | "\n", 248 | " # second part - days per year above threshold\n", 249 | " plt.subplot(1,2,2) # plot on the second panel\n", 250 | " dts = sst[sst>=thr].time # select dates when SST is equal or greater than the threshold. note that this time is not a logical array, but the time values\n", 251 | " hot_days = dts.groupby('time.year').count() # agregate by year, by counting \n", 252 | " plt.bar(hot_days.year, hot_days) # bar plot of days per year\n", 253 | " plt.xlim(int(dater[0][:4]), int(dater[1][:4])+1) # make it nice\n", 254 | " plt.ylabel('No. days above '+str(np.round(thr,1))+'C')\n", 255 | " plt.grid(True, alpha=0.3)\n", 256 | " plt.show() # display and finaiize this figure, so the next is not overwritten\n", 257 | "\n", 258 | "## the actual analysis: two examples ##\n", 259 | "\n", 260 | "### Maximum climatology threshold\n", 261 | "thr = ds['+Std'].max() # setting threshold as maximum climatological value: mean + 1 standard deviation\n", 262 | "print('Max climatological SST = ',np.round(thr,1),'C')\n", 263 | "SST_above(thr) # Call function we defined\n", 264 | "\n", 265 | "### A relevant threshold. \n", 266 | "# For example, for hawaii (the select region), 28C is a relevant threshold for coral bleaching (https://coralreefwatch.noaa.gov/product/5km/tutorial/crw08a_bleaching_threshold.php)\n", 267 | "thr = 28\n", 268 | "print('\\n\\nBiologically relevant SST = ',thr,'C')\n", 269 | "SST_above(thr) # Call function" 270 | ] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": {}, 275 | "source": [ 276 | "***\n", 277 | "#### Now, a different analsys of anomalously warm SST days. \n", 278 | "## Marine heatwaves\n", 279 | "Defined as any period with SST anomalies above the threshold determined by the 90th percentile value of a given period - in this case our data time period." 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": null, 285 | "metadata": {}, 286 | "outputs": [], 287 | "source": [ 288 | "# first, calculate the threshold: 90th percentile\n", 289 | "thr = np.percentile(sst_anomaly, 90)\n", 290 | "\n", 291 | "fig, axs = plt.subplots(3,1,figsize=(16,16)) # make a figure of 3 vertical panels\n", 292 | "\n", 293 | "# same plot as in our function above, but this time we are plotting the anomalies.\n", 294 | "plt.subplot(3,1,1) \n", 295 | "plt.plot(sst_anomaly.time,sst_anomaly.data, lw=1)\n", 296 | "plt.axhline(y=0, c='k', zorder=0, alpha=0.5) # add a line to highlight the x axis \n", 297 | "a=sst_anomaly>=thr # select data above the threshold\n", 298 | "plt.plot(sst_anomaly.time[a], sst_anomaly.data[a],'.r', markersize=3)\n", 299 | "# all stuff here to make it look good\n", 300 | "plt.ylabel('SST anomalies ($^\\circ$C)')\n", 301 | "plt.xlabel('Year')\n", 302 | "plt.title('Location: '+str(latr)+'$^\\circ$N, '+str(lonr)+'$^\\circ$W')\n", 303 | "plt.grid(True, alpha=0.3)\n", 304 | "\n", 305 | "# Now plot on the original data (not anomalies)\n", 306 | "plt.subplot(3,1,2) # second panel\n", 307 | "plt.plot(sst.time,sst.data, lw=1)\n", 308 | "plt.plot(sst.time[a], sst.data[a],'.r', markersize=3) # plot only the values equal or above threshold\n", 309 | "# all stuff here to make it look good\n", 310 | "plt.ylabel('SST ($^\\circ$C)')\n", 311 | "plt.xlabel('Year')\n", 312 | "plt.title('Location: '+str(latr)+'$^\\circ$N, '+str(lonr)+'$^\\circ$W')\n", 313 | "plt.grid(True, alpha=0.3)\n", 314 | "\n", 315 | "# plot of marine heatwave days per year\n", 316 | "dts = sst_anomaly[sst_anomaly>=thr].time\n", 317 | "mhw = dts.groupby('time.year').count()\n", 318 | "plt.subplot(3,1,3) # third panel\n", 319 | "plt.bar(mhw.year,mhw)\n", 320 | "plt.ylabel('No. days SSTa > '+str(np.round(thr,1))+'C')\n", 321 | "plt.grid(True, alpha=0.3)\n", 322 | "plt.show()\n", 323 | "\n", 324 | "mhw # print the numbers of days" 325 | ] 326 | }, 327 | { 328 | "cell_type": "markdown", 329 | "metadata": {}, 330 | "source": [ 331 | "## Resources\n", 332 | "\n", 333 | "For the cloud and data in the cloud, see resources listed in Chapter 5.\n", 334 | "\n", 335 | "### Resources specifically for this chapter:\n", 336 | "\n", 337 | "- [MUR SST Data](https://registry.opendata.aws/mur/). SST data in the cloud, with references the official datta website, examples and other resources.\n", 338 | "\n", 339 | "- [Pangeo OSM2020 Tutorial](https://github.com/pangeo-gallery/osm2020tutorial). This is a very good tutorial for ocean application and cloud computing. Plenty of examples. Many of the commands here are from this tutorial.\n", 340 | "\n", 341 | "### About MHW\n", 342 | "\n", 343 | "- [Marine heatwaves](http://www.marineheatwaves.org/all-about-mhws.html). A good place to begin to get info about the subject.\n", 344 | "\n", 345 | "- [Marine heatwaves code](https://github.com/ecjoliver/marineHeatWaves). Marine heatwaves code from E. Oliver.\n", 346 | "\n", 347 | "### If you want to learn more:\n", 348 | "\n", 349 | "- [Methods for accessing a AWS bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-bucket-intro.html). Bucket is the name of the cloud storage object. S3 stands for Amazon's Simple Storage Service.\n", 350 | "\n", 351 | "- [hvplot site](https://hvplot.holoviz.org/index.html). Plotting tool used here.\n", 352 | "\n", 353 | "- [zarr](https://zarr.readthedocs.io/en/stable/). Learn more about this big data storage format." 354 | ] 355 | }, 356 | { 357 | "cell_type": "code", 358 | "execution_count": null, 359 | "metadata": {}, 360 | "outputs": [], 361 | "source": [] 362 | } 363 | ], 364 | "metadata": { 365 | "kernelspec": { 366 | "display_name": "Python 3 (ipykernel)", 367 | "language": "python", 368 | "name": "python3" 369 | }, 370 | "language_info": { 371 | "codemirror_mode": { 372 | "name": "ipython", 373 | "version": 3 374 | }, 375 | "file_extension": ".py", 376 | "mimetype": "text/x-python", 377 | "name": "python", 378 | "nbconvert_exporter": "python", 379 | "pygments_lexer": "ipython3", 380 | "version": "3.7.6" 381 | } 382 | }, 383 | "nbformat": 4, 384 | "nbformat_minor": 4 385 | } 386 | -------------------------------------------------------------------------------- /Ch7_Atmosphere_Example.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Chapter 7 - Example: Atmospheric Data \n", 8 | "### Analyze monthy wind data for a selected region\n", 9 | "\n", 10 | "In this chapter, we exemplify the use of an atmospheric/climate data set, the reanalysis dataset ERA-5, to analyze change in wind vectors at 10m. We characterize its variability over a given region, plot the field and calculate linear trends.\n", 11 | "\n", 12 | "[ERA-5 (ECMWF)](https://registry.opendata.aws/ecmwf-era5/) reanalysis incorporates satellite and in-situ data, and its output variables include ocean, land and atmospheric ones. Therefore, this script can be easily modified for other data. " 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": null, 18 | "metadata": {}, 19 | "outputs": [], 20 | "source": [ 21 | "import warnings\n", 22 | "warnings.simplefilter('ignore') \n", 23 | "\n", 24 | "import numpy as np\n", 25 | "import pandas as pd\n", 26 | "import xarray as xr\n", 27 | "from calendar import month_abbr # function that gives you the abbreviated name of a month\n", 28 | "from calendar import monthrange # gives the number of day in a month\n", 29 | "import matplotlib.pyplot as plt \n", 30 | "import hvplot.pandas\n", 31 | "import hvplot.xarray\n", 32 | "import fsspec\n", 33 | "import s3fs\n", 34 | "import dask\n", 35 | "from dask.distributed import performance_report, Client, progress\n", 36 | "import os # library to interact with the operating system" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "***\n", 44 | "## For this example we select a region, and also a specific month and a range of years to analyze" 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": null, 50 | "metadata": {}, 51 | "outputs": [], 52 | "source": [ 53 | "# Select region by defining latitude and longitude range. \n", 54 | "# ERA-5 data has a 1/4 degree resolution. \n", 55 | "latr = [39, 40] # Latitude range. Make sure lat1 < lat2 since no test is done below to simplify the code. resolution 0.25 degrees\n", 56 | "lonr = [-125, -123] # lon1 < lon2. and use the range -180 : 180\n", 57 | "# time selection\n", 58 | "mon = 5 # month to analyze\n", 59 | "iyr = 2000 # initial year. by default, we set it to the start year of ERA5 dataset\n", 60 | "fyr = 2021 # final year. by default, we set it to the end year of ERA5 dataset\n" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "***\n", 68 | "## Acquire data from the AWS cloud\n", 69 | "\n", 70 | "In this case, files are stored in a different format than SST. ERA5 data is stored in monthly files (of daily data) organized in yearly folders. Then, monhtly files have to be accessed individually." 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": null, 76 | "metadata": { 77 | "scrolled": true 78 | }, 79 | "outputs": [], 80 | "source": [ 81 | "tdt = list() # initialize a list to store the time index\n", 82 | "\n", 83 | "# v meridional component\n", 84 | "print('Acquiring meridional wind v10m')\n", 85 | "for iy, y in enumerate(range(iyr, fyr+1)): # for loop over the selected years\n", 86 | " file_location = 'https://era5-pds.s3.us-east-1.amazonaws.com/zarr/'+str(y)+'/'+str(mon).zfill(2)+'/data/northward_wind_at_10_metres.zarr'\n", 87 | " # filename includes: bucket name: era5-pds, year: y (transformed to string type), month: mon, and the name of the variable with extenssion zarr\n", 88 | " ds = xr.open_zarr(file_location,consolidated=True) # open access to data\n", 89 | "\n", 90 | " # generate time frame to obtain the whole month data (first to last day of selected month)\n", 91 | " dte1 = str(y)+'-'+str(mon).zfill(2)+'-01'\n", 92 | " dte2 = str(y)+'-'+str(mon).zfill(2)+'-'+str(monthrange(y, mon)[1]) #monthrange provides the lenght of the month\n", 93 | " # select data region and time - meridional wind\n", 94 | " vds = ds['northward_wind_at_10_metres'].sel(time0 = slice(dte1,dte2),\n", 95 | " lat = slice(latr[1],latr[0],), \n", 96 | " lon = slice(lonr[0]+360,lonr[1]+360)\n", 97 | " ).mean(axis=0).load() # calculae mean before downloading it\n", 98 | " if iy==0: # if the first year, create an array to store data\n", 99 | " v10_dt = np.full((len(range(iyr, fyr+1)),vds.shape[0],vds.shape[1]), np.nan) # create an array of the size [years,lat,lon]\n", 100 | " v10_dt[iy,:,:] = vds.data # store selected data per year\n", 101 | " \n", 102 | "# u component\n", 103 | "print('Acquiring zonal wind u10m')\n", 104 | "for iy, y in enumerate(range(iyr, fyr+1)):\n", 105 | " file_location = 'https://era5-pds.s3.us-east-1.amazonaws.com/zarr/'+str(y)+'/'+str(mon).zfill(2)+'/data/eastward_wind_at_10_metres.zarr'\n", 106 | " # note that each variable has a distintive file name\n", 107 | " ds = xr.open_zarr(file_location,consolidated=True)\n", 108 | "\n", 109 | " dte1 = str(y)+'-'+str(mon).zfill(2)+'-01'\n", 110 | " dte2 = str(y)+'-'+str(mon).zfill(2)+'-'+str(monthrange(y, mon)[1])\n", 111 | " uds = ds['eastward_wind_at_10_metres'].sel(time0 = slice(dte1,dte2),\n", 112 | " lat = slice(latr[1],latr[0],), \n", 113 | " lon = slice(lonr[0]+360,lonr[1]+360)\n", 114 | " ).mean(axis=0).load()\n", 115 | " if iy==0: \n", 116 | " u10_dt = np.full((len(range(iyr, fyr+1)),uds.shape[0],uds.shape[1]), np.nan)\n", 117 | " u10_dt[iy,:,:] = uds.data \n", 118 | " \n", 119 | " # append month-year time to the list\n", 120 | " tdt.append(str(y)+'-'+str(mon).zfill(2)+'-01') # add first day of month\n", 121 | " \n" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": null, 127 | "metadata": {}, 128 | "outputs": [], 129 | "source": [ 130 | "# Build a dataset from the selected data. not only a dataarray since we have 2 variables for the vector\n", 131 | "mw10 = xr.Dataset(data_vars=dict(u10m=(['time','lat','lon'],u10_dt),\n", 132 | " v10m=(['time','lat','lon'],v10_dt), ),\n", 133 | " coords=dict(time=tdt,lat=vds.lat.values, lon=vds.lon.values-360),attrs=vds.attrs) \n", 134 | "# Add a wind speed variable\n", 135 | "mw10['wsp10m'] = np.sqrt(mw10.u10m**2+mw10.v10m**2) # calculate wind speed\n", 136 | "mw10.to_netcdf('./data/ERA5_wind10m_mon'+str(mon).zfill(2)+'.nc') # saving the file for a future use, so we don't have to get data again\n", 137 | "mw10 # taking a peek\n" 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": null, 143 | "metadata": {}, 144 | "outputs": [], 145 | "source": [ 146 | "mw10 = xr.open_dataset('./data/ERA5_wind10m_mon05.nc')\n", 147 | "mw10.close()\n", 148 | "mw10" 149 | ] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": {}, 154 | "source": [ 155 | "***\n", 156 | "## Plotting the data\n", 157 | "\n", 158 | "As before, there is a simple way to plot the data for quick inspection, and also a way to make the plot ready for sharing or publication." 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": null, 164 | "metadata": {}, 165 | "outputs": [], 166 | "source": [ 167 | "# simple plot of data, using the matplotlib function quiver to plot vectors\n", 168 | "x,y = np.meshgrid(mw10.lon,mw10.lat) # generate an lat/lon grid to plot the vectors\n", 169 | "plt.quiver(x, y, mw10.u10m[0,:,:], mw10.v10m[0,:,:]) \n", 170 | "plt.show()" 171 | ] 172 | }, 173 | { 174 | "cell_type": "code", 175 | "execution_count": null, 176 | "metadata": {}, 177 | "outputs": [], 178 | "source": [ 179 | "# Now a more detailed plot\n", 180 | "from cartopy.mpl.ticker import LongitudeFormatter, LatitudeFormatter\n", 181 | "import cartopy.feature as cfeature\n", 182 | "import cartopy.crs as ccrs\n", 183 | "from calendar import month_abbr\n", 184 | "\n", 185 | "# Select a region of our data, giving it a margin\n", 186 | "margin = 0.5 # extra space for the plot\n", 187 | "region = np.array([[latr[0]-margin,latr[1]+margin],[lonr[0]-margin,lonr[1]+margin]]) # numpy array that specifies the lat/lon boundaries of our selected region\n", 188 | "\n", 189 | "# Create and set the figure context\n", 190 | "fig = plt.figure(figsize=(8,5)) # create a figure object, and assign it a variable name fig\n", 191 | "ax = plt.axes(projection=ccrs.PlateCarree()) # projection type - this one is easy to use\n", 192 | "ax.coastlines(resolution='50m',linewidth=2,color='black') \n", 193 | "ax.add_feature(cfeature.LAND, color='grey', alpha=0.3)\n", 194 | "ax.set_extent([region[1,0],region[1,1],region[0,0],region[0,1]],crs=ccrs.PlateCarree()) \n", 195 | "ax.set_xticks([*np.arange(region[1,0],region[1,1]+1,1)], crs=ccrs.PlateCarree()) # customize ticks and labels to longitude\n", 196 | "ax.set_yticks([*np.arange(region[0,0],region[0,1]+1,1)], crs=ccrs.PlateCarree()) # customize ticks and labels to latitude\n", 197 | "ax.xaxis.set_major_formatter(LongitudeFormatter(zero_direction_label=True))\n", 198 | "ax.yaxis.set_major_formatter(LatitudeFormatter())\n", 199 | "\n", 200 | "# Plot average wind for the selected month, color is the wind speed\n", 201 | "plt.quiver(x, y, mw10.u10m.mean(axis=0), mw10.v10m.mean(axis=0),mw10.wsp10m.mean(axis=0), cmap='jet')\n", 202 | "cbar=plt.colorbar()\n", 203 | "cbar.set_label('m/s') # color bar label\n", 204 | "plt.title('Wind for '+month_abbr[mon]+' ('+str(iyr)+'-'+str(fyr)+')')\n", 205 | "#fig.savefig('filename') # save your figure by usinig the method .savefig. python recognized the format from the filename extension. \n", 206 | "plt.show()" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "*** \n", 214 | "## To analyze the data in time, we select only one point in space. \n", 215 | "But if you want to analyze the entire field, you can:\n", 216 | "- Average spatially using .mean(axis=(1,2)) on the variables\n", 217 | "- Repeat the analysis for each point (using a `for` loop)\n", 218 | "- Or even better: use `xarray` methods to apply a function to the array" 219 | ] 220 | }, 221 | { 222 | "cell_type": "code", 223 | "execution_count": null, 224 | "metadata": {}, 225 | "outputs": [], 226 | "source": [ 227 | "print('Latitude values: ', mw10.lat.values)\n", 228 | "print('Longitude values: ',mw10.lon.values)" 229 | ] 230 | }, 231 | { 232 | "cell_type": "code", 233 | "execution_count": null, 234 | "metadata": {}, 235 | "outputs": [], 236 | "source": [ 237 | "# select a point from the range of latitude and longitude values above\n", 238 | "slat = 39 # selected latitude\n", 239 | "slon = -124 # selected longitude" 240 | ] 241 | }, 242 | { 243 | "cell_type": "code", 244 | "execution_count": null, 245 | "metadata": {}, 246 | "outputs": [], 247 | "source": [ 248 | "# Select data for an specific location, and do a simple plot of each variable\n", 249 | "plt.figure(figsize=(12,8))\n", 250 | "\n", 251 | "# meridional wind change\n", 252 | "plt.subplot(2,2,1)\n", 253 | "plt.plot(range(iyr,fyr+1),mw10.v10m.sel(lat=slat,lon=slon), 'bd-',zorder=2)\n", 254 | "plt.axhline(y=0,c='k', alpha=0.4)\n", 255 | "plt.ylabel('Wind speed (m/s)')\n", 256 | "plt.title('Meridional wind (v), Lat='+str(slat)+', Lon='+str(slon))\n", 257 | "plt.grid(zorder=0)\n", 258 | "\n", 259 | "# zonal wind change\n", 260 | "plt.subplot(2,2,2)\n", 261 | "plt.plot(range(iyr,fyr+1),mw10.u10m.sel(lat=slat,lon=slon), 'go-',zorder=2)\n", 262 | "plt.axhline(y=0,c='k', alpha=0.4)\n", 263 | "plt.ylabel('Wind speed (m/s)')\n", 264 | "plt.title('Zonal wind (u), Lat='+str(slat)+', Lon='+str(slon))\n", 265 | "plt.grid(zorder=0)\n", 266 | "\n", 267 | "# wind speed change\n", 268 | "plt.subplot(2,2,3)\n", 269 | "plt.plot(range(iyr,fyr+1), mw10.wsp10m.sel(lat=slat,lon=slon), 's-',c='darkorange',zorder=2)\n", 270 | "plt.axhline(y=0,c='k', alpha=0.4)\n", 271 | "plt.ylabel('Wind speed (m/s)')\n", 272 | "plt.title('Wind speed, Lat='+str(slat)+', Lon='+str(slon))\n", 273 | "plt.grid(zorder=0)\n", 274 | "\n", 275 | "plt.tight_layout()\n", 276 | "plt.show()" 277 | ] 278 | }, 279 | { 280 | "cell_type": "markdown", 281 | "metadata": {}, 282 | "source": [ 283 | "***\n", 284 | "## Now, let's calculate the temporal trend on one of the wind variables, using a first degree linear regression " 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": null, 290 | "metadata": {}, 291 | "outputs": [], 292 | "source": [ 293 | "# libraries for statistics and machine learning functions\n", 294 | "from sklearn.preprocessing import PolynomialFeatures\n", 295 | "import statsmodels.api as sm\n", 296 | "\n", 297 | "var='wsp10m' # select a variable from our Dataset\n", 298 | "\n", 299 | "x = np.array([*range(iyr,fyr+1)]).reshape(-1,1) # we generate an array of years, and transpose it by using .reshape(-1,1)\n", 300 | "y = mw10[var].sel(lat=slat,lon=slon).values.reshape(-1,1) # selected variable at the selected point\n", 301 | "\n", 302 | "polf = PolynomialFeatures(1) # linear regression (order=1)\n", 303 | "xp = polf.fit_transform(x) # generate a array with the years and a dummy / constant variable\n", 304 | "mods = sm.OLS(y,xp).fit() # calculate regression model, stored in mods\n", 305 | "\n", 306 | "print(mods.summary()) # each variable of the modell can also be accessed individually\n", 307 | "\n", 308 | "# this summary shows different metrics and significance levels along with the equation variables and constants. \n", 309 | "# for more details see the resources section below" 310 | ] 311 | }, 312 | { 313 | "cell_type": "markdown", 314 | "metadata": {}, 315 | "source": [ 316 | "***\n", 317 | "# Resources\n", 318 | "**Data**\n", 319 | "- AWS [ERA-5 (ECMWF)](https://registry.opendata.aws/ecmwf-era5/) reanalysis data.\n", 320 | "This page also has links to other tutorials that use other libraries.\n", 321 | "- [List of data available](https://github.com/planet-os/notebooks/blob/master/aws/era5-pds.md) on ERA5 and details on how the files are organized.\n", 322 | "- Google Earth Engine ERA-5 data. [[Monthly]](https://developers.google.com/earth-engine/datasets/catalog/ECMWF_ERA5_MONTHLY#bands) [[Daily]](https://developers.google.com/earth-engine/datasets/catalog/ECMWF_ERA5_DAILY).\n", 323 | "\n", 324 | "**More on the libraries:**\n", 325 | "- [xarray apply](https://www.programcreek.com/python/example/123575/xarray.apply_ufunc) Examples on how to apply a function to an xarray structure\n", 326 | "- [sckit-learn (sklearn)](https://scikit-learn.org/stable/) a library for machine learning functions\n", 327 | "- [statsmodels](https://www.statsmodels.org/stable/user-guide.html) a library to calculalte statistical models.\n", 328 | "\n", 329 | "\n" 330 | ] 331 | }, 332 | { 333 | "cell_type": "code", 334 | "execution_count": null, 335 | "metadata": {}, 336 | "outputs": [], 337 | "source": [] 338 | } 339 | ], 340 | "metadata": { 341 | "kernelspec": { 342 | "display_name": "Python 3 (ipykernel)", 343 | "language": "python", 344 | "name": "python3" 345 | }, 346 | "language_info": { 347 | "codemirror_mode": { 348 | "name": "ipython", 349 | "version": 3 350 | }, 351 | "file_extension": ".py", 352 | "mimetype": "text/x-python", 353 | "name": "python", 354 | "nbconvert_exporter": "python", 355 | "pygments_lexer": "ipython3", 356 | "version": "3.7.6" 357 | } 358 | }, 359 | "nbformat": 4, 360 | "nbformat_minor": 4 361 | } 362 | -------------------------------------------------------------------------------- /Ch8_Land_Example.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Chapter 8 - Example: Land Data\n", 8 | "### Changes in vegetation index through the years for an area. NDVI index inidicates the percentage of vegetation for each grid point.\n", 9 | "\n", 10 | "In this chapter we don't use data from the cloud, but exemplify how to obtain timeseries data from the data stored in temporally separeted files in the internet and analyze it. You'll see that it is not very different from previous chapters, except that there is not a centralized repository for data. In the future (hopefully soon), when data is in the cloud on a similar data format, accessing from these data would be similar to chapters 6 and 7.\n", 11 | "\n", 12 | "## This script reads NDVI (vegetation index) files from a `thredds` server, compile the region and time selected, and then analyze the change in vegetation index through time." 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": null, 18 | "metadata": {}, 19 | "outputs": [], 20 | "source": [ 21 | "import warnings\n", 22 | "warnings.simplefilter('ignore') \n", 23 | "\n", 24 | "import pandas as pd\n", 25 | "import numpy as np\n", 26 | "import xarray as xr\n", 27 | "xr.set_options(display_style=\"html\") # display dataset nicely\n", 28 | "import os\n", 29 | "import re # regular expressions\n", 30 | "from datetime import date\n", 31 | "from calendar import month_abbr\n", 32 | "import urllib as ur # library to download files online \n", 33 | "import requests # library to read files online \n", 34 | "import matplotlib.pyplot as plt \n", 35 | "import hvplot.pandas\n", 36 | "import hvplot.xarray\n" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": null, 42 | "metadata": {}, 43 | "outputs": [], 44 | "source": [ 45 | "# Select a region \n", 46 | "lat1, lat2 = 16, 18 # two latitudes for a range: lat1=0.3) # create a mask for veg. index >= 30% in the first time step. other locations set to NaN\n", 197 | "veg_area = mask0.count() # count the number of grid points above when the mask is applied - need it if you want to calculate area later\n", 198 | "for i in range(len(ayrs)): \n", 199 | " tmp=ndvi[i,:,:]*mask0 # apply the mask for each year\n", 200 | " veg_mean.append(tmp.mean())\n", 201 | "\n", 202 | "plt.bar(ayrs,veg_mean-np.nanmean(veg_mean))\n", 203 | "plt.title('Vegetation Index Change for '+month_abbr[mon]+' '+str(dy).zfill(2))\n", 204 | "plt.ylabel('NDVI')\n", 205 | "plt.grid(True, alpha=0.3)\n", 206 | "plt.show()" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "***\n", 214 | "# Resources\n", 215 | "\n", 216 | "\n", 217 | "### Data and data sources: \n", 218 | "- [NDVI Normalized Difference Vegetation Index (Climate Data Record)](https://www.ncei.noaa.gov/products/climate-data-records/normalized-difference-vegetation-index) data. \n", 219 | "- [NDVI data list](https://www.ncei.noaa.gov/thredds/catalog/cdr/ndvi/catalog.html) \n", 220 | "\n", 221 | "### Other locations for MODIS and NDVI data\n", 222 | "- [AWS](https://registry.opendata.aws/modis-astraea/)\n", 223 | "- [AWS NASA NEX](https://registry.opendata.aws/nasanex/)\n", 224 | "- [Earth Engine](https://developers.google.com/earth-engine/datasets/catalog/NOAA_CDR_AVHRR_NDVI_V5#description)\n", 225 | "- [USGS](https://lpdaac.usgs.gov/products/mod13q1v006/)\n", 226 | "\n", 227 | "### Other data in `thredds`\n", 228 | "- [NCEI thredds](https://www.ncei.noaa.gov/thredds/catalog.html) NOAA National Centers for Environmental Information thredds catalog.\n", 229 | "- [How to access data file ini thredds](https://www.unidata.ucar.edu/software/tds/current/tutorial/CatalogPrimer.html)\n", 230 | "\n", 231 | "### More on the libraries:\n", 232 | "- [A short article on how to download files from url in Python](https://betterprogramming.pub/3-simple-ways-to-download-files-with-python-569cb91acae6)\n", 233 | "- [urllib/request](https://docs.python.org/3/library/urllib.request.html?highlight=retrieve) library \n", 234 | "- Regular expressions [re](https://docs.python.org/3/howto/regex.html). Useful method to manipulate strings. See this [tutorial](https://www.tutorialspoint.com/python/python_reg_expressions.htm) for a more friendly approach.\n" 235 | ] 236 | }, 237 | { 238 | "cell_type": "code", 239 | "execution_count": null, 240 | "metadata": {}, 241 | "outputs": [], 242 | "source": [] 243 | } 244 | ], 245 | "metadata": { 246 | "kernelspec": { 247 | "display_name": "Python 3 (ipykernel)", 248 | "language": "python", 249 | "name": "python3" 250 | }, 251 | "language_info": { 252 | "codemirror_mode": { 253 | "name": "ipython", 254 | "version": 3 255 | }, 256 | "file_extension": ".py", 257 | "mimetype": "text/x-python", 258 | "name": "python", 259 | "nbconvert_exporter": "python", 260 | "pygments_lexer": "ipython3", 261 | "version": "3.7.6" 262 | } 263 | }, 264 | "nbformat": 4, 265 | "nbformat_minor": 4 266 | } 267 | -------------------------------------------------------------------------------- /Notebooks_Results/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/marisolgr/python_sat_tutorials/2ec5fe982ac03285dde32fd031a9c40219f25f4f/Notebooks_Results/.DS_Store -------------------------------------------------------------------------------- /Python_Installation.md: -------------------------------------------------------------------------------- 1 | # Installing Python, Jupyter Notebook, and clonnnig the tutorial from GitHub 2 | 3 | If you decide to install `Python`, there are a number of ways and resources to do so. Here, we will explain how to install Python and Jupyter Notebook using __Conda__. 4 | 5 | ## Conda 6 | [Conda](https://docs.conda.io/en/latest/) is package manager tool to help you install and update `Python` and any library you might need, keeping track of versions and conflicts. Not only for beginners, `conda` makes it easier and simpler to install and work with `Python` and other, always in development, libraries. 7 | 8 | We will use __Miniconda__, which is a minimal installer of conda. This means that it will install `conda`, `Python`, the packages that they depend on, and only a small number of other packages. Nothing extra. The user will only install the packages (libraries) needed. This allows your `Python` installation to be tailored to your needs. 9 | 10 | 11 | ### Install Miniconda 12 | - Download `Miniconda` [here](https://docs.conda.io/en/latest/miniconda.html). Choose your platform, and make sure to get at least Python 3.8. 13 | - Follow the Regular installation instructions for each platform, located [here](https://conda.io/projects/conda/en/latest/user-guide/install/index.html#regular-installation). (There are many sites with easier (or harder) to follow instructions, but this site is the most up-to-date one.) 14 | 15 | 16 | ### Install the necessary libraries and create a new 'environment' 17 | 18 | - Additional libraries can be installed one by one, or as we will do here, using a list of libraries in a file named __environment.yml__. Download this file directly from [here](https://github.com/marisolgr/python_sat_tutorials/blob/main/environment.yml), or find it in the main folder of this tutorial if you cloned it from `Github` (instructions below). 19 | - Open an anaconda prompt: 20 | - Windows: From your start button, look for Anaconda, within that folder open 'Anaconda Prompt'. 21 | - macOS: Open Launchpad, then open Terminal or iTerm. 22 | - Linux–CentOS: Open Applications - System Tools - Terminal. 23 | - Linux–Ubuntu: Open the Dash by clicking the upper left Ubuntu icon, then type “terminal”. 24 | - In the anaconda window, type `conda list`. If Anaconda is installed and working, this will display a list of installed packages and their versions. 25 | - In your anaconda prompt, type `conda env create -f environment.yml`. You may need to include the directory to where the environment.yml file was downloaded to. 26 | 27 | ### Run the tutorial 28 | 29 | - In the anaconda prompt, type `conda activate tutorialenv` to load the created environment. 30 | - Open a __Jupyter Notebook__ by typing `jupyter notebook` in the anaconda promt. This will open a __Jupyter Notebook__ `Dashboard` in your browser. From there you can open the Chapter Notebooks by clicking at the file links. 31 | 32 | ## Github 33 | 34 | Downloading the content of a __GitGub__ repository to your local computer is referred as `cloning`. To clone this (or any repository): 35 | 36 | - In a terminal type `git clone https://github.com/marisolgr/python_sat_tutorials` 37 | - Or you can download it by going to the main page [here](https://github.com/marisolgr/python_sat_tutorials) and clicking on the green button named `Code`, at the top right of the file list. Then, click on __Download ZIP__ from the drop menu. Unzip it. 38 | - You might want to move this directory to a convenient place in your computer. 39 | - From your __Jupyter Notebook__ `Dashboard`, navigate to the directory of the tutorial, and you'll see the Chapters files. 40 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Tutorial: Timeseries of Satellite Data using Python [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/marisolgr/python_sat_tutorials/main) 2 | 3 | 4 | Tutorial to learn how to access and analyze time series of Satellite and Satellite-based Data using Python and JupyterLab in the Cloud 5 | 6 | ## Objective 7 | This tutorial aims to provide scientists (or anybbody) who want to use satellite data with the necessary tools for obtaining, temporally analyzing, and visualizing these data using the Cloud. __Note:__ This in __not__ a tutorial on Python per se - there are a myriad of resources for that. The purpose of this tutorial is to learn, through __examples__, only the necessary Python code and tools required to extract and do simple temporal analysis of satellite data. 8 | 9 | We want you to get your toes wet, get to see and use the power of Python, and then maybe you will want to learn more. For that, we encourage you to visit the links on the __Resources__ section at the end of each chapter. 10 | 11 | This project, supported by the [Better Scientific Software foundation](https://bssw.io/), and the tutorial in oceanography that we use as basis here was originally funded by [NASA](https://www.nasa.gov/). 12 | 13 | *** 14 | 15 | ## How it works 16 | This tutorial is developed to run __and__ access satellite data on the Cloud. 17 | 18 | __To launch the tutorial__: 19 | 20 | - Click on the binder icon below. It will redirect you to an online environment with the tutorial. 21 | - It might take some time to load the first time, but eventually you'll be promted with a _Jupyter_ environment, listing the Chapters of this tutorial on your web browser (See Chapter 2 for a brief guide on Jupyter Notebook). 22 | - __Double click__ on the Chapter you want to work on. It will open in a new tab. 23 | - At the end of the session, quit the session (top right of the page). 24 | - You can access the tutorial (repeating this same procedure) as many times as you want. 25 | 26 | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/marisolgr/python_sat_tutorials/main) 27 | 28 | *** 29 | 30 | This tutorial is divided into Chapters that provide the necessary tools as building blocks. These chapters are stand-alone, so can be skipped if you are familiar with the particular tool presented. 31 | 32 | ## Chapters: 33 | 34 | 1. Introduction to Python for Earth Science: Basic concepts about Python 35 | 36 | 2. Introduction to Jupyter Lab: How to use the web interface JupyterLab 37 | 38 | 3. Python Basics: Basic of Python 39 | 40 | 4a. Python Tools: xarray, the library that makes satellite data analysis easy 41 | 42 | 4b. Plotting Tools: Python plotting libraries 43 | 44 | 5. Satellite Cloud Data: Background information on Cloud access and data 45 | 46 | 6. Ocean Data Example: First cloud data acquisition and analysis on ocean surface temperature data 47 | 48 | 7. Atmospheric Data Example: Acquisition and analysis of satellite-based data wind data from the cloud 49 | 50 | 8. Land Data Example: Acquisition and analysis of vegetation data from online data 51 | 52 | 53 | *** 54 | 55 | ## If you want to run it on your computer 56 | The tutorials can also be cloned from this repository and run locally on your computer (you would need good access ttot the internet and make sure your libraries are update and compatible). To get instructions of how to install Python, Jupyter Notebooks, clone the tutorials from Github, and to access the data on the cloud, see [here](https://github.com/marisolgr/python_sat_tutorials/blob/main/Python_Installation.md). 57 | 58 | *** 59 | 60 | Developed by: Marisol García-Reyes (marisolgr@faralloninstitute.org) 61 | 62 | Funded by: [Better Scientific Software (BSSw) Fellowship Program](https://bssw.io/pages/bssw-fellowship-program) 63 | 64 | _Modified from 'Python for Oceanographers' by: Chelle Gentemann and Marisol García-Reyes. Access [here](https://github.com/python4oceanography/ocean_python_tutorial), and 'Pangeo Tutorial for AGU Oceans Sciences 2020' [here](https://github.com/pangeo-gallery/osm2020tutorial)._ 65 | 66 | -------------------------------------------------------------------------------- /data/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/marisolgr/python_sat_tutorials/2ec5fe982ac03285dde32fd031a9c40219f25f4f/data/.DS_Store -------------------------------------------------------------------------------- /data/ERA5_wind10m_mon05.nc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/marisolgr/python_sat_tutorials/2ec5fe982ac03285dde32fd031a9c40219f25f4f/data/ERA5_wind10m_mon05.nc -------------------------------------------------------------------------------- /data/HadISST_sst_2000-2020.nc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/marisolgr/python_sat_tutorials/2ec5fe982ac03285dde32fd031a9c40219f25f4f/data/HadISST_sst_2000-2020.nc -------------------------------------------------------------------------------- /data/ndvi_feb2022.nc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/marisolgr/python_sat_tutorials/2ec5fe982ac03285dde32fd031a9c40219f25f4f/data/ndvi_feb2022.nc -------------------------------------------------------------------------------- /data/sst_example.nc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/marisolgr/python_sat_tutorials/2ec5fe982ac03285dde32fd031a9c40219f25f4f/data/sst_example.nc -------------------------------------------------------------------------------- /data/tmp.nc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/marisolgr/python_sat_tutorials/2ec5fe982ac03285dde32fd031a9c40219f25f4f/data/tmp.nc -------------------------------------------------------------------------------- /environment.yml: -------------------------------------------------------------------------------- 1 | name: tutorialenv 2 | channels: 3 | - conda-forge 4 | - defaults 5 | dependencies: 6 | - cartopy=0.18.0 7 | - jupyterlab=2.2.8 8 | - matplotlib=3.2.2 9 | - numpy=1.18.5 10 | - pandas=1.0.5 11 | - python=3.7.6 12 | - xarray=0.15.1 13 | - netcdf4=1.5.3 14 | - scikit-learn=1.0 15 | - statsmodels=0.13.0 16 | - dask=2.19.0 17 | - zarr=2.8.3 18 | - s3fs=0.2.0 19 | - fsspec=2021.8.1 20 | - hvplot=0.7.3 21 | - wget=1.20.3 22 | - aiohttp=3.7.4.post0 23 | - geoviews=1.9.2 24 | -------------------------------------------------------------------------------- /figures/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/marisolgr/python_sat_tutorials/2ec5fe982ac03285dde32fd031a9c40219f25f4f/figures/.DS_Store -------------------------------------------------------------------------------- /figures/JupyterNotebook_example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/marisolgr/python_sat_tutorials/2ec5fe982ac03285dde32fd031a9c40219f25f4f/figures/JupyterNotebook_example.png -------------------------------------------------------------------------------- /figures/Jupyter_Notebook_Dashboard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/marisolgr/python_sat_tutorials/2ec5fe982ac03285dde32fd031a9c40219f25f4f/figures/Jupyter_Notebook_Dashboard.png -------------------------------------------------------------------------------- /figures/Jupyter_Notebook_Menus.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/marisolgr/python_sat_tutorials/2ec5fe982ac03285dde32fd031a9c40219f25f4f/figures/Jupyter_Notebook_Menus.png -------------------------------------------------------------------------------- /figures/data_structures.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/marisolgr/python_sat_tutorials/2ec5fe982ac03285dde32fd031a9c40219f25f4f/figures/data_structures.png -------------------------------------------------------------------------------- /figures/globe_data.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/marisolgr/python_sat_tutorials/2ec5fe982ac03285dde32fd031a9c40219f25f4f/figures/globe_data.png -------------------------------------------------------------------------------- /figures/jupyter_logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/marisolgr/python_sat_tutorials/2ec5fe982ac03285dde32fd031a9c40219f25f4f/figures/jupyter_logo.png -------------------------------------------------------------------------------- /figures/map_base_January.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/marisolgr/python_sat_tutorials/2ec5fe982ac03285dde32fd031a9c40219f25f4f/figures/map_base_January.png -------------------------------------------------------------------------------- /figures/map_base_July.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/marisolgr/python_sat_tutorials/2ec5fe982ac03285dde32fd031a9c40219f25f4f/figures/map_base_July.png -------------------------------------------------------------------------------- /figures/python_logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/marisolgr/python_sat_tutorials/2ec5fe982ac03285dde32fd031a9c40219f25f4f/figures/python_logo.png -------------------------------------------------------------------------------- /figures/python_scientific_ecosystem.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/marisolgr/python_sat_tutorials/2ec5fe982ac03285dde32fd031a9c40219f25f4f/figures/python_scientific_ecosystem.png -------------------------------------------------------------------------------- /figures/xarray_logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/marisolgr/python_sat_tutorials/2ec5fe982ac03285dde32fd031a9c40219f25f4f/figures/xarray_logo.png --------------------------------------------------------------------------------