├── .gitignore
├── 01_Basic_Data_Analysis_and_Visualization.ipynb
├── 02_Science_Data_Formats_and_Advanced_Plotting.ipynb
├── 03 Image Combination and Gridding.ipynb
├── README.md
├── Solutions_to_Exercises.ipynb
├── data
├── 20200901_20200930_Monterey.lev15.csv
├── 3B-HHR.MS.MRG.3IMERG.20160811-S233000-E235959.1410.V06B.HDF5
├── 3B-HHR.MS.MRG.3IMERG.20160811-S233000-E235959.1410.V06B_thinned.nc
├── JRR-AOD_v2r3_j01_s202009152044026_e202009152045271_c202009152113150_thinned.nc
├── MOP03JM-201811-L3V95.6.3_thinned.nc
├── VIIRSNDE_global2020258.v1.0.txt
├── gfs_3_20200915_0000_000.grb2
├── meso
│ ├── OR_ABI-L1b-RadM1-M3C02_G16_s20182822019282_e20182822019339_c20182822019374.nc
│ ├── OR_ABI-L1b-RadM1-M3C13_G16_s20182822019282_e20182822019350_c20182822019384.nc
│ ├── OR_ABI-L1b-RadM1-M6C02_G16_s20192091147504_e20192091147562_c20192091147599.nc
│ └── OR_ABI-L1b-RadM1-M6C03_G16_s20192091147504_e20192091147562_c20192091148025.nc
└── sst.mon.ltm.1981-2010.nc
├── environment.yml
├── installation
└── install_python_run_notebook.pdf
└── sample_script.py
/.gitignore:
--------------------------------------------------------------------------------
1 |
2 | .ipynb_checkpoints/*
3 | satnames.npz
4 | satellites.csv
5 | precip.png
6 |
--------------------------------------------------------------------------------
/01_Basic_Data_Analysis_and_Visualization.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Lesson 1: Basic Data Analysis and Visualization \n",
8 | "\n",
9 | "Rebekah Esmaili (bekah@umd.edu)\n",
10 | "Research Scientist, STC/JPSS\n",
11 | " \n",
12 | "---\n",
13 | "\n",
14 | "\n",
15 | "## Why Python?\n",
16 | "\n",
17 | "Pros\n",
18 | "\n",
19 | "* General-purpose, cross-platform\n",
20 | "* Free and open source\n",
21 | "* Reasonably easy to learn\n",
22 | "* Expressive and succinct code, forces good style\n",
23 | "* Being interpreted and dynamically typed makes it great for data analysis\n",
24 | "* Robust ecosystem of scientific libraries, including powerful statistical and visualization packages\n",
25 | "* Large community of scientific users and large existing codebases\n",
26 | "* Major investment into Python ecosystem by Earth science research agencies, including NASA, NCAR, UK Met Office, and Lamont-Doherty Earth Observatory. See Pangeo.\n",
27 | "* Reads Earth science data formats like HDF, NetCDF, GRIB\n",
28 | "\n",
29 | "Cons\n",
30 | "\n",
31 | "* Performance penalties for interpreted languages, although many libraries are wrappers for compiled languages. Avoid large loops in favor of matrix/vector operations when possible.\n",
32 | "* Multithreading is limited due to the Global Interpreter Lock, but other parallelism is available\n",
33 | "* See Julia for a modern scientific language which is trying to overcome these challenges\n",
34 | "\n",
35 | "Why we use Python 3?\n",
36 | "\n",
37 | "* Python 2 reached it's \"end of life\" as of January 2020\n",
38 | "* No more updates or bugfixes\n",
39 | "* No further official support\n",
40 | "* Subtle differences: https://www.geeksforgeeks.org/important-differences-between-python-2-x-and-python-3-x-with-examples/\n",
41 | "\n",
42 | "---\n",
43 | "\n",
44 | "## Lesson Objectives\n",
45 | "\n",
46 | "* You will learn to:\n",
47 | " * Import relevant packages for scientific programming\n",
48 | " * Read ascii data\n",
49 | " * Basic plotting and visualization\n",
50 | " \n",
51 | "---\n",
52 | "\n",
53 | "## What do I need?\n",
54 | "* If you are really new to Python, I recommend using the binder links to run these notebooks remotely.\n",
55 | "* If you have some experience, you can either install Anaconda locally on your laptop or on a remote server. I _do not recommend_ using system or shared Python installations unless you are advanced!\n",
56 | "\n",
57 | "## What is Anaconda?\n",
58 | "* Anaconda is a package manager\n",
59 | "* Comes bundled with Python, a lot of useful scientific/mathematical packages, and development environments.\n",
60 | "* Easiest place to start if you are new!\n",
61 | "\n",
62 | "## Launching Jupyter Notebook\n",
63 | "\n",
64 | "Linux/Mac\n",
65 | "* Open terminal, cd to the directory where you have your notebooks and data, and type:\n",
66 | "```\n",
67 | "jupyter notebook\n",
68 | "```\n",
69 | "\n",
70 | "Windows\n",
71 | "* Start → Anaconda3 → Jupyter Notebook\n",
72 | "\n",
73 | "Jupyter Home Screen\n",
74 | "\n",
75 | "* This will launch your default web browser with a local webserver that displays the contents of the directory that you're working in.\n",
76 | "* Note: in all the examples, the path assumed that jupyter is launched from the notebook directory. You will need to change the path to point to your data if this is different.\n",
77 | "* Click on New on the top right.\n",
78 | "\n",
79 | "---\n",
80 | "\n",
81 | "## Basic Python Syntax"
82 | ]
83 | },
84 | {
85 | "cell_type": "markdown",
86 | "metadata": {},
87 | "source": [
88 | "The most basic Python command is to write words to the screen. In jupyter notebooks, the result will appear below the line of code. To run the above command in Jupyter notebook, highlight the cell and either chick the run button (►) or press the **Shift** and **Enter** keys"
89 | ]
90 | },
91 | {
92 | "cell_type": "code",
93 | "execution_count": null,
94 | "metadata": {},
95 | "outputs": [],
96 | "source": [
97 | "# This is a comment, python will not run this!\n",
98 | "print(\"Hello Earth\")"
99 | ]
100 | },
101 | {
102 | "cell_type": "markdown",
103 | "metadata": {},
104 | "source": [
105 | "In Python, variables are dynamically allocated, which means that you do not need to declare the type or size prior to storing data in them. Instead, Python will automatically guess the variable type based on the content of what you are assigning:"
106 | ]
107 | },
108 | {
109 | "cell_type": "code",
110 | "execution_count": null,
111 | "metadata": {},
112 | "outputs": [],
113 | "source": [
114 | "var_int = 8\n",
115 | "var_float = 15.0\n",
116 | "var_scifloat = 4e8\n",
117 | "var_complex = complex(4, 2)\n",
118 | "var_greetings = 'Hello Earth'"
119 | ]
120 | },
121 | {
122 | "cell_type": "markdown",
123 | "metadata": {},
124 | "source": [
125 | "Python has many built in functions, the syntax is usually:\n",
126 | "\n",
127 | "```\n",
128 | "function_name(inputs)\n",
129 | "```\n",
130 | "You have already used two functions: *print()* and *complex()*. Another useful function is *type()*, will tell us if the variable is an integer, a float, a complex number, or a string. "
131 | ]
132 | },
133 | {
134 | "cell_type": "code",
135 | "execution_count": null,
136 | "metadata": {},
137 | "outputs": [],
138 | "source": [
139 | " type(var_int), type(var_float), type(var_scifloat), type(var_complex), type(var_greetings)"
140 | ]
141 | },
142 | {
143 | "cell_type": "markdown",
144 | "metadata": {},
145 | "source": [
146 | "Python has the following built-in operators:\n",
147 | "\n",
148 | "* Addition, subtraction, multiplication, division: +, -, *, /\n",
149 | "* Exponential, integer division, modulus: \\**, //, %\n",
150 | "\n"
151 | ]
152 | },
153 | {
154 | "cell_type": "code",
155 | "execution_count": null,
156 | "metadata": {},
157 | "outputs": [],
158 | "source": [
159 | "2+2.0, var_int**2, var_float//var_int, var_float%var_int"
160 | ]
161 | },
162 | {
163 | "cell_type": "markdown",
164 | "metadata": {},
165 | "source": [
166 | "---\n",
167 | "\n",
168 | "**Exercise 1:** Learning to use notebooks\n",
169 | "\n",
170 | "1. Launch Jupyter Notebook and create a new notebook\n",
171 | "2. Rename the notebook\n",
172 | "3. Create a new cell and use *type()* to see if the following are floats and integers:\n",
173 | " * 2+2\n",
174 | " * 2\\*2.0\n",
175 | " * var_float/var_int\n",
176 | "---\n",
177 | "**Solution:**"
178 | ]
179 | },
180 | {
181 | "cell_type": "code",
182 | "execution_count": null,
183 | "metadata": {},
184 | "outputs": [],
185 | "source": []
186 | },
187 | {
188 | "cell_type": "markdown",
189 | "metadata": {},
190 | "source": [
191 | "## Working with lists\n",
192 | "\n",
193 | "Lists are useful for storing scientific data. Lists are made using square brackets. They can hold any data type (integers, floats, and strings) and even mixtures of the two."
194 | ]
195 | },
196 | {
197 | "cell_type": "code",
198 | "execution_count": null,
199 | "metadata": {},
200 | "outputs": [],
201 | "source": [
202 | "numbers_list = [4, 8, 15, 16, 23]"
203 | ]
204 | },
205 | {
206 | "cell_type": "markdown",
207 | "metadata": {},
208 | "source": [
209 | "You can access elements of the list using the index. Python is zero based, so index 0 retrieves the first element."
210 | ]
211 | },
212 | {
213 | "cell_type": "code",
214 | "execution_count": null,
215 | "metadata": {},
216 | "outputs": [],
217 | "source": [
218 | "numbers_list[3]"
219 | ]
220 | },
221 | {
222 | "cell_type": "markdown",
223 | "metadata": {},
224 | "source": [
225 | "New items can also be appended to the list using the append function, which has the syntax:\n",
226 | "\n",
227 | "```\n",
228 | "variable.function(element(s))\n",
229 | "```\n",
230 | "The list will be updated *in-place*."
231 | ]
232 | },
233 | {
234 | "cell_type": "code",
235 | "execution_count": null,
236 | "metadata": {},
237 | "outputs": [],
238 | "source": [
239 | "numbers_list.append(42)\n",
240 | "print(numbers_list)"
241 | ]
242 | },
243 | {
244 | "cell_type": "markdown",
245 | "metadata": {},
246 | "source": [
247 | "Perhaps we want to calculate the sum of the values in two lists. However, we cannot use the *+* like we did with single values. For list objects, the + will *combine* lists."
248 | ]
249 | },
250 | {
251 | "cell_type": "code",
252 | "execution_count": null,
253 | "metadata": {},
254 | "outputs": [],
255 | "source": [
256 | "numbers_list"
257 | ]
258 | },
259 | {
260 | "cell_type": "markdown",
261 | "metadata": {},
262 | "source": [
263 | "To perform mathematical operations, you can convert the above list to an array using the NumPy package."
264 | ]
265 | },
266 | {
267 | "cell_type": "markdown",
268 | "metadata": {},
269 | "source": [
270 | "### Importing Packages\n",
271 | "\n",
272 | "Packages are collection of modules, which help simplify common tasks. [NumPy](https://numpy.org/) is useful for mathematical operations and array manipulation.\n",
273 | "\n",
274 | "\n",
275 | "* Provides a high-performance multidimensional array object and tools for working with these arrays.\n",
276 | "* Fundamental package for scientific computing with Python.\n",
277 | "* Included with with the Anaconda package manager.\n",
278 | "* For more examples than presented below, please refer [the NumPy Quick Start](https://numpy.org/devdocs/user/quickstart.html)\n",
279 | "\n",
280 | "The basic syntax for calling packages is to type the import \\[package name\\]. However, some packages have long names, so you can use import \\[package name\\] as \\[alias\\]."
281 | ]
282 | },
283 | {
284 | "cell_type": "code",
285 | "execution_count": null,
286 | "metadata": {},
287 | "outputs": [],
288 | "source": [
289 | "import numpy as np"
290 | ]
291 | },
292 | {
293 | "cell_type": "markdown",
294 | "metadata": {},
295 | "source": [
296 | "If you do not see any error after running the line above, then the package was successfully imported.\n",
297 | "### Working with arrays\n",
298 | "\n",
299 | "I can use NumPy’s array constructor *np.array()* to convert our list to a NumPy array and perform the matrix multiplication. For example, I can double each element of the array:"
300 | ]
301 | },
302 | {
303 | "cell_type": "code",
304 | "execution_count": null,
305 | "metadata": {},
306 | "outputs": [],
307 | "source": [
308 | "numbers_array = np.array(numbers_list)\n",
309 | "numbers_array*2"
310 | ]
311 | },
312 | {
313 | "cell_type": "markdown",
314 | "metadata": {},
315 | "source": [
316 | "Another difference between arrays and lists is that lists are only one-dimensional. NumPy can be any number of dimensions. For example, I can change the dimensions of the data using the *reshape()* function:"
317 | ]
318 | },
319 | {
320 | "cell_type": "code",
321 | "execution_count": null,
322 | "metadata": {},
323 | "outputs": [],
324 | "source": [
325 | "numbers_array_2d = numbers_array.reshape(3,2)\n",
326 | "numbers_array_2d"
327 | ]
328 | },
329 | {
330 | "cell_type": "code",
331 | "execution_count": null,
332 | "metadata": {},
333 | "outputs": [],
334 | "source": [
335 | "numbers_array_2d.shape"
336 | ]
337 | },
338 | {
339 | "cell_type": "markdown",
340 | "metadata": {},
341 | "source": [
342 | "The original numbers_array has a length of 6, the new array has 2 rows and 3 columns."
343 | ]
344 | },
345 | {
346 | "cell_type": "markdown",
347 | "metadata": {},
348 | "source": [
349 | "### Reading ASCII data\n",
350 | "\n",
351 | "The Pandas package has a useful function for reading text/ascii data called *read_csv()*. The function name is somewhat a misnomer, as *read_csv* will read any delimited data using the *delim=* keyword argument. Below, you will import the [Pandas](https://pandas.pydata.org/) package and we will read in a dataset. Note that the path below is relative to the current notebook and you may have to change the code if you are running locally on your computer:\n",
352 | "\n",
353 | "```\n",
354 | "data/VIIRSNDE_global2020258.v1.0.txt\n",
355 | "```\n",
356 | "\n",
357 | "We will look at the Visible Infrared Imaging Radiometer Suite (VIIRS) Active Fire product, a product that classifies if a pixel contains fire with various confidence levels. More information can be found at https://www.ospo.noaa.gov/Products/land/fire.html. We will examine the data on Sept 15, 2020 (day of year 258)."
358 | ]
359 | },
360 | {
361 | "cell_type": "code",
362 | "execution_count": null,
363 | "metadata": {},
364 | "outputs": [],
365 | "source": [
366 | "import pandas as pd"
367 | ]
368 | },
369 | {
370 | "cell_type": "markdown",
371 | "metadata": {},
372 | "source": [
373 | "The default seperator is a comma (,), however my data also contains space. I use the \"\\s*\" to indicate space following the comma should be ignored. The engine=\"python\" keyword ensures that this will work across different operating systems."
374 | ]
375 | },
376 | {
377 | "cell_type": "code",
378 | "execution_count": null,
379 | "metadata": {},
380 | "outputs": [],
381 | "source": [
382 | "fname = \"data/VIIRSNDE_global2020258.v1.0.txt\"\n",
383 | "fires = pd.read_csv(fname, sep=',\\s*', engine='python')"
384 | ]
385 | },
386 | {
387 | "cell_type": "markdown",
388 | "metadata": {},
389 | "source": [
390 | "You can inspect the contents within the notebook using the *head()* function, which will return the first five rows of the dataset. Pandas automatically stores data in structures called *DataFrames*. DataFrames are two dimensional (rows and columns) and resemble a spreadsheet. The leftmost column is the row index and is not part of the *fires* dataset. "
391 | ]
392 | },
393 | {
394 | "cell_type": "code",
395 | "execution_count": null,
396 | "metadata": {},
397 | "outputs": [],
398 | "source": [
399 | "fires.head()"
400 | ]
401 | },
402 | {
403 | "cell_type": "markdown",
404 | "metadata": {},
405 | "source": [
406 | "You can access individual columns of data using the column name. For example, below you can extract the pixel brightness temperature (brt):"
407 | ]
408 | },
409 | {
410 | "cell_type": "code",
411 | "execution_count": null,
412 | "metadata": {},
413 | "outputs": [],
414 | "source": [
415 | "fires[\"brt_t13(K)\"]"
416 | ]
417 | },
418 | {
419 | "cell_type": "markdown",
420 | "metadata": {},
421 | "source": [
422 | "---\n",
423 | "\n",
424 | "**Exercise 2:** Import an ascii file\n",
425 | "\n",
426 | "1. Import the dataset \"20200901_20200930_Monterey.lev15.csv\" and save it to a variable called *aeronet*.\n",
427 | "2. Print the first few lines using *.head()*\n",
428 | "3. Find a column that doesn't have only missing values (-999) and (challenge!) calculate the mean using the following syntax *variable\\[\"column\"\\].mean()*\n",
429 | "---\n",
430 | "**Solution:**"
431 | ]
432 | },
433 | {
434 | "cell_type": "code",
435 | "execution_count": null,
436 | "metadata": {},
437 | "outputs": [],
438 | "source": []
439 | },
440 | {
441 | "cell_type": "markdown",
442 | "metadata": {},
443 | "source": [
444 | "### Working with masks and masked arrays\n",
445 | "\n",
446 | "When working with data, sometimes there are numbers I want to remove. For instance, I may want to work with data below a certain threshold. You can subset the data using identity operations:\n",
447 | "\n",
448 | "* less than: <\n",
449 | "* less than or equal to: <=\n",
450 | "* greater than: >\n",
451 | "* greater than or equal to: >=\n",
452 | "* equals: ==\n",
453 | "* not equals: !=\n",
454 | "\n",
455 | "Their use will return either a True or False statement. For the *fires* dataset, you can find which elements of the array that meet some condition, such as only examining larger fires that have a Fire Radiative Power (FRP) above 50 MW:"
456 | ]
457 | },
458 | {
459 | "cell_type": "code",
460 | "execution_count": null,
461 | "metadata": {},
462 | "outputs": [],
463 | "source": [
464 | "masked_nums = (fires['frp(MW)'] > 50)\n",
465 | "print(masked_nums)"
466 | ]
467 | },
468 | {
469 | "cell_type": "markdown",
470 | "metadata": {},
471 | "source": [
472 | "Sometimes you may want to filter by two conditions. For example, insteading of filtering the FRP data, you may only want to examine values within a latitude and longitude domain. In Python, I can combine multiple conditions using and (&) and or (|) statements. Below, I extract the data in 5°x5° box arond Monterey, California:"
473 | ]
474 | },
475 | {
476 | "cell_type": "code",
477 | "execution_count": null,
478 | "metadata": {},
479 | "outputs": [],
480 | "source": [
481 | "masked_nums = (fires['Lat'] > 35.0) & (fires['Lat'] < 40.0) & (fires['Lon'] > -125.0) & (fires['Lon'] < -120.0)\n",
482 | "print(masked_nums)"
483 | ]
484 | },
485 | {
486 | "cell_type": "markdown",
487 | "metadata": {},
488 | "source": [
489 | "The above mask can be used in place of an index. Below, you can create a new variable that takes the FRP using the *fires\\['frp(MW)'\\]* variable and subsets it with the array of *masked_nums*:"
490 | ]
491 | },
492 | {
493 | "cell_type": "code",
494 | "execution_count": null,
495 | "metadata": {},
496 | "outputs": [],
497 | "source": [
498 | "monterey_fires = fires['frp(MW)'][masked_nums]\n",
499 | "print(monterey_fires)"
500 | ]
501 | },
502 | {
503 | "cell_type": "markdown",
504 | "metadata": {},
505 | "source": [
506 | "From this new variable, you can compute the average in this region and compare them to the global average for that day:"
507 | ]
508 | },
509 | {
510 | "cell_type": "code",
511 | "execution_count": null,
512 | "metadata": {},
513 | "outputs": [],
514 | "source": [
515 | "monterey_fires.mean(), fires['frp(MW)'].mean()"
516 | ]
517 | },
518 | {
519 | "cell_type": "markdown",
520 | "metadata": {},
521 | "source": [
522 | "You can use the size command to compare the dimensions of original array and the one that filtered out values that were outside of our latitude and longitude bounds. You will notice that these two arrays have different sizes."
523 | ]
524 | },
525 | {
526 | "cell_type": "code",
527 | "execution_count": null,
528 | "metadata": {},
529 | "outputs": [],
530 | "source": [
531 | "fires['frp(MW)'].size, monterey_fires.size"
532 | ]
533 | },
534 | {
535 | "cell_type": "markdown",
536 | "metadata": {},
537 | "source": [
538 | "There are cases where you will want to preserve the size and shape of the original array. For these situations, you can utilize the NumPy *masked array* module. The syntax is *np.ma.array()*, and you will add a keyword argument *mask=*, which is set to the inverse (~) of the *mask_nums*."
539 | ]
540 | },
541 | {
542 | "cell_type": "code",
543 | "execution_count": null,
544 | "metadata": {},
545 | "outputs": [],
546 | "source": [
547 | "monterey_fires_ma = np.ma.array(fires['frp(MW)'], mask=~masked_nums, fill_value=-999)\n",
548 | "monterey_fires_ma"
549 | ]
550 | },
551 | {
552 | "cell_type": "markdown",
553 | "metadata": {},
554 | "source": [
555 | "Then, you can calculate the mean values and confirm that they are the same as the previous example:"
556 | ]
557 | },
558 | {
559 | "cell_type": "code",
560 | "execution_count": null,
561 | "metadata": {},
562 | "outputs": [],
563 | "source": [
564 | "monterey_fires_ma.mean()"
565 | ]
566 | },
567 | {
568 | "cell_type": "markdown",
569 | "metadata": {},
570 | "source": [
571 | "However, the key difference will be the size, which retains the shape of the unmasked data:"
572 | ]
573 | },
574 | {
575 | "cell_type": "code",
576 | "execution_count": null,
577 | "metadata": {},
578 | "outputs": [],
579 | "source": [
580 | "monterey_fires_ma.size"
581 | ]
582 | },
583 | {
584 | "cell_type": "markdown",
585 | "metadata": {},
586 | "source": [
587 | "---\n",
588 | "**Exercise 3:** Filtering data\n",
589 | "\n",
590 | "Using the dataset imported in the previous example (*aeronet*):\n",
591 | " \n",
592 | "1. Create a mask that filters the \"AOD_870nm\" column to only include values that are above 0.\n",
593 | "2. Create a new variables, *day_of_year*, with the mask applied to aeronet\\[\"Day_of_Year(Fraction)\"\\].\n",
594 | "3. Create a new variables, *aod_870*, with the mask applied to aeronet\\[\"AOD_870nm\"\\].\n",
595 | "4. Compare the mean value of *aeronet\\[\"AOD_870nm\"\\]* to *aod_870*.\n",
596 | " \n",
597 | "---\n",
598 | "**Solution**"
599 | ]
600 | },
601 | {
602 | "cell_type": "code",
603 | "execution_count": null,
604 | "metadata": {},
605 | "outputs": [],
606 | "source": []
607 | },
608 | {
609 | "cell_type": "markdown",
610 | "metadata": {},
611 | "source": [
612 | "### Basic figures and plots\n",
613 | "\n",
614 | "Python has several packages to create visuals for remote sensing data, either in the form of imagery or plots of relevant analysis. Of these, the most widely used and oldest packages is [Matplotlib](https://matplotlib.org/). Matplotlib plots are highly customizable and has additional toolkits that can enhance functionality, such as creating maps using the [Cartopy](https://scitools.org.uk/cartopy/docs/latest/) package, which I will describe more in the next session.\n"
615 | ]
616 | },
617 | {
618 | "cell_type": "code",
619 | "execution_count": null,
620 | "metadata": {},
621 | "outputs": [],
622 | "source": [
623 | "import matplotlib.pyplot as plt"
624 | ]
625 | },
626 | {
627 | "cell_type": "markdown",
628 | "metadata": {},
629 | "source": [
630 | "Suppose you want to learn what the global distribution of fire radiative power is. From inspecting the frp(MW) column earlier, these values extend to many decimal places. Rather than use a continuous scale, I can instead group in the data into 10 MW bins, from 0 to 500 MW:"
631 | ]
632 | },
633 | {
634 | "cell_type": "code",
635 | "execution_count": null,
636 | "metadata": {},
637 | "outputs": [],
638 | "source": [
639 | "bins10MW = np.arange(0, 500, 10)"
640 | ]
641 | },
642 | {
643 | "cell_type": "markdown",
644 | "metadata": {},
645 | "source": [
646 | "I can use these bins to create a histogram. Line by line, the code below will do as follows. Each additional line is layering elements on this empty graphic. The entire block of code must be run at once and not split into multiple cells. \n",
647 | "\n",
648 | "1. *plt.figure()* creates a blank canvas.\n",
649 | "2. I add the histogram to the figure using *plt.hist()*, which automatically will count the number of rows with fire radiative power in the bins that I defined above in the bins10W variable. I must then pass in the data (fires['frp(MW)']) and the bins (bins10MW) into plt.hist. \n",
650 | "3. *plt.show()* tells matplotlib the plot is now complete and to render it:"
651 | ]
652 | },
653 | {
654 | "cell_type": "code",
655 | "execution_count": null,
656 | "metadata": {},
657 | "outputs": [],
658 | "source": [
659 | "plt.figure(figsize=[5,5])\n",
660 | "plt.hist(fires['frp(MW)'], bins=bins10MW)\n",
661 | "plt.show()"
662 | ]
663 | },
664 | {
665 | "cell_type": "markdown",
666 | "metadata": {},
667 | "source": [
668 | "Below, you will remake this plot but add some aesthetic additions, such as labels to the x and y axis using *set_xlabel()* and *set_ylabel()*. Since there are thousands more fires with fire radiative power less than 100 MW than fires with higher values the data are likely lognormal. The plot will be easier to interpret of I rescale the y-axis to a log scale while leaving the x-axis linear.\n",
669 | "\n",
670 | "The command *plt.subplot()* will return an axis object to a variable (*ax*). There are three numbers passed in (111), which correspond to rows, columns, and index. In this example, there is one row and one column, and therefore, only one index."
671 | ]
672 | },
673 | {
674 | "cell_type": "code",
675 | "execution_count": null,
676 | "metadata": {},
677 | "outputs": [],
678 | "source": [
679 | "plt.figure()\n",
680 | "\n",
681 | "ax = plt.subplot(111)\n",
682 | "\n",
683 | "ax.hist(fires['frp(MW)'], bins=bins10MW)\n",
684 | "\n",
685 | "ax.set_yscale('log')\n",
686 | "\n",
687 | "ax.set_xlabel(\"Fire Radiative Power (MW)\")\n",
688 | "ax.set_ylabel(\"Counts\")\n",
689 | "\n",
690 | "plt.show()"
691 | ]
692 | },
693 | {
694 | "cell_type": "markdown",
695 | "metadata": {},
696 | "source": [
697 | "You can also plot the data in 2-dimensions. For example, each row in *fires* has a latitude and longitude coordinates pair. I will take these two coordinates and plot using *plt.scatter()*. The first argument is the x-coordinate and the second is the y-coordinate (the order matters). \n",
698 | "\n",
699 | "There are some command line options *plt.scatter()*:\n",
700 | "\n",
701 | "* s: size with respect to the default\n",
702 | "* c: color, which can be either from a predefined name list or a hexadecimal value\n",
703 | "* alpha: opacity, where smaller values are transparent.\n",
704 | "\n",
705 | "Like in the previous example, I have chosen to label the latitude and longitude axes:"
706 | ]
707 | },
708 | {
709 | "cell_type": "code",
710 | "execution_count": null,
711 | "metadata": {},
712 | "outputs": [],
713 | "source": [
714 | "fig = plt.figure()\n",
715 | "ax = plt.subplot(111)\n",
716 | "\n",
717 | "ax.scatter(fires['Lon'], fires['Lat'], s=0.5, c='black', alpha=0.1)\n",
718 | "\n",
719 | "ax.set_xlabel('Longitude')\n",
720 | "ax.set_ylabel('Latitude')\n",
721 | "\n",
722 | "plt.show()"
723 | ]
724 | },
725 | {
726 | "cell_type": "markdown",
727 | "metadata": {},
728 | "source": [
729 | "You can almost see the outline of the continents from the data above. In the next session, you will learn how to overlay maps onto your plots."
730 | ]
731 | },
732 | {
733 | "cell_type": "markdown",
734 | "metadata": {},
735 | "source": [
736 | "---\n",
737 | "**Exercise 4:** Create a scatterplot\n",
738 | "\n",
739 | "Use the variables *aod_870* and *day_of_year* that you made in Exercise 3 to:\n",
740 | "\n",
741 | "1. Create a scatter plot showing the *day_of_year* (x-axis) and *aod_870* (y-axis)\n",
742 | "2. Add y-axis and x-axis labels using *.set_xlabel()* and *.set_ylabel()*\n",
743 | "3. Adjust the color and size of the scatterplot\n",
744 | "---\n",
745 | "**Solution**"
746 | ]
747 | },
748 | {
749 | "cell_type": "code",
750 | "execution_count": null,
751 | "metadata": {},
752 | "outputs": [],
753 | "source": []
754 | },
755 | {
756 | "cell_type": "markdown",
757 | "metadata": {},
758 | "source": [
759 | "## Summary:\n",
760 | "\n",
761 | "You learned:\n",
762 | "* Very basic built-in Python functions and operations\n",
763 | "* How to import three packages: numpy, pandas, and matplotlib\n",
764 | "* Worked with arrays and lists\n",
765 | "* How to create a simple plot\n",
766 | "\n",
767 | "Next lesson:\n",
768 | "* More advanced plots, such as using maps\n",
769 | "* Importing scientific datasets, such as netcdf and grib"
770 | ]
771 | },
772 | {
773 | "cell_type": "code",
774 | "execution_count": null,
775 | "metadata": {},
776 | "outputs": [],
777 | "source": []
778 | }
779 | ],
780 | "metadata": {
781 | "kernelspec": {
782 | "display_name": "Python 3 (ipykernel)",
783 | "language": "python",
784 | "name": "python3"
785 | },
786 | "language_info": {
787 | "codemirror_mode": {
788 | "name": "ipython",
789 | "version": 3
790 | },
791 | "file_extension": ".py",
792 | "mimetype": "text/x-python",
793 | "name": "python",
794 | "nbconvert_exporter": "python",
795 | "pygments_lexer": "ipython3",
796 | "version": "3.7.12"
797 | }
798 | },
799 | "nbformat": 4,
800 | "nbformat_minor": 4
801 | }
802 |
--------------------------------------------------------------------------------
/02_Science_Data_Formats_and_Advanced_Plotting.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Lesson 2: Scientific Data Formats and Advanced Plotting\n",
8 | "\n",
9 | "Rebekah Esmaili (bekah@umd.edu) Research Scientist, STC/JPSS\n",
10 | " \n",
11 | "---\n",
12 | "\n",
13 | "\n",
14 | "## Lesson Objectives\n",
15 | "* You will learn to:\n",
16 | " * Import relevant packages for scientific programming\n",
17 | " * Read netCDF and GRIB2 data\n",
18 | " * Creating plots and maps\n",
19 | " \n",
20 | "---\n",
21 | "\n",
22 | "## What do I need?\n",
23 | "* If you are really new to Python, I recommend using the binder links to run these notebooks remotely.\n",
24 | "* If you have some experience, you can either install Anaconda locally on your laptop or on a remote server. There are some instructions in the main directory.\n",
25 | "* I _do not recommend_ using system or shared Python installations unless you are advanced!\n",
26 | "\n",
27 | "---\n",
28 | "\n",
29 | "## Importing NetCDF files\n",
30 | "\n",
31 | "NetCDF and HDF are self-describing formats, which are structured binary data files and useful for storing other big datasets. Computationally, it is faster to read in binary-based datasets than text, which needs to be parsed before being stored into a computer’s memory. Because the files are more compact, they are cheaper to store large, long-term satellite data. Furthermore, information about the data can be stored inside the file themselves.\n",
32 | "\n",
33 | "Datasets:\n",
34 | "* JRR-AOD_v2r3_j01_s202009152044026_e202009152045271_c202009152113150_thinned.nc: A netCDF file that contains Aerosol Optical Depth (AOD) retrieved from a Suomi NPP overpass on 2020 9 Aug. For this workshop, unused fields were removed.\n",
35 | "* gfs_3_20200915_0000_000.grb2: A GRIB2 file that contains GFS analysis\n",
36 | "* MOP03JM-201811-L3V95.6.3_thinned.nc: The Nov 2018 CO monthly mean from the Measurement of Pollution in the Troposphere (MOPITT), which is an instrument on the Terra satellite.\n",
37 | " * NOTE: For this tutorial, the file was converted to a netCDF4 file and unused variable fields were removed. The original file is HDF5 MOP03JM-201811-L3V95.6.3.he5 and can be obtained from https://earthdata.nasa.gov/.\n",
38 | "* [NOAA Extended Reconstructed SST version 5 dataset (ERSST)](https://psl.noaa.gov/data/gridded/data.noaa.ersst.v5.html). Shows the global monthly mean ocean surface temperature from 1854-present using data collected from ocean buoys, ships, and climate modeled data.\n",
39 | "\n",
40 | "Many environmental dataset names are quite long. However, the dataset name is encoded to give us information about the contents. For example:\n",
41 | "\n",
42 | "```\n",
43 | "JRR-AOD_v2r3_j01_s202009152044026_e202009152045271_c202009152113150.nc\n",
44 | "```\n",
45 | "You can learn several important features of the dataset without opening it:\n",
46 | "\n",
47 | "* Prefix indicates the mission (JRR, for JPSS Risk Reduction)\n",
48 | "* Product (Aerosol Optical Depth, or AOD), algorithm version\n",
49 | "* Revision number (v1r1)\n",
50 | "* Satellite source (j01 for JPSS-1/NOAA-20)\n",
51 | "* Start (s), end (e), and creation (c) time, which are each followed by the year, month, day, hour, minute, and seconds (to one decimal place). \n",
52 | "\n",
53 | "First, import three commonly used packages in Python:"
54 | ]
55 | },
56 | {
57 | "cell_type": "code",
58 | "execution_count": null,
59 | "metadata": {},
60 | "outputs": [],
61 | "source": [
62 | "import numpy as np\n",
63 | "import pandas as pd\n",
64 | "import matplotlib.pyplot as plt"
65 | ]
66 | },
67 | {
68 | "cell_type": "markdown",
69 | "metadata": {},
70 | "source": [
71 | "To begin, you need to first import [xarray](http://xarray.pydata.org/en/stable/io.html) which is tailored to open netCDF4 files and work with large arrays (like numpy and pandas). The [netCDF4 package](https://unidata.github.io/netcdf4-python/netCDF4/index.html) can also be used to import files."
72 | ]
73 | },
74 | {
75 | "cell_type": "code",
76 | "execution_count": null,
77 | "metadata": {},
78 | "outputs": [],
79 | "source": [
80 | "import xarray as xr"
81 | ]
82 | },
83 | {
84 | "cell_type": "markdown",
85 | "metadata": {},
86 | "source": [
87 | "Use the Dataset function to import the above dataset."
88 | ]
89 | },
90 | {
91 | "cell_type": "code",
92 | "execution_count": null,
93 | "metadata": {},
94 | "outputs": [],
95 | "source": [
96 | "fname='data/JRR-AOD_v2r3_j01_s202009152044026_e202009152045271_c202009152113150_thinned.nc'\n",
97 | "aod_file_id = xr.open_dataset(fname)"
98 | ]
99 | },
100 | {
101 | "cell_type": "markdown",
102 | "metadata": {},
103 | "source": [
104 | "If you print the contents of the file_id variable, you will get a long list of the global attributes, variables, dimensions, and much more."
105 | ]
106 | },
107 | {
108 | "cell_type": "code",
109 | "execution_count": null,
110 | "metadata": {},
111 | "outputs": [],
112 | "source": [
113 | "aod_file_id"
114 | ]
115 | },
116 | {
117 | "cell_type": "markdown",
118 | "metadata": {},
119 | "source": [
120 | "The output above is worth inspecting. Inside Jupyter Notebooks, xarray allows you to inspect the file contents. Clicking on the arrows will show a preview of the metadata. Note that you can also use tools like [Panoply](https://www.giss.nasa.gov/tools/panoply/) to inspect the contents of the netCDF file outside of Python.\n",
121 | "\n",
122 | "* __Dimensions__: The dimensions are named Rows and Columns, which are respectively 768 and 3200.\n",
123 | "\n",
124 | "* __Coordinates__: The coordinates are Latitude and Longitude. These are both two dimensions.\n",
125 | "\n",
126 | "* __Variables__: This file has only one variable, which is AOD550. It's dimensions are also Rows and Columns.\n",
127 | "\n",
128 | "* __Attributes__: netCDF4 [CF-1.5 conventions](https://cfconventions.org/). Some of the information that we saw in the file name is also present: this product is the *JPSS Risk Reduction Unique Aerosol Optical Depth* (title) *Level 2* product (processing_level) and the data was collected from the *NOAA-20* (satellite_name) *VIIRS* instrument (instrument_name). The *start* (time_coverage_start) and *end* times (time_coverage_end) metadata fields are consistent with the filename. I recommend that you read netCDF file header contents, especially the first time you are working with new data. "
129 | ]
130 | },
131 | {
132 | "cell_type": "markdown",
133 | "metadata": {},
134 | "source": [
135 | "AOD is a unitless measure of the extinction of solar radiation by particles suspended in the atmosphere. High values of AOD can indicate the presence of dust, smoke, or another air pollutant while low values indicate a cleaner atmosphere.\n",
136 | "\n",
137 | "**A quick NumPy recap!**\n",
138 | "Using NumPy, we can access individual elements using an index, with zero being the first element. For example:"
139 | ]
140 | },
141 | {
142 | "cell_type": "code",
143 | "execution_count": null,
144 | "metadata": {},
145 | "outputs": [],
146 | "source": [
147 | "num_array = np.array([4, 8, 15, 16, 23, 42])\n",
148 | "num_array[2]"
149 | ]
150 | },
151 | {
152 | "cell_type": "markdown",
153 | "metadata": {},
154 | "source": [
155 | "You can access all numbers using the colon (:) inside the square brackets:"
156 | ]
157 | },
158 | {
159 | "cell_type": "code",
160 | "execution_count": null,
161 | "metadata": {},
162 | "outputs": [],
163 | "source": [
164 | "num_array[:]"
165 | ]
166 | },
167 | {
168 | "cell_type": "markdown",
169 | "metadata": {},
170 | "source": [
171 | "I can extract AOD using the *.variable* function. It's a 2-dimensional array, so the code below has two \\[:,:\\]"
172 | ]
173 | },
174 | {
175 | "cell_type": "code",
176 | "execution_count": null,
177 | "metadata": {},
178 | "outputs": [],
179 | "source": [
180 | "AOD_550 = aod_file_id['AOD550'][:,:]\n",
181 | "AOD_lat = aod_file_id['Latitude'][:,:]\n",
182 | "AOD_lon = aod_file_id['Longitude'][:,:]"
183 | ]
184 | },
185 | {
186 | "cell_type": "code",
187 | "execution_count": null,
188 | "metadata": {},
189 | "outputs": [],
190 | "source": [
191 | "AOD_550"
192 | ]
193 | },
194 | {
195 | "cell_type": "markdown",
196 | "metadata": {},
197 | "source": [
198 | "Xarray uses NumPy as a dependency so so we can use numpy functions like *.mean()*. First we have to make sure it's in the right format. If you check the type of *AOD_550*, you can see it's a *numpy.ndarray.*"
199 | ]
200 | },
201 | {
202 | "cell_type": "code",
203 | "execution_count": null,
204 | "metadata": {},
205 | "outputs": [],
206 | "source": [
207 | "type(AOD_550.values)"
208 | ]
209 | },
210 | {
211 | "cell_type": "markdown",
212 | "metadata": {},
213 | "source": [
214 | "The missing values are masked out, so if we do statistics on the array, it will not include them."
215 | ]
216 | },
217 | {
218 | "cell_type": "code",
219 | "execution_count": null,
220 | "metadata": {},
221 | "outputs": [],
222 | "source": [
223 | "avgAOD = AOD_550.mean()\n",
224 | "print(avgAOD)"
225 | ]
226 | },
227 | {
228 | "cell_type": "markdown",
229 | "metadata": {},
230 | "source": [
231 | "---\n",
232 | "**Exercise 1**: Importing netCDF files\n",
233 | "1. Open the file \"MOP03JM-201811-L3V95.6.3_thinned.nc\" using the netCDF4 library\n",
234 | "2. Print the variable names\n",
235 | "3. What are the dimensions?\n",
236 | "---\n",
237 | "\n",
238 | "**Solution:**"
239 | ]
240 | },
241 | {
242 | "cell_type": "code",
243 | "execution_count": null,
244 | "metadata": {},
245 | "outputs": [],
246 | "source": []
247 | },
248 | {
249 | "cell_type": "markdown",
250 | "metadata": {},
251 | "source": [
252 | "## Importing GRIB2 files\n",
253 | "\n",
254 | "GRIB2 files is a binary datasets that take on a table-driven code form. \"Table driven\" means that the files require external tables to decode the data type. Thus, they are not self-describing. These files follow a methodology of encoding binary data and not a distinct file type. Binary Universal Form for the Representation of meteorological data (BUFR) and GRIdded Binary (GRIB) are two common table-driven formats in Earth Sciences. \n",
255 | "\n",
256 | "American NWS models (e.g. GFS, NAM, and HRRR) and the European (e.g. ECMWF) models are stored in GRIB2. While they share the same format, there are some differences in how each organization stores its data. GRIB2 are stored as binary variables with a header describing the data stored followed by the variable values.\n",
257 | "\n",
258 | "Currently, some of the GRIB2 decoders have problems parsing the American datasets because the American models have multiple pressure dimensions (depending on the variable) while the European models have one. Still, there are ways the data can be inspected by using the pygrib and cfgrib packages."
259 | ]
260 | },
261 | {
262 | "cell_type": "code",
263 | "execution_count": null,
264 | "metadata": {},
265 | "outputs": [],
266 | "source": [
267 | "import pygrib"
268 | ]
269 | },
270 | {
271 | "cell_type": "markdown",
272 | "metadata": {},
273 | "source": [
274 | "The pygrib package (Unidata) has an interface between Python and the GRIB-API (ECMWF). ECMWF has since ended support for the GRIB-API as the primary GRIB2 encoded and decoder and now use ecCodes. However, the package is still maintained by the developer (https://jswhit.github.io/pygrib/) and is useful for parsing NCEP weather forecast data."
275 | ]
276 | },
277 | {
278 | "cell_type": "code",
279 | "execution_count": null,
280 | "metadata": {},
281 | "outputs": [],
282 | "source": [
283 | "filename = 'data/gfs_3_20200915_0000_000.grb2'\n",
284 | "gfs_grb2 = pygrib.open(filename)"
285 | ]
286 | },
287 | {
288 | "cell_type": "markdown",
289 | "metadata": {},
290 | "source": [
291 | "This opens the file, but does not extract the elements:"
292 | ]
293 | },
294 | {
295 | "cell_type": "code",
296 | "execution_count": null,
297 | "metadata": {},
298 | "outputs": [],
299 | "source": [
300 | "type(gfs_grb2)"
301 | ]
302 | },
303 | {
304 | "cell_type": "markdown",
305 | "metadata": {},
306 | "source": [
307 | "Below is a *for loop* in Python. The code block below will iterate over each item in the open dataset and append (using *.append*) them to a list (*records*). Note that if you run this command again, you will read to the end of the file, so there will be no result. You will have to re-open the command and re-run the block below.\n",
308 | "\n",
309 | "You can check the size of the final list using *len(messages)*:"
310 | ]
311 | },
312 | {
313 | "cell_type": "code",
314 | "execution_count": null,
315 | "metadata": {},
316 | "outputs": [],
317 | "source": [
318 | "records = []\n",
319 | "for grb in gfs_grb2:\n",
320 | " records.append(str(grb))\n",
321 | " \n",
322 | "len(records)"
323 | ]
324 | },
325 | {
326 | "cell_type": "markdown",
327 | "metadata": {},
328 | "source": [
329 | "There are 522 individual data product definition in this file, so first let’s inspect the contents of one line to start:"
330 | ]
331 | },
332 | {
333 | "cell_type": "code",
334 | "execution_count": null,
335 | "metadata": {},
336 | "outputs": [],
337 | "source": [
338 | "records[12]"
339 | ]
340 | },
341 | {
342 | "cell_type": "markdown",
343 | "metadata": {},
344 | "source": [
345 | "From the output above, you can see that the colons (:) separate the sections of the product definition in this GRIB2 message. The elements are *index* (1), *variable name* and *units* (2-3), and *spatial*, *vertical*, and *temporal* definitions (4-8). There is one record for each *pressure level* and *time*. You can then extract all variables using the *.select(name=\\[variable\\])* command. Below, you select all the Temperature records (there are 46, which you can see by using the *len(temps)* command). Since it is a long list, you are only printing some of these below:"
346 | ]
347 | },
348 | {
349 | "cell_type": "code",
350 | "execution_count": null,
351 | "metadata": {},
352 | "outputs": [],
353 | "source": [
354 | "temps = gfs_grb2.select(name='Temperature')"
355 | ]
356 | },
357 | {
358 | "cell_type": "markdown",
359 | "metadata": {},
360 | "source": [
361 | "If you want to extract temperature at 85000 Pa, you can use the index (*315*) to pull that record:"
362 | ]
363 | },
364 | {
365 | "cell_type": "code",
366 | "execution_count": null,
367 | "metadata": {},
368 | "outputs": [],
369 | "source": [
370 | "temp = gfs_grb2[315]"
371 | ]
372 | },
373 | {
374 | "cell_type": "markdown",
375 | "metadata": {},
376 | "source": [
377 | "Then, using *.values* you can extract the data from the record:"
378 | ]
379 | },
380 | {
381 | "cell_type": "code",
382 | "execution_count": null,
383 | "metadata": {},
384 | "outputs": [],
385 | "source": [
386 | "temp.values"
387 | ]
388 | },
389 | {
390 | "cell_type": "markdown",
391 | "metadata": {},
392 | "source": [
393 | "You can also extract the grid information and other import metadata for this record. To see all available information, use the *.keys()* command:"
394 | ]
395 | },
396 | {
397 | "cell_type": "code",
398 | "execution_count": null,
399 | "metadata": {
400 | "scrolled": true
401 | },
402 | "outputs": [],
403 | "source": [
404 | "temp.keys()"
405 | ]
406 | },
407 | {
408 | "cell_type": "markdown",
409 | "metadata": {},
410 | "source": [
411 | "The coordinates can be extracted using the *.latitude* and *.longitude*. You can additionally extract the level, units, and forecast time from the file:"
412 | ]
413 | },
414 | {
415 | "cell_type": "code",
416 | "execution_count": null,
417 | "metadata": {},
418 | "outputs": [],
419 | "source": [
420 | "gfs_lat_all = temp.latitudes\n",
421 | "gfs_lon_all = temp.longitudes\n",
422 | "\n",
423 | "level = temp.level\n",
424 | "units = temp.units\n",
425 | "\n",
426 | "analysis_date = temp.analDate\n",
427 | "fcst_time = temp.forecastTime"
428 | ]
429 | },
430 | {
431 | "cell_type": "markdown",
432 | "metadata": {},
433 | "source": [
434 | "Problem: The shape of the latitude is MUCH bigger than the temperature... why and what can we do about it?"
435 | ]
436 | },
437 | {
438 | "cell_type": "code",
439 | "execution_count": null,
440 | "metadata": {},
441 | "outputs": [],
442 | "source": [
443 | "temp.values.shape, gfs_lat_all.shape, gfs_lon_all.shape"
444 | ]
445 | },
446 | {
447 | "cell_type": "markdown",
448 | "metadata": {},
449 | "source": [
450 | "We can troubleshoot by printing the values. We can see that latitude repeats the values many times."
451 | ]
452 | },
453 | {
454 | "cell_type": "code",
455 | "execution_count": null,
456 | "metadata": {},
457 | "outputs": [],
458 | "source": [
459 | "gfs_lat_all, gfs_lon_all"
460 | ]
461 | },
462 | {
463 | "cell_type": "markdown",
464 | "metadata": {},
465 | "source": [
466 | "A simple way of fixing this is to use np.unique to remove any duplicating values:"
467 | ]
468 | },
469 | {
470 | "cell_type": "code",
471 | "execution_count": null,
472 | "metadata": {},
473 | "outputs": [],
474 | "source": [
475 | "gfs_lat = np.unique(gfs_lat_all)\n",
476 | "gfs_lon = np.unique(gfs_lon_all)\n",
477 | "gfs_lat.shape, gfs_lon.shape"
478 | ]
479 | },
480 | {
481 | "cell_type": "markdown",
482 | "metadata": {},
483 | "source": [
484 | "Now that we know how to import multidimensional data, you will make some plots in the next section."
485 | ]
486 | },
487 | {
488 | "cell_type": "markdown",
489 | "metadata": {},
490 | "source": [
491 | "## Plotting 3-dimensional Data\n",
492 | "\n",
493 | "We can access the data directly using Open-source Project for a [Network Data Access Protocol (OPeNDAP)](https://www.opendap.org/), which simplifies access. Instead of downloading and reading data into Python, we can access it directly using a URL."
494 | ]
495 | },
496 | {
497 | "cell_type": "code",
498 | "execution_count": null,
499 | "metadata": {},
500 | "outputs": [],
501 | "source": [
502 | "fname = 'http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/noaa.ersst.v5/sst.mnmean.nc'\n",
503 | "\n",
504 | "sst_file_id = xr.open_dataset(fname)"
505 | ]
506 | },
507 | {
508 | "cell_type": "markdown",
509 | "metadata": {},
510 | "source": [
511 | "You can print the dataset contents by typing the variable name:"
512 | ]
513 | },
514 | {
515 | "cell_type": "code",
516 | "execution_count": null,
517 | "metadata": {},
518 | "outputs": [],
519 | "source": [
520 | "sst_file_id"
521 | ]
522 | },
523 | {
524 | "cell_type": "markdown",
525 | "metadata": {},
526 | "source": [
527 | "Let's inspect the contents again:\n",
528 | "\n",
529 | "* __Dimensions__: The dimensions are named YDim and XDim, which are respectively 180 and 360.\n",
530 | "\n",
531 | "* __Coordinates__: None!\n",
532 | "\n",
533 | "* __Variables__: Has three variables, Latitude, Longitude, and RetrievedCOTotalColumnDay. Note that Latitude and Longitude are not coordinates in this file.\n",
534 | "\n",
535 | "* __Attributes__: None! \n",
536 | "\n",
537 | "Let's import *RetrievedCOTotalColumnDay* which is a 2-dimensional variable. You will also need latitude and longitude, which are both one dimensional:"
538 | ]
539 | },
540 | {
541 | "cell_type": "code",
542 | "execution_count": null,
543 | "metadata": {},
544 | "outputs": [],
545 | "source": [
546 | "sst = sst_file_id[\"sst\"][2012,:,:]\n",
547 | "sst_lat = sst_file_id[\"lat\"][:]\n",
548 | "sst_lon = sst_file_id[\"lon\"][:]"
549 | ]
550 | },
551 | {
552 | "cell_type": "markdown",
553 | "metadata": {},
554 | "source": [
555 | "Contour plots and mesh plots are two useful ways of looking at 3-dimensional data. Both plots require the x, y, and z coordinates to have the same 2-dimensional grid. However, lat and lon are 1-dimensional. You can use *np.meshgrid()* to project the 1-dimensional x and y coordinates into two dimensions.\n",
556 | "\n",
557 | "This function is a little confusing at first, so I'll show a simple example. Suppose you have to simple arrays:"
558 | ]
559 | },
560 | {
561 | "cell_type": "code",
562 | "execution_count": null,
563 | "metadata": {},
564 | "outputs": [],
565 | "source": [
566 | "tmp_x = [1,2]\n",
567 | "tmp_y = [3,4,5]"
568 | ]
569 | },
570 | {
571 | "cell_type": "markdown",
572 | "metadata": {},
573 | "source": [
574 | "*tmp_x* has two elements and *tmp_y* has three. If you create a mesh of the two variables, there will be two variables, both with 3 rows and 2 columns: "
575 | ]
576 | },
577 | {
578 | "cell_type": "code",
579 | "execution_count": null,
580 | "metadata": {},
581 | "outputs": [],
582 | "source": [
583 | "np.meshgrid(tmp_x, tmp_y)"
584 | ]
585 | },
586 | {
587 | "cell_type": "markdown",
588 | "metadata": {},
589 | "source": [
590 | "Returning to the example, below is the meshgrid of the 1-dimensional latitude and longitude coordinates:"
591 | ]
592 | },
593 | {
594 | "cell_type": "code",
595 | "execution_count": null,
596 | "metadata": {},
597 | "outputs": [],
598 | "source": [
599 | "X_sst, Y_sst = np.meshgrid(sst_lon, sst_lat)"
600 | ]
601 | },
602 | {
603 | "cell_type": "markdown",
604 | "metadata": {},
605 | "source": [
606 | "Before plotting, you need to check if all the dimensions match. However, after comparing the shape of co to X_co, you can see that the dimensions are flipped:"
607 | ]
608 | },
609 | {
610 | "cell_type": "code",
611 | "execution_count": null,
612 | "metadata": {},
613 | "outputs": [],
614 | "source": [
615 | "sst.shape, X_sst.shape"
616 | ]
617 | },
618 | {
619 | "cell_type": "markdown",
620 | "metadata": {},
621 | "source": [
622 | "To make the two arrays match, you can use the *.transpose()* function to switch the x and y coordinates in co."
623 | ]
624 | },
625 | {
626 | "cell_type": "markdown",
627 | "metadata": {},
628 | "source": [
629 | "In the last session, you learned how to use *plt.subplot()* to generate the empty figure (*fig*)and axis (*ax*). \n",
630 | "\n",
631 | "One line 2, you call *ax.contourf* and input the X_co, Y_co, and transposed co variables. co acts as a color value, which becomes the third dimension of the plot. You then store this object into a variable *co_plot* so that you can pass it into *ax.colorbar* in order to map the colors to numeric values."
632 | ]
633 | },
634 | {
635 | "cell_type": "code",
636 | "execution_count": null,
637 | "metadata": {},
638 | "outputs": [],
639 | "source": [
640 | "# contourf\n",
641 | "fig = plt.figure()\n",
642 | "ax = plt.subplot(111)\n",
643 | "sst_plot = ax.contourf(X_sst, Y_sst, sst)\n",
644 | "fig.colorbar(sst_plot, orientation='horizontal', ax=ax)\n",
645 | "plt.show()"
646 | ]
647 | },
648 | {
649 | "cell_type": "markdown",
650 | "metadata": {},
651 | "source": [
652 | "In the image above, you can see that there are regions where there are higher levels of CO (in molecules/cm2). The data are clustered together and have global coverage, so a contour plot is a relevant choice in this scenario.\n",
653 | "\n",
654 | "Like contour plots, mesh plots are also 2-dimensional plots that display 3-dimensions of information using x, y, coordinates and z for a color scale. However, mesh plots do not perform any smoothing and display data as-is on a regular grid. However, since many satellite datasets are swath-based, irregularly spaced data needs to be re-gridded in order to display it as a mesh grid. In the code block below, let’s compare how the MOPITT data looks using pcolormesh command with the previous example using contour. The code below has no other changes to the plot other than the call to the plot type."
655 | ]
656 | },
657 | {
658 | "cell_type": "code",
659 | "execution_count": null,
660 | "metadata": {},
661 | "outputs": [],
662 | "source": [
663 | "#pcolormesh\n",
664 | "fig = plt.figure()\n",
665 | "ax = plt.subplot(111)\n",
666 | "sst_plot = ax.pcolormesh(X_sst, Y_sst, sst, shading='auto')\n",
667 | "fig.colorbar(sst_plot, orientation='horizontal')\n",
668 | "plt.show()"
669 | ]
670 | },
671 | {
672 | "cell_type": "markdown",
673 | "metadata": {},
674 | "source": [
675 | "You might notice that there is more structure in the mesh plot than the filled contour. This is useful if you wish to examine fine structure and patterns.\n",
676 | "\n",
677 | "---\n",
678 | "**Exercise 2**: Plot 3-dimensional data\n",
679 | "\n",
680 | "Plot *AOD_lat*, *AOD_lon*, and *AOD_500* (which we imported from the \"JRR-AOD_v2r3_j01_...\" netCDF file as:\n",
681 | "\n",
682 | "1. Check the dimensions for all variables using *.shape*.\n",
683 | "2. Do you need to generate a meshgrid with *np.meshgrid()*?\n",
684 | "3. Create a contour plot\n",
685 | "\n",
686 | "---\n",
687 | "**Solution:**"
688 | ]
689 | },
690 | {
691 | "cell_type": "code",
692 | "execution_count": null,
693 | "metadata": {},
694 | "outputs": [],
695 | "source": []
696 | },
697 | {
698 | "cell_type": "markdown",
699 | "metadata": {},
700 | "source": [
701 | "## Adding Maps to datasets\n",
702 | "\n",
703 | "The package Cartopy add mapping functionality to Matplotlib. Cartopy provides an interface to obtain continent, country, and feature details to overlay onto your plot. Furthermore, Cartopy also enables you to convert your data from one map projection to another, which requires a cartesian coordinate system to the map coordinates. Matplotlib natively supports the six mathematical and map projections (Aitoff, Hammer, Lambert, Mollweide, polar, and rectilinear) and combined with Cartopy, data can be transformed to a total of 33 possible projections."
704 | ]
705 | },
706 | {
707 | "cell_type": "code",
708 | "execution_count": null,
709 | "metadata": {},
710 | "outputs": [],
711 | "source": [
712 | "from cartopy import crs as ccrs"
713 | ]
714 | },
715 | {
716 | "cell_type": "markdown",
717 | "metadata": {},
718 | "source": [
719 | "Just like before, we need to convert the 1D lat and lon coordinates to 2D using meshgrid. We can check the shape to ensure all variables have the same dimensions."
720 | ]
721 | },
722 | {
723 | "cell_type": "code",
724 | "execution_count": null,
725 | "metadata": {},
726 | "outputs": [],
727 | "source": [
728 | "gfs_temp = temp.values\n",
729 | "gfs_x, gfs_y = np.meshgrid(gfs_lon, gfs_lat)\n",
730 | "\n",
731 | "gfs_x.shape, gfs_y.shape, gfs_temp.shape"
732 | ]
733 | },
734 | {
735 | "cell_type": "code",
736 | "execution_count": null,
737 | "metadata": {},
738 | "outputs": [],
739 | "source": [
740 | "fig = plt.figure(figsize=[10,5])\n",
741 | "ax = plt.subplot(projection=ccrs.PlateCarree())\n",
742 | "\n",
743 | "ax.pcolormesh(gfs_x, gfs_y, gfs_temp)\n",
744 | "\n",
745 | "ax.coastlines('50m')\n",
746 | "plt.show()"
747 | ]
748 | },
749 | {
750 | "cell_type": "markdown",
751 | "metadata": {},
752 | "source": [
753 | "In the next example, you can switch from Plate Carrée to Orthographic. You must define the projection twice, once in the *projection=* keyword and again in the *transform=*. In the *plt.subplot* line, you must define the to coordinates (*ccrs.Orthographic*), which is how you want to axes to show the data. In the ax.scatter line, you use the transform keyword argument in scatter to define the from coordinates (Plate Carrée), which are the coordinates that the data formatted for."
754 | ]
755 | },
756 | {
757 | "cell_type": "code",
758 | "execution_count": null,
759 | "metadata": {},
760 | "outputs": [],
761 | "source": [
762 | "fig = plt.figure(figsize=[10,5])\n",
763 | "ax = plt.subplot(projection=ccrs.Orthographic(90, 0))\n",
764 | "\n",
765 | "ax.pcolormesh(gfs_x, gfs_y, gfs_temp, transform=ccrs.PlateCarree())\n",
766 | "\n",
767 | "ax.coastlines('50m')\n",
768 | "plt.show()"
769 | ]
770 | },
771 | {
772 | "cell_type": "markdown",
773 | "metadata": {},
774 | "source": [
775 | "---\n",
776 | "**Exercise 3** Adding maps to plots\n",
777 | "\n",
778 | "Using *sst_lat*, *AOD_lon*, and *AOD_550* (which we imported from the \"JRR-AOD_v2r3_j01_...\" netCDF file)\n",
779 | "\n",
780 | "1. Create a *pcolormesh* plot\n",
781 | "2. Add the coastlines to a standard Plate Caree plot using *projection=* option.\n",
782 | "\n",
783 | "---\n",
784 | "**Solution**:"
785 | ]
786 | },
787 | {
788 | "cell_type": "code",
789 | "execution_count": null,
790 | "metadata": {},
791 | "outputs": [],
792 | "source": []
793 | },
794 | {
795 | "cell_type": "markdown",
796 | "metadata": {},
797 | "source": [
798 | "## Summary:\n",
799 | "\n",
800 | "You learned:\n",
801 | "\n",
802 | "* How to import scientific data formats, like netCDF and GRIB2\n",
803 | "* Worked with arrays and lists\n",
804 | "* How to create a simple maps\n",
805 | "\n",
806 | "Next lesson:\n",
807 | "* Create new imagery by combining single channels\n",
808 | "* Perform basic gridding operations to regularly spaced data\n",
809 | "* Save data into text and binary files, and plots as images"
810 | ]
811 | },
812 | {
813 | "cell_type": "code",
814 | "execution_count": null,
815 | "metadata": {},
816 | "outputs": [],
817 | "source": []
818 | }
819 | ],
820 | "metadata": {
821 | "kernelspec": {
822 | "display_name": "Python 3 (ipykernel)",
823 | "language": "python",
824 | "name": "python3"
825 | },
826 | "language_info": {
827 | "codemirror_mode": {
828 | "name": "ipython",
829 | "version": 3
830 | },
831 | "file_extension": ".py",
832 | "mimetype": "text/x-python",
833 | "name": "python",
834 | "nbconvert_exporter": "python",
835 | "pygments_lexer": "ipython3",
836 | "version": "3.7.12"
837 | }
838 | },
839 | "nbformat": 4,
840 | "nbformat_minor": 2
841 | }
842 |
--------------------------------------------------------------------------------
/03 Image Combination and Gridding.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Lesson 3: Image Combination and Gridding Data\n",
8 | "\n",
9 | "Rebekah Esmaili (bekah@umd.edu) Research Scientist, STC/JPSS\n",
10 | " \n",
11 | "---\n",
12 | "\n",
13 | "\n",
14 | "## Lesson Objectives\n",
15 | "* You will learn to:\n",
16 | " * Create new imagery by combining single channels\n",
17 | " * Perform basic gridding operations to regularly spaced data\n",
18 | " * Save data into text and binary files, and plots as images\n",
19 | " \n",
20 | "---\n",
21 | "\n",
22 | "## What do I need?\n",
23 | "* If you are really new to Python, I recommend using the binder links to run these notebooks remotely.\n",
24 | "* If you have some experience, you can either install Anaconda locally on your laptop or on a remote server. There are some instructions in the main directory.\n",
25 | "* I _do not recommend_ using system or shared Python installations unless you are advanced!\n",
26 | "\n",
27 | "---\n",
28 | "\n",
29 | "Datasets:\n",
30 | "* Two GOES-16 level 1 radiance data files will be used to examine the NDVI:\n",
31 | " * Channel 2: OR_ABI-L1b-RadM1-M6C02_G16_s20192091147504_e20192091147562_c20192091147599.nc\n",
32 | " * Channel 3: OR_ABI-L1b-RadM1-M6C03_G16_s20192091147504_e20192091147562_c20192091148025.nc\n",
33 | "* Two GOES-16 level 1 radiance data files will be used to examine the hurricane convection:\n",
34 | " * Channel 2: OR_ABI-L1b-RadM1-M6C02_G16_s20192091147504_e20192091147562_c20192091147599\n",
35 | " * Channel 13: OR_ABI-L1b-RadM1-M3C13_G16_s20182822019282_e20182822019350_c20182822019384.nc\n",
36 | "* 3B-HHR.MS.MRG.3IMERG.20160811-S233000-E235959.1410.V06B_thinned.nc: IMERG is a global 30-minute precipitation dataset.\n",
37 | " * NOTE: I removed fields, reduced the domain, and converted the original file to netCDF for this lesson. The original file is in HDF5 format and called 3B-HHR.MS.MRG.3IMERG.20160811-S233000-E235959.1410.V06B.HDF5. Access the original file via [NASA's GPM website](https://gpm.nasa.gov/data/directory)\n",
38 | "* sst.mon.ltm.1981-2010.nc: NOAA Extended Reconstructed Sea Surface Temperature (SST) V5 dataset\n",
39 | "\n",
40 | "Start by importing several packages that we covered in the earlier lessons:"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": null,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "import xarray as xr\n",
50 | "import matplotlib.pyplot as plt\n",
51 | "import numpy as np, numpy.ma as ma\n",
52 | "import pandas as pd\n",
53 | "from cartopy import crs as ccrs\n",
54 | "import scipy.interpolate"
55 | ]
56 | },
57 | {
58 | "cell_type": "markdown",
59 | "metadata": {},
60 | "source": [
61 | "Scipy.interpolate has a lot of useful interpolation schemes. While not covered in this tutorial, [pyresample](https://pyresample.readthedocs.io/en/latest/) is a great package for more complex gridding operations and for irregularly spaced data."
62 | ]
63 | },
64 | {
65 | "cell_type": "markdown",
66 | "metadata": {},
67 | "source": [
68 | "## Combining multiple datasets\n",
69 | "\n",
70 | "In the next example, we will construct the Normalized Difference Vegetation Index (NDVI) for a scene by combining multiple images from spectral bands. The NDVI is useful for looking at vegetation health, surface type, and can be used to look at burn scars after fires. Plants absorb light between 0.4 and 0.7 µm (visible, VIS) and reflect in 0.7 to 1.1 µm (Near infrared, NIR). NDVI is calculated using the following equation:\n",
71 | "```\n",
72 | "NDVI = (NIR — VIS)/(NIR + VIS)\n",
73 | "```\n",
74 | "\n",
75 | "Healthy vegetation will absorb more in the visible light and reflect more in the near IR, and thus have a higher NDVI value. Drier vegetation will absorb less visible light and reflect less near IR, resulting in a lower NDVI value. A value of 0 can be found in deserts where there is little to no vegetaion while values close to 1 indicate a high vegetation density (Source: https://earthobservatory.nasa.gov/features/MeasuringVegetation/measuring_vegetation_2.php)\n",
76 | "\n",
77 | "GOES-16 is a geostationary weather satellite that takes continuous imagery of a domain centered on the Atlantic Ocean. The Advanced Baseline Imager (ABI) takes images of the Earth using 16 spectral bands, which are also referred to as channels."
78 | ]
79 | },
80 | {
81 | "cell_type": "code",
82 | "execution_count": null,
83 | "metadata": {},
84 | "outputs": [],
85 | "source": [
86 | "# Import ABI Channel 3 ()\n",
87 | "fname = 'data/meso/OR_ABI-L1b-RadM1-M6C03_G16_s20192091147504_e20192091147562_c20192091148025.nc'\n",
88 | "goesnc = xr.open_dataset(fname)\n",
89 | "nearir = goesnc['Rad'].values\n",
90 | "\n",
91 | "# Import ABI Channel 2\n",
92 | "fname = 'data/meso/OR_ABI-L1b-RadM1-M6C02_G16_s20192091147504_e20192091147562_c20192091147599.nc'\n",
93 | "goesnc = xr.open_dataset(fname)\n",
94 | "vis = goesnc['Rad'].values"
95 | ]
96 | },
97 | {
98 | "cell_type": "markdown",
99 | "metadata": {},
100 | "source": [
101 | "If you look at the size of the VIS channel and compare with the NIR channel, you'll see that the nearir is higher resolution."
102 | ]
103 | },
104 | {
105 | "cell_type": "code",
106 | "execution_count": null,
107 | "metadata": {},
108 | "outputs": [],
109 | "source": [
110 | "vis.shape, nearir.shape"
111 | ]
112 | },
113 | {
114 | "cell_type": "markdown",
115 | "metadata": {},
116 | "source": [
117 | "The arrays need to match so that we can take the difference. There are more complicated gridding schemes, but for now, we can simply sample the VIS array so that we only keep every 2nd element. We can do this by using the double colon (::) in Numpy, which looks like:"
118 | ]
119 | },
120 | {
121 | "cell_type": "code",
122 | "execution_count": null,
123 | "metadata": {},
124 | "outputs": [],
125 | "source": [
126 | "vis = vis[::2, ::2]"
127 | ]
128 | },
129 | {
130 | "cell_type": "markdown",
131 | "metadata": {},
132 | "source": [
133 | "Now, if we check the dimensions again, the shapes match:"
134 | ]
135 | },
136 | {
137 | "cell_type": "code",
138 | "execution_count": null,
139 | "metadata": {},
140 | "outputs": [],
141 | "source": [
142 | "vis.shape, nearir.shape"
143 | ]
144 | },
145 | {
146 | "cell_type": "markdown",
147 | "metadata": {},
148 | "source": [
149 | "We can now difference the two spectral channels to obtain the NDVI for the scene and afterwards, we make a plot of the resulting image."
150 | ]
151 | },
152 | {
153 | "cell_type": "code",
154 | "execution_count": null,
155 | "metadata": {},
156 | "outputs": [],
157 | "source": [
158 | "img = (nearir-vis)/(nearir+vis)\n",
159 | "\n",
160 | "plt.figure(figsize=[12,12])\n",
161 | "plt.imshow(img)\n",
162 | "plt.colorbar()\n",
163 | "plt.show()"
164 | ]
165 | },
166 | {
167 | "cell_type": "markdown",
168 | "metadata": {},
169 | "source": [
170 | "---\n",
171 | "**Exercise 1**: Combining images from two different channels\n",
172 | "1. Open the two files using the netCDF4 library:\n",
173 | " * Channel 13: OR_ABI-L1b-RadM1-M3C13_G16_s20182822019282_e20182822019350_c20182822019384.nc\n",
174 | " * Channel 2: OR_ABI-L1b-RadM1-M3C02_G16_s20182822019282_e20182822019339_c20182822019374.nc\n",
175 | "2. From each file, extract the 'Rad' variable (radiance) and save it to a new variable.\n",
176 | "3. Check the dimensions, are they the same? If not, use the double colon (::) to subset the array and match the two array dimensions.\n",
177 | "4. Take the difference between channel 2 and channel 13 (Ch02 - C13) \n",
178 | "5. Make a plot using plt.imshow().\n",
179 | "---\n",
180 | "\n",
181 | "**Solution:**"
182 | ]
183 | },
184 | {
185 | "cell_type": "code",
186 | "execution_count": null,
187 | "metadata": {},
188 | "outputs": [],
189 | "source": []
190 | },
191 | {
192 | "cell_type": "markdown",
193 | "metadata": {},
194 | "source": [
195 | "## Changing the grid\n",
196 | "\n",
197 | "We will often need to change grids when we compare data sources, such as models, satellite observations, and in situ data. There are many methods to do this. In the previous example, we had one channel that was 1000x1000 pixels and another that is 2000x2000. Since our goal was to display the data and not perform rigorous analysis, we dropped every other pixel. However, for a long-term trend analysis, we may want to use more sophisticated methods, such as interpolation. Below is a simple example, where we decrease the resolution (also called \"coarsening\", \"aggregating,\" or \"upscaling\" the data).\n",
198 | "\n",
199 | "Scipy.interpolate is useful sub-package gridding. In particular, the griddata function allows the user to pass a single or a list of unstructured points. The command and syntax looks like:\n",
200 | "\n",
201 | "```python\n",
202 | "scipy.interpolate.griddata(points, values, (Xnew, Ynew), method='nearest')\n",
203 | "```\n",
204 | "* points: an array pair of x and y values (e.g. longitude and latitude)\n",
205 | "* values: the corresponding \"z\" value at the x,y location. This needs to be 1-dimensional\n",
206 | "\n",
207 | "There are three methods: \n",
208 | "* 'nearest' closest data point to (Xnew, Ynew)\n",
209 | "* 'linear' a linear interpolation between the closest points to (Xnew, Ynew)\n",
210 | "* 'cubic' a spline interpolation between the closest points to (Xnew, Ynew)\n",
211 | "\n",
212 | "Below, we will import an IMERG dataset to look at a heavy rainfall episode over New Orleans, Lousiana, USA on August 11, 2016. The domain that we will study is 28N-33N, 94.0W-88.0W. IMERG is 0.1 x 0.1 degrees latitude and longitude; for our example below, we will change the grid to 0.5 x 0.5 degrees.\n",
213 | "\n",
214 | "The code block below imports three variables:\n",
215 | "\n",
216 | "* precipitationCal (3-dimensional)\n",
217 | "* lat (1-dimensional)\n",
218 | "* lon (1-dimensional)\n",
219 | "\n",
220 | "precipitationCal is three dimensional \\[time, lat, lon\\] but there is only one time element. We will read in the data using the 0 index so that we are working only in two dimensions, latitude and longitude."
221 | ]
222 | },
223 | {
224 | "cell_type": "code",
225 | "execution_count": null,
226 | "metadata": {},
227 | "outputs": [],
228 | "source": [
229 | "fname = 'data/3B-HHR.MS.MRG.3IMERG.20160811-S233000-E235959.1410.V06B_thinned.nc'\n",
230 | "imergv6 = xr.open_dataset(fname)\n",
231 | "precip = imergv6['precipitationCal'][0,:,:].values\n",
232 | "lat = imergv6['lat'].values\n",
233 | "lon = imergv6['lon'].values"
234 | ]
235 | },
236 | {
237 | "cell_type": "markdown",
238 | "metadata": {},
239 | "source": [
240 | "First, to convert precip from a 2-dimensioanl variable to 1-dimensional, we can again use the *.flatten()* command:"
241 | ]
242 | },
243 | {
244 | "cell_type": "code",
245 | "execution_count": null,
246 | "metadata": {},
247 | "outputs": [],
248 | "source": [
249 | "values = precip.flatten()"
250 | ]
251 | },
252 | {
253 | "cell_type": "markdown",
254 | "metadata": {},
255 | "source": [
256 | "Next, we need to define our original/old grid that the data is natively in. Our *precip* array is 2-dimensional, but *lat* and *lon* are 1-dimensional. We can use the *np.meshgrid* command to project the data into 2-dimensions (*Xold* and *Yold*) and match the precipitation array size. However, we then need to convert it to a 1-dimensional array to match *values*.\n",
257 | "\n",
258 | "Below, we use meshgrid to get the 2-dimensional x and y arrays. We use the *indexing='ij'* option because the latitudes start with a positive number and become negative, thus the origin is the upper left corner of the box (following matrix, or 'ij' formatting). If latitude started with negative values and became positive, then the origin would be in the bottom left (following the \"standard\" or \"xy\" formatting)."
259 | ]
260 | },
261 | {
262 | "cell_type": "code",
263 | "execution_count": null,
264 | "metadata": {},
265 | "outputs": [],
266 | "source": [
267 | "Xold, Yold = np.meshgrid(lon, lat, indexing='ij')"
268 | ]
269 | },
270 | {
271 | "cell_type": "markdown",
272 | "metadata": {},
273 | "source": [
274 | "We now need to write Xold and Yold into a 1-dimensional form. Then, these will be stored as a pair in a Nx2 dimensional array.\n",
275 | "\n",
276 | "First, we get the dimensions from value and create an array fileld with zeros using *np.zero*. We will overwrite the zeros with the flattened Xold and Yold arrays."
277 | ]
278 | },
279 | {
280 | "cell_type": "code",
281 | "execution_count": null,
282 | "metadata": {},
283 | "outputs": [],
284 | "source": [
285 | "dims = (values.shape[0], 2)\n",
286 | "points = np.zeros(dims)\n",
287 | "\n",
288 | "points[:, 0] = Xold.flatten()\n",
289 | "points[:, 1] = Yold.flatten()"
290 | ]
291 | },
292 | {
293 | "cell_type": "markdown",
294 | "metadata": {},
295 | "source": [
296 | "We are done formatting our dataset!\n",
297 | "\n",
298 | "Now we must define our new grid. We can use the mgrid function to do this, which will return two 2-dimensional X and Y meshes across our domain of choice:\n",
299 | "\n",
300 | "```python\n",
301 | "np.mgrid[x_start:x_stop:nx, y_start:y_stop:ny]\n",
302 | "```\n",
303 | "\n",
304 | "We could interpolate across the entire planet (180W-180E,90S-90N), but that will require more calculations than we need. To save time and the number of calculation, we will focus on our domain (28N-33N, 94.0W-88.0W, saved to the variable *coverage*). These values are the *_start, *_stop values in the code above.\n",
305 | "\n",
306 | "The nx and ny variables are the number of points spanning the start and stop values, which is tricker to calculate. To count the number of points, we use the following formula:\n",
307 | "\n",
308 | "```\n",
309 | "int((x_stop - x_start)/grid_size)\n",
310 | "```\n",
311 | "Remember, our goal is to change the 0.1 x 0.1 degree grid to 0.5 x 0.5 degrees (*grid_size*). We use the int command because the numbers cannot have trailing decimals. Confusingly, mgrid requires nx and ny to be complex values (using *complex()*).\n",
312 | "\n",
313 | "In the last step, we call mgrid to generate Xnew and Ynew!"
314 | ]
315 | },
316 | {
317 | "cell_type": "code",
318 | "execution_count": null,
319 | "metadata": {},
320 | "outputs": [],
321 | "source": [
322 | "grid_size=0.5\n",
323 | "\n",
324 | "coverage = [-94.0, 28.0, -88.0, 33.0]\n",
325 | "\n",
326 | "num_points_x = int((coverage[2] - coverage[0])/grid_size)\n",
327 | "num_points_y = int((coverage[3] - coverage[1])/grid_size)\n",
328 | "\n",
329 | "nx = complex(0, num_points_x)\n",
330 | "ny = complex(0, num_points_y)\n",
331 | "\n",
332 | "Xnew, Ynew = np.mgrid[coverage[0]:coverage[2]:nx, coverage[1]:coverage[3]:ny]"
333 | ]
334 | },
335 | {
336 | "cell_type": "markdown",
337 | "metadata": {},
338 | "source": [
339 | "Finally, we can perform our interpolation!"
340 | ]
341 | },
342 | {
343 | "cell_type": "code",
344 | "execution_count": null,
345 | "metadata": {},
346 | "outputs": [],
347 | "source": [
348 | "gridOut = scipy.interpolate.griddata(points, values, (Xnew, Ynew), method='nearest')"
349 | ]
350 | },
351 | {
352 | "cell_type": "markdown",
353 | "metadata": {},
354 | "source": [
355 | "Let's compare our results. We'll compare them side-by-side to the native resolution of the data:"
356 | ]
357 | },
358 | {
359 | "cell_type": "code",
360 | "execution_count": null,
361 | "metadata": {},
362 | "outputs": [],
363 | "source": [
364 | "to_proj = ccrs.PlateCarree()\n",
365 | "from_proj = ccrs.PlateCarree()\n",
366 | "extent = [-94.0, -88.0, 28.0, 33.0]\n",
367 | "\n",
368 | "fig = plt.figure(figsize=[15,15])\n",
369 | "\n",
370 | "ax=plt.subplot(121, projection=to_proj)\n",
371 | "ax.coastlines('10m')\n",
372 | "ax.set_extent(extent)\n",
373 | "\n",
374 | "ax.set_title(\"Before regridding\")\n",
375 | "ax.pcolormesh(Xold, Yold, precip, vmin=0, vmax=20)\n",
376 | "\n",
377 | "\n",
378 | "ax=plt.subplot(122, projection=to_proj)\n",
379 | "ax.coastlines('10m')\n",
380 | "ax.set_extent(extent)\n",
381 | "\n",
382 | "ax.set_title(\"After regridding\")\n",
383 | "ax.pcolormesh(Xnew, Ynew, gridOut, vmin=0, vmax=20)\n",
384 | "\n",
385 | "plt.show()"
386 | ]
387 | },
388 | {
389 | "cell_type": "markdown",
390 | "metadata": {},
391 | "source": [
392 | "---\n",
393 | "**Exercise 2**: Regridding a regularly spaced dataset\n",
394 | "\n",
395 | "We will practice converting a 2 x 2 degree, regularly spaced grid to a 5-degree regularly spaced grid. This example has a lot of steps but closely follows the IMERG regridding example. We will use NOAA Extended Reconstructed Sea Surface Temperature (SST) V5 dataset, which has 2 degree spacing. This dataset is similar the the IMERG dataset, with the following differences:\n",
396 | "* The longitude coordinates are expressed from 0 to 360 degrees; IMERG is from -180 to 180 degrees.\n",
397 | "* The SST dataset has more than one time coordinate, we will use the first one (index 0), which corresponds to January. Feel free to look at any other ones!\n",
398 | "\n",
399 | "Here are the steps:\n",
400 | "1. Import sst.mon.ltm.1981-2010.nc, which has monthly long-term mean SST values.\n",
401 | "2. Extract the following variables:\n",
402 | " * 'sst' is a 3-dimensional variable, time, lat, and lon. We will only import the first month (index 0), so use \\[0,:,:\\]\n",
403 | " * 'lat', 'lon' are 1-dimensional variables. Latitude goes from negative to positive, and longitude spans 0 to 360.\n",
404 | "3. Convert sst to 1-dimensional using *.flatten()*. \n",
405 | "4. Define a 1-dimensional list of the original 2 degree lat, lon grid.\n",
406 | " * Create a 2-dimensional lat, lon using the np.meshgrid ( e.g. np.meshgrid(lon, lat, indexing='xy'))\n",
407 | "5. Define the new lat, lon grid that has 5 degrees of spacing\n",
408 | " * Create a new variable called gridsize\n",
409 | " * Calculate the number of longitude points and latitude \n",
410 | "6. Interpolate the original SST to the new 5-degree grid.\n",
411 | " * Flatten (using .flatten() ) the ssts so that it is 1-dimensional\n",
412 | "7. Create a plot showing the old and the new data.\n",
413 | " * The data are from 0 to 360, we need to set the central longitude to -180 on the map!\n",
414 | " * You can change the variable names in the pcolormesh lines to reflect any differences in your variable names.\n",
415 | " \n",
416 | "```python\n",
417 | "fig = plt.figure(figsize=[15,15])\n",
418 | "\n",
419 | "to_proj = ccrs.PlateCarree(central_longitude=-180)\n",
420 | "from_proj = ccrs.PlateCarree()\n",
421 | "\n",
422 | "ax = plt.subplot(projection=to_proj)\n",
423 | "ax.coastlines('10m', color='black')\n",
424 | "\n",
425 | "# For the original 2-degree grid\n",
426 | "ax.pcolormesh(Xold, Yold, sst, vmin=0, vmax=30, transform=from_proj)\n",
427 | "\n",
428 | "# For the new 5-degree grid\n",
429 | "#ax.pcolormesh(Xnew, Ynew, gridOut, vmin=0, vmax=30, transform=from_proj)\n",
430 | "\n",
431 | "\n",
432 | "plt.show()\n",
433 | "```\n",
434 | "\n",
435 | "---\n",
436 | "\n",
437 | "**Solution:**"
438 | ]
439 | },
440 | {
441 | "cell_type": "markdown",
442 | "metadata": {},
443 | "source": [
444 | "## Exporting data and Figures\n",
445 | "\n",
446 | "### Saving as csv:\n",
447 | "\n",
448 | "The Pandas *to_csv* is convenient for quickly saving files. The option *index=False* suppress the indices of the DataFrame (which are printed to the left of the DataFrame) from being printed to file."
449 | ]
450 | },
451 | {
452 | "cell_type": "code",
453 | "execution_count": null,
454 | "metadata": {},
455 | "outputs": [],
456 | "source": [
457 | "name = ['GOES-16', 'IceSat-2', 'Himawari']\n",
458 | "agency = ['NOAA', 'NASA', 'JAXA']\n",
459 | "orbit = ['GEO', 'LEO', 'GEO']\n",
460 | "\n",
461 | "df = pd.DataFrame({'name': name,\n",
462 | " 'agency': agency,\n",
463 | " 'orbit': orbit})\n",
464 | "\n",
465 | "df.to_csv('satellites.csv', index=False)"
466 | ]
467 | },
468 | {
469 | "cell_type": "markdown",
470 | "metadata": {},
471 | "source": [
472 | "### Saving as a binary file\n",
473 | "\n",
474 | "NumPy binary files (.npz) are geared towards arrays, which can be multi-dimensional. These are useful for quickly storing large datasets."
475 | ]
476 | },
477 | {
478 | "cell_type": "code",
479 | "execution_count": null,
480 | "metadata": {},
481 | "outputs": [],
482 | "source": [
483 | "np.savez('satnames', name=name, agency=agency, orbit=orbit)"
484 | ]
485 | },
486 | {
487 | "cell_type": "code",
488 | "execution_count": null,
489 | "metadata": {},
490 | "outputs": [],
491 | "source": [
492 | "npzfile = np.load('satnames.npz')\n",
493 | "npzfile.files\n",
494 | "npzfile.close()"
495 | ]
496 | },
497 | {
498 | "cell_type": "markdown",
499 | "metadata": {},
500 | "source": [
501 | "### Saving figures\n",
502 | "\n",
503 | "Normally, we end out plots with plt.show() to display them inline. Instead, use *plt.savefig()*. The second argument (*bbox_inches*) refers to the whitespace around the plot, it is optional. "
504 | ]
505 | },
506 | {
507 | "cell_type": "code",
508 | "execution_count": null,
509 | "metadata": {},
510 | "outputs": [],
511 | "source": [
512 | "fig = plt.figure(figsize=[8,8])\n",
513 | "\n",
514 | "ax=plt.subplot(projection=to_proj)\n",
515 | "ax.coastlines('10m')\n",
516 | "ax.set_extent(extent)\n",
517 | "\n",
518 | "ax.pcolormesh(Xold, Yold, precip, vmin=0, vmax=20)\n",
519 | "\n",
520 | "plt.savefig('precip.png', bbox_inches='tight') \n",
521 | "plt.close()\n"
522 | ]
523 | },
524 | {
525 | "cell_type": "markdown",
526 | "metadata": {},
527 | "source": [
528 | "## Scripting with Python\n",
529 | "\n",
530 | "### Creating scripts from Jupyter Notebooks\n",
531 | "One of the simplest ways to create a script is to convert an existing Jupyter notebook. As an example, we will created a notebook named script_example that only contains one line of code: print(“Hello Earth”). You can convert any Jupyter Notebook to a script by going to File → Download as → Python (.py):\n",
532 | " \n",
533 | "\n",
534 | "This will download a new file (script_example.py) to your computer. If you open the file using your text editor, you will see:\n",
535 | "\n",
536 | "```\n",
537 | "\n",
538 | "#!/usr/bin/env python\n",
539 | "# coding: utf-8\n",
540 | "\n",
541 | "# In[1]:\n",
542 | "\n",
543 | "print(\"Hello Earth\")\n",
544 | "```\n",
545 | "\n",
546 | "You will notice that the script contains the line numbers (*ln\\[1\\]*), which in my opinion is unnecessary and should be removed from your script. Beginners, you can delete this extra formatting from your file.\n",
547 | "\n",
548 | "### Running Python scripts from the command line\n",
549 | "\n",
550 | "Now you are finished editing the code and you probably want to run it. There are two ways you can run Python scripts:\n",
551 | "\n",
552 | "1. Using the command line interpreter\n",
553 | "2. Using iPython\n",
554 | "\n",
555 | "iPython is an interactive command line that allows you to run code in chunks. In fact, Jupyter Notebook is built using iPython, which explains the similarity in behavior.\n",
556 | " \n",
557 | "* Windows: I suggest using the Anaconda Prompt which you can access from the start menu or using Anaconda Navigator. \n",
558 | "* MacOs/Linux: open the Terminal app. \n",
559 | "\n",
560 | "Once the command line is open, you start in a default location. For example, if you are using Windows and launch the Anaconda Prompt you will see:\n",
561 | "\n",
562 | "```\n",
563 | "(base) C:\\Users\\rebekah>\n",
564 | "```\n",
565 | "\n",
566 | "Now, navigate to where our script is. To do this, you will change directories using the cd command. For example, if your code is stored in C:\\Documents\\Python, you can type:\n",
567 | "\n",
568 | "```\n",
569 | "\n",
570 | "cd C:\\Documents\\Python\n",
571 | "```\n",
572 | "\n",
573 | "The command line will now be updated showing:\n",
574 | "\n",
575 | "```\n",
576 | "(base) C:\\Documents\\Python>\n",
577 | "```\n",
578 | "\n",
579 | "Now that you are in the right place, you can call the Python interpreter, which to convert your code into a format that your computer can understand and executes the command. If you installed Anaconda, this includes a Python 3 interpreter (*python3*). So, to run the script, type:\n",
580 | "\n",
581 | "```\n",
582 | "python3 hello_world.py\n",
583 | "```\n",
584 | "\n",
585 | "If successful, “Hello Earth” should print to your screen.\n",
586 | "\n",
587 | "A second method is to use iPython, which allows you to open Python in interactive mode. Unlike the command line method, iPython will let you run code line-by-line. So, like Jupyter Notebook, you have the option to copy and paste you code from the text editor in chunks into the iPython window. You can also call the entire script inside iPython. This is done by starting iPython and using the command %run \\[script name\\].py. Below is a capture from my terminal:\n",
588 | "\n",
589 | "```\n",
590 | "Python 3.7.6 (default, Jan 8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)]\n",
591 | "Type 'copyright', 'credits' or 'license' for more information\n",
592 | "IPython 7.12.0 -- An enhanced Interactive Python. Type '?' for help.\n",
593 | "\n",
594 | "In [1]: %run script_example.ipynb\n",
595 | "Hello Earth\n",
596 | "```\n",
597 | "\n",
598 | "One advantage of using iPython is that after the script finishes running, variables that were generated in the script are still in memory. Then, you can print or operate on the variables to either debug or to develop your code further. \n",
599 | "\n",
600 | "You may have noted two differences in workflow for write code in scripts versus notebooks, (1) that code cannot be inline and (2) the program must run fully to the end.\n",
601 | "\n",
602 | "\n",
603 | "### Handling output when scripting\n",
604 | "\n",
605 | "In the previous example, you printed text to the screen but Python’s capable of saving figures and data. To save plots, replace *plt.show()* with the *plt.savefig()* command.\n",
606 | "\n",
607 | "It is possible to directly display your graphics using the X11 protocol (by default in Linux) with XQuartz (Mac) or PuTTy (Windows). \n",
608 | "\n",
609 | "I typically discourage this because satellite imagery tends to be very large and thus slow to display remotely. From my experience, it is usually faster to write an image to a file and then view the plot after it is fully rendered.\n",
610 | "\n",
611 | "## Summary:\n",
612 | "\n",
613 | "You learned:\n",
614 | "\n",
615 | "* How to combine satellite imagery\n",
616 | "* Change grid size from regularly-spaced data\n",
617 | "* How to save data and graphics\n",
618 | "\n",
619 | "## Conclusion\n",
620 | "\n",
621 | "I hope you feel empowered find relevant satellite data for your project are equipped with the tools to visualize it. Practice regularly (daily!) to improve your skills. Here are some ways you can continue your journey:\n",
622 | "\n",
623 | "* Downlaod data. You can access data from ESA (https://earth.esa.int/eogateway/), NOAA’s threads data server: https://www.ncei.noaa.gov/thredds/catalog.html, or NASA's [Earthdata](https://earthdata.nasa.gov/) portals.\n",
624 | "* Read. \n",
625 | " * [Python for Data Science](https://jakevdp.github.io/PythonDataScienceHandbook/) (free)\n",
626 | " * [Research Software Engineering with Python](https://merely-useful.tech/py-rse/) Free eBook to enhance your workflow\n",
627 | " * Python Programming and Visualization for Scientists by Alex DeCaria (not free)\n",
628 | " * Python Machine Learning by Wei-Meng Lee (not free)\n",
629 | " * [Earth Observation Using Python](https://www.wiley.com/en-us/Earth+Observation+using+Python%3A+A+Practical+Programming+Guide-p-9781119606888) by Rebekah Esmaili (not free)\n",
630 | "* Watch.\n",
631 | " * [CS Dojo](https://www.youtube.com/channel/UCxX9wt5FWQUAAz4UrysqK9A) on YouTube has a lot of short, fun Python tutorials.\n",
632 | " * [Coursera](https://www.coursera.org/learn/interactive-python-1?specialization=computer-fundamentals) has some fundamental interactive Python courses if you want more structure.\n",
633 | " * [Python for Climate and Meteorology](https://www.youtube.com/watch?v=uQZAEPnUZ5o) Another focused Python workshop taught at AMS, a little more advanced.\n",
634 | "* Connect with an online community, such as Pangeo (https://discourse.pangeo.io/)"
635 | ]
636 | },
637 | {
638 | "cell_type": "code",
639 | "execution_count": null,
640 | "metadata": {},
641 | "outputs": [],
642 | "source": []
643 | }
644 | ],
645 | "metadata": {
646 | "kernelspec": {
647 | "display_name": "Python 3 (ipykernel)",
648 | "language": "python",
649 | "name": "python3"
650 | },
651 | "language_info": {
652 | "codemirror_mode": {
653 | "name": "ipython",
654 | "version": 3
655 | },
656 | "file_extension": ".py",
657 | "mimetype": "text/x-python",
658 | "name": "python",
659 | "nbconvert_exporter": "python",
660 | "pygments_lexer": "ipython3",
661 | "version": "3.7.12"
662 | }
663 | },
664 | "nbformat": 4,
665 | "nbformat_minor": 4
666 | }
667 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Python for Earth Sciences
2 |
3 | ### Instructor: [Rebekah Esmaili](http://www.rebekahesmaili.com)
4 |
5 | ---
6 |
7 | A crash course in Python focusing on reading and visualizing data-sets used in Earth sciences.
8 |
9 | This code is interactive! Click:
10 | [](https://mybinder.org/v2/gh/modern-tools-workshop/AGU-python-workshop-2021/HEAD)
11 |
12 | ---
13 |
14 | ## Getting Started
15 |
16 | This workshop will cover:
17 |
18 | * Launching Jupyter Notebooks
19 | * Working with arrays using the Numpy package
20 | * Importing text datasets using the Pandas package
21 | * Creating simple graphics with Matplotlib
22 | * Importing scientific data formats, such as netCDF and GRIB2
23 | * Creating maps from datasets
24 |
25 | ---
26 |
27 | ### Installation requirements
28 |
29 | "I am really new to Python!"
30 |
31 | * I recommend launching binder, which is a "cloud version" of this course. No installation required!
32 | [](https://mybinder.org/v2/gh/modern-tools-workshop/AGU-python-workshop-2021/HEAD)
33 |
34 | * Need help with Binder? Video tutorial on [YouTube](https://youtu.be/3BrfFe4HsAw).
35 |
36 | "I have used Python before!"
37 |
38 | * If you wish to run the examples locally, I recommend installing [Anaconda](https://www.anaconda.com/products/individual). If you are having trouble with your installation, contact the instructor before the course or use binder.
39 | * Need help installing Anaconda? Video tutorial on [YouTube](https://youtu.be/zxSQCXXvOIM).
40 | * Download the contents of the [GitHub repository](https://ter.ps/noaapy) to your computer.
41 | * Launch Jupyter Notebooks from the Anaconda Navigator. This will open a window in your default browser. Navigate to the folder that contains the notebooks (*.ipynb) and click on the tutorial for the day.
42 | * New to Jupyter? Here's a video tutorial on [YouTube](https://youtu.be/gmMCuR9JPpY).
43 | * Additional packages:
44 | * Launch the Anaconda Prompt (Windows) or Terminal (MacOS/Linux). Then copy/paste and hit enter:
45 | ```
46 | conda install -c conda-forge cartopy
47 | conda install -c conda-forge netCDF4
48 | conda install -c conda-forge xarray
49 | conda install -c conda-forge pygrib
50 | ```
51 | * If there are no errors, then you are set-up!
52 | * Alternatively, if you are familiar with environments, you can use the environments.yml to install the necessary packages. You can do this in the terminal using:
53 |
54 | ```
55 | conda env create -f environment.yml
56 | ```
57 | Then, switch to the new environment (conda activate python-workshop) once the installation is complete.
58 |
59 | I *do not* recommend:
60 | * Using Python on a remote server for this tutorial (I cannot help troubleshoot)
61 | * Using your operating system's Python or a shared Python installations unless you are advanced!
62 |
63 | ---
64 | ## Course Philosophy
65 |
66 | * Increase accessibility of satellite data and analysis
67 | * Teach Python using practical examples and real-world datasets
68 | * Promote reproducible and transparent scientific research
69 |
70 | ## Resources
71 |
72 | ### Packages and Tutorials
73 |
74 | Pandas
75 | * Short Introduction: https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html
76 | * Cookbook for more details: https://pandas.pydata.org/pandas-docs/stable/user_guide/cookbook.html#cookbook
77 |
78 | ---
79 | Matplotlib
80 | * Pyplot Tutorial: https://matplotlib.org/3.1.1/tutorials/introductory/pyplot.html
81 |
82 | ---
83 | Reading self describing file
84 | * NETCDF
85 | * Detailed tutorial on the netCDF4 package: https://unidata.github.io/netcdf4-python.
86 | * Xarray tutorial: https://xarray-contrib.github.io/xarray-tutorial/
87 | * HDF files
88 | * The package [h5py](https://www.h5py.org/) is similar to netcdf4.
89 | * User manual at http://docs.h5py.org/en/stable/.
90 | * Xarray can also open HDF files!
91 | * GRIB/GRIB2 files
92 | * World Meteorology Association standard format, e.g. commonly used with weather-related models like ECMWF and GFS.
93 | * Can be opened using [pygrib](https://github.com/jswhit/pygrib).
94 | * Example usage at https://jswhit.github.io/pygrib/docs/.
95 | * BUFR
96 | * Another common table-driven format.
97 | * Open with [python-bufr](https://github.com/pytroll/python-bufr), part of the pytroll project.
98 | ---
99 |
100 | ### General Python resources
101 |
102 | Beginner Tutorials
103 |
104 | * Youtube series for absolute beginners [CS Dojo](https://www.youtube.com/watch?v=Z1Yd7upQsXY&list=PLBZBJbE_rGRWeh5mIBhD-hhDwSEDxogDg)
105 |
106 | * [Research Software Engineering with Python](https://merely-useful.tech/py-rse/) Free eBook to enhance your workflow.
107 |
108 | Intermediate Tutorials
109 |
110 | * Last year's workshop, [Python for Earth Science with Rebekah](https://youtube.com/playlist?list=PLlcgQ3Rl-9fR4oOmfeKPKHuk2Lj57bNJy), is available online. I'll upload this one once available.
111 |
112 | * [Python for Climate and Meteorology](https://www.youtube.com/watch?v=uQZAEPnUZ5o) Another tutorial taught at AMS, a little more advanced.
113 |
114 | * Learn more about [Python for Atmosphere and Ocean Scientists](https://carpentries-lab.github.io/python-aos-lesson/) using Software Carpentry lesson plans.
115 |
116 | * [Earth Observation using Python](https://www.wiley.com/en-us/Earth+Observation+using+Python%3A+A+Practical+Programming+Guide-p-9781119606888) is a book I wrote that builds on the content of the workshop.
117 |
118 | ### Acknowledgements
119 |
120 | Special thanks to past contributors, [Kriti Bhargava](https://cisess.umd.edu/meet-our-scientists/kriti-bhargava/) and [Eviatar Bach](http://eviatarbach.com/)!
121 |
--------------------------------------------------------------------------------
/data/3B-HHR.MS.MRG.3IMERG.20160811-S233000-E235959.1410.V06B.HDF5:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/modern-tools-workshop/AGU-python-workshop-2021/666cdddba187d30c4a68418cdbc6b31892c19816/data/3B-HHR.MS.MRG.3IMERG.20160811-S233000-E235959.1410.V06B.HDF5
--------------------------------------------------------------------------------
/data/3B-HHR.MS.MRG.3IMERG.20160811-S233000-E235959.1410.V06B_thinned.nc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/modern-tools-workshop/AGU-python-workshop-2021/666cdddba187d30c4a68418cdbc6b31892c19816/data/3B-HHR.MS.MRG.3IMERG.20160811-S233000-E235959.1410.V06B_thinned.nc
--------------------------------------------------------------------------------
/data/JRR-AOD_v2r3_j01_s202009152044026_e202009152045271_c202009152113150_thinned.nc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/modern-tools-workshop/AGU-python-workshop-2021/666cdddba187d30c4a68418cdbc6b31892c19816/data/JRR-AOD_v2r3_j01_s202009152044026_e202009152045271_c202009152113150_thinned.nc
--------------------------------------------------------------------------------
/data/MOP03JM-201811-L3V95.6.3_thinned.nc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/modern-tools-workshop/AGU-python-workshop-2021/666cdddba187d30c4a68418cdbc6b31892c19816/data/MOP03JM-201811-L3V95.6.3_thinned.nc
--------------------------------------------------------------------------------
/data/gfs_3_20200915_0000_000.grb2:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/modern-tools-workshop/AGU-python-workshop-2021/666cdddba187d30c4a68418cdbc6b31892c19816/data/gfs_3_20200915_0000_000.grb2
--------------------------------------------------------------------------------
/data/meso/OR_ABI-L1b-RadM1-M3C02_G16_s20182822019282_e20182822019339_c20182822019374.nc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/modern-tools-workshop/AGU-python-workshop-2021/666cdddba187d30c4a68418cdbc6b31892c19816/data/meso/OR_ABI-L1b-RadM1-M3C02_G16_s20182822019282_e20182822019339_c20182822019374.nc
--------------------------------------------------------------------------------
/data/meso/OR_ABI-L1b-RadM1-M3C13_G16_s20182822019282_e20182822019350_c20182822019384.nc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/modern-tools-workshop/AGU-python-workshop-2021/666cdddba187d30c4a68418cdbc6b31892c19816/data/meso/OR_ABI-L1b-RadM1-M3C13_G16_s20182822019282_e20182822019350_c20182822019384.nc
--------------------------------------------------------------------------------
/data/meso/OR_ABI-L1b-RadM1-M6C02_G16_s20192091147504_e20192091147562_c20192091147599.nc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/modern-tools-workshop/AGU-python-workshop-2021/666cdddba187d30c4a68418cdbc6b31892c19816/data/meso/OR_ABI-L1b-RadM1-M6C02_G16_s20192091147504_e20192091147562_c20192091147599.nc
--------------------------------------------------------------------------------
/data/meso/OR_ABI-L1b-RadM1-M6C03_G16_s20192091147504_e20192091147562_c20192091148025.nc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/modern-tools-workshop/AGU-python-workshop-2021/666cdddba187d30c4a68418cdbc6b31892c19816/data/meso/OR_ABI-L1b-RadM1-M6C03_G16_s20192091147504_e20192091147562_c20192091148025.nc
--------------------------------------------------------------------------------
/data/sst.mon.ltm.1981-2010.nc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/modern-tools-workshop/AGU-python-workshop-2021/666cdddba187d30c4a68418cdbc6b31892c19816/data/sst.mon.ltm.1981-2010.nc
--------------------------------------------------------------------------------
/environment.yml:
--------------------------------------------------------------------------------
1 | name: python-workshop
2 | channels:
3 | - conda-forge
4 | dependencies:
5 | - python=3.7
6 | - numpy
7 | - matplotlib
8 | - pandas
9 | - cartopy
10 | - netcdf4
11 | - pyproj
12 | - eccodes
13 | - cython
14 | - pygrib
15 | - notebook
16 | - scipy
17 | - xarray
18 |
--------------------------------------------------------------------------------
/installation/install_python_run_notebook.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/modern-tools-workshop/AGU-python-workshop-2021/666cdddba187d30c4a68418cdbc6b31892c19816/installation/install_python_run_notebook.pdf
--------------------------------------------------------------------------------
/sample_script.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | # coding: utf-8
3 |
4 | import numpy as np
5 | import pandas as pd
6 | import matplotlib.pyplot as plt
7 | from netCDF4 import Dataset
8 | from cartopy import crs as ccrs
9 |
10 | # Open file
11 | fname='data/JRR-AOD_v2r3_j01_s202009152044026_e202009152045271_c202009152113150_thinned.nc'
12 | aod_file_id = Dataset(fname)
13 |
14 | # Import variables
15 | AOD_550 = aod_file_id.variables['AOD550'][:,:]
16 | AOD_lat = aod_file_id.variables['Latitude'][:,:]
17 | AOD_lon = aod_file_id.variables['Longitude'][:,:]
18 |
19 | # Make figure
20 | fig = plt.figure()
21 | ax = plt.subplot()
22 | co_plot = ax.contourf(AOD_lon, AOD_lat, AOD_550)
23 | fig.colorbar(co_plot, orientation='horizontal')
24 | plt.savefig("AOD_plot.png")
25 |
--------------------------------------------------------------------------------