├── .gitattributes ├── .gitignore ├── 01_Basic_Data_Analysis_and_Visualization.ipynb ├── 02_Science_Data_Formats_and_Advanced_Plotting.ipynb ├── 03_Remote_Datasets_and_Exporting.ipynb ├── LICENSE.md ├── README.md ├── Solutions_to_Exercises.ipynb ├── data ├── 20200901_20200930_Monterey.lev15.csv ├── JRR-AOD_v2r3_j01_s202009152044026_e202009152045271_c202009152113150_thinned.nc ├── MOP03JM-201811-L3V95.6.3_thinned.nc ├── VIIRSNDE_global2020258.v1.0.txt ├── gfs_3_20200915_0000_000.grb2 └── sst.mon.ltm.1981-2010.nc ├── environment.yml ├── img └── flowchart.png └── sample_script.py /.gitattributes: -------------------------------------------------------------------------------- 1 | 01_Basic_Data_Analysis_and_Visualization.ipynb filter=strip-notebook-output 2 | 02_Science_Data_Formats_and_Advanced_Plotting.ipynb filter=strip-notebook-output 3 | 03_Remote_Datasets_and_Exporting.ipynb filter=strip-notebook-output -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | 2 | .ipynb_checkpoints/* 3 | satnames.npz 4 | satellites.csv 5 | SRB.png 6 | -------------------------------------------------------------------------------- /01_Basic_Data_Analysis_and_Visualization.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Lesson 1: Basic Data Analysis and Visualization \n", 8 | "\n", 9 | "Author: Rebekah Esmaili (rebekah.esmaili@gmail.com)\n", 10 | " \n", 11 | "---\n", 12 | "\n", 13 | "\n", 14 | "## Why Python?\n", 15 | "\n", 16 | "Pros\n", 17 | "\n", 18 | "* General-purpose, cross-platform\n", 19 | "* Free and open source\n", 20 | "* Reasonably easy to learn\n", 21 | "* Expressive and succinct code, forces good style\n", 22 | "* Being interpreted and dynamically typed makes it great for data analysis\n", 23 | "* Robust ecosystem of scientific libraries, including powerful statistical and visualization packages\n", 24 | "* Large community of scientific users and large existing codebases\n", 25 | "* Major investment into Python ecosystem by Earth science research agencies, including NASA, NCAR, UK Met Office, and Lamont-Doherty Earth Observatory. See Pangeo.\n", 26 | "* Reads Earth science data formats like HDF, NetCDF, GRIB\n", 27 | "\n", 28 | "Cons\n", 29 | "\n", 30 | "* Performance penalties for interpreted languages, although many libraries are wrappers for compiled languages. Avoid large loops in favor of matrix/vector operations when possible.\n", 31 | "* Multithreading is limited due to the Global Interpreter Lock, but other parallelism is available\n", 32 | "* See Julia for a modern scientific language which is trying to overcome these challenges\n", 33 | "\n", 34 | "Why you should not use Python 2.7:\n", 35 | "\n", 36 | "* Python 2 reached it's \"end of life\" as of January 2020\n", 37 | "* No more updates or bugfixes\n", 38 | "* No further official support\n", 39 | "* Subtle differences: https://www.geeksforgeeks.org/important-differences-between-python-2-x-and-python-3-x-with-examples/\n", 40 | "\n", 41 | "---\n", 42 | "\n", 43 | "## Lesson Objectives\n", 44 | "\n", 45 | "* You will learn to:\n", 46 | " * Import relevant packages for scientific programming\n", 47 | " * Read ascii data\n", 48 | " * Basic plotting and visualization\n", 49 | " \n", 50 | "---\n", 51 | "\n", 52 | "## Basic Python Syntax" 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "The most basic Python command is to write words to the screen. In jupyter notebooks, the result will appear below the line of code. To run the above command in Jupyter notebook, highlight the cell and either chick the run button (►) or press the **Shift** and **Enter** keys" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": null, 65 | "metadata": {}, 66 | "outputs": [], 67 | "source": [ 68 | "# This is a comment, python will not run this!\n", 69 | "print(\"Hello Earth\")" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "In Python, variables are dynamically allocated, which means that you do not need to declare the type or size prior to storing data in them. Instead, Python will automatically guess the variable type based on the content of what you are assigning:" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": null, 82 | "metadata": {}, 83 | "outputs": [], 84 | "source": [ 85 | "var_int = 8\n", 86 | "var_float = 15.0\n", 87 | "var_scifloat = 4e8\n", 88 | "var_complex = complex(4, 2)\n", 89 | "var_greetings = 'Hello Earth'" 90 | ] 91 | }, 92 | { 93 | "cell_type": "markdown", 94 | "metadata": {}, 95 | "source": [ 96 | "Python has many built in functions, the syntax is usually:\n", 97 | "\n", 98 | "```\n", 99 | "function_name(inputs)\n", 100 | "```\n", 101 | "You have already used two functions: *print()* and *complex()*. Another useful function is *type()*, will tell us if the variable is an integer, a float, a complex number, or a string. " 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": null, 107 | "metadata": {}, 108 | "outputs": [], 109 | "source": [ 110 | " type(var_int), type(var_float), type(var_scifloat), type(var_complex), type(var_greetings)" 111 | ] 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "metadata": {}, 116 | "source": [ 117 | "Python has the following built-in operators:\n", 118 | "\n", 119 | "* Addition, subtraction, multiplication, division: +, -, *, /\n", 120 | "* Exponential, integer division, modulus: \\**, //, %\n", 121 | "\n" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": null, 127 | "metadata": {}, 128 | "outputs": [], 129 | "source": [ 130 | "2+2.0, var_int**2, var_float//var_int, var_float%var_int" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": {}, 136 | "source": [ 137 | "---\n", 138 | "\n", 139 | "**Exercise 1:** Learning to use notebooks\n", 140 | "\n", 141 | "1. Launch Jupyter Notebook and create a new notebook\n", 142 | "2. Rename the notebook\n", 143 | "3. Create a new cell and use *type()* to see if the following are floats and integers:\n", 144 | " * 2+2\n", 145 | " * 2\\*2.0\n", 146 | " * var_float/var_int\n", 147 | "---\n", 148 | "**Solution:**" 149 | ] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "execution_count": null, 154 | "metadata": {}, 155 | "outputs": [], 156 | "source": [] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "metadata": {}, 161 | "source": [ 162 | "## Working with lists\n", 163 | "\n", 164 | "Lists are useful for storing scientific data. Lists are made using square brackets. They can hold any data type (integers, floats, and strings) and even mixtures of the two." 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": null, 170 | "metadata": {}, 171 | "outputs": [], 172 | "source": [ 173 | "numbers_list = [4, 8, 15, 16, 23]" 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "You can access elements of the list using the index. Python is zero based, so index 0 retrieves the first element." 181 | ] 182 | }, 183 | { 184 | "cell_type": "code", 185 | "execution_count": null, 186 | "metadata": {}, 187 | "outputs": [], 188 | "source": [ 189 | "numbers_list[3]" 190 | ] 191 | }, 192 | { 193 | "cell_type": "markdown", 194 | "metadata": {}, 195 | "source": [ 196 | "New items can also be appended to the list using the append function, which has the syntax:\n", 197 | "\n", 198 | "```\n", 199 | "variable.function(element(s))\n", 200 | "```\n", 201 | "The list will be updated *in-place*." 202 | ] 203 | }, 204 | { 205 | "cell_type": "code", 206 | "execution_count": null, 207 | "metadata": {}, 208 | "outputs": [], 209 | "source": [ 210 | "numbers_list.append(42)\n", 211 | "print(numbers_list)" 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "metadata": {}, 217 | "source": [ 218 | "Perhaps we want to calculate the sum of the values in two lists. However, we cannot use the *+* like we did with single values. For list objects, the + will *combine* lists." 219 | ] 220 | }, 221 | { 222 | "cell_type": "code", 223 | "execution_count": null, 224 | "metadata": {}, 225 | "outputs": [], 226 | "source": [ 227 | "numbers_list" 228 | ] 229 | }, 230 | { 231 | "cell_type": "markdown", 232 | "metadata": {}, 233 | "source": [ 234 | "To perform mathematical operations, you can convert the above list to an array using the NumPy package." 235 | ] 236 | }, 237 | { 238 | "cell_type": "markdown", 239 | "metadata": {}, 240 | "source": [ 241 | "### Importing Packages\n", 242 | "\n", 243 | "Packages are collection of modules, which help simplify common tasks. [NumPy](https://numpy.org/) is useful for mathematical operations and array manipulation.\n", 244 | "\n", 245 | "\n", 246 | "* Provides a high-performance multidimensional array object and tools for working with these arrays.\n", 247 | "* Fundamental package for scientific computing with Python.\n", 248 | "* Included with with the Anaconda package manager.\n", 249 | "* For more examples than presented below, please refer [the NumPy Quick Start](https://numpy.org/devdocs/user/quickstart.html)\n", 250 | "\n", 251 | "The basic syntax for calling packages is to type the import \\[package name\\]. However, some packages have long names, so you can use import \\[package name\\] as \\[alias\\]." 252 | ] 253 | }, 254 | { 255 | "cell_type": "code", 256 | "execution_count": null, 257 | "metadata": {}, 258 | "outputs": [], 259 | "source": [ 260 | "import numpy as np" 261 | ] 262 | }, 263 | { 264 | "cell_type": "markdown", 265 | "metadata": {}, 266 | "source": [ 267 | "If you do not see any error after running the line above, then the package was successfully imported.\n", 268 | "### Working with arrays\n", 269 | "\n", 270 | "I can use NumPy’s array constructor *np.array()* to convert our list to a NumPy array and perform the matrix multiplication. For example, I can double each element of the array:" 271 | ] 272 | }, 273 | { 274 | "cell_type": "code", 275 | "execution_count": null, 276 | "metadata": {}, 277 | "outputs": [], 278 | "source": [ 279 | "numbers_array = np.array(numbers_list)\n", 280 | "numbers_array*2" 281 | ] 282 | }, 283 | { 284 | "cell_type": "markdown", 285 | "metadata": {}, 286 | "source": [ 287 | "Another difference between arrays and lists is that lists are only one-dimensional. NumPy can be any number of dimensions. For example, I can change the dimensions of the data using the *reshape()* function:" 288 | ] 289 | }, 290 | { 291 | "cell_type": "code", 292 | "execution_count": null, 293 | "metadata": {}, 294 | "outputs": [], 295 | "source": [ 296 | "numbers_array_2d = numbers_array.reshape(3,2)\n", 297 | "numbers_array_2d" 298 | ] 299 | }, 300 | { 301 | "cell_type": "code", 302 | "execution_count": null, 303 | "metadata": {}, 304 | "outputs": [], 305 | "source": [ 306 | "numbers_array_2d.shape" 307 | ] 308 | }, 309 | { 310 | "cell_type": "markdown", 311 | "metadata": {}, 312 | "source": [ 313 | "The original numbers_array had 6 elements in a 1D array, the new array is 2D with 3 rows and 2 columns." 314 | ] 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "metadata": {}, 319 | "source": [ 320 | "### Reading ASCII data\n", 321 | "\n", 322 | "The Pandas package has a useful function for reading text/ascii data called *read_csv()*. The function name is somewhat a misnomer, as *read_csv* will read any delimited data using the *delim=* keyword argument. Below, you will import the [Pandas](https://pandas.pydata.org/) package and we will read in a dataset. Note that the path below is relative to the current notebook and you may have to change the code if you are running locally on your computer:\n", 323 | "\n", 324 | "```\n", 325 | "data/VIIRSNDE_global2020258.v1.0.txt\n", 326 | "```\n", 327 | "\n", 328 | "We will look at the Visible Infrared Imaging Radiometer Suite (VIIRS) Active Fire product, a product that classifies if a pixel contains fire with various confidence levels. More information can be found at https://www.ospo.noaa.gov/Products/land/fire.html. We will examine the data on Sept 15, 2020 (day of year 258)." 329 | ] 330 | }, 331 | { 332 | "cell_type": "code", 333 | "execution_count": null, 334 | "metadata": {}, 335 | "outputs": [], 336 | "source": [ 337 | "import pandas as pd" 338 | ] 339 | }, 340 | { 341 | "cell_type": "markdown", 342 | "metadata": {}, 343 | "source": [ 344 | "The default seperator is a comma (,), however my data also contains space. I use the \"\\s*\" to indicate space following the comma should be ignored. The engine=\"python\" keyword ensures that this will work across different operating systems." 345 | ] 346 | }, 347 | { 348 | "cell_type": "code", 349 | "execution_count": null, 350 | "metadata": {}, 351 | "outputs": [], 352 | "source": [ 353 | "fname = \"data/VIIRSNDE_global2020258.v1.0.txt\"\n", 354 | "fires = pd.read_csv(fname, sep=',\\s*', engine='python')" 355 | ] 356 | }, 357 | { 358 | "cell_type": "markdown", 359 | "metadata": {}, 360 | "source": [ 361 | "You can inspect the contents within the notebook using the *head()* function, which will return the first five rows of the dataset. Pandas automatically stores data in structures called *DataFrames*. DataFrames are two dimensional (rows and columns) and resemble a spreadsheet. The leftmost column is the row index and is not part of the *fires* dataset. " 362 | ] 363 | }, 364 | { 365 | "cell_type": "code", 366 | "execution_count": null, 367 | "metadata": {}, 368 | "outputs": [], 369 | "source": [ 370 | "fires.head()" 371 | ] 372 | }, 373 | { 374 | "cell_type": "markdown", 375 | "metadata": {}, 376 | "source": [ 377 | "You can access individual columns of data using the column name. For example, below you can extract the pixel brightness temperature (brt):" 378 | ] 379 | }, 380 | { 381 | "cell_type": "code", 382 | "execution_count": null, 383 | "metadata": {}, 384 | "outputs": [], 385 | "source": [ 386 | "fires[\"brt_t13(K)\"]" 387 | ] 388 | }, 389 | { 390 | "cell_type": "markdown", 391 | "metadata": {}, 392 | "source": [ 393 | "---\n", 394 | "\n", 395 | "**Exercise 2:** Import an ascii file\n", 396 | "\n", 397 | "1. Import the dataset \"20200901_20200930_Monterey.lev15.csv\" and save it to a variable called *aeronet*.\n", 398 | "2. Print the first few lines using *.head()*\n", 399 | "3. Find a column that doesn't have only missing values (-999) and (challenge!) calculate the mean using the following syntax *variable\\[\"column\"\\].mean()*\n", 400 | "---\n", 401 | "**Solution:**" 402 | ] 403 | }, 404 | { 405 | "cell_type": "code", 406 | "execution_count": null, 407 | "metadata": {}, 408 | "outputs": [], 409 | "source": [] 410 | }, 411 | { 412 | "cell_type": "markdown", 413 | "metadata": {}, 414 | "source": [ 415 | "### Working with masks and masked arrays\n", 416 | "\n", 417 | "When working with data, sometimes there are numbers I want to remove. For instance, I may want to work with data below a certain threshold. You can subset the data using identity operations:\n", 418 | "\n", 419 | "* less than: <\n", 420 | "* less than or equal to: <=\n", 421 | "* greater than: >\n", 422 | "* greater than or equal to: >=\n", 423 | "* equals: ==\n", 424 | "* not equals: !=\n", 425 | "\n", 426 | "Their use will return either a True or False statement. For the *fires* dataset, you can find which elements of the array that meet some condition, such as only examining larger fires that have a Fire Radiative Power (FRP) above 50 MW:" 427 | ] 428 | }, 429 | { 430 | "cell_type": "code", 431 | "execution_count": null, 432 | "metadata": {}, 433 | "outputs": [], 434 | "source": [ 435 | "masked_nums = (fires['frp(MW)'] > 50)\n", 436 | "print(masked_nums)" 437 | ] 438 | }, 439 | { 440 | "cell_type": "markdown", 441 | "metadata": {}, 442 | "source": [ 443 | "Sometimes you may want to filter by two conditions. For example, insteading of filtering the FRP data, you may only want to examine values within a latitude and longitude domain. In Python, I can combine multiple conditions using and (&) and or (|) statements. Below, I extract the data in 5°x5° box arond Monterey, California:" 444 | ] 445 | }, 446 | { 447 | "cell_type": "code", 448 | "execution_count": null, 449 | "metadata": {}, 450 | "outputs": [], 451 | "source": [ 452 | "masked_nums = (fires['Lat'] > 35.0) & (fires['Lat'] < 40.0) & (fires['Lon'] > -125.0) & (fires['Lon'] < -120.0)\n", 453 | "print(masked_nums)" 454 | ] 455 | }, 456 | { 457 | "cell_type": "markdown", 458 | "metadata": {}, 459 | "source": [ 460 | "The above mask can be used in place of an index. Below, you can create a new variable that takes the FRP using the *fires\\['frp(MW)'\\]* variable and subsets it with the array of *masked_nums*:" 461 | ] 462 | }, 463 | { 464 | "cell_type": "code", 465 | "execution_count": null, 466 | "metadata": {}, 467 | "outputs": [], 468 | "source": [ 469 | "monterey_fires = fires['frp(MW)'][masked_nums]\n", 470 | "print(monterey_fires)" 471 | ] 472 | }, 473 | { 474 | "cell_type": "markdown", 475 | "metadata": {}, 476 | "source": [ 477 | "From this new variable, you can compute the average in this region and compare them to the global average for that day:" 478 | ] 479 | }, 480 | { 481 | "cell_type": "code", 482 | "execution_count": null, 483 | "metadata": {}, 484 | "outputs": [], 485 | "source": [ 486 | "monterey_fires.mean(), fires['frp(MW)'].mean()" 487 | ] 488 | }, 489 | { 490 | "cell_type": "markdown", 491 | "metadata": {}, 492 | "source": [ 493 | "You can use the size command to compare the dimensions of original array and the one that filtered out values that were outside of our latitude and longitude bounds. You will notice that these two arrays have different sizes." 494 | ] 495 | }, 496 | { 497 | "cell_type": "code", 498 | "execution_count": null, 499 | "metadata": {}, 500 | "outputs": [], 501 | "source": [ 502 | "fires['frp(MW)'].size, monterey_fires.size" 503 | ] 504 | }, 505 | { 506 | "cell_type": "markdown", 507 | "metadata": {}, 508 | "source": [ 509 | "There are cases where you will want to preserve the size and shape of the original array. For these situations, you can utilize the NumPy *masked array* module. The syntax is *np.ma.array()*, and you will add a keyword argument *mask=*, which is set to the inverse (~) of the *mask_nums*." 510 | ] 511 | }, 512 | { 513 | "cell_type": "code", 514 | "execution_count": null, 515 | "metadata": {}, 516 | "outputs": [], 517 | "source": [ 518 | "monterey_fires_ma = np.ma.array(fires['frp(MW)'], mask=~masked_nums, fill_value=-999)\n", 519 | "monterey_fires_ma" 520 | ] 521 | }, 522 | { 523 | "cell_type": "markdown", 524 | "metadata": {}, 525 | "source": [ 526 | "Then, you can calculate the mean values and confirm that they are the same as the previous example:" 527 | ] 528 | }, 529 | { 530 | "cell_type": "code", 531 | "execution_count": null, 532 | "metadata": {}, 533 | "outputs": [], 534 | "source": [ 535 | "monterey_fires_ma.mean()" 536 | ] 537 | }, 538 | { 539 | "cell_type": "markdown", 540 | "metadata": {}, 541 | "source": [ 542 | "However, the key difference will be the size, which retains the shape of the unmasked data:" 543 | ] 544 | }, 545 | { 546 | "cell_type": "code", 547 | "execution_count": null, 548 | "metadata": {}, 549 | "outputs": [], 550 | "source": [ 551 | "monterey_fires_ma.size" 552 | ] 553 | }, 554 | { 555 | "cell_type": "markdown", 556 | "metadata": {}, 557 | "source": [ 558 | "---\n", 559 | "**Exercise 3:** Filtering data\n", 560 | "\n", 561 | "Using the dataset imported in the previous example (*aeronet*):\n", 562 | " \n", 563 | "1. Create a mask that filters the \"AOD_870nm\" column to only include values that are above 0.\n", 564 | "2. Create a new variables, *day_of_year*, with the mask applied to aeronet\\[\"Day_of_Year(Fraction)\"\\].\n", 565 | "3. Create a new variables, *aod_870*, with the mask applied to aeronet\\[\"AOD_870nm\"\\].\n", 566 | "4. Compare the mean value of *aeronet\\[\"AOD_870nm\"\\]* to *aod_870*.\n", 567 | " \n", 568 | "---\n", 569 | "**Solution**" 570 | ] 571 | }, 572 | { 573 | "cell_type": "code", 574 | "execution_count": null, 575 | "metadata": {}, 576 | "outputs": [], 577 | "source": [] 578 | }, 579 | { 580 | "cell_type": "markdown", 581 | "metadata": {}, 582 | "source": [ 583 | "### Basic figures and plots\n", 584 | "\n", 585 | "Python has several packages to create visuals for remote sensing data, either in the form of imagery or plots of relevant analysis. Of these, the most widely used and oldest packages is [Matplotlib](https://matplotlib.org/). Matplotlib plots are highly customizable and has additional toolkits that can enhance functionality, such as creating maps using the [Cartopy](https://scitools.org.uk/cartopy/docs/latest/) package, which I will describe more in the next session.\n" 586 | ] 587 | }, 588 | { 589 | "cell_type": "code", 590 | "execution_count": null, 591 | "metadata": {}, 592 | "outputs": [], 593 | "source": [ 594 | "import matplotlib.pyplot as plt" 595 | ] 596 | }, 597 | { 598 | "cell_type": "markdown", 599 | "metadata": {}, 600 | "source": [ 601 | "Suppose you want to learn what the global distribution of fire radiative power is. From inspecting the frp(MW) column earlier, these values extend to many decimal places. Rather than use a continuous scale, I can instead group in the data into 10 MW bins, from 0 to 500 MW:" 602 | ] 603 | }, 604 | { 605 | "cell_type": "code", 606 | "execution_count": null, 607 | "metadata": {}, 608 | "outputs": [], 609 | "source": [ 610 | "bins10MW = np.arange(0, 500, 10)" 611 | ] 612 | }, 613 | { 614 | "cell_type": "markdown", 615 | "metadata": {}, 616 | "source": [ 617 | "I can use these bins to create a histogram. Line by line, the code below will do as follows. Each additional line is layering elements on this empty graphic. The entire block of code must be run at once and not split into multiple cells. \n", 618 | "\n", 619 | "1. *plt.figure()* creates a blank canvas.\n", 620 | "2. I add the histogram to the figure using *plt.hist()*, which automatically will count the number of rows with fire radiative power in the bins that I defined above in the bins10W variable. I must then pass in the data (fires['frp(MW)']) and the bins (bins10MW) into plt.hist. \n", 621 | "3. *plt.show()* tells matplotlib the plot is now complete and to render it:" 622 | ] 623 | }, 624 | { 625 | "cell_type": "code", 626 | "execution_count": null, 627 | "metadata": {}, 628 | "outputs": [], 629 | "source": [ 630 | "plt.figure(figsize=[5,5])\n", 631 | "plt.hist(fires['frp(MW)'], bins=bins10MW)\n", 632 | "plt.show()" 633 | ] 634 | }, 635 | { 636 | "cell_type": "markdown", 637 | "metadata": {}, 638 | "source": [ 639 | "Below, you will remake this plot but add some aesthetic additions, such as labels to the x and y axis using *set_xlabel()* and *set_ylabel()*. Since there are thousands more fires with fire radiative power less than 100 MW than fires with higher values the data are likely lognormal. The plot will be easier to interpret of I rescale the y-axis to a log scale while leaving the x-axis linear.\n", 640 | "\n", 641 | "The command *plt.subplot()* will return an axis object to a variable (*ax*). There are three numbers passed in (111), which correspond to rows, columns, and index. In this example, there is one row and one column, and therefore, only one index." 642 | ] 643 | }, 644 | { 645 | "cell_type": "code", 646 | "execution_count": null, 647 | "metadata": {}, 648 | "outputs": [], 649 | "source": [ 650 | "plt.figure()\n", 651 | "\n", 652 | "ax = plt.subplot(111)\n", 653 | "\n", 654 | "ax.hist(fires['frp(MW)'], bins=bins10MW)\n", 655 | "\n", 656 | "ax.set_yscale('log')\n", 657 | "\n", 658 | "ax.set_xlabel(\"Fire Radiative Power (MW)\")\n", 659 | "ax.set_ylabel(\"Counts\")\n", 660 | "\n", 661 | "plt.show()" 662 | ] 663 | }, 664 | { 665 | "cell_type": "markdown", 666 | "metadata": {}, 667 | "source": [ 668 | "You can also plot the data in 2-dimensions. For example, each row in *fires* has a latitude and longitude coordinates pair. I will take these two coordinates and plot using *plt.scatter()*. The first argument is the x-coordinate and the second is the y-coordinate (the order matters). \n", 669 | "\n", 670 | "There are some command line options *plt.scatter()*:\n", 671 | "\n", 672 | "* s: size with respect to the default\n", 673 | "* c: color, which can be either from a predefined name list or a hexadecimal value\n", 674 | "* alpha: opacity, where smaller values are transparent.\n", 675 | "\n", 676 | "Like in the previous example, I have chosen to label the latitude and longitude axes:" 677 | ] 678 | }, 679 | { 680 | "cell_type": "code", 681 | "execution_count": null, 682 | "metadata": {}, 683 | "outputs": [], 684 | "source": [ 685 | "fig = plt.figure()\n", 686 | "ax = plt.subplot(111)\n", 687 | "\n", 688 | "ax.scatter(fires['Lon'], fires['Lat'], s=0.5, c='black', alpha=0.1)\n", 689 | "\n", 690 | "ax.set_xlabel('Longitude')\n", 691 | "ax.set_ylabel('Latitude')\n", 692 | "\n", 693 | "plt.show()" 694 | ] 695 | }, 696 | { 697 | "cell_type": "markdown", 698 | "metadata": {}, 699 | "source": [ 700 | "You can almost see the outline of the continents from the data above. In the next session, you will learn how to overlay maps onto your plots." 701 | ] 702 | }, 703 | { 704 | "cell_type": "markdown", 705 | "metadata": {}, 706 | "source": [ 707 | "---\n", 708 | "**Exercise 4:** Create a scatterplot\n", 709 | "\n", 710 | "Use the variables *aod_870* and *day_of_year* that you made in Exercise 3 to:\n", 711 | "\n", 712 | "1. Create a scatter plot showing the *day_of_year* (x-axis) and *aod_870* (y-axis)\n", 713 | "2. Add y-axis and x-axis labels using *.set_xlabel()* and *.set_ylabel()*\n", 714 | "3. Adjust the color and size of the scatterplot\n", 715 | "---\n", 716 | "**Solution**" 717 | ] 718 | }, 719 | { 720 | "cell_type": "code", 721 | "execution_count": null, 722 | "metadata": {}, 723 | "outputs": [], 724 | "source": [] 725 | }, 726 | { 727 | "cell_type": "markdown", 728 | "metadata": {}, 729 | "source": [ 730 | "## Summary:\n", 731 | "\n", 732 | "You learned:\n", 733 | "* Very basic built-in Python functions and operations\n", 734 | "* How to import three packages: numpy, pandas, and matplotlib\n", 735 | "* Worked with arrays and lists\n", 736 | "* How to create a simple plot\n", 737 | "\n", 738 | "Next lesson:\n", 739 | "* More advanced plots, such as using maps\n", 740 | "* Importing scientific datasets, such as netcdf and grib" 741 | ] 742 | }, 743 | { 744 | "cell_type": "code", 745 | "execution_count": null, 746 | "metadata": {}, 747 | "outputs": [], 748 | "source": [] 749 | } 750 | ], 751 | "metadata": { 752 | "kernelspec": { 753 | "display_name": "Python 3.11.0 ('notebook_demo')", 754 | "language": "python", 755 | "name": "python3" 756 | }, 757 | "language_info": { 758 | "codemirror_mode": { 759 | "name": "ipython", 760 | "version": 3 761 | }, 762 | "file_extension": ".py", 763 | "mimetype": "text/x-python", 764 | "name": "python", 765 | "nbconvert_exporter": "python", 766 | "pygments_lexer": "ipython3", 767 | "version": "3.11.0" 768 | }, 769 | "vscode": { 770 | "interpreter": { 771 | "hash": "4589143d4cda0c8671911bd60c16dc1d10ec327722e7574bc882b745b51509b4" 772 | } 773 | } 774 | }, 775 | "nbformat": 4, 776 | "nbformat_minor": 4 777 | } 778 | -------------------------------------------------------------------------------- /02_Science_Data_Formats_and_Advanced_Plotting.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Lesson 2: Scientific Data Formats and Advanced Plotting\n", 8 | "\n", 9 | "Author: Rebekah Esmaili (rebekah.esmaili@gmail.com)\n", 10 | " \n", 11 | "---\n", 12 | "\n", 13 | "## Lesson Objectives\n", 14 | "* You will learn to:\n", 15 | " * Import relevant packages for scientific programming\n", 16 | " * Read netCDF and GRIB2 data\n", 17 | " * Creating plots and maps\n", 18 | " \n", 19 | "\n", 20 | "![](img/flowchart.png)\n", 21 | "\n" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "\n", 29 | "## Importing NetCDF files\n", 30 | "\n", 31 | "NetCDF and HDF are self-describing formats, which are structured binary data files and useful for storing other big datasets. Computationally, it is faster to read in binary-based datasets than text, which needs to be parsed before being stored into a computer’s memory. Because the files are more compact, they are cheaper to store large, long-term satellite data. Furthermore, information about the data (\"metadata\") can be stored inside the file themselves.\n", 32 | "\n", 33 | "Datasets:\n", 34 | "* JRR-AOD_v2r3_j01_s202009152044026_e202009152045271_c202009152113150_thinned.nc: A netCDF file that contains Aerosol Optical Depth (AOD) retrieved from a Suomi NPP overpass on 2020 9 Aug. For this workshop, unused fields were removed.\n", 35 | "* gfs_3_20200915_0000_000.grb2: A GRIB2 file that contains GFS analysis\n", 36 | "* MOP03JM-201811-L3V95.6.3_thinned.nc: The Nov 2018 CO monthly mean from the Measurement of Pollution in the Troposphere (MOPITT), which is an instrument on the Terra satellite.\n", 37 | " * NOTE: For this tutorial, the file was converted to a netCDF4 file and unused variable fields were removed. The original file is HDF5 MOP03JM-201811-L3V95.6.3.he5 and can be obtained from https://earthdata.nasa.gov/.\n", 38 | "* [NOAA Extended Reconstructed SST version 5 dataset (ERSST)](https://psl.noaa.gov/data/gridded/data.noaa.ersst.v5.html). Shows the global monthly mean ocean surface temperature from 1854-present using data collected from ocean buoys, ships, and climate modeled data.\n", 39 | "\n", 40 | "Many environmental dataset names are quite long. However, the dataset name is encoded to give us information about the contents. For example:\n", 41 | "\n", 42 | "```\n", 43 | "JRR-AOD_v2r3_j01_s202009152044026_e202009152045271_c202009152113150.nc\n", 44 | "```\n", 45 | "You can learn several important features of the dataset without opening it:\n", 46 | "\n", 47 | "* Prefix indicates the mission (JRR, for JPSS Risk Reduction)\n", 48 | "* Product (Aerosol Optical Depth, or AOD), algorithm version\n", 49 | "* Revision number (v1r1)\n", 50 | "* Satellite source (j01 for JPSS-1/NOAA-20)\n", 51 | "* Start (s), end (e), and creation (c) time, which are each followed by the year, month, day, hour, minute, and seconds (to one decimal place). \n", 52 | "\n", 53 | "First, import three commonly used packages in Python:" 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": null, 59 | "metadata": {}, 60 | "outputs": [], 61 | "source": [ 62 | "import numpy as np\n", 63 | "import pandas as pd\n", 64 | "import matplotlib.pyplot as plt" 65 | ] 66 | }, 67 | { 68 | "attachments": {}, 69 | "cell_type": "markdown", 70 | "metadata": {}, 71 | "source": [ 72 | "To begin, you need to first import [xarray](http://xarray.pydata.org/en/stable/io.html) which is tailored to open netCDF4 files and work with large arrays (like numpy and pandas). The [netCDF4 package](https://unidata.github.io/netcdf4-python/netCDF4/index.html) can also be used to import files. The [h5netcdf](https://github.com/h5netcdf/h5netcdf) is useful because it combines features of both netcdf4 and h5py so you can use one reader for two different file types." 73 | ] 74 | }, 75 | { 76 | "cell_type": "code", 77 | "execution_count": null, 78 | "metadata": {}, 79 | "outputs": [], 80 | "source": [ 81 | "import xarray as xr" 82 | ] 83 | }, 84 | { 85 | "attachments": {}, 86 | "cell_type": "markdown", 87 | "metadata": {}, 88 | "source": [ 89 | "Use the open_dataset function to import the above dataset. The engine option is used to read the files. Some possible file readers are \"netcdf4\", \"scipy\", \"pydap\", \"h5netcdf\", \"pynio\", \"cfgrib\", \"pseudonetcdf\", \"zarr\" but you also must have the packages installed. IN " 90 | ] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "execution_count": null, 95 | "metadata": {}, 96 | "outputs": [], 97 | "source": [ 98 | "fname='data/JRR-AOD_v2r3_j01_s202009152044026_e202009152045271_c202009152113150_thinned.nc'\n", 99 | "aod_file_id = xr.open_dataset(fname, engine='h5netcdf')" 100 | ] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": {}, 105 | "source": [ 106 | "If you print the contents of the file_id variable, you will get a long list of the global attributes, variables, dimensions, and much more." 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": null, 112 | "metadata": {}, 113 | "outputs": [], 114 | "source": [ 115 | "aod_file_id" 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "The output above is worth inspecting. Inside Jupyter Notebooks, xarray allows you to inspect the file contents. Clicking on the arrows will show a preview of the metadata. Note that you can also use tools like [Panoply](https://www.giss.nasa.gov/tools/panoply/) to inspect the contents of the netCDF file outside of Python.\n", 123 | "\n", 124 | "* __Dimensions__: The dimensions are named Rows and Columns, which are respectively 768 and 3200.\n", 125 | "\n", 126 | "* __Coordinates__: The coordinates are Latitude and Longitude. These are both two dimensions.\n", 127 | "\n", 128 | "* __Variables__: This file has only one variable, which is AOD550. It's dimensions are also Rows and Columns.\n", 129 | "\n", 130 | "* __Attributes__: netCDF4 [CF-1.5 conventions](https://cfconventions.org/). Some of the information that we saw in the file name is also present: this product is the *JPSS Risk Reduction Unique Aerosol Optical Depth* (title) *Level 2* product (processing_level) and the data was collected from the *NOAA-20* (satellite_name) *VIIRS* instrument (instrument_name). The *start* (time_coverage_start) and *end* times (time_coverage_end) metadata fields are consistent with the filename. I recommend that you read netCDF file header contents, especially the first time you are working with new data. " 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": {}, 136 | "source": [ 137 | "AOD is a unitless measure of the extinction of solar radiation by particles suspended in the atmosphere. High values of AOD can indicate the presence of dust, smoke, or another air pollutant while low values indicate a cleaner atmosphere.\n", 138 | "\n", 139 | "Xarray syntax will resemble both Pandas and Numpy. Unlike numpy, N-D arrays can be labeled. Instead of having to remember indices numbers, we can extract elements using their coordinate or variables names.\n", 140 | "\n", 141 | "Below I'll extract three important variables: AOD550, Latitude, and Longitude:" 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": null, 147 | "metadata": {}, 148 | "outputs": [], 149 | "source": [ 150 | "AOD_550 = aod_file_id['AOD550']\n", 151 | "AOD_lat = aod_file_id['Latitude']\n", 152 | "AOD_lon = aod_file_id['Longitude']" 153 | ] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "metadata": {}, 158 | "source": [ 159 | "Let's print AOD_550 below. This variable contains only a portion of the original data array:" 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": null, 165 | "metadata": {}, 166 | "outputs": [], 167 | "source": [ 168 | "AOD_550" 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": {}, 174 | "source": [ 175 | "Xarray uses NumPy as a dependency so so we can use numpy functions like *.mean()*. First we have to make sure it's in the right format. If you check the type of *AOD_550*, you can see it's a *numpy.ndarray.*" 176 | ] 177 | }, 178 | { 179 | "cell_type": "code", 180 | "execution_count": null, 181 | "metadata": {}, 182 | "outputs": [], 183 | "source": [ 184 | "type(AOD_550.values)" 185 | ] 186 | }, 187 | { 188 | "cell_type": "markdown", 189 | "metadata": {}, 190 | "source": [ 191 | "Xarray handles missing data automatically, so if we do statistics on the array, it will not include them:" 192 | ] 193 | }, 194 | { 195 | "cell_type": "code", 196 | "execution_count": null, 197 | "metadata": {}, 198 | "outputs": [], 199 | "source": [ 200 | "avgAOD = AOD_550.mean()\n", 201 | "print(avgAOD)" 202 | ] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "metadata": {}, 207 | "source": [ 208 | "---\n", 209 | "**Exercise 1**: Importing netCDF files\n", 210 | "1. Open the file \"MOP03JM-201811-L3V95.6.3_thinned.nc\" using the xarray library\n", 211 | "2. Print the variable names\n", 212 | "3. What are the dimensions?\n", 213 | "---\n", 214 | "\n", 215 | "**Solution:**" 216 | ] 217 | }, 218 | { 219 | "cell_type": "code", 220 | "execution_count": null, 221 | "metadata": {}, 222 | "outputs": [], 223 | "source": [] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": {}, 228 | "source": [ 229 | "## Importing GRIB2 files\n", 230 | "\n", 231 | "GRIB2 files is a binary datasets that take on a table-driven code form. \"Table driven\" means that the files require external tables to decode the data type. Thus, they are not self-describing. These files follow a methodology of encoding binary data and not a distinct file type. Binary Universal Form for the Representation of meteorological data (BUFR) and GRIdded Binary (GRIB) are two common table-driven formats in Earth Sciences. \n", 232 | "\n", 233 | "American NWS models (e.g. GFS, NAM, and HRRR) and the European (e.g. ECMWF) models are stored in GRIB2. While they share the same format, there are some differences in how each organization stores its data. GRIB2 are stored as binary variables with a header describing the data stored followed by the variable values.\n", 234 | "\n", 235 | "Currently, some of the GRIB2 decoders have problems parsing the American datasets because the American models have multiple pressure dimensions (depending on the variable) while the European models have one. Still, there are ways the data can be inspected by using the pygrib and cfgrib packages." 236 | ] 237 | }, 238 | { 239 | "cell_type": "code", 240 | "execution_count": null, 241 | "metadata": {}, 242 | "outputs": [], 243 | "source": [ 244 | "import pygrib" 245 | ] 246 | }, 247 | { 248 | "cell_type": "markdown", 249 | "metadata": {}, 250 | "source": [ 251 | "The pygrib package (Unidata) has an interface between Python and the GRIB-API (ECMWF). ECMWF has since ended support for the GRIB-API as the primary GRIB2 encoded and decoder and now use ecCodes. However, the package is still maintained by the developer (https://jswhit.github.io/pygrib/) and is useful for parsing NCEP weather forecast data." 252 | ] 253 | }, 254 | { 255 | "cell_type": "code", 256 | "execution_count": null, 257 | "metadata": {}, 258 | "outputs": [], 259 | "source": [ 260 | "filename = 'data/gfs_3_20200915_0000_000.grb2'\n", 261 | "gfs_grb2 = pygrib.open(filename)" 262 | ] 263 | }, 264 | { 265 | "cell_type": "markdown", 266 | "metadata": {}, 267 | "source": [ 268 | "This opens the file, but does not extract the elements:" 269 | ] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "execution_count": null, 274 | "metadata": {}, 275 | "outputs": [], 276 | "source": [ 277 | "type(gfs_grb2)" 278 | ] 279 | }, 280 | { 281 | "cell_type": "markdown", 282 | "metadata": {}, 283 | "source": [ 284 | "Below is a *for loop* in Python. The code block below will iterate over each item in the open dataset and append (using *.append*) them to a list (*records*). Note that if you run this command again, you will read to the end of the file, so there will be no result. You will have to re-open the command and re-run the block below.\n", 285 | "\n", 286 | "You can check the size of the final list using *len(messages)*:" 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": null, 292 | "metadata": {}, 293 | "outputs": [], 294 | "source": [ 295 | "records = []\n", 296 | "for grb in gfs_grb2:\n", 297 | " records.append(str(grb))\n", 298 | " \n", 299 | "len(records)" 300 | ] 301 | }, 302 | { 303 | "cell_type": "markdown", 304 | "metadata": {}, 305 | "source": [ 306 | "There are 522 individual data product definition in this file, so first let’s inspect the contents of one line to start:" 307 | ] 308 | }, 309 | { 310 | "cell_type": "code", 311 | "execution_count": null, 312 | "metadata": {}, 313 | "outputs": [], 314 | "source": [ 315 | "records[12]" 316 | ] 317 | }, 318 | { 319 | "cell_type": "markdown", 320 | "metadata": {}, 321 | "source": [ 322 | "From the output above, you can see that the colons (:) separate the sections of the product definition in this GRIB2 message. The elements are *index* (1), *variable name* and *units* (2-3), and *spatial*, *vertical*, and *temporal* definitions (4-8). There is one record for each *pressure level* and *time*. You can then extract all variables using the *.select(name=\\[variable\\])* command. Below, you select all the Temperature records (there are 46, which you can see by using the *len(temps)* command). Since it is a long list, you are only printing some of these below:" 323 | ] 324 | }, 325 | { 326 | "cell_type": "code", 327 | "execution_count": null, 328 | "metadata": {}, 329 | "outputs": [], 330 | "source": [ 331 | "temps = gfs_grb2.select(name='Temperature')" 332 | ] 333 | }, 334 | { 335 | "cell_type": "markdown", 336 | "metadata": {}, 337 | "source": [ 338 | "If you want to extract temperature at 85000 Pa, you can use the index (*315*) to pull that record:" 339 | ] 340 | }, 341 | { 342 | "cell_type": "code", 343 | "execution_count": null, 344 | "metadata": {}, 345 | "outputs": [], 346 | "source": [ 347 | "temp = gfs_grb2[315]" 348 | ] 349 | }, 350 | { 351 | "cell_type": "markdown", 352 | "metadata": {}, 353 | "source": [ 354 | "Then, using *.values* you can extract the data from the record:" 355 | ] 356 | }, 357 | { 358 | "cell_type": "code", 359 | "execution_count": null, 360 | "metadata": {}, 361 | "outputs": [], 362 | "source": [ 363 | "temp.values" 364 | ] 365 | }, 366 | { 367 | "cell_type": "markdown", 368 | "metadata": {}, 369 | "source": [ 370 | "You can also extract the grid information and other import metadata for this record. To see all available information, use the *.keys()* command:" 371 | ] 372 | }, 373 | { 374 | "cell_type": "code", 375 | "execution_count": null, 376 | "metadata": {}, 377 | "outputs": [], 378 | "source": [ 379 | "temp.keys()" 380 | ] 381 | }, 382 | { 383 | "cell_type": "markdown", 384 | "metadata": {}, 385 | "source": [ 386 | "The coordinates can be extracted using the *.latitude* and *.longitude*. You can additionally extract the level, units, and forecast time from the file:" 387 | ] 388 | }, 389 | { 390 | "cell_type": "code", 391 | "execution_count": null, 392 | "metadata": {}, 393 | "outputs": [], 394 | "source": [ 395 | "gfs_lat_all = temp.latitudes\n", 396 | "gfs_lon_all = temp.longitudes\n", 397 | "\n", 398 | "level = temp.level\n", 399 | "units = temp.units\n", 400 | "\n", 401 | "analysis_date = temp.analDate\n", 402 | "fcst_time = temp.forecastTime" 403 | ] 404 | }, 405 | { 406 | "cell_type": "markdown", 407 | "metadata": {}, 408 | "source": [ 409 | "Problem: The shape of the latitude is MUCH bigger than the temperature... why and what can we do about it?" 410 | ] 411 | }, 412 | { 413 | "cell_type": "code", 414 | "execution_count": null, 415 | "metadata": {}, 416 | "outputs": [], 417 | "source": [ 418 | "temp.values.shape, gfs_lat_all.shape, gfs_lon_all.shape" 419 | ] 420 | }, 421 | { 422 | "cell_type": "markdown", 423 | "metadata": {}, 424 | "source": [ 425 | "We can troubleshoot by printing the values. We can see that latitude repeats the values many times." 426 | ] 427 | }, 428 | { 429 | "cell_type": "code", 430 | "execution_count": null, 431 | "metadata": {}, 432 | "outputs": [], 433 | "source": [ 434 | "gfs_lat_all, gfs_lon_all" 435 | ] 436 | }, 437 | { 438 | "cell_type": "markdown", 439 | "metadata": {}, 440 | "source": [ 441 | "A simple way of fixing this is to use np.unique to remove any duplicating values:" 442 | ] 443 | }, 444 | { 445 | "cell_type": "code", 446 | "execution_count": null, 447 | "metadata": {}, 448 | "outputs": [], 449 | "source": [ 450 | "gfs_lat = np.unique(gfs_lat_all)\n", 451 | "gfs_lon = np.unique(gfs_lon_all)\n", 452 | "gfs_lat.shape, gfs_lon.shape" 453 | ] 454 | }, 455 | { 456 | "cell_type": "markdown", 457 | "metadata": {}, 458 | "source": [ 459 | "Now that we know how to import multidimensional data, you will make some plots in the next section." 460 | ] 461 | }, 462 | { 463 | "cell_type": "markdown", 464 | "metadata": {}, 465 | "source": [ 466 | "## Plotting 3-dimensional Data\n", 467 | "\n", 468 | "So far, we have only made line and scatter plots. Matplotlib also supports plotting spatial datasets. However, we often have to do perform several array operations to ensure the x, y, and z coordinates are the same shape. Let's work with a Sea Surface Temperature (SST) dataset in the next example and make a 3D plot." 469 | ] 470 | }, 471 | { 472 | "cell_type": "code", 473 | "execution_count": null, 474 | "metadata": {}, 475 | "outputs": [], 476 | "source": [ 477 | "import xarray as xr\n", 478 | "\n", 479 | "fname = 'data/sst.mon.ltm.1981-2010.nc'\n", 480 | "sst_file_id = xr.open_dataset(fname, engine='h5netcdf', decode_times=False)" 481 | ] 482 | }, 483 | { 484 | "cell_type": "markdown", 485 | "metadata": {}, 486 | "source": [ 487 | "Like before, you can inspect the contents by typing the variable name:" 488 | ] 489 | }, 490 | { 491 | "cell_type": "code", 492 | "execution_count": null, 493 | "metadata": {}, 494 | "outputs": [], 495 | "source": [ 496 | "sst_file_id" 497 | ] 498 | }, 499 | { 500 | "cell_type": "markdown", 501 | "metadata": {}, 502 | "source": [ 503 | "From the printed information above, we can see the following:\n", 504 | "\n", 505 | "* __Dimensions__: The dimensions are named lat, lon, and time, which each have the size of 89, 180, and 12.\n", 506 | "\n", 507 | "* __Coordinates__: Are also lat, lon, and time\n", 508 | "\n", 509 | "* __Variables__: Has three variables, climatology_bounds, sst, valid_yr_count." 510 | ] 511 | }, 512 | { 513 | "cell_type": "markdown", 514 | "metadata": {}, 515 | "source": [ 516 | "Let's import *sst* which is a 2-dimensional variable. You will also need lat and lon, which are both one dimensional:" 517 | ] 518 | }, 519 | { 520 | "cell_type": "code", 521 | "execution_count": null, 522 | "metadata": {}, 523 | "outputs": [], 524 | "source": [ 525 | "sst = sst_file_id[\"sst\"]\n", 526 | "sst_lat = sst_file_id[\"lat\"]\n", 527 | "sst_lon = sst_file_id[\"lon\"]" 528 | ] 529 | }, 530 | { 531 | "cell_type": "markdown", 532 | "metadata": {}, 533 | "source": [ 534 | "Let's inspect the shape and see if the data are already formatted for plotting:" 535 | ] 536 | }, 537 | { 538 | "cell_type": "code", 539 | "execution_count": null, 540 | "metadata": {}, 541 | "outputs": [], 542 | "source": [ 543 | "sst_lat.shape, sst_lon.shape, sst.shape" 544 | ] 545 | }, 546 | { 547 | "cell_type": "markdown", 548 | "metadata": {}, 549 | "source": [ 550 | "Contour plots and mesh plots are two useful ways of looking at 3-dimensional data. Both plots require the x, y, and z coordinates to have the same 2D grid. \n", 551 | "\n", 552 | "The shapes illustrate two problems:\n", 553 | "1. SST has a time dependency, we need to just pick one month from the record because we can't plot all 12 on a single graph\n", 554 | "2. sst is 2D while lat and lon are 1D. You can use *np.meshgrid()* to project the 1-dimensional x and y coordinates into two dimensions\n", 555 | "\n", 556 | "\n", 557 | "Problem #1 can be solved by using the xarray select (.sel) command to select just one month, I'll choose December. The index for December is 11 because python numbering starts at 0." 558 | ] 559 | }, 560 | { 561 | "cell_type": "code", 562 | "execution_count": null, 563 | "metadata": {}, 564 | "outputs": [], 565 | "source": [ 566 | "sst = sst_file_id[\"sst\"].isel(time=11)" 567 | ] 568 | }, 569 | { 570 | "cell_type": "markdown", 571 | "metadata": {}, 572 | "source": [ 573 | "The np.meshgrid function will help with problem #2 above. The function is a little confusing at first, so I'll show a simple example. Suppose you have to simple arrays:" 574 | ] 575 | }, 576 | { 577 | "cell_type": "code", 578 | "execution_count": null, 579 | "metadata": {}, 580 | "outputs": [], 581 | "source": [ 582 | "tmp_x = [1,2]\n", 583 | "tmp_y = [3,4,5]" 584 | ] 585 | }, 586 | { 587 | "cell_type": "markdown", 588 | "metadata": {}, 589 | "source": [ 590 | "*tmp_x* has two elements and *tmp_y* has three. If you create a mesh of the two variables, there will be two variables, both with 3 rows and 2 columns: " 591 | ] 592 | }, 593 | { 594 | "cell_type": "code", 595 | "execution_count": null, 596 | "metadata": {}, 597 | "outputs": [], 598 | "source": [ 599 | "np.meshgrid(tmp_x, tmp_y)" 600 | ] 601 | }, 602 | { 603 | "cell_type": "markdown", 604 | "metadata": {}, 605 | "source": [ 606 | "Returning to the example, below is the meshgrid of the 1-dimensional latitude and longitude coordinates:" 607 | ] 608 | }, 609 | { 610 | "cell_type": "code", 611 | "execution_count": null, 612 | "metadata": {}, 613 | "outputs": [], 614 | "source": [ 615 | "X_sst, Y_sst = np.meshgrid(sst_lon, sst_lat)" 616 | ] 617 | }, 618 | { 619 | "cell_type": "markdown", 620 | "metadata": {}, 621 | "source": [ 622 | "Before plotting, you need to check if all the dimensions match. However, after comparing the shape of co to X_co, you can see that the dimensions are flipped:" 623 | ] 624 | }, 625 | { 626 | "cell_type": "code", 627 | "execution_count": null, 628 | "metadata": {}, 629 | "outputs": [], 630 | "source": [ 631 | "sst.shape, X_sst.shape" 632 | ] 633 | }, 634 | { 635 | "cell_type": "markdown", 636 | "metadata": {}, 637 | "source": [ 638 | "We've already learned how to use *plt.subplot()* to generate the empty figure (*fig*) and axis (*ax*). \n", 639 | "\n", 640 | "One line 2, you call *ax.contourf* and input the X_co, Y_co, and transposed co variables. co acts as a color value, which becomes the third dimension of the plot. You then store this object into a variable *co_plot* so that you can pass it into *ax.colorbar* in order to map the colors to numeric values." 641 | ] 642 | }, 643 | { 644 | "cell_type": "code", 645 | "execution_count": null, 646 | "metadata": {}, 647 | "outputs": [], 648 | "source": [ 649 | "# contourf\n", 650 | "fig = plt.figure()\n", 651 | "ax = plt.subplot(111)\n", 652 | "sst_plot = ax.contourf(X_sst, Y_sst, sst)\n", 653 | "fig.colorbar(sst_plot, orientation='horizontal', ax=ax)\n", 654 | "plt.show()" 655 | ] 656 | }, 657 | { 658 | "cell_type": "markdown", 659 | "metadata": {}, 660 | "source": [ 661 | "Like contour plots, mesh plots are also 2-dimensional plots that display 3-dimensions of information using x, y, coordinates and z for a color scale. However, mesh plots do not perform any smoothing and display data as-is on a regular grid. However, since many satellite datasets are swath-based, irregularly spaced data needs to be re-gridded in order to display it as a mesh grid. In the code block below, let’s compare how the sst data looks using pcolormesh command with the previous example using contour. The code below has no other changes to the plot other than the call to the plot type." 662 | ] 663 | }, 664 | { 665 | "cell_type": "code", 666 | "execution_count": null, 667 | "metadata": {}, 668 | "outputs": [], 669 | "source": [ 670 | "#pcolormesh\n", 671 | "fig = plt.figure()\n", 672 | "ax = plt.subplot(111)\n", 673 | "sst_plot = ax.pcolormesh(X_sst, Y_sst, sst, shading='auto')\n", 674 | "fig.colorbar(sst_plot, orientation='horizontal')\n", 675 | "plt.show()" 676 | ] 677 | }, 678 | { 679 | "attachments": {}, 680 | "cell_type": "markdown", 681 | "metadata": {}, 682 | "source": [ 683 | "You might notice that there is more structure in the mesh plot than the filled contour. This is useful if you wish to examine fine structure and patterns.\n", 684 | "\n", 685 | "---\n", 686 | "**Exercise 2**: Plot 3-dimensional data\n", 687 | "\n", 688 | "Plot *AOD_lat*, *AOD_lon*, and *AOD_550* (which we imported from the \"JRR-AOD_v2r3_j01_...\" netCDF file as:\n", 689 | "\n", 690 | "1. Check the dimensions for all variables using *.shape*.\n", 691 | "2. Do you need to generate a meshgrid with *np.meshgrid()*?\n", 692 | "3. Create a contour plot\n", 693 | "\n", 694 | "---\n", 695 | "**Solution:**" 696 | ] 697 | }, 698 | { 699 | "cell_type": "code", 700 | "execution_count": null, 701 | "metadata": {}, 702 | "outputs": [], 703 | "source": [] 704 | }, 705 | { 706 | "attachments": {}, 707 | "cell_type": "markdown", 708 | "metadata": {}, 709 | "source": [ 710 | "## Adding maps to plots\n", 711 | "\n", 712 | "The package [Cartopy](https://scitools.org.uk/cartopy/docs/latest/) add mapping functionality to Matplotlib. Cartopy provides an interface to obtain continent, country, and feature details to overlay onto your plot. Furthermore, Cartopy also enables you to convert your data from one map projection to another, which requires a cartesian coordinate system to the map coordinates. Matplotlib natively supports the six mathematical and map projections (Aitoff, Hammer, Lambert, Mollweide, polar, and rectilinear) and combined with Cartopy, data can be transformed to a total of 33 possible projections." 713 | ] 714 | }, 715 | { 716 | "cell_type": "code", 717 | "execution_count": null, 718 | "metadata": {}, 719 | "outputs": [], 720 | "source": [ 721 | "from cartopy import crs as ccrs" 722 | ] 723 | }, 724 | { 725 | "cell_type": "markdown", 726 | "metadata": {}, 727 | "source": [ 728 | "Just like before, we need to convert the 1D lat and lon coordinates to 2D using meshgrid. We can check the shape to ensure all variables have the same dimensions." 729 | ] 730 | }, 731 | { 732 | "cell_type": "code", 733 | "execution_count": null, 734 | "metadata": {}, 735 | "outputs": [], 736 | "source": [ 737 | "gfs_temp = temp.values\n", 738 | "gfs_x, gfs_y = np.meshgrid(gfs_lon, gfs_lat)\n", 739 | "\n", 740 | "gfs_x.shape, gfs_y.shape, gfs_temp.shape" 741 | ] 742 | }, 743 | { 744 | "cell_type": "code", 745 | "execution_count": null, 746 | "metadata": {}, 747 | "outputs": [], 748 | "source": [ 749 | "fig = plt.figure(figsize=[10,5])\n", 750 | "ax = plt.subplot(projection=ccrs.PlateCarree())\n", 751 | "\n", 752 | "ax.pcolormesh(gfs_x, gfs_y, gfs_temp)\n", 753 | "\n", 754 | "ax.coastlines('50m')\n", 755 | "plt.show()" 756 | ] 757 | }, 758 | { 759 | "cell_type": "markdown", 760 | "metadata": {}, 761 | "source": [ 762 | "In the next example, you can switch from Plate Carrée to Orthographic. You must define the projection twice, once in the *projection=* keyword and again in the *transform=*. In the *plt.subplot* line, you must define the to coordinates (*ccrs.Orthographic*), which is how you want to axes to show the data. In the ax.scatter line, you use the transform keyword argument in scatter to define the from coordinates (Plate Carrée), which are the coordinates that the data formatted for." 763 | ] 764 | }, 765 | { 766 | "cell_type": "code", 767 | "execution_count": null, 768 | "metadata": {}, 769 | "outputs": [], 770 | "source": [ 771 | "fig = plt.figure(figsize=[10,5])\n", 772 | "ax = plt.subplot(projection=ccrs.Orthographic(90, 0))\n", 773 | "\n", 774 | "ax.pcolormesh(gfs_x, gfs_y, gfs_temp, transform=ccrs.PlateCarree())\n", 775 | "\n", 776 | "ax.coastlines('50m')\n", 777 | "plt.show()" 778 | ] 779 | }, 780 | { 781 | "cell_type": "markdown", 782 | "metadata": {}, 783 | "source": [ 784 | "---\n", 785 | "**Exercise 3** Adding maps to plots\n", 786 | "\n", 787 | "Using *sst_lat*, *AOD_lon*, and *AOD_550* (which we imported from the \"JRR-AOD_v2r3_j01_...\" netCDF file)\n", 788 | "\n", 789 | "1. Create a *pcolormesh* plot\n", 790 | "2. Add the coastlines to a standard Plate Caree plot using *projection=* option.\n", 791 | "\n", 792 | "---\n", 793 | "**Solution**:" 794 | ] 795 | }, 796 | { 797 | "cell_type": "code", 798 | "execution_count": null, 799 | "metadata": {}, 800 | "outputs": [], 801 | "source": [] 802 | }, 803 | { 804 | "cell_type": "markdown", 805 | "metadata": {}, 806 | "source": [ 807 | "## Summary:\n", 808 | "\n", 809 | "You learned:\n", 810 | "\n", 811 | "* How to import scientific data formats, like netCDF and GRIB2\n", 812 | "* Worked with arrays and lists\n", 813 | "* How to create a simple maps\n", 814 | "\n", 815 | "Next lesson:\n", 816 | "* Obtain datasets from remote sources\n", 817 | "* Save data into text and binary files, and plots as images" 818 | ] 819 | }, 820 | { 821 | "cell_type": "code", 822 | "execution_count": null, 823 | "metadata": {}, 824 | "outputs": [], 825 | "source": [] 826 | } 827 | ], 828 | "metadata": { 829 | "kernelspec": { 830 | "display_name": "Python 3.11.0 ('notebook_demo')", 831 | "language": "python", 832 | "name": "python3" 833 | }, 834 | "language_info": { 835 | "codemirror_mode": { 836 | "name": "ipython", 837 | "version": 3 838 | }, 839 | "file_extension": ".py", 840 | "mimetype": "text/x-python", 841 | "name": "python", 842 | "nbconvert_exporter": "python", 843 | "pygments_lexer": "ipython3", 844 | "version": "3.11.0 | packaged by conda-forge | (main, Oct 25 2022, 06:18:27) [GCC 10.4.0]" 845 | }, 846 | "vscode": { 847 | "interpreter": { 848 | "hash": "4589143d4cda0c8671911bd60c16dc1d10ec327722e7574bc882b745b51509b4" 849 | } 850 | } 851 | }, 852 | "nbformat": 4, 853 | "nbformat_minor": 2 854 | } 855 | -------------------------------------------------------------------------------- /03_Remote_Datasets_and_Exporting.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Lesson 3: Saving and Scripting\n", 8 | "Author: Rebekah Esmaili (rebekah.esmaili@gmail.com)\n", 9 | " \n", 10 | "---\n", 11 | "\n", 12 | "## Lesson Objectives\n", 13 | "* You will learn to:\n", 14 | " * Save data into text and binary files, and plots as images\n", 15 | " * Run python scripts" 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": null, 21 | "metadata": {}, 22 | "outputs": [], 23 | "source": [ 24 | "import pandas as pd\n", 25 | "import numpy as np\n", 26 | "import matplotlib.pyplot as plt\n", 27 | "import xarray as xr\n", 28 | "import cartopy.crs as ccrs" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "## Data Sources\n", 36 | "\n", 37 | "### Data Archives\n", 38 | "\n", 39 | "Most Earth-science agencies openly and freely disseminate satellite, model, and in situ observations. Typically, data are stored in online data repositories and data holdings can be queried and ordered. Some examples are shown below.\n", 40 | "\n", 41 | "* [NASA](https://www.earthdata.nasa.gov/learn/find-data)\n", 42 | "* [NOAA](https://www.avl.class.noaa.gov/saa/products/welcome)\n", 43 | "* [ECMWF](https://www.ecmwf.int/en/forecasts/datasets)\n", 44 | "* [ESA](https://earth.esa.int/eogateway/catalog)\n", 45 | "\n", 46 | "### OPeNDAP/THREDDS catalogs\n", 47 | "\n", 48 | "Increasingly agencies are offering data catalogs that allow users to directly read from the archive without ordered. Open-source Project for a [Network Data Access Protocol (OPeNDAP)](https://www.opendap.org/) is an example of a software tool that simplifies access. Instead of downloading and reading data into Python, we can access it directly using a URL. Xarray supports OPeNDAP.\n", 49 | "\n", 50 | "Here are a couple of useful data OPeNDAP catalogs:\n", 51 | "\n", 52 | "* [NOAA/Physical Sciences Lab](https://psl.noaa.gov/data/) gridded climate datasets extending hundreds of years to real-time wind profiler data at a single location. The data or products derived from this data, organized by type, are available to scientists and the general public at the links below.\n", 53 | "* [NASA EOSDIS OpeNDAP Servers](https://www.earthdata.nasa.gov/engage/open-data-services-and-software/api/opendap/opendap-servers) servers across the Earth Observing System Data and Information System (EOSDIS). Contains many NASA satellite and reanalysis datasets.\n", 54 | "\n", 55 | "The example below shows Level 3 Monthly Average Chlorophyll Concentration from the Ocean Biology Processing Group (NASA/GSFC/OBPG). As you can see, we are passing a url and not a file location into xarray. One caveat if the engine **must be netcdf4** because the h5netcdf engine presently doesn't support opendap." 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": null, 61 | "metadata": {}, 62 | "outputs": [], 63 | "source": [ 64 | "url = 'http://oceandata.sci.gsfc.nasa.gov/opendap/Merged_ATV/L3SMI/2022/001/X20220012022031.L3m_MO_CHL_chlor_a_4km.nc'\n", 65 | "sst_opendap = xr.open_dataset(url, engine='netcdf4')\n", 66 | "sst_opendap" 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": null, 72 | "metadata": {}, 73 | "outputs": [], 74 | "source": [ 75 | "chlor_a = sst_opendap.chlor_a[::15, ::15]\n", 76 | "chlor_a" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": null, 82 | "metadata": {}, 83 | "outputs": [], 84 | "source": [ 85 | "map_proj = ccrs.PlateCarree()\n", 86 | "\n", 87 | "fig = plt.figure(figsize=[10,5])\n", 88 | "ax = plt.subplot(projection=map_proj)\n", 89 | "chlor_a.plot(vmin=0, vmax=.8, transform=map_proj, cmap='rainbow')\n", 90 | "ax.coastlines(\"10m\", color=\"k\")\n", 91 | "\n", 92 | "plt.show()\n" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": {}, 98 | "source": [ 99 | "\n", 100 | "### Accessing data from the cloud\n", 101 | "\n", 102 | "Many major cloud computing companies, such as Google, Microsoft, and Amazon disseminate Earth science datasets. For instance, AWS supports an [open data program](https://aws.amazon.com/opendata/) that makes many climate datasets publicly available.\n", 103 | "\n", 104 | "The package s3fs is file interface for Amazon S3 (Simple Storage Service) buckets, so you can browse and search for data. NOAA's Open Data Dissemination (NODD) program is increasing access to satellite data, including GOES and JPSS. In this example, we'll look at GOES-16 data." 105 | ] 106 | }, 107 | { 108 | "cell_type": "code", 109 | "execution_count": null, 110 | "metadata": {}, 111 | "outputs": [], 112 | "source": [ 113 | "import s3fs" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": null, 119 | "metadata": {}, 120 | "outputs": [], 121 | "source": [ 122 | "fs = s3fs.S3FileSystem(anon=True)\n", 123 | "url = 'noaa-goes16/ABI-L2-RSRF/2022/285/18/OR_ABI-L2-RSRF-M6_G16_s20222851800206_e20222851809514_c20222851901446.nc'\n", 124 | "remote_obj = fs.open(url, mode='rb')" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": null, 130 | "metadata": {}, 131 | "outputs": [], 132 | "source": [ 133 | "# Open\n", 134 | "g16 = xr.open_dataset(remote_obj, engine='h5netcdf')" 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": null, 140 | "metadata": {}, 141 | "outputs": [], 142 | "source": [ 143 | "data_proj = ccrs.PlateCarree()\n", 144 | "map_proj = ccrs.Orthographic(central_longitude=-72.3)\n", 145 | "\n", 146 | "fig = plt.figure()\n", 147 | "ax = plt.subplot(projection=map_proj)\n", 148 | "g16.RSR.plot(transform=data_proj, vmin=0, vmax=600)\n", 149 | "ax.coastlines(\"10m\", color=\"k\")\n", 150 | "\n", 151 | "plt.show()\n" 152 | ] 153 | }, 154 | { 155 | "cell_type": "markdown", 156 | "metadata": {}, 157 | "source": [ 158 | "## Exporting data and figures\n" 159 | ] 160 | }, 161 | { 162 | "cell_type": "markdown", 163 | "metadata": {}, 164 | "source": [ 165 | "### Saving figures\n", 166 | "\n", 167 | "Normally, we end out plots with *plt.show()* to display them inline. Instead, use *plt.savefig()*. The second argument (*bbox_inches*) refers to the whitespace around the plot, it is optional. " 168 | ] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": null, 173 | "metadata": {}, 174 | "outputs": [], 175 | "source": [ 176 | "data_proj = ccrs.PlateCarree()\n", 177 | "map_proj = ccrs.Orthographic(central_longitude=-72.3)\n", 178 | "\n", 179 | "fig = plt.figure()\n", 180 | "ax = plt.subplot(projection=map_proj)\n", 181 | "g16.RSR.plot(transform=data_proj, vmin=0, vmax=600)\n", 182 | "ax.coastlines(\"10m\", color=\"k\")\n", 183 | "\n", 184 | "# The only difference is this...\n", 185 | "plt.savefig('SRB.png')\n", 186 | "plt.close()\n" 187 | ] 188 | }, 189 | { 190 | "cell_type": "markdown", 191 | "metadata": {}, 192 | "source": [ 193 | "### Saving csv files\n", 194 | "\n", 195 | "The Pandas *to_csv* is convenient for quickly saving files. The option *index=False* suppress the indices of the DataFrame (which are printed to the left of the DataFrame) from being printed to file." 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": null, 201 | "metadata": {}, 202 | "outputs": [], 203 | "source": [ 204 | "name = ['GOES-16', 'IceSat-2', 'Himawari']\n", 205 | "agency = ['NOAA', 'NASA', 'JAXA']\n", 206 | "orbit = ['GEO', 'LEO', 'GEO']\n", 207 | "\n", 208 | "df = pd.DataFrame({'name': name,\n", 209 | " 'agency': agency,\n", 210 | " 'orbit': orbit})\n", 211 | "\n", 212 | "df.to_csv('satellites.csv', index=False)" 213 | ] 214 | }, 215 | { 216 | "cell_type": "markdown", 217 | "metadata": {}, 218 | "source": [ 219 | "### Saving as a binary file\n", 220 | "\n", 221 | "NumPy binary files (.npz) are geared towards arrays, which can be multi-dimensional. These are useful for quickly storing large datasets." 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": null, 227 | "metadata": {}, 228 | "outputs": [], 229 | "source": [ 230 | "np.savez('satnames', name=name, agency=agency, orbit=orbit)" 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "execution_count": null, 236 | "metadata": {}, 237 | "outputs": [], 238 | "source": [ 239 | "npzfile = np.load('satnames.npz')\n", 240 | "npzfile.files\n", 241 | "npzfile.close()" 242 | ] 243 | }, 244 | { 245 | "attachments": {}, 246 | "cell_type": "markdown", 247 | "metadata": {}, 248 | "source": [ 249 | "## Scripting with Python\n", 250 | "\n", 251 | "### Running Python scripts from the command line\n", 252 | "\n", 253 | "Now you are finished editing the code and you probably want to run it. There are two ways you can run Python scripts:\n", 254 | "\n", 255 | "1. Using the command line interpreter\n", 256 | "2. Using iPython\n", 257 | "\n", 258 | "iPython is an interactive command line that allows you to run code in chunks. In fact, Jupyter Notebook is built using iPython, which explains the similarity in behavior.\n", 259 | " \n", 260 | "* Windows: I suggest using the Anaconda Prompt which you can access from the start menu or using Anaconda Navigator. \n", 261 | "* MacOs/Linux: open the Terminal app. \n", 262 | "\n", 263 | "Once the command line is open, you start in a default location. For example, if you are using Windows and launch the Anaconda Prompt you will see:\n", 264 | "\n", 265 | "```\n", 266 | "(base) C:\\Users\\>\n", 267 | "```\n", 268 | "\n", 269 | "Now, navigate to where our script is. To do this, you will change directories using the cd command. For example, if your code is stored in C:\\Documents\\Python, you can type:\n", 270 | "\n", 271 | "```\n", 272 | "\n", 273 | "cd C:\\Documents\\Python\n", 274 | "```\n", 275 | "\n", 276 | "The command line will now be updated showing:\n", 277 | "\n", 278 | "```\n", 279 | "(base) C:\\Documents\\Python>\n", 280 | "```\n", 281 | "\n", 282 | "Now that you are in the right place, you can call the Python interpreter, which to convert your code into a format that your computer can understand and executes the command. If you installed Anaconda, this includes a Python 3 interpreter (*python3*). So, to run the script, type:\n", 283 | "\n", 284 | "```\n", 285 | "python3 hello_world.py\n", 286 | "```\n", 287 | "\n", 288 | "If successful, “Hello Earth” should print to your screen.\n", 289 | "\n", 290 | "A second method is to use iPython, which allows you to open Python in interactive mode. Unlike the command line method, iPython will let you run code line-by-line. So, like Jupyter Notebook, you have the option to copy and paste you code from the text editor in chunks into the iPython window. You can also call the entire script inside iPython. This is done by starting iPython and using the command %run \\[script name\\].py. Below is a capture from my terminal:\n", 291 | "\n", 292 | "```\n", 293 | "Python 3.7.6 (default, Jan 8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)]\n", 294 | "Type 'copyright', 'credits' or 'license' for more information\n", 295 | "IPython 7.12.0 -- An enhanced Interactive Python. Type '?' for help.\n", 296 | "\n", 297 | "In [1]: %run script_example.ipynb\n", 298 | "Hello Earth\n", 299 | "```\n", 300 | "\n", 301 | "One advantage of using iPython is that after the script finishes running, variables that were generated in the script are still in memory. Then, you can print or operate on the variables to either debug or to develop your code further. \n", 302 | "\n", 303 | "You may have noted two differences in workflow for write code in scripts versus notebooks, (1) that code cannot be inline and (2) the program must run fully to the end.\n", 304 | "\n", 305 | "\n", 306 | "### Handling output when scripting\n", 307 | "\n", 308 | "In the previous example, you printed text to the screen but Python’s capable of saving figures and data. To save plots, replace *plt.show()* with the *plt.savefig()* command.\n", 309 | "\n", 310 | "It is possible to directly display your graphics using the X11 protocol (by default in Linux) with XQuartz (Mac) or PuTTy (Windows). \n", 311 | "\n", 312 | "I typically discourage this because satellite imagery tends to be very large and thus slow to display remotely. From my experience, it is usually faster to write an image to a file and then view the plot after it is fully rendered.\n", 313 | "\n", 314 | "## Summary:\n", 315 | "\n", 316 | "You learned:\n", 317 | "\n", 318 | "* Aceess data from data archives, OPeNDAP, and cloud storage\n", 319 | "* How to save data and graphics\n", 320 | "* How to run scripts\n", 321 | "\n", 322 | "## Conclusion\n", 323 | "\n", 324 | "I hope you feel empowered find relevant satellite data for your project are equipped with the tools to visualize it. Practice regularly (daily!) to improve your skills. Here are some ways you can continue your journey:\n", 325 | "\n", 326 | "* Downlaod data. You can access data from ESA (https://earth.esa.int/eogateway/), NOAA’s threads data server: https://www.ncei.noaa.gov/thredds/catalog.html, or NASA's [Earthdata](https://earthdata.nasa.gov/) portals.\n", 327 | "* Read. \n", 328 | " * [Project Pythia Foundations](https://foundations.projectpythia.org/landing-page.html) a free and excellent coverage of Python basics for Earth Science. Covers the topics presented in more detail.\n", 329 | " * [Python for Data Science](https://jakevdp.github.io/PythonDataScienceHandbook/) (free) a more general book on Python programming.\n", 330 | " * [Research Software Engineering with Python](https://merely-useful.tech/py-rse/) Free eBook to enhance your workflow\n", 331 | " * Python Programming and Visualization for Scientists by Alex DeCaria (not free)\n", 332 | " * Python Machine Learning by Wei-Meng Lee (not free)\n", 333 | " * [Earth Observation Using Python](https://www.wiley.com/en-us/Earth+Observation+using+Python%3A+A+Practical+Programming+Guide-p-9781119606888) by Rebekah Esmaili (not free)\n", 334 | "* Watch.\n", 335 | " * [CS Dojo](https://www.youtube.com/channel/UCxX9wt5FWQUAAz4UrysqK9A) on YouTube has a lot of short, fun Python tutorials.\n", 336 | " * [Coursera](https://www.coursera.org/learn/interactive-python-1?specialization=computer-fundamentals) has some fundamental interactive Python courses if you want more structure.\n", 337 | " * [Python for Climate and Meteorology](https://www.youtube.com/watch?v=uQZAEPnUZ5o) Another focused Python workshop taught at AMS, a little more advanced.\n", 338 | "* Connect with an online community, such as Pangeo (https://discourse.pangeo.io/)" 339 | ] 340 | }, 341 | { 342 | "cell_type": "code", 343 | "execution_count": null, 344 | "metadata": {}, 345 | "outputs": [], 346 | "source": [] 347 | } 348 | ], 349 | "metadata": { 350 | "kernelspec": { 351 | "display_name": "Python 3.11.0 ('notebook_demo')", 352 | "language": "python", 353 | "name": "python3" 354 | }, 355 | "language_info": { 356 | "codemirror_mode": { 357 | "name": "ipython", 358 | "version": 3 359 | }, 360 | "file_extension": ".py", 361 | "mimetype": "text/x-python", 362 | "name": "python", 363 | "nbconvert_exporter": "python", 364 | "pygments_lexer": "ipython3", 365 | "version": "3.11.0" 366 | }, 367 | "vscode": { 368 | "interpreter": { 369 | "hash": "4589143d4cda0c8671911bd60c16dc1d10ec327722e7574bc882b745b51509b4" 370 | } 371 | } 372 | }, 373 | "nbformat": 4, 374 | "nbformat_minor": 4 375 | } 376 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2022 Rebekah Esmaili 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Python for Earth Sciences 2 | 3 | Instructor: [Rebekah Esmaili](http://www.rebekahesmaili.com), PhD 4 | 5 | --- 6 | 7 | A crash course in Python focusing on reading and visualizing data-sets used in Earth sciences. 8 | 9 | This code is interactive! Click here: [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/modern-tools-workshop/AGU-python-workshop-2022/HEAD) 10 | 11 | --- 12 | 13 | ## Getting Started 14 | 15 | This workshop will cover: 16 | 17 | * Launching Jupyter Notebooks 18 | * Working with arrays using the Numpy package 19 | * Importing text datasets using the Pandas package 20 | * Creating simple graphics with Matplotlib 21 | * Importing scientific data formats, such as netCDF and GRIB2 22 | * Creating maps from datasets 23 | 24 | --- 25 | 26 | ### Installation Requirements 27 | 28 | "I am really new to Python!" 29 | 30 | * I recommend launching binder, which is a "cloud version" of this course. No installation required! [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/modern-tools-workshop/AGU-python-workshop-2022/HEAD) 31 | * Need help with Binder? Video tutorial on [YouTube](https://youtu.be/3BrfFe4HsAw). 32 | 33 | "I have used Python before!" 34 | 35 | * If you wish to run the examples locally, I recommend installing [Anaconda](https://www.anaconda.com/products/individual). If you are having trouble with your installation, contact the instructor before the course or use binder. 36 | * Need help installing Anaconda? Video tutorial on [YouTube](https://youtu.be/zxSQCXXvOIM). 37 | * Download the contents of this GitHub repository to your computer. 38 | * Launch Jupyter Notebooks from the Anaconda Navigator. This will open a window in your default browser. Navigate to the folder that contains the notebooks (*.ipynb) and click on the tutorial for the day. 39 | * New to Jupyter? Here's a video tutorial on [YouTube](https://youtu.be/gmMCuR9JPpY). 40 | * Additional packages: 41 | * Launch the Anaconda Prompt (Windows) or Terminal (MacOS/Linux). 42 | * Use the environments.yml to install the necessary packages. You can do this in the terminal using: 43 | 44 | ```bash 45 | conda env create -f environment.yml 46 | ``` 47 | 48 | * Then, switch to the new environment once the installation is complete: 49 | 50 | ```bash 51 | conda activate python-workshop 52 | ``` 53 | * Note: The default environment is called 'base.' If you close the terminal, you will have to switch back to the environment using the above command again. 54 | 55 | I *do not* recommend: 56 | * Using Python on a remote server for this tutorial (I cannot help troubleshoot) 57 | * Using your operating system's Python or a shared Python installations unless you are advanced! 58 | 59 | --- 60 | ## Course Philosophy 61 | 62 | * Increase accessibility of satellite data and analysis 63 | * Teach Python using practical examples and real-world datasets 64 | * Promote reproducible and transparent scientific research 65 | 66 | ## Resources 67 | 68 | ### Packages and Tutorials 69 | 70 | Pandas 71 | * Short Introduction: https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html 72 | * Cookbook for more details: https://pandas.pydata.org/pandas-docs/stable/user_guide/cookbook.html#cookbook 73 | 74 | --- 75 | Matplotlib 76 | * Pyplot Tutorial: https://matplotlib.org/3.1.1/tutorials/introductory/pyplot.html 77 | 78 | --- 79 | Import/Export 80 | 81 | NetCDF files 82 | * Detailed tutorial on the netCDF4 package: https://unidata.github.io/netcdf4-python. 83 | * Xarray tutorial: https://xarray-contrib.github.io/xarray-tutorial/ 84 | 85 | HDF files 86 | * The package [h5py](https://www.h5py.org/) is similar to netcdf4. 87 | * User manual at http://docs.h5py.org/en/stable/. 88 | * Xarray can also open HDF files! 89 | 90 | GRIB/GRIB2 files 91 | * World Meteorology Association standard format, e.g. commonly used with weather-related models like ECMWF and GFS. 92 | * Can be opened using [pygrib](https://github.com/jswhit/pygrib). 93 | * Example usage at https://jswhit.github.io/pygrib/docs/. 94 | 95 | BUFR files 96 | * Another common table-driven format. 97 | * Open with [python-bufr](https://github.com/pytroll/python-bufr), part of the pytroll project. 98 | --- 99 | 100 | ### General Python Resources 101 | 102 | Beginner Tutorials 103 | * Youtube series for absolute beginners [CS Dojo](https://www.youtube.com/watch?v=Z1Yd7upQsXY&list=PLBZBJbE_rGRWeh5mIBhD-hhDwSEDxogDg) 104 | * [Research Software Engineering with Python](https://merely-useful.tech/py-rse/) Free eBook to enhance your workflow. 105 | 106 | Intermediate Tutorials 107 | * [Project Pythia Foundations Online Textbook](https://foundations.projectpythia.org/landing-page.html) A community learning resource for Python-based computing in the geosciences. Highly recommended! 108 | * [Python for Climate and Meteorology](https://www.youtube.com/watch?v=uQZAEPnUZ5o) Another tutorial taught at AMS, a little more advanced. 109 | * Learn more about [Python for Atmosphere and Ocean Scientists](https://carpentries-lab.github.io/python-aos-lesson/) using Software Carpentry lesson plans. 110 | * [Earth Observation using Python](https://www.wiley.com/en-us/Earth+Observation+using+Python%3A+A+Practical+Programming+Guide-p-9781119606888) is a book I wrote that builds on the content of the workshop. 111 | 112 | ## Granting Permission for Reuse 113 | These course materials are designed for use as part of the AGU Scientific Workshop and are the intellectual property of the instructor. However, I encourage others to reuse or adapt the material for their research communities or use cases. If you do not significantly modify the original material, please send me an email letting me know and acknowledge me in your content. -------------------------------------------------------------------------------- /Solutions_to_Exercises.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Solutions to Exercises\n", 8 | "Author: Rebekah Esmaili (rebekah.esmaili@gmail.com)\n", 9 | " \n", 10 | "---" 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 1, 16 | "metadata": {}, 17 | "outputs": [], 18 | "source": [ 19 | "import numpy as np\n", 20 | "import pandas as pd\n", 21 | "import matplotlib.pyplot as plt\n", 22 | "import xarray as xr\n", 23 | "from cartopy import crs as ccrs\n", 24 | "import scipy.interpolate\n", 25 | "import s3fs" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "# 1. Basic Data Analysis and Visualization\n", 33 | "\n", 34 | "**Exercise 1:** Learning to use notebooks\n", 35 | "\n", 36 | "1. Launch Jupyter Notebook and create a new notebook\n", 37 | "2. Rename the notebook\n", 38 | "3. Create a new cell and use *type()* to see if the following are floats and integers:\n", 39 | " * 2+2\n", 40 | " * 2\\*2.0\n", 41 | " * var_float/var_int\n", 42 | "---\n", 43 | "**Solution:**" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": 2, 49 | "metadata": {}, 50 | "outputs": [ 51 | { 52 | "data": { 53 | "text/plain": [ 54 | "(int, float, float)" 55 | ] 56 | }, 57 | "execution_count": 2, 58 | "metadata": {}, 59 | "output_type": "execute_result" 60 | } 61 | ], 62 | "source": [ 63 | "var_int = 8\n", 64 | "var_float = 15.0\n", 65 | "\n", 66 | "type(2+2), type(2*2.0), type(var_float/var_int)" 67 | ] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": {}, 72 | "source": [ 73 | "**Exercise 2:** Import an ascii file\n", 74 | "\n", 75 | "1. Import the dataset \"20200901_20200930_Monterey.lev15.csv\" and save it to a variable called *aeronet*.\n", 76 | "2. Print the column names\n", 77 | "3. Find a column that doesn't have only missing values (-999) and (challenge!) calculate the mean using the following syntax *variable\\[\"column\"\\].mean()*\n", 78 | "---\n", 79 | "**Solution:**" 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "execution_count": 3, 85 | "metadata": {}, 86 | "outputs": [], 87 | "source": [ 88 | "fname = \"data/20200901_20200930_Monterey.lev15.csv\"\n", 89 | "aeronet = pd.read_csv(fname, sep=',\\s*', engine='python')" 90 | ] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "execution_count": 4, 95 | "metadata": {}, 96 | "outputs": [ 97 | { 98 | "data": { 99 | "text/html": [ 100 | "
\n", 101 | "\n", 114 | "\n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | "
Date(dd:mm:yyyy)Time(hh:mm:ss)Day_of_YearDay_of_Year(Fraction)AOD_1640nmAOD_1020nmAOD_870nmAOD_865nmAOD_779nmAOD_675nm...Exact_Wavelengths_of_AOD(um)_380nmExact_Wavelengths_of_AOD(um)_340nmExact_Wavelengths_of_PW(um)_935nmExact_Wavelengths_of_AOD(um)_681nmExact_Wavelengths_of_AOD(um)_709nmExact_Wavelengths_of_AOD(um)_EmptyExact_Wavelengths_of_AOD(um)_Empty.1Exact_Wavelengths_of_AOD(um)_Empty.2Exact_Wavelengths_of_AOD(um)_Empty.3Exact_Wavelengths_of_AOD(um)_Empty.4
00.07129620:53:18245245.8703470.0611690.1670120.238173-999-9990.400838...0.37940.34090.9365-999-999-999-999-999-999-999
10.07129620:58:18245245.8738190.0611550.1684170.239952-999-9990.404648...0.37940.34090.9365-999-999-999-999-999-999-999
20.07129621:03:18245245.8772920.0631350.1731430.246827-999-9990.414668...0.37940.34090.9365-999-999-999-999-999-999-999
30.07129621:08:18245245.8807640.0617540.1705410.241485-999-9990.405998...0.37940.34090.9365-999-999-999-999-999-999-999
40.07129621:18:18245245.8877080.0590590.1639190.232041-999-9990.391191...0.37940.34090.9365-999-999-999-999-999-999-999
\n", 264 | "

5 rows × 113 columns

\n", 265 | "
" 266 | ], 267 | "text/plain": [ 268 | " Date(dd:mm:yyyy) Time(hh:mm:ss) Day_of_Year Day_of_Year(Fraction) \\\n", 269 | "0 0.071296 20:53:18 245 245.870347 \n", 270 | "1 0.071296 20:58:18 245 245.873819 \n", 271 | "2 0.071296 21:03:18 245 245.877292 \n", 272 | "3 0.071296 21:08:18 245 245.880764 \n", 273 | "4 0.071296 21:18:18 245 245.887708 \n", 274 | "\n", 275 | " AOD_1640nm AOD_1020nm AOD_870nm AOD_865nm AOD_779nm AOD_675nm ... \\\n", 276 | "0 0.061169 0.167012 0.238173 -999 -999 0.400838 ... \n", 277 | "1 0.061155 0.168417 0.239952 -999 -999 0.404648 ... \n", 278 | "2 0.063135 0.173143 0.246827 -999 -999 0.414668 ... \n", 279 | "3 0.061754 0.170541 0.241485 -999 -999 0.405998 ... \n", 280 | "4 0.059059 0.163919 0.232041 -999 -999 0.391191 ... \n", 281 | "\n", 282 | " Exact_Wavelengths_of_AOD(um)_380nm Exact_Wavelengths_of_AOD(um)_340nm \\\n", 283 | "0 0.3794 0.3409 \n", 284 | "1 0.3794 0.3409 \n", 285 | "2 0.3794 0.3409 \n", 286 | "3 0.3794 0.3409 \n", 287 | "4 0.3794 0.3409 \n", 288 | "\n", 289 | " Exact_Wavelengths_of_PW(um)_935nm Exact_Wavelengths_of_AOD(um)_681nm \\\n", 290 | "0 0.9365 -999 \n", 291 | "1 0.9365 -999 \n", 292 | "2 0.9365 -999 \n", 293 | "3 0.9365 -999 \n", 294 | "4 0.9365 -999 \n", 295 | "\n", 296 | " Exact_Wavelengths_of_AOD(um)_709nm Exact_Wavelengths_of_AOD(um)_Empty \\\n", 297 | "0 -999 -999 \n", 298 | "1 -999 -999 \n", 299 | "2 -999 -999 \n", 300 | "3 -999 -999 \n", 301 | "4 -999 -999 \n", 302 | "\n", 303 | " Exact_Wavelengths_of_AOD(um)_Empty.1 Exact_Wavelengths_of_AOD(um)_Empty.2 \\\n", 304 | "0 -999 -999 \n", 305 | "1 -999 -999 \n", 306 | "2 -999 -999 \n", 307 | "3 -999 -999 \n", 308 | "4 -999 -999 \n", 309 | "\n", 310 | " Exact_Wavelengths_of_AOD(um)_Empty.3 Exact_Wavelengths_of_AOD(um)_Empty.4 \n", 311 | "0 -999 -999 \n", 312 | "1 -999 -999 \n", 313 | "2 -999 -999 \n", 314 | "3 -999 -999 \n", 315 | "4 -999 -999 \n", 316 | "\n", 317 | "[5 rows x 113 columns]" 318 | ] 319 | }, 320 | "execution_count": 4, 321 | "metadata": {}, 322 | "output_type": "execute_result" 323 | } 324 | ], 325 | "source": [ 326 | "aeronet.head()" 327 | ] 328 | }, 329 | { 330 | "cell_type": "code", 331 | "execution_count": 5, 332 | "metadata": {}, 333 | "outputs": [ 334 | { 335 | "data": { 336 | "text/plain": [ 337 | "-1.7637041143410852" 338 | ] 339 | }, 340 | "execution_count": 5, 341 | "metadata": {}, 342 | "output_type": "execute_result" 343 | } 344 | ], 345 | "source": [ 346 | "aeronet[\"AOD_1640nm\"].mean()" 347 | ] 348 | }, 349 | { 350 | "cell_type": "markdown", 351 | "metadata": {}, 352 | "source": [ 353 | "---\n", 354 | "**Exercise 3:** Filtering data\n", 355 | "\n", 356 | "Using the dataset imported in the previous example (*aeronet*):\n", 357 | " \n", 358 | "1. Create a mask that filters the \"AOD_870nm\" column to only include values that are above 0.\n", 359 | "2. Create a new variables, *day_of_year*, with the mask applied to aeronet\\[\"Day_of_Year(Fraction)\"\\].\n", 360 | "3. Create a new variables, *aod_870*, with the mask applied to aeronet\\[\"AOD_870nm\"\\].\n", 361 | "4. Compare the mean value of *aeronet\\[\"AOD_870nm\"\\]* to *aod_870*.\n", 362 | " \n", 363 | "---\n", 364 | "**Solution**" 365 | ] 366 | }, 367 | { 368 | "cell_type": "code", 369 | "execution_count": 6, 370 | "metadata": {}, 371 | "outputs": [], 372 | "source": [ 373 | "mask_aod = (aeronet[\"AOD_870nm\"] > 0 )" 374 | ] 375 | }, 376 | { 377 | "cell_type": "code", 378 | "execution_count": 7, 379 | "metadata": {}, 380 | "outputs": [], 381 | "source": [ 382 | "day_of_year = aeronet[\"Day_of_Year(Fraction)\"][mask_aod]\n", 383 | "aod_870 = aeronet[\"AOD_870nm\"][mask_aod]" 384 | ] 385 | }, 386 | { 387 | "cell_type": "code", 388 | "execution_count": 8, 389 | "metadata": {}, 390 | "outputs": [ 391 | { 392 | "data": { 393 | "text/plain": [ 394 | "(-0.3344563769379846, 0.6341813957322987)" 395 | ] 396 | }, 397 | "execution_count": 8, 398 | "metadata": {}, 399 | "output_type": "execute_result" 400 | } 401 | ], 402 | "source": [ 403 | "aeronet[\"AOD_870nm\"].mean(), aod_870.mean()" 404 | ] 405 | }, 406 | { 407 | "cell_type": "markdown", 408 | "metadata": {}, 409 | "source": [ 410 | "---\n", 411 | "**Exercise 4:** Create a scatterplot\n", 412 | "\n", 413 | "Use the variables *aod_870* and *day_of_year* that you made in Exercise 3 to:\n", 414 | "\n", 415 | "1. Create a scatter plot showing the *day_of_year* (x-axis) and *aod_870* (y-axis)\n", 416 | "2. Add y-axis and x-axis labels using *.set_xlabel()* and *.set_ylabel()*\n", 417 | "3. Adjust the color and size of the scatterplot\n", 418 | "\n", 419 | "---\n", 420 | "**Solution**" 421 | ] 422 | }, 423 | { 424 | "cell_type": "code", 425 | "execution_count": 9, 426 | "metadata": {}, 427 | "outputs": [ 428 | { 429 | "data": { 430 | "image/png": "", 431 | "text/plain": [ 432 | "
" 433 | ] 434 | }, 435 | "metadata": {}, 436 | "output_type": "display_data" 437 | } 438 | ], 439 | "source": [ 440 | "fig = plt.figure() \n", 441 | "ax = plt.subplot()\n", 442 | "ax.scatter(day_of_year, aod_870, s=0.5)\n", 443 | "ax.set_xlabel('Day of Year')\n", 444 | "ax.set_ylabel('AOD')\n", 445 | "plt.show()" 446 | ] 447 | }, 448 | { 449 | "cell_type": "markdown", 450 | "metadata": {}, 451 | "source": [ 452 | "# 2. Scientific Data Formats and Advanced Plotting\n", 453 | "\n", 454 | "---\n", 455 | "**Exercise 1**: Importing netCDF files\n", 456 | "1. Open the file \"MOP03JM-201811-L3V95.6.3_thinned.nc\" using the xarray library\n", 457 | "2. Print the variable names\n", 458 | "3. What are the dimensions?\n", 459 | "---\n", 460 | "\n", 461 | "**Solution:**" 462 | ] 463 | }, 464 | { 465 | "cell_type": "code", 466 | "execution_count": 10, 467 | "metadata": {}, 468 | "outputs": [], 469 | "source": [ 470 | "fname = 'data/MOP03JM-201811-L3V95.6.3_thinned.nc' \n", 471 | "mop_file_id = xr.open_dataset(fname, engine='h5netcdf')" 472 | ] 473 | }, 474 | { 475 | "cell_type": "code", 476 | "execution_count": 11, 477 | "metadata": {}, 478 | "outputs": [ 479 | { 480 | "data": { 481 | "text/html": [ 482 | "
\n", 483 | "\n", 484 | "\n", 485 | "\n", 486 | "\n", 487 | "\n", 488 | "\n", 489 | "\n", 490 | "\n", 491 | "\n", 492 | "\n", 493 | "\n", 494 | "\n", 495 | "\n", 496 | "\n", 497 | "
<xarray.Dataset>\n",
 846 |        "Dimensions:                    (YDim: 180, XDim: 360)\n",
 847 |        "Dimensions without coordinates: YDim, XDim\n",
 848 |        "Data variables:\n",
 849 |        "    Latitude                   (YDim) float32 ...\n",
 850 |        "    Longitude                  (XDim) float32 ...\n",
 851 |        "    RetrievedCOTotalColumnDay  (XDim, YDim) float32 ...
" 852 | ], 853 | "text/plain": [ 854 | "\n", 855 | "Dimensions: (YDim: 180, XDim: 360)\n", 856 | "Dimensions without coordinates: YDim, XDim\n", 857 | "Data variables:\n", 858 | " Latitude (YDim) float32 ...\n", 859 | " Longitude (XDim) float32 ...\n", 860 | " RetrievedCOTotalColumnDay (XDim, YDim) float32 ..." 861 | ] 862 | }, 863 | "execution_count": 11, 864 | "metadata": {}, 865 | "output_type": "execute_result" 866 | } 867 | ], 868 | "source": [ 869 | "mop_file_id" 870 | ] 871 | }, 872 | { 873 | "cell_type": "markdown", 874 | "metadata": {}, 875 | "source": [ 876 | "---\n", 877 | "**Exercise 2**: Plot 3-dimensional data\n", 878 | "\n", 879 | "Plot *AOD_lat*, *AOD_lon*, and *AOD_500* (which we imported from the \"JRR-AOD_v2r3_j01_...\" netCDF file as:\n", 880 | "\n", 881 | "1. Check the dimensions for all variables using *.shape*.\n", 882 | "2. Do you need to generate a meshgrid with *np.meshgrid()*?\n", 883 | "3. Create a contour plot\n", 884 | "\n", 885 | "---\n", 886 | "**Solution:**" 887 | ] 888 | }, 889 | { 890 | "cell_type": "code", 891 | "execution_count": 12, 892 | "metadata": {}, 893 | "outputs": [], 894 | "source": [ 895 | "fname='data/JRR-AOD_v2r3_j01_s202009152044026_e202009152045271_c202009152113150_thinned.nc'\n", 896 | "aod_file_id = xr.open_dataset(fname, engine='h5netcdf')\n", 897 | "\n", 898 | "AOD_550 = aod_file_id['AOD550']\n", 899 | "AOD_lat = aod_file_id['Latitude']\n", 900 | "AOD_lon = aod_file_id['Longitude']" 901 | ] 902 | }, 903 | { 904 | "cell_type": "code", 905 | "execution_count": 13, 906 | "metadata": {}, 907 | "outputs": [ 908 | { 909 | "data": { 910 | "text/plain": [ 911 | "((768, 3200), (768, 3200))" 912 | ] 913 | }, 914 | "execution_count": 13, 915 | "metadata": {}, 916 | "output_type": "execute_result" 917 | } 918 | ], 919 | "source": [ 920 | "AOD_lat.shape, AOD_lon.shape" 921 | ] 922 | }, 923 | { 924 | "cell_type": "code", 925 | "execution_count": 14, 926 | "metadata": {}, 927 | "outputs": [ 928 | { 929 | "data": { 930 | "image/png": "", 931 | "text/plain": [ 932 | "
" 933 | ] 934 | }, 935 | "metadata": {}, 936 | "output_type": "display_data" 937 | } 938 | ], 939 | "source": [ 940 | "fig = plt.figure() \n", 941 | "ax = plt.subplot()\n", 942 | "co_plot = ax.contourf(AOD_lon, AOD_lat, AOD_550)\n", 943 | "fig.colorbar(co_plot, orientation='horizontal')\n", 944 | "plt.show()" 945 | ] 946 | }, 947 | { 948 | "cell_type": "markdown", 949 | "metadata": {}, 950 | "source": [ 951 | "---\n", 952 | "**Exercise 3** Adding maps to plots\n", 953 | "\n", 954 | "Using *AOD_lat*, *AOD_lon*, and *AOD_550* (which we imported from the \"JRR-AOD_v2r3_j01_...\" netCDF file):\n", 955 | "\n", 956 | "1. Create a *contourf* plot (same as Exercise 2)\n", 957 | "2. Add the coastlines to a standard Plate Caree plot using *projection=* option.\n", 958 | "\n", 959 | "---\n", 960 | "**Solution**:" 961 | ] 962 | }, 963 | { 964 | "cell_type": "code", 965 | "execution_count": 15, 966 | "metadata": {}, 967 | "outputs": [ 968 | { 969 | "data": { 970 | "image/png": "", 971 | "text/plain": [ 972 | "
" 973 | ] 974 | }, 975 | "metadata": {}, 976 | "output_type": "display_data" 977 | } 978 | ], 979 | "source": [ 980 | "fig = plt.figure()\n", 981 | "ax = plt.subplot(projection=ccrs.PlateCarree())\n", 982 | "\n", 983 | "ax.contourf(AOD_lon, AOD_lat, AOD_550)\n", 984 | "\n", 985 | "ax.coastlines('50m')\n", 986 | "plt.show()" 987 | ] 988 | } 989 | ], 990 | "metadata": { 991 | "kernelspec": { 992 | "display_name": "Python 3.11.0 ('notebook_demo')", 993 | "language": "python", 994 | "name": "python3" 995 | }, 996 | "language_info": { 997 | "codemirror_mode": { 998 | "name": "ipython", 999 | "version": 3 1000 | }, 1001 | "file_extension": ".py", 1002 | "mimetype": "text/x-python", 1003 | "name": "python", 1004 | "nbconvert_exporter": "python", 1005 | "pygments_lexer": "ipython3", 1006 | "version": "3.11.0" 1007 | }, 1008 | "vscode": { 1009 | "interpreter": { 1010 | "hash": "4589143d4cda0c8671911bd60c16dc1d10ec327722e7574bc882b745b51509b4" 1011 | } 1012 | } 1013 | }, 1014 | "nbformat": 4, 1015 | "nbformat_minor": 4 1016 | } 1017 | -------------------------------------------------------------------------------- /data/JRR-AOD_v2r3_j01_s202009152044026_e202009152045271_c202009152113150_thinned.nc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/modern-tools-workshop/AGU-python-workshop-2022/3bcc622e8b92ba9bfb61e188860f75298371df24/data/JRR-AOD_v2r3_j01_s202009152044026_e202009152045271_c202009152113150_thinned.nc -------------------------------------------------------------------------------- /data/MOP03JM-201811-L3V95.6.3_thinned.nc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/modern-tools-workshop/AGU-python-workshop-2022/3bcc622e8b92ba9bfb61e188860f75298371df24/data/MOP03JM-201811-L3V95.6.3_thinned.nc -------------------------------------------------------------------------------- /data/gfs_3_20200915_0000_000.grb2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/modern-tools-workshop/AGU-python-workshop-2022/3bcc622e8b92ba9bfb61e188860f75298371df24/data/gfs_3_20200915_0000_000.grb2 -------------------------------------------------------------------------------- /data/sst.mon.ltm.1981-2010.nc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/modern-tools-workshop/AGU-python-workshop-2022/3bcc622e8b92ba9bfb61e188860f75298371df24/data/sst.mon.ltm.1981-2010.nc -------------------------------------------------------------------------------- /environment.yml: -------------------------------------------------------------------------------- 1 | name: python-workshop 2 | channels: 3 | - conda-forge 4 | dependencies: 5 | - python=3.9 6 | - numpy 7 | - matplotlib 8 | - pandas 9 | - cartopy 10 | - h5netcdf 11 | - netcdf4 12 | - pydap 13 | - pyproj 14 | - eccodes 15 | - cython 16 | - pygrib 17 | - s3fs 18 | - scipy 19 | - xarray 20 | -------------------------------------------------------------------------------- /img/flowchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/modern-tools-workshop/AGU-python-workshop-2022/3bcc622e8b92ba9bfb61e188860f75298371df24/img/flowchart.png -------------------------------------------------------------------------------- /sample_script.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding: utf-8 3 | 4 | import numpy as np 5 | import pandas as pd 6 | import matplotlib.pyplot as plt 7 | from netCDF4 import Dataset 8 | from cartopy import crs as ccrs 9 | 10 | # Open file 11 | fname='data/JRR-AOD_v2r3_j01_s202009152044026_e202009152045271_c202009152113150_thinned.nc' 12 | aod_file_id = Dataset(fname) 13 | 14 | # Import variables 15 | AOD_550 = aod_file_id.variables['AOD550'][:,:] 16 | AOD_lat = aod_file_id.variables['Latitude'][:,:] 17 | AOD_lon = aod_file_id.variables['Longitude'][:,:] 18 | 19 | # Make figure 20 | fig = plt.figure() 21 | ax = plt.subplot() 22 | co_plot = ax.contourf(AOD_lon, AOD_lat, AOD_550) 23 | fig.colorbar(co_plot, orientation='horizontal') 24 | plt.savefig("AOD_plot.png") 25 | --------------------------------------------------------------------------------