├── animated.gif ├── .gitignore ├── README.md └── AEOCCG-AccessingAusCoverDataOnTheNCI.ipynb /animated.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ternaustralia/aeoccg-examples/HEAD/animated.gif -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | env/ 12 | build/ 13 | develop-eggs/ 14 | dist/ 15 | downloads/ 16 | eggs/ 17 | .eggs/ 18 | lib/ 19 | lib64/ 20 | parts/ 21 | sdist/ 22 | var/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | 27 | # PyInstaller 28 | # Usually these files are written by a python script from a template 29 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 30 | *.manifest 31 | *.spec 32 | 33 | # Installer logs 34 | pip-log.txt 35 | pip-delete-this-directory.txt 36 | 37 | # Unit test / coverage reports 38 | htmlcov/ 39 | .tox/ 40 | .coverage 41 | .coverage.* 42 | .cache 43 | nosetests.xml 44 | coverage.xml 45 | *,cover 46 | .hypothesis/ 47 | 48 | # Translations 49 | *.mo 50 | *.pot 51 | 52 | # Django stuff: 53 | *.log 54 | local_settings.py 55 | 56 | # Flask stuff: 57 | instance/ 58 | .webassets-cache 59 | 60 | # Scrapy stuff: 61 | .scrapy 62 | 63 | # Sphinx documentation 64 | docs/_build/ 65 | 66 | # PyBuilder 67 | target/ 68 | 69 | # IPython Notebook 70 | .ipynb_checkpoints 71 | 72 | # pyenv 73 | .python-version 74 | 75 | # celery beat schedule file 76 | celerybeat-schedule 77 | 78 | # dotenv 79 | .env 80 | 81 | # virtualenv 82 | venv/ 83 | ENV/ 84 | 85 | # Spyder project settings 86 | .spyderproject 87 | 88 | # Rope project settings 89 | .ropeproject 90 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Accessing TERN AusCover Data on the NCI and RDS
2 | 3 |
4 | 5 | 6 | 7 | 8 | 9 |
10 | 11 | #### A Technical Capacity Building Webinar for AEOCCG 12 | 13 | _Presented by [Peter Scarth](mailto:p.scarth@uq.edu.au?subject=AEOCCG%20webinar%20information) (Joint Remote Sensing Research Program) with lots of help and input from Kelsey Druken and Claire Trenham from the [NCI](http://nci.org.au/about-nci/contact/nci-staff-2/)._ 14 | 15 | ### Abstract 16 | 17 | Have you ever wondered how to access some of the TERN AusCover remotely sensed data which is available on the NCI using some of the available web services directly within your analysis environment? 18 | 19 | This webinar will give several examples using direct access using opendap on thredds and gdal/rasterio on thredds and http where you'll be able to query and directly interact with Landsat, MODIS and Himawari datasets in both iPython notebooks and in GIS packages. By the end of the webinar, you’ll be able to discover some of the many data sets openly available on the NCI and query them using web services. 20 | 21 | #### Background on RDS <> TERN AusCover <> NCI <> AGDC? 22 | - [TERN AusCover](http://auscover.org.au) - provides a national expert network and a data delivery service for provision of Australian biophysical remote sensing data time-series, continental-scale map products, and selected high-resolution datasets over TERN sites. **It is a virtual network for earth observation data sets across Australia that are typically produced by universities, state government agencies _(e.g. NSW OEH and Qld DSITI)_, and federal agencies _(e.g. GA, BOM and CSIRO)_ ).** 23 | - [Research Data Storage Infrastructure (RDSI)](https://www.rds.edu.au) - supports collaborative access to research data assets of national significance (including national reference collections) through distributed data centre development. **They provide the online storage capacity to host Australian EO data sets online.** 24 | - [National Computational Infrastructure (NCI)](http://nci.org.au/) - is Australia’s national research computing facility including the Southern Hemisphere’s most highly-integrated supercomputer and filesystems, Australia’s highest performance research cloud, and one of the nation’s largest data catalogues. **They provide the compute capacity for some of the Auscover data sets, interactive environments, and cataloging and training services.** 25 | - [Australian Geoscience Data Cube (AGDC)](http://www.datacube.org.au/) - provides an integrated gridded data analysis environment for earth observation satellite and related data from multiple satellite and other acquisition systems. **This project facilitates the operational processing and interactive analysis of national EO data sets and is used to produce the Landsat and MODIS collections on the NCI.** 26 | 27 | These groups work together to improve the processing, storage, cataloging, discovery and analysis of EO based data across Australia. 28 | 29 | #### The NCI VDI 30 | Data downloading and analysis by many users has potential risks _(apart from the data being too big for this to be feasible!)_. **Bringing scientists to the data can help mitigate these issues by ensuring everyone is working on the same data.** To support this workflow, the NCI runs a Virtual Desktop Environment with: 31 | - Tools to support climate data analysis & visualisation 32 | - Virtual laboratory to access, process & analyse data 33 | - Analyses input data in a consistent format 34 | - Workflow tools allow science community to implement own analyses without dealing directly with filesystems & HPC 35 | - A range of standard software tools connected to the global Lustre filesystem and HPC 36 | - Desktops: 32GB RAM, 140GB local scratch, 8vCPUs 37 | - [How can I get VDI access?](http://nci.org.au/access/getting-access-to-the-national-facility/allocation-schemes/ ) 38 | 39 | ### NCI and RDS TERN AusCover Services 40 | 41 | NCI have a multi-element system for metadata catalogues and data services 42 | - [GeoNetwork](http://geonetwork.nci.org.au): Find metadata records (akin to CSIRO DAP) 43 | - [THREDDS](http://dap.nci.org.au) Data Service: download or remotely access or view data 44 | - OPeNDAP is one of the protocols served, permits subsetting and remote access to files 45 | - Other protocols include HTTP download and Open Geospatial Consortium Web Services to stream JPEG, TIFF etc. 46 | - Geoserver, ERDDAP, Hyrax, others… and filesystem 47 | - PROMS (provenance), DOI minting (citation) 48 | 49 | [TERN AusCover](http://qld.auscover.org.au/public/html/index.html) also uses [THREDDS](http://qld.auscover.org.au/thredds/catalog.html), [Geoserver](http://qld.auscover.org.au/geoserver/web/), [FTP](ftp://qld.auscover.org.au/) and [HTTP](http://qld.auscover.org.au/public/data/) services to deliver data in a variety of formats, and is working to refine online discovery, subsetting and reformatting tools for both raster and vector data. Keep an eye out for the new website launching soon. 50 | 51 | 52 | ## Presentation Outline 53 | 54 | For these examples I'll be using a [Jupyter Notebook](http://jupyter.org/) with code in Python. 55 | - _The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text and has support for over 40 programming languages, including Python, R, Julia and Scala_. 56 | - If you've never used Jupyter Notebooks before, I highly recommend installing [Anaconda](https://www.continuum.io/downloads) 57 | - _As an aside, many of the packages used by JRSRP and partners, such as [RIOS](http://rioshome.org/), [RSGISLIB](http://www.rsgislib.org/) and [PyLidar](http://pylidar.org/) can be installed into this environment from the [OSGEO Conda index](https://conda.anaconda.org/osgeo)_ 58 | 59 | This notebook will outline some simple online interaction with some of the **JRSRP Landsat seasonal mosaics**. We'll treat the data hosted on http://qld.auscover.org.au as files and use the [RasterIO](https://www.mapbox.com/blog/rasterio-announce/) package to interact with the data and undertake some typical remote sensing tasks. Finally we'll build a simple example to extract and analyse a time series of imagery across an agricultural research property in the Burdekin (from ~5 TB of raster data hosted online). 60 | 61 | Then we'll look at how you'd **access some of the GA Landsat data** produced out of the AGDC and hosted on the NCI using the OPeNDAP protocol via THREDDS. [Link to Hosted Notebook](https://github.com/nci/nci-notebooks/blob/master/Python_Examples/Python_GDAL_NetCDF.ipynb) 62 | 63 | Finally we'll check out a pretty cool notebook that uses the NCI THREDDS Data Server and queries the **CSIRO Auscover MODIS** data sets to extract a time series of imagery. [Link to hosted Notebook](https://github.com/nci/nci-notebooks/blob/master/Data_Access/Using_Siphon/Python_Siphon_II.ipynb) 64 | 65 | 66 | ### Additional NCI Resources 67 | 68 | We won't have much time to look at how you query and explore the THREDDS catalog to [access data](https://github.com/nci/nci-notebooks/blob/master/Data_Access/Using_Thredds/THREDDS_DataAccess.ipynb) or find [WMS and WCS service endpoints](https://github.com/nci/nci-notebooks/blob/master/Data_Access/Using_Thredds/THREDDS_WMS_WCS.ipynb) so I'd strongly encourage you to follow these links if you want to find out more. 69 | 70 | Similarly, **accessing many of these data sets is easy using your desktop GIS package**. This will be the topic of another TERN AusCover session later in the year, but for now have a look at the [NCI QGIS examples](https://github.com/nci/nci-notebooks/tree/master/QGIS_Examples). 71 | 72 | 73 | This all looks a little more tricky than firing up your desktop remote sensing package, but you do get a highly flexible open source analysis environment that give you the ability to perform reproducible research, and operationalise your algorithms nationally with ease. 74 | See the links below for training information, more Jupyter notebooks or NCI help: 75 | - https://training.nci.org.au 76 | - https://github.com/nci/nci-notebooks 77 | - http://nci.org.au/user-support/getting-help/ 78 | -------------------------------------------------------------------------------- /AEOCCG-AccessingAusCoverDataOnTheNCI.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Accessing AusCover Data on the NCI and RDS
\n", 8 | "\n", 9 | "
\n", 10 | "\n", 11 | "\n", 12 | "\n", 13 | "
\n", 14 | "\n", 15 | "#### A Technical Capacity Building Webinar for AEOCCG\n", 16 | "\n", 17 | "_Presented by [Peter Scarth](mailto:p.scarth@uq.edu.au?subject=AEOCCG%20webinar%20information) (Joint Remote Sensing Research Program) with lots of help and input from Kelsey Druken and Claire Trenham from the [NCI](http://nci.org.au/about-nci/contact/nci-staff-2/)._\n", 18 | "\n", 19 | "### Abstract\n", 20 | "\n", 21 | "Have you ever wondered how to access some of the AusCover remotely sensed data which is available on the NCI using some of the available web services directly within your analysis environment?\n", 22 | "\n", 23 | "This webinar will give several examples using direct access using opendap on thredds and gdal/rasterio on thredds and http where you'll be able to query and directly interact with Landsat, MODIS and Himawari datasets in both iPython notebooks and in GIS packages. By the end of the webinar, you’ll be able to discover some of the many data sets openly available on the NCI and query them using web services.\n", 24 | "\n", 25 | "#### Background on RDS <> AusCover <> NCI <> AGDC?\n", 26 | " - [AusCover](http://auscover.org.au) - provides a national expert network and a data delivery service for provision of Australian biophysical remote sensing data time-series, continental-scale map products, and selected high-resolution datasets over TERN sites. **It is a virtual network for earth observation data sets across Australia that are typically produced by universities, state government agencies _(e.g. NSW OEH and Qld DSITI)_, and federal agencies _(e.g. GA, BOM and CSIRO)_ ).**\n", 27 | " - [Research Data Storage Infrastructure (RDSI)](https://www.rds.edu.au) - supports collaborative access to research data assets of national significance (including national reference collections) through distributed data centre development. **They provide the online storage capacity to host Australian EO data sets online.**\n", 28 | " - [National Computational Infrastructure (NCI)](http://nci.org.au/) - is Australia’s national research computing facility including the Southern Hemisphere’s most highly-integrated supercomputer and filesystems, Australia’s highest performance research cloud, and one of the nation’s largest data catalogues. **They provide the compute capacity for some of the Auscover data sets, interactive environments, and cataloging and training services.** \n", 29 | " - [Australian Geoscience Data Cube (AGDC)](http://www.datacube.org.au/) - provides an integrated gridded data analysis environment for earth observation satellite and related data from multiple satellite and other acquisition systems. **This project facilitates the operational processing and interactive analysis of national EO data sets and is used to produce the Landsat and MODIS collections on the NCI.**\n", 30 | "\n", 31 | "These groups work together to improve the processing, storage, cataloging, discovery and analysis of EO based data across Australia.\n", 32 | "\n", 33 | "#### The NCI VDI\n", 34 | "Data downloading and analysis by many users has potential risks _(apart from the data being too big for this to be feasible!)_. **Bringing scientists to the data can help mitigate these issues by ensuring everyone is working on the same data.** To support this workflow, the NCI runs a Virtual Desktop Environment with:\n", 35 | " - Tools to support climate data analysis & visualisation\n", 36 | " - Virtual laboratory to access, process & analyse data\n", 37 | " - Analyses input data in a consistent format \n", 38 | " - Workflow tools allow science community to implement own analyses without dealing directly with filesystems & HPC\n", 39 | " - A range of standard software tools connected to the global Lustre filesystem and HPC\n", 40 | " - Desktops: 32GB RAM, 140GB local scratch, 8vCPUs\n", 41 | " - [How can I get VDI access?](http://nci.org.au/access/getting-access-to-the-national-facility/allocation-scheme/ )\n", 42 | "\n", 43 | "### NCI and RDS AusCover Services\n", 44 | "\n", 45 | "NCI have a multi-element system for metadata catalogues and data services\n", 46 | " - [GeoNetwork](http://geonetwork.nci.org.au/geonetwork): Find metadata records (akin to CSIRO DAP)\n", 47 | " - [THREDDS](http://dap.nci.org.au) Data Service: download or remotely access or view data\n", 48 | " - OPeNDAP is one of the protocols served, permits subsetting and remote access to files\n", 49 | " - Other protocols include HTTP download and Open Geospatial Consortium Web Services to stream JPEG, TIFF etc.\n", 50 | " - Geoserver, ERDDAP, Hyrax, others… and filesystem\n", 51 | " - PROMS (provenance), DOI minting (citation)\n", 52 | "\n", 53 | "[AusCover](http://qld.auscover.org.au/public/html/index.html) also uses [THREDDS](http://qld.auscover.org.au/thredds/catalog.html), [Geoserver](http://qld.auscover.org.au/geoserver/web/), [FTP](ftp://qld.auscover.org.au/) and [HTTP](http://qld.auscover.org.au/public/data/) services to deliver data in a variety of formats, and is working to refine online discovery, subsetting and reformatting tools for both raster and vector data. Keep an eye out for the new website launching soon.\n", 54 | "\n", 55 | "\n", 56 | "## Presentation Outline\n", 57 | "\n", 58 | "For these examples I'll be using a [Jupyter Notebook](http://jupyter.org/) with code in Python.\n", 59 | " - _The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text and has support for over 40 programming languages, including Python, R, Julia and Scala_. \n", 60 | " - If you've never used Jupyter Notebooks before, I highly recommend installing [Anaconda](https://www.continuum.io/downloads)\n", 61 | " - _As an aside, many of the packages used by JRSRP and partners, such as [RIOS](http://rioshome.org/), [RSGISLIB](http://www.rsgislib.org/) and [PyLidar](http://pylidar.org/) can be installed into this environment from the [OSGEO Conda index](https://conda.anaconda.org/osgeo)_\n", 62 | "\n", 63 | "This notebook will outline some simple online interaction with some of the **JRSRP Landsat seasonal mosaics**. We'll treat the data hosted on http://qld.auscover.org.au as files and use the [RasterIO](https://www.mapbox.com/blog/rasterio-announce/) package to interact with the data and undertake some typical remote sensing tasks. Finally we'll build a simple example to extract and analyse a time series of imagery across an agricultural research property in the Burdekin (from ~5 TB of raster data hosted online).\n", 64 | "\n", 65 | "Then we'll look at how you'd **access some of the GA Landsat data** produced out of the AGDC and hosted on the NCI using the OPeNDAP protocol via THREDDS. [Link to Hosted Notebook](https://github.com/nci/Notebooks/blob/master/Python_Examples/Python_GDAL_NetCDF.ipynb)\n", 66 | "\n", 67 | "Finally we'll check out a pretty cool notebook that uses the NCI THREDDS Data Server and queries the **CSIRO Auscover MODIS** data sets to extract a time series of imagery. [Link to hosted Notebook](https://github.com/nci/Notebooks/blob/master/Data_Access/Using_Siphon/Python_Siphon_II.ipynb)\n", 68 | "\n", 69 | "\n", 70 | "### Additional NCI Resources\n", 71 | "\n", 72 | "We won't have time to look at how you query and explore the THREDDS catalog to [access data](https://github.com/nci/Notebooks/blob/master/Data_Access/Using_Thredds/THREDDS_DataAccess.ipynb) or find [WMS and WCS service endpoints](https://github.com/nci/Notebooks/blob/master/Data_Access/Using_Thredds/THREDDS_WMS_WCS.ipynb) so I'd strongly encourage you to follow these links if you want to find out more.\n", 73 | "\n", 74 | "Similarly, **accessing many of these data sets is easy using your desktop GIS package**. This will be the topic of another AusCover session later in the year, but for now have a look at the [NCI QGIS examples](https://github.com/nci/Notebooks/tree/master/QGIS_Examples).\n", 75 | "\n", 76 | "\n", 77 | "This all looks a little more tricky than firing up your desktop remote sensing package, but you do get a highly flexible open source analysis environment that give you the ability to perform reproducible research, and operationalise your algorithms nationally with ease.\n", 78 | "See the links below for training information, more Jupyter notebooks or NCI help:\n", 79 | " - https://training.nci.org.au \n", 80 | " - https://github.com/nci/nci-notebooks\n", 81 | " - http://nci.org.au/user-support/getting-help/\n" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": {}, 87 | "source": [ 88 | "***\n", 89 | "### Import python packages\n", 90 | "There are a number of Python packages that help interacting with raster data. Here I use [Rasterio](https://github.com/mapbox/rasterio), a GDAL and Numpy-based Python library designed to make your work with geospatial raster data more productive, more fun — more [Zen](https://www.python.org/dev/peps/pep-0020/).\n" 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": null, 96 | "metadata": { 97 | "collapsed": true 98 | }, 99 | "outputs": [], 100 | "source": [ 101 | "%matplotlib inline\n", 102 | "import requests\n", 103 | "import rasterio\n", 104 | "import numpy\n", 105 | "import matplotlib.pyplot as plt" 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "***\n", 113 | "## Opening a dataset and discovering some information about the file\n", 114 | "\n", 115 | "**Note 1:** This does not yet load/extract any data, just opens the file.\n", 116 | "\n", 117 | "**Note 2:** RasterIO, like GDAL, is perfectly happy with virtual file systems such as zip files, memory bufers, streaming data or data hosted on a HTTP or FTP server. Here we are connecting to an 18GB file on a HTTP file system so we use the **/vsicurl/** driver\n" 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": null, 123 | "metadata": { 124 | "collapsed": false 125 | }, 126 | "outputs": [], 127 | "source": [ 128 | "# The paths to the South Australian Seasonal Landsat Mosaics on the Queensland RDSI Node\n", 129 | "refDataPath = '/vsicurl/http://qld.auscover.org.au/public/data/landsat\\\n", 130 | "/surface_reflectance/sa/l8olre_sa_m201412201502_dbia2.tif'\n", 131 | "\n", 132 | "# Open the terrain corrected surface reflectance data\n", 133 | "refDataSet = rasterio.open(refDataPath)\n", 134 | "\n", 135 | "# The dataset object contains metadata about the raster data\n", 136 | "print \"Bands:\\t \", refDataSet.count\n", 137 | "print \"Height:\\t \", refDataSet.height\n", 138 | "print \"Width:\\t \", refDataSet.width\n", 139 | "print \"BoundingBox: \", refDataSet.bounds\n", 140 | "print \"CRS:\\t \", refDataSet.crs" 141 | ] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": {}, 146 | "source": [ 147 | "### A note on converting between real world and transformed coordinates\n", 148 | " - Most of the time you'll need to work out where the image pixel corresponding to a real world coordinate is.\n", 149 | " - That means working out how to do coordinate transformations.\n", 150 | " - Here we use the **affine** transformation attached to the dataset object." 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": null, 156 | "metadata": { 157 | "collapsed": false 158 | }, 159 | "outputs": [], 160 | "source": [ 161 | "# Some dummy coordinates as an example\n", 162 | "col, row = 0, 0\n", 163 | "easting, northing = (-460755.0, -2716865.0)\n", 164 | "\n", 165 | "# Convert from image to world coordinates\n", 166 | "print \"Forward Transformation:\\n\", refDataSet.affine\n", 167 | "print \"\\nConvert (%s %s) to Coordinates:\\t\" % (col, row) ,refDataSet.affine * (col, row)\n", 168 | "\n", 169 | "# Convert from world to image coordinates\n", 170 | "print \"\\nInverse Transformation:\\n\", ~refDataSet.affine\n", 171 | "print \"\\nConvert (%s %s) to Row and Column:\\t\" % (easting, northing) ,~refDataSet.affine * (easting, northing)\n" 172 | ] 173 | }, 174 | { 175 | "cell_type": "markdown", 176 | "metadata": {}, 177 | "source": [ 178 | "### Reading in a data subset from the remote file system\n", 179 | "\n", 180 | " - First we compute the area we want to extract from the image\n", 181 | " - Then we pull the data subset from the server\n", 182 | " - In all these examples, **subsets are extracted without having to read the entire file** so you only pull the data you are working with back to your browser. In the case of this example, the (compressed) file is 18GB in size but we just read the portion we want so the subset is available in under a second." 183 | ] 184 | }, 185 | { 186 | "cell_type": "code", 187 | "execution_count": null, 188 | "metadata": { 189 | "collapsed": false 190 | }, 191 | "outputs": [], 192 | "source": [ 193 | "# This is a subset over Port Augusta, SA with the bounding box in EPSG:3577 (Australian Albers)\n", 194 | "topLeft = (527643.414,-3541553.603)\n", 195 | "bottomRight = (574184.998,-3565421.083)\n", 196 | "\n", 197 | "# Compute the image coordinates from the real world coordinates using an inverse transform\n", 198 | "y1, x1 = ~refDataSet.affine * topLeft\n", 199 | "y2, x2 = ~refDataSet.affine * bottomRight\n", 200 | "\n", 201 | "# Read in the SWIR, NIR and Red bands within the specified bounding box\n", 202 | "refDataSubset = refDataSet.read([5,4,3], window=((x1,x2), (y1,y2)))\n", 203 | "\n", 204 | "# Print some information on the subset\n", 205 | "print \"Data extract shape: {0}, dtype: {1}\".format(refDataSubset.shape, refDataSubset.dtype)" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "### Displaying a RGB Image\n", 213 | "\n", 214 | " - Like any remote sensing package, we have to stretch the data into 8 bit space and set up an informative band combination.\n", 215 | " - For more info on common band combinations: http://landsat.usgs.gov/L8_band_combos.php\n" 216 | ] 217 | }, 218 | { 219 | "cell_type": "code", 220 | "execution_count": null, 221 | "metadata": { 222 | "collapsed": false 223 | }, 224 | "outputs": [], 225 | "source": [ 226 | "# Stretch and scale the data from 0 to 255 as an unsigned 8 bit integer for display\n", 227 | "refDataSubsetScaled=numpy.clip(refDataSubset / 5000.0 * 255.0, 0, 255).astype('uint8')\n", 228 | "\n", 229 | "# Plot the image using matplotlib\n", 230 | "plt.figure(figsize=(16,6))\n", 231 | "\n", 232 | "# Rearrange the axes so that it works with matplotlib that likes BIP not BSQ data\n", 233 | "plt.imshow(numpy.rollaxis(refDataSubsetScaled,0,3))\n", 234 | "\n", 235 | "# Add a title to the plot\n", 236 | "plt.title('Port Augusta Area - Landsat Surface Reflectance (Bands 5,4,3)', fontsize=20)" 237 | ] 238 | }, 239 | { 240 | "cell_type": "markdown", 241 | "metadata": {}, 242 | "source": [ 243 | "### Computing NDVI from the extracted Subset\n", 244 | "\n", 245 | " - Everyone uses NDVI as a 'Hello World\" remote sensing example so I will as well.\n", 246 | " - Note that these data use 32767 as a nodata value so we need to mask these out from the calculation.\n", 247 | " - We also embed a simple histogram of the NDVI to show the spread of values." 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "execution_count": null, 253 | "metadata": { 254 | "collapsed": false 255 | }, 256 | "outputs": [], 257 | "source": [ 258 | "# Extract the bands from the nD array\n", 259 | "b4=refDataSubset[1].astype(float)\n", 260 | "b3=refDataSubset[2].astype(float)\n", 261 | "\n", 262 | "# Allow division by zero\n", 263 | "numpy.seterr(divide='ignore', invalid='ignore')\n", 264 | "\n", 265 | "# Calculate NDVI\n", 266 | "ndvi = (b4 - b3) / (b4 + b3)\n", 267 | "\n", 268 | "# Make the NoData values NaN so they get masked from the analysis\n", 269 | "ndvi[b4 == 32767] = numpy.nan\n", 270 | "\n", 271 | "# Plot the result using matplotlib\n", 272 | "fig = plt.figure(figsize=(16,6))\n", 273 | "\n", 274 | "# Select an apprppriate colourmap\n", 275 | "plt.imshow(ndvi,cmap='brg')\n", 276 | "\n", 277 | "# Display a colorbar\n", 278 | "plt.colorbar()\n", 279 | "\n", 280 | "# Add a title to the plot\n", 281 | "plt.title('Port Augusta Area - Landsat NDVI Map and Histogram', fontsize=20)\n", 282 | "\n", 283 | "# Inset a histogram of the NDVI values\n", 284 | "a = plt.axes([.62, .65, .10, .20])\n", 285 | "\n", 286 | "# Compute the histogram\n", 287 | "n, bins, patches = plt.hist(ndvi.ravel(), bins=200, range=(-1, 1),facecolor='grey',histtype='stepfilled')\n", 288 | "\n", 289 | "# Add a title to the histogram plot\n", 290 | "plt.title('Distribution', fontsize=16, color='black')\n", 291 | "\n", 292 | "# Remove the Y axis labels for clarity\n", 293 | "a.yaxis.set_visible(False)" 294 | ] 295 | }, 296 | { 297 | "cell_type": "markdown", 298 | "metadata": {}, 299 | "source": [ 300 | "***\n", 301 | "### Clipping that file locally for additional analysis\n", 302 | " - The [GDAL utilities](http://www.gdal.org/gdal_utilities.html) are perfect for this type of application\n", 303 | " - First we'll use [gdal_translate](http://www.gdal.org/gdal_translate.html) on the command line to clip out this same subset from the 18GB parent file on the server, resample it to 250m by pixel averaging (let's say we want to compare to MODIS) and change the format of the file to [KEA](http://kealib.org/)\n", 304 | " - Then we'll check the file information using [gdalinfo](http://www.gdal.org/gdalinfo.html)\n", 305 | " " 306 | ] 307 | }, 308 | { 309 | "cell_type": "code", 310 | "execution_count": null, 311 | "metadata": { 312 | "collapsed": false 313 | }, 314 | "outputs": [], 315 | "source": [ 316 | "# Run the gdal_translate utility\n", 317 | "! gdal_translate -of KEA -projwin 527643 -3541553 574184 -3565421 -tr 250 250 -r average \\\n", 318 | "'/vsicurl/http://qld.auscover.org.au/public/data/landsat/surface_reflectance/sa/l8olre_sa_m201412201502_dbia2.tif' \\\n", 319 | "l8olre_portaugusta_m201412201502_dbia2.kea" 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": null, 325 | "metadata": { 326 | "collapsed": false 327 | }, 328 | "outputs": [], 329 | "source": [ 330 | "# Run the gdalinfo utility\n", 331 | "! gdalinfo l8olre_portaugusta_m201412201502_dbia2.kea" 332 | ] 333 | }, 334 | { 335 | "cell_type": "markdown", 336 | "metadata": { 337 | "collapsed": true 338 | }, 339 | "source": [ 340 | "***\n", 341 | "### Run an analysis of fractional cover data over the same area\n", 342 | " - These data contain values representing the percentage of a pixel that is covered by green vegetation, dead vegetation and bare ground. \n", 343 | " - A common task is to ascertain how much bare ground above a certain threshold exists in an area\n", 344 | " - For this example we simply repeat the extraction process again using the corresponding fractional cover data set\n", 345 | " - It's then a simple matter to plot a **cumulative histogram** of bare ground values for the reporting" 346 | ] 347 | }, 348 | { 349 | "cell_type": "code", 350 | "execution_count": null, 351 | "metadata": { 352 | "collapsed": false 353 | }, 354 | "outputs": [], 355 | "source": [ 356 | "# Open the fractional cover data on the server\n", 357 | "fcDataPath = '/vsicurl/http://qld.auscover.org.au/public/data/landsat\\\n", 358 | "/seasonal_fractional_cover/fractional_cover/sa/lztmre_sa_m201412201502_dima2.tif'\n", 359 | "fcDataSet = rasterio.open(fcDataPath)\n", 360 | "\n", 361 | "# Extract the first band (Bare Ground) from the data on the Auscover server\n", 362 | "fcDataSubset = fcDataSet.read(1, window=((x1,x2), (y1,y2)))\n", 363 | "\n", 364 | "# Print some information on the extracted data\n", 365 | "print \"Data extract shape: {0}, dtype: {1}\".format(fcDataSubset.shape, fcDataSubset.dtype)\n", 366 | "\n", 367 | "# Scale the bare ground data as percent from 0 to 100\n", 368 | "barePercent = numpy.clip(fcDataSubset - 100.0, 0, 100)\n", 369 | "\n", 370 | "# Setup the matplotlib figure\n", 371 | "fig = plt.figure(figsize=(9,5))\n", 372 | "\n", 373 | "# Compute the cumulative histogram and plot\n", 374 | "n, bins, patches = plt.hist\\\n", 375 | " (barePercent.ravel(), bins=100, range=(0, 100),color='brown',\\\n", 376 | " cumulative=True, normed=1,histtype='step',linewidth=3)\n", 377 | "\n", 378 | "# Add in labels and grids to the histogram\n", 379 | "plt.xlabel('Bare Ground')\n", 380 | "plt.ylabel('Percent Area')\n", 381 | "plt.title('Cumulative Distribution of Bare Ground', fontsize=20)\n", 382 | "plt.grid(True)" 383 | ] 384 | }, 385 | { 386 | "cell_type": "markdown", 387 | "metadata": {}, 388 | "source": [ 389 | "### Lets have a look at the bare ground image to locate those highly bare areas" 390 | ] 391 | }, 392 | { 393 | "cell_type": "code", 394 | "execution_count": null, 395 | "metadata": { 396 | "collapsed": false 397 | }, 398 | "outputs": [], 399 | "source": [ 400 | "# Plot the result using matplotlib\n", 401 | "fig = plt.figure(figsize=(16,6))\n", 402 | "\n", 403 | "# Select an apprppriate colourmap\n", 404 | "plt.imshow(barePercent,cmap='gist_earth')\n", 405 | "\n", 406 | "# Display a colorbar\n", 407 | "plt.colorbar()\n", 408 | "\n", 409 | "# Add a title to the plot\n", 410 | "plt.title('Port Augusta Area - Landsat Bare Ground', fontsize=20)" 411 | ] 412 | }, 413 | { 414 | "cell_type": "markdown", 415 | "metadata": {}, 416 | "source": [ 417 | "***\n", 418 | "## Time series analysis example\n", 419 | "**We'll look at the Spyglass research property in Queensland**\n", 420 | "\n", 421 | " - This requires a catalog service or a good filename convention\n", 422 | " - We need to:\n", 423 | " - work out what image data we need\n", 424 | " - open the subsets\n", 425 | " - calculate statistics for each subset\n", 426 | " - Plot the results.\n", 427 | " - This can also be slow over the web depending on your internet connection speed and the size of the subset due to the face we download over 100 subsets.\n", 428 | " - It's still faster than downloading 2TB of data then trying to analyise that on your desktop!\n", 429 | "\n", 430 | "\n", 431 | "\n", 432 | "_Aside: This image is embedded using Geoserver to render on the fly_\n", 433 | "\n" 434 | ] 435 | }, 436 | { 437 | "cell_type": "code", 438 | "execution_count": null, 439 | "metadata": { 440 | "collapsed": false 441 | }, 442 | "outputs": [], 443 | "source": [ 444 | "%%time\n", 445 | "# Import some additional python packages needed for this example\n", 446 | "from xml.etree import ElementTree\n", 447 | "from IPython.display import display, HTML\n", 448 | "from scipy.interpolate import interp1d\n", 449 | "\n", 450 | "# Download and parse the THREDDS catalog XML to find all the seasonal fractional cover images in Queensland.\n", 451 | "catalogUrl = 'http://qld.auscover.org.au/thredds/catalog/auscover/landsat\\\n", 452 | "/seasonal_fractional_cover/fractional_cover/qld/catalog.xml'\n", 453 | "root = ElementTree.XML(requests.get(catalogUrl).text.encode('utf-8'))\n", 454 | "\n", 455 | "# Build a list of datasets in the catalog XML\n", 456 | "datasets=root.find(\"{http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0}dataset\")\n", 457 | "\n", 458 | "# Define the property bounding box in EPSG:3577 (Australian Albers)\n", 459 | "# This corresponds to the subset displayed above\n", 460 | "paddockTopLeft = (1420000,-2150000)\n", 461 | "paddockBottomRight = (1450000,-2170000)\n", 462 | "\n", 463 | "# Print some statistics about property size\n", 464 | "print \"Analysing % 0.0f Ha of Imagery\" % \\\n", 465 | " ((paddockBottomRight[0]-paddockTopLeft[0]) * (paddockTopLeft[1]-paddockBottomRight[1]) / 10000) \n", 466 | "\n", 467 | "\n", 468 | "# Setup empty lists to hold the property statistics\n", 469 | "seasonalDates= []\n", 470 | "seasonalBareGroundMedian = []\n", 471 | "seasonalBareGroundMean = []\n", 472 | "seasonalBareGroundStd = []\n", 473 | "seasonalBareGroundMin = []\n", 474 | "seasonalBareGroundMax = []\n", 475 | "\n", 476 | "# Loop through all the datasets found in the catalog\n", 477 | "for dataset in datasets:\n", 478 | " \n", 479 | " # Get the name of the file\n", 480 | " fileName = dataset.get('name')\n", 481 | " if fileName is not None:\n", 482 | " # Open the dataset on the server using RasterIO\n", 483 | " fileUrl = \"/vsicurl/http://qld.auscover.org.au/thredds/fileServer/\" + str(dataset.get('urlPath'))\n", 484 | " tsDataSet = rasterio.open(fileUrl)\n", 485 | "\n", 486 | " # Compute the image coordinates from the real world coordinates using the inverse transform\n", 487 | " y3, x3 = ~tsDataSet.affine * paddockTopLeft\n", 488 | " y4, x4 = ~tsDataSet.affine * paddockBottomRight\n", 489 | " \n", 490 | " # Read the property subset into a numpy array\n", 491 | " tsDataSubset = tsDataSet.read(1, window=((x3,x4), (y3,y4)))\n", 492 | " \n", 493 | " # Remove the nodata values (0) from the flattened array and rescale from 0 to 100\n", 494 | " tsDataSubset = numpy.ravel(tsDataSubset[tsDataSubset <> 0] - 100)\n", 495 | " \n", 496 | " # Check that we have some data available for the time period\n", 497 | " if tsDataSubset.size > 1:\n", 498 | " \n", 499 | " # Compute some statistics for the region and add to the lists\n", 500 | " # Median\n", 501 | " seasonalBareGroundMedian.append(numpy.median(tsDataSubset))\n", 502 | " # Mean\n", 503 | " seasonalBareGroundMean.append(numpy.mean(tsDataSubset))\n", 504 | " # Standard Deviation\n", 505 | " seasonalBareGroundStd.append(numpy.std(tsDataSubset))\n", 506 | " # 5th percentile, a robust minimum\n", 507 | " seasonalBareGroundMin.append(numpy.percentile(tsDataSubset,5))\n", 508 | " # 95th percentile, a robust maximum\n", 509 | " seasonalBareGroundMax.append(numpy.percentile(tsDataSubset,95))\n", 510 | "\n", 511 | " # Lazy date calculation. Should really be using datetime here!\n", 512 | " \n", 513 | " # Starting year from the filename\n", 514 | " year = float(fileName.split('_')[2][1:5])\n", 515 | " # Starting month from the filename\n", 516 | " month = float(fileName.split('_')[2][5:7])\n", 517 | " # Add the fractional year to the data list\n", 518 | " seasonalDates.append(year + month / 12)\n", 519 | "\n", 520 | "\n", 521 | "# Print a handy table of all of our statistics\n", 522 | "propertyStats = [seasonalDates,seasonalBareGroundMin,seasonalBareGroundMedian,\\\n", 523 | " seasonalBareGroundMean,seasonalBareGroundMax,seasonalBareGroundStd]\n", 524 | "\n", 525 | "# This hack inserts a formatted HTML table inline into our notebook\n", 526 | "display(HTML('{}
'.format(''.join(\n", 527 | " '{}'.format(''.join(str(_) for _ in row)) for row in propertyStats))))\n", 528 | "\n", 529 | "\n", 530 | "\n", 531 | "# Instead of just plotting the points, lets try some interpolation to make a smooth curve\n", 532 | "\n", 533 | "# Make a linearly spaced array of fractional years spanning the range of our data\n", 534 | "interpDates = numpy.linspace(min(seasonalDates), max(seasonalDates), num=500, endpoint=True)\n", 535 | "\n", 536 | "# Build a function to do linear interpolation of the maximum bare ground\n", 537 | "linearFunction = interp1d(seasonalDates, seasonalBareGroundMax, kind='linear')\n", 538 | "\n", 539 | "# Build a function to do cubic interpolation of the maximum bare ground\n", 540 | "cubicFunction = interp1d(seasonalDates, seasonalBareGroundMax, kind='cubic')\n", 541 | "\n", 542 | "# Fit a polynomial to the maximum bare ground\n", 543 | "fittedPolynomial = numpy.poly1d(numpy.polyfit(seasonalDates, seasonalBareGroundMax, 2))\n", 544 | "\n", 545 | "# Setup the figure\n", 546 | "fig = plt.figure(figsize=(16,8))\n", 547 | "\n", 548 | "# Plot the points, the linear interpolation and the cubic interpolation\n", 549 | "plt.plot(seasonalDates, seasonalBareGroundMax, 'o',\\\n", 550 | " interpDates, cubicFunction(interpDates),'-',\\\n", 551 | " interpDates, linearFunction(interpDates), '--',\\\n", 552 | " seasonalDates, fittedPolynomial(seasonalDates),'.')\n", 553 | "\n", 554 | "# Add annotation to the plot\n", 555 | "plt.legend(['Bare Ground Observations', 'Cubic interpolation', 'Linear interpolation', 'Fitted Polynomial'], loc='best')\n", 556 | "plt.xlabel('Season Start Date')\n", 557 | "plt.ylabel('Bare Ground (%)')\n", 558 | "plt.title('Maximum (95th percentile) Bare Ground across Spyglass', fontsize=16)\n", 559 | "plt.grid(True)\n", 560 | "plt.ticklabel_format(useOffset=False)" 561 | ] 562 | }, 563 | { 564 | "cell_type": "markdown", 565 | "metadata": {}, 566 | "source": [ 567 | "***\n", 568 | "# Aside - Accessing field data via services is possible as well!\n", 569 | " - Several ways using WMS and WCS\n", 570 | " - This super simple method used WCS geojson and the Folium library" 571 | ] 572 | }, 573 | { 574 | "cell_type": "code", 575 | "execution_count": null, 576 | "metadata": { 577 | "collapsed": false 578 | }, 579 | "outputs": [], 580 | "source": [ 581 | "# Import folium, a leaflet based python \n", 582 | "import folium\n", 583 | "\n", 584 | "# The URL from Geoserver for the GeoJSON TLS Sites\n", 585 | "geojsonUrl = 'http://qld.auscover.org.au/geoserver/aus/ows?\\\n", 586 | "service=WFS&version=1.0.0&request=GetFeature&\\\n", 587 | "typeName=aus:tls_sites_display&maxFeatures=100&outputFormat=application%2Fjson'\n", 588 | "\n", 589 | "# GET the data from the server as text\n", 590 | "tlsSites = requests.get(geojsonUrl).text\n", 591 | "\n", 592 | "# Build the base Map object\n", 593 | "tlsMap = folium.Map(tiles='OpenStreetMap',location=[-27, 135], zoom_start=4)\n", 594 | "\n", 595 | "# Add the GeoJSON from the server\n", 596 | "tlsMap.choropleth(geo_str=tlsSites)\n", 597 | "\n", 598 | "# Show the map\n", 599 | "tlsMap" 600 | ] 601 | }, 602 | { 603 | "cell_type": "markdown", 604 | "metadata": {}, 605 | "source": [ 606 | "***\n", 607 | "# NCI Data Access: Python NetCDF Landsat8\n", 608 | "\n", 609 | "**The following will go through how to:**
\n", 610 | " 1. Access published netCDF data through NCI's THREDDS Data Server (using OPeNDAP)\n", 611 | " 2. Extract/view data\n", 612 | " 3. Save data subset to new file\n", 613 | "\n", 614 | "\n", 615 | "### Import python libraries\n", 616 | "\n", 617 | "There are several Python libraries available to work with netCDF and HDF file formats. This tutorial will use `netCDF4` but others, such as `h5py`, `cdms2`, and `gdal` can also be used. For more information on these other libraries, please see the main tutorial page. \n" 618 | ] 619 | }, 620 | { 621 | "cell_type": "code", 622 | "execution_count": null, 623 | "metadata": { 624 | "collapsed": true 625 | }, 626 | "outputs": [], 627 | "source": [ 628 | "%matplotlib inline\n", 629 | "import numpy\n", 630 | "from netCDF4 import Dataset\n", 631 | "import matplotlib.pyplot as plt \n" 632 | ] 633 | }, 634 | { 635 | "cell_type": "markdown", 636 | "metadata": {}, 637 | "source": [ 638 | "## Open/read file\n", 639 | "**Note:** This does not yet load/extract any data, just opens the file.\n", 640 | "\n", 641 | "### The 'Dataset' function is used to open a file with Python's netCDF4 library. \n", 642 | "For local files, this will be the filepath (i.e., /g/data...) while for remote access, this will be the OPeNDAP data URL. For instructions on how to find the OPeNDAP URL, please see: [THREDDS Data Access](https://nbviewer.jupyter.org/github/kdruken/Notebooks/blob/master/THREDDS_DataAccess.ipynb)\n", 643 | "\n", 644 | "#### Accessing data remotely (OPeNDAP)\n", 645 | "#### After opening the file with the OPeNDAP address, the file can be handled in the same manner as a local file. \n", 646 | "\n" 647 | ] 648 | }, 649 | { 650 | "cell_type": "code", 651 | "execution_count": null, 652 | "metadata": { 653 | "collapsed": true 654 | }, 655 | "outputs": [], 656 | "source": [ 657 | "url = 'http://dapds00.nci.org.au/thredds/dodsC/rs0/tiles/EPSG3577/LS8_OLI_TIRS_NBAR/LS8_OLI_TIRS_NBAR_3577_-10_-27_2013.nc'\n", 658 | "f = Dataset(url, 'r')" 659 | ] 660 | }, 661 | { 662 | "cell_type": "markdown", 663 | "metadata": {}, 664 | "source": [ 665 | "## Browse information about the file" 666 | ] 667 | }, 668 | { 669 | "cell_type": "markdown", 670 | "metadata": {}, 671 | "source": [ 672 | "### File dimensions" 673 | ] 674 | }, 675 | { 676 | "cell_type": "code", 677 | "execution_count": null, 678 | "metadata": { 679 | "collapsed": false 680 | }, 681 | "outputs": [], 682 | "source": [ 683 | "for item in f.dimensions:\n", 684 | " print f.dimensions[item].name, f.dimensions[item].size" 685 | ] 686 | }, 687 | { 688 | "cell_type": "markdown", 689 | "metadata": {}, 690 | "source": [ 691 | "### File variables" 692 | ] 693 | }, 694 | { 695 | "cell_type": "code", 696 | "execution_count": null, 697 | "metadata": { 698 | "collapsed": false 699 | }, 700 | "outputs": [], 701 | "source": [ 702 | "vars = f.variables.keys()\n", 703 | "for item in vars:\n", 704 | " print 'Variable: \\t', item\n", 705 | " print 'Dimensions: \\t', f[item].dimensions\n", 706 | " print 'Shape: \\t', f[item].shape, '\\n'" 707 | ] 708 | }, 709 | { 710 | "cell_type": "markdown", 711 | "metadata": {}, 712 | "source": [ 713 | "## Extracting data (using index values)\n", 714 | "A nice feature of netCDF/HDF file formats is that you can extract subsets without having to read the entire file (or variable). The example below demonstrates the simplest subsetting example by directly specifying the subset indices. " 715 | ] 716 | }, 717 | { 718 | "cell_type": "code", 719 | "execution_count": null, 720 | "metadata": { 721 | "collapsed": false 722 | }, 723 | "outputs": [], 724 | "source": [ 725 | "# Read variables (but not yet extract)\n", 726 | "band2 = f['band_2']\n", 727 | "y = f['y']\n", 728 | "x = f['x']\n", 729 | "t = f['time']\n", 730 | "\n", 731 | "# Subset indices\n", 732 | "x1, x2 = 1000,3999\n", 733 | "y1, y2 = 0,3000\n", 734 | "t1 = 9\n", 735 | "\n", 736 | "# Extract\n", 737 | "band2_subset = band2[t1, y1:y2, x1:x2]\n", 738 | "y_subset = y[y1:y2]\n", 739 | "x_subset = x[x1:x2]\n" 740 | ] 741 | }, 742 | { 743 | "cell_type": "markdown", 744 | "metadata": {}, 745 | "source": [ 746 | "## Plot data" 747 | ] 748 | }, 749 | { 750 | "cell_type": "code", 751 | "execution_count": null, 752 | "metadata": { 753 | "collapsed": false 754 | }, 755 | "outputs": [], 756 | "source": [ 757 | "# Set figure size\n", 758 | "plt.figure(figsize=(12,6))\n", 759 | "\n", 760 | "# Plot data subset with equal axes and colorbar\n", 761 | "plt.contourf(x_subset, y_subset, band2_subset)\n", 762 | "plt.axis('equal')\n", 763 | "cbar = plt.colorbar()\n", 764 | "\n", 765 | "# Add figure title and labels\n", 766 | "# We can make use of the defined variable attributes to do this\n", 767 | "plt.title(band2.long_name+'\\n', fontsize=18)\n", 768 | "plt.xlabel(x.long_name+' ('+x.units+') ', fontsize=16)\n", 769 | "plt.ylabel(y.long_name+' ('+y.units+') ', fontsize=16)\n", 770 | "\n", 771 | "# Adjust tick mark size\n", 772 | "cbar.ax.tick_params(labelsize=16) \n", 773 | "plt.tick_params(labelsize=16)" 774 | ] 775 | }, 776 | { 777 | "cell_type": "markdown", 778 | "metadata": {}, 779 | "source": [ 780 | "## Plot subset as RGB image\n", 781 | "For more info on common band combinations: http://landsat.usgs.gov/L8_band_combos.php\n" 782 | ] 783 | }, 784 | { 785 | "cell_type": "code", 786 | "execution_count": null, 787 | "metadata": { 788 | "collapsed": false 789 | }, 790 | "outputs": [], 791 | "source": [ 792 | "# Read in bands\n", 793 | "band4_subset = f['band_4'][t1, y1:y2, x1:x2]\n", 794 | "band6_subset = f['band_6'][t1, y1:y2, x1:x2]\n", 795 | "band7_subset = f['band_7'][t1, y1:y2, x1:x2]\n", 796 | "\n", 797 | "# Bands must be clipped (value of 6000 was chosen in this case) and scaled to values between (0, 255) to plot as RGB image.\n", 798 | "b4 = band4_subset.clip(0, 6000) / 6000. * 255\n", 799 | "b6 = band6_subset.clip(0, 6000) / 6000. * 255\n", 800 | "b7 = band7_subset.clip(0, 6000) / 6000. * 255\n", 801 | "\n", 802 | "## Combine the bands of interest into numpy NxNx3 dimensional array\n", 803 | "# Note: The data type must be converted to 'uint8' to plot as image\n", 804 | "rgb = numpy.stack((b7, b6, b4), axis=2).astype('uint8')\n", 805 | "print \"New array shape: {0}, dtype: {1}\".format(rgb.shape, rgb.dtype)\n", 806 | "\n", 807 | "# Set figure size\n", 808 | "plt.figure(figsize=(12,12))\n", 809 | "\n", 810 | "# Plot image\n", 811 | "plt.imshow(rgb, extent=[x_subset[0], y_subset[-1], x_subset[-1], y_subset[0]])\n", 812 | "\n", 813 | "# Add figure title and labels\n", 814 | "# We can make use of the defined variable attributes to do this\n", 815 | "plt.title('Landsat 8 False Colour: Bands (7, 6, 4) \\n', fontsize=20)\n", 816 | "plt.xlabel(x.long_name+' ('+x.units+') ', fontsize=16)\n", 817 | "plt.ylabel(y.long_name+' ('+y.units+') ', fontsize=16)\n", 818 | "\n", 819 | "\n", 820 | "# Adjust tick mark size\n", 821 | "plt.tick_params(labelsize=16)" 822 | ] 823 | }, 824 | { 825 | "cell_type": "code", 826 | "execution_count": null, 827 | "metadata": { 828 | "collapsed": true 829 | }, 830 | "outputs": [], 831 | "source": [ 832 | "# Close the file\n", 833 | "f.close()" 834 | ] 835 | }, 836 | { 837 | "cell_type": "markdown", 838 | "metadata": {}, 839 | "source": [ 840 | "***\n", 841 | "# Using the NCI THREDDS Data Server with Siphon\n", 842 | "\n", 843 | "Siphon is a collection of Python utilities for downloading data from Unidata data technologies. More information on installing and using Unidata's Siphon can be found: \n", 844 | "https://github.com/Unidata/siphon\n", 845 | "\n", 846 | "**The following will go through how to:**\n", 847 | "- Use Siphon to find and query Landsat data hosted at NCI\n", 848 | "- Use Siphon to find and query MODIS data hosted on NCI THREDDS Data Server\n", 849 | "\n", 850 | "\n", 851 | "## Browse for data\n", 852 | "\n", 853 | "Begin by going to NCI's Geonetwork page: http://geonetwork.nci.org.au/\n", 854 | "\n", 855 | "This page contains the metadata records for NCI Data Collections as well as information on where to find the data. \n", 856 | "\n", 857 | "\n", 858 | "\n", 859 | "In this example, we will search for Landsat data:\n", 860 | "\n", 861 | "\n", 862 | "\n", 863 | "If we click on the first result, we see a brief overview of the metadata record. **Note:** For the full record, navigate to the upper-right corner of your browser to change to the \"Full view\" (eyeball icon). \n", 864 | "\n", 865 | "One of the options under **Download and links** is the NCI THREDDS Data Server Catalog page: \n", 866 | "\n", 867 | "\n", 868 | "\n", 869 | "By navigating to this link, the available (public) data subcollections and datasets will be visible:\n", 870 | "\n", 871 | "\n", 872 | "\n", 873 | "\n", 874 | "In this example, let's navigate to the ** LANDSAT data: scenes and tiles/ tiles/ EPSG3577/ LS8_OLI_TIRS_NBAR/ ** dataset: \n", 875 | "\n", 876 | "\n", 877 | "\n", 878 | "## Using Siphon\n", 879 | "\n", 880 | "Once selecting a parent dataset directory, Siphon can be used to search and use the data access methods and services provided by THREDDS. For example, Siphon will return a list of data endpoints for the OPeNDAP data URL, NetCDF Subset Service (NCSS), Web Map Service (WMS), Web Coverage Service (WCS), and the HTTP link for direct download. \n", 881 | "\n", 882 | "### Import python packages\n" 883 | ] 884 | }, 885 | { 886 | "cell_type": "code", 887 | "execution_count": null, 888 | "metadata": { 889 | "collapsed": false 890 | }, 891 | "outputs": [], 892 | "source": [ 893 | "%matplotlib inline\n", 894 | "from netCDF4 import Dataset \n", 895 | "from siphon import catalog, ncss\n", 896 | "import datetime\n", 897 | "import matplotlib.pyplot as plt " 898 | ] 899 | }, 900 | { 901 | "cell_type": "markdown", 902 | "metadata": {}, 903 | "source": [ 904 | "### Provide the top-level URL from the THREDDS page above\n", 905 | "Note: You can leave the '.html' but you will receive a message saying it was changed to '.xml'. \n", 906 | "\n", 907 | "\n" 908 | ] 909 | }, 910 | { 911 | "cell_type": "code", 912 | "execution_count": null, 913 | "metadata": { 914 | "collapsed": true 915 | }, 916 | "outputs": [], 917 | "source": [ 918 | "catalog_url = 'http://dapds00.nci.org.au/thredds/catalog/rs0/tiles/EPSG3577/LS8_OLI_TIRS_NBAR/catalog.xml'" 919 | ] 920 | }, 921 | { 922 | "cell_type": "markdown", 923 | "metadata": {}, 924 | "source": [ 925 | "### Now we use Siphon to list all the available datasets under this catalog" 926 | ] 927 | }, 928 | { 929 | "cell_type": "code", 930 | "execution_count": null, 931 | "metadata": { 932 | "collapsed": false 933 | }, 934 | "outputs": [], 935 | "source": [ 936 | "tds = catalog.TDSCatalog(catalog_url)\n", 937 | "datasets = list(tds.datasets)\n", 938 | "endpts = tds.datasets.values()" 939 | ] 940 | }, 941 | { 942 | "cell_type": "markdown", 943 | "metadata": {}, 944 | "source": [ 945 | "#### Some of the datasets...\n" 946 | ] 947 | }, 948 | { 949 | "cell_type": "code", 950 | "execution_count": null, 951 | "metadata": { 952 | "collapsed": false 953 | }, 954 | "outputs": [], 955 | "source": [ 956 | "datasets[:10]" 957 | ] 958 | }, 959 | { 960 | "cell_type": "markdown", 961 | "metadata": {}, 962 | "source": [ 963 | "#### And their associated endpoints for data services:" 964 | ] 965 | }, 966 | { 967 | "cell_type": "code", 968 | "execution_count": null, 969 | "metadata": { 970 | "collapsed": false 971 | }, 972 | "outputs": [], 973 | "source": [ 974 | "for key, value in endpts[0].access_urls.items():\n", 975 | " print key, value" 976 | ] 977 | }, 978 | { 979 | "cell_type": "markdown", 980 | "metadata": {}, 981 | "source": [ 982 | "### Now we can use Siphon along with some form of query method to find some data\n", 983 | "\n", 984 | "#### This example will use Shapely to find intersecting Polygon shapes" 985 | ] 986 | }, 987 | { 988 | "cell_type": "code", 989 | "execution_count": null, 990 | "metadata": { 991 | "collapsed": false 992 | }, 993 | "outputs": [], 994 | "source": [ 995 | "from shapely.geometry import Polygon\n", 996 | "from shapely.wkt import loads\n", 997 | "\n", 998 | "query = (136, 138, -29.3, -27.8)\n", 999 | "query = Polygon([[136, -29.3], [136, -27.8], [138, -27.8], [138, -29.3]])\n", 1000 | "\n", 1001 | "# What this query looks like in WKT\n", 1002 | "print query.wkt" 1003 | ] 1004 | }, 1005 | { 1006 | "cell_type": "markdown", 1007 | "metadata": {}, 1008 | "source": [ 1009 | "#### Loop through the datasets and check if the Landsat's geospatial bounds (which is in a WKT polygon format) intersects with the query\n" 1010 | ] 1011 | }, 1012 | { 1013 | "cell_type": "code", 1014 | "execution_count": null, 1015 | "metadata": { 1016 | "collapsed": false 1017 | }, 1018 | "outputs": [], 1019 | "source": [ 1020 | "%%time \n", 1021 | "\n", 1022 | "matches = []\n", 1023 | "for dataset in endpts:\n", 1024 | " dap = dataset.access_urls['OPENDAP']\n", 1025 | " with Dataset(dap) as f:\n", 1026 | " bounds = loads(f.geospatial_bounds.encode())\n", 1027 | " if bounds.intersects(query) == True:\n", 1028 | " print dap\n", 1029 | " matches.append(dap)" 1030 | ] 1031 | }, 1032 | { 1033 | "cell_type": "markdown", 1034 | "metadata": {}, 1035 | "source": [ 1036 | "### Let's take a quick look at what was found \n", 1037 | "\n", 1038 | "(Because we are accessing data remotely through OPeNDAP, let's look at a lower resolution so it doesn't exceed memory limits.)" 1039 | ] 1040 | }, 1041 | { 1042 | "cell_type": "code", 1043 | "execution_count": null, 1044 | "metadata": { 1045 | "collapsed": false 1046 | }, 1047 | "outputs": [], 1048 | "source": [ 1049 | "plt.figure(figsize=(12,8))\n", 1050 | "\n", 1051 | "for match in matches:\n", 1052 | " with Dataset(match) as f:\n", 1053 | " x = f.variables['x'][::50]\n", 1054 | " y = f.variables['y'][::50]\n", 1055 | " t = f.variables['time'][::5]\n", 1056 | " \n", 1057 | " for i in range(0, len(t)):\n", 1058 | " b2 = f.variables['band_2'][i,::50,::50]\n", 1059 | " plt.pcolormesh(x, y, b2)" 1060 | ] 1061 | }, 1062 | { 1063 | "cell_type": "markdown", 1064 | "metadata": {}, 1065 | "source": [ 1066 | "***\n", 1067 | "## Now we'll use the same process to look at a MODIS time series\n", 1068 | "\n", 1069 | "#### Start by defining the parent catalog URL from NCI's THREDDS Data Server\n", 1070 | "#### Then use Siphon to explore the available datasets and data services end points" 1071 | ] 1072 | }, 1073 | { 1074 | "cell_type": "code", 1075 | "execution_count": null, 1076 | "metadata": { 1077 | "collapsed": true 1078 | }, 1079 | "outputs": [], 1080 | "source": [ 1081 | "url = 'http://dapds00.nci.org.au/thredds/catalog/u39/public/data/modis/fractionalcover-clw/v2.2/netcdf/catalog.xml'\n", 1082 | "\n", 1083 | "tds = catalog.TDSCatalog(url)\n", 1084 | "datasets = list(tds.datasets)\n", 1085 | "endpts = tds.datasets.values()" 1086 | ] 1087 | }, 1088 | { 1089 | "cell_type": "markdown", 1090 | "metadata": {}, 1091 | "source": [ 1092 | "### We can create a small function that uses Siphon's Netcdf Subset Service (NCSS) to extract a spatial request (defined by a lat/lon box)" 1093 | ] 1094 | }, 1095 | { 1096 | "cell_type": "code", 1097 | "execution_count": null, 1098 | "metadata": { 1099 | "collapsed": true 1100 | }, 1101 | "outputs": [], 1102 | "source": [ 1103 | "def get_data(dataset, bbox): \n", 1104 | " nc = ncss.NCSS(dataset.access_urls['NetcdfSubset'])\n", 1105 | " query = nc.query()\n", 1106 | " query.lonlat_box(north=bbox[3],south=bbox[2],east=bbox[1],west=bbox[0])\n", 1107 | " query.variables('bs')\n", 1108 | " \n", 1109 | " data = nc.get_data(query)\n", 1110 | " \n", 1111 | " lon = data['longitude'][:]\n", 1112 | " lat = data['latitude'][:]\n", 1113 | " bs = data['bs'][0,:,:]\n", 1114 | " t = data['time'][:]\n", 1115 | " \n", 1116 | " time_base = datetime.date(year=1800, month=1, day=1)\n", 1117 | " time = time_base + datetime.timedelta(t[0])\n", 1118 | " \n", 1119 | " return lon, lat, bs, time" 1120 | ] 1121 | }, 1122 | { 1123 | "cell_type": "markdown", 1124 | "metadata": {}, 1125 | "source": [ 1126 | "### Test the function on a single file and view result" 1127 | ] 1128 | }, 1129 | { 1130 | "cell_type": "code", 1131 | "execution_count": null, 1132 | "metadata": { 1133 | "collapsed": false 1134 | }, 1135 | "outputs": [], 1136 | "source": [ 1137 | "bbox = (135, 140, -31, -27)\n", 1138 | "lon, lat, bs, t = get_data(endpts[0], bbox)\n", 1139 | "\n", 1140 | "plt.figure(figsize=(10,10))\n", 1141 | "plt.imshow(bs, extent=bbox, cmap='gist_earth', origin='upper')\n", 1142 | "\n", 1143 | "plt.xlabel('longitude (degrees)', fontsize=14)\n", 1144 | "plt.ylabel('latitude (degrees)', fontsize=14)\n", 1145 | "print \"Date: \", t" 1146 | ] 1147 | }, 1148 | { 1149 | "cell_type": "markdown", 1150 | "metadata": {}, 1151 | "source": [ 1152 | "### Loop and query over the collection and save every plot as a PNG image file" 1153 | ] 1154 | }, 1155 | { 1156 | "cell_type": "code", 1157 | "execution_count": null, 1158 | "metadata": { 1159 | "collapsed": false 1160 | }, 1161 | "outputs": [], 1162 | "source": [ 1163 | "bbox = (135, 140, -31, -27)\n", 1164 | "plt.figure(figsize=(10,10))\n", 1165 | "\n", 1166 | "for endpt in endpts[:15]:\n", 1167 | " try:\n", 1168 | " lon, lat, bs, t = get_data(endpt, bbox)\n", 1169 | "\n", 1170 | " plt.imshow(bs, extent=bbox, cmap='gist_earth', origin='upper')\n", 1171 | " plt.clim(vmin=-2, vmax=100)\n", 1172 | "\n", 1173 | " plt.tick_params(labelsize=14)\n", 1174 | " plt.xlabel('longitude (degrees)', fontsize=14)\n", 1175 | " plt.ylabel('latitude (degrees)', fontsize=14)\n", 1176 | "\n", 1177 | " plt.title(\"Date: \"+str(t), fontsize=16, weight='bold')\n", 1178 | " plt.savefig(endpt.name+\".png\")\n", 1179 | " plt.cla()\n", 1180 | " except:\n", 1181 | " pass\n", 1182 | "\n", 1183 | "plt.close()" 1184 | ] 1185 | }, 1186 | { 1187 | "cell_type": "markdown", 1188 | "metadata": {}, 1189 | "source": [ 1190 | "### Convert the series of PNG files above into a GIF\n", 1191 | "Uses Imagemagick convert on the command line" 1192 | ] 1193 | }, 1194 | { 1195 | "cell_type": "code", 1196 | "execution_count": null, 1197 | "metadata": { 1198 | "collapsed": false 1199 | }, 1200 | "outputs": [], 1201 | "source": [ 1202 | "# Run the command line conversion\n", 1203 | "! convert -fuzz 50% -layers optimize -delay 50 -loop 0 *.png animated.gif\n", 1204 | "# Delete the individial PNG files after creation of the GIF\n", 1205 | "! rm *.png" 1206 | ] 1207 | }, 1208 | { 1209 | "cell_type": "markdown", 1210 | "metadata": {}, 1211 | "source": [ 1212 | "### Show the animation of the temporal imagery\n", 1213 | "" 1214 | ] 1215 | }, 1216 | { 1217 | "cell_type": "markdown", 1218 | "metadata": {}, 1219 | "source": [ 1220 | "### We can also use Siphon to extract a single point by creating another function" 1221 | ] 1222 | }, 1223 | { 1224 | "cell_type": "code", 1225 | "execution_count": null, 1226 | "metadata": { 1227 | "collapsed": true 1228 | }, 1229 | "outputs": [], 1230 | "source": [ 1231 | "def get_point(dataset, lat, lon):\n", 1232 | " nc = ncss.NCSS(dataset.access_urls['NetcdfSubset'])\n", 1233 | " query = nc.query()\n", 1234 | " query.lonlat_point(lon, lat)\n", 1235 | " query.variables('bs')\n", 1236 | " \n", 1237 | " data = nc.get_data(query)\n", 1238 | " bs = data['bs'][0]\n", 1239 | " date = data['date'][0]\n", 1240 | " \n", 1241 | " return bs, date" 1242 | ] 1243 | }, 1244 | { 1245 | "cell_type": "markdown", 1246 | "metadata": {}, 1247 | "source": [ 1248 | "### Test this function on a point" 1249 | ] 1250 | }, 1251 | { 1252 | "cell_type": "code", 1253 | "execution_count": null, 1254 | "metadata": { 1255 | "collapsed": false 1256 | }, 1257 | "outputs": [], 1258 | "source": [ 1259 | "bs, date = get_point(endpts[4], -27.75, 137)\n", 1260 | "print bs, date" 1261 | ] 1262 | }, 1263 | { 1264 | "cell_type": "markdown", 1265 | "metadata": {}, 1266 | "source": [ 1267 | "### Time series example\n", 1268 | " - We use our function to drill the data at the point for each timestep\n", 1269 | " - Then we can show the timeseries on a plot" 1270 | ] 1271 | }, 1272 | { 1273 | "cell_type": "code", 1274 | "execution_count": null, 1275 | "metadata": { 1276 | "collapsed": true 1277 | }, 1278 | "outputs": [], 1279 | "source": [ 1280 | "data = []\n", 1281 | "for endpt in endpts[::20]:\n", 1282 | " bs, date = get_point(endpt, -27.75, 137)\n", 1283 | " data.append([date, bs])" 1284 | ] 1285 | }, 1286 | { 1287 | "cell_type": "code", 1288 | "execution_count": null, 1289 | "metadata": { 1290 | "collapsed": false 1291 | }, 1292 | "outputs": [], 1293 | "source": [ 1294 | "sortOrder = numpy.argsort(numpy.array(data)[:,0])\n", 1295 | "BS = numpy.array(data)[sortOrder,1]\n", 1296 | "Date = numpy.array(data)[sortOrder,0]\n", 1297 | "\n", 1298 | "plt.figure(figsize=(12,6))\n", 1299 | "plt.plot(Date, BS, '-o', linewidth=2, markersize=8)\n", 1300 | "\n", 1301 | "plt.tick_params(labelsize=14)\n", 1302 | "plt.xlabel('date', fontsize=14)\n", 1303 | "plt.ylabel('fractional cover of bare soil (%)', fontsize=14)\n", 1304 | "plt.title('Lat, Lon: -27.75, 137', fontsize=16)" 1305 | ] 1306 | }, 1307 | { 1308 | "cell_type": "code", 1309 | "execution_count": null, 1310 | "metadata": { 1311 | "collapsed": true 1312 | }, 1313 | "outputs": [], 1314 | "source": [] 1315 | } 1316 | ], 1317 | "metadata": { 1318 | "kernelspec": { 1319 | "display_name": "Python 2", 1320 | "language": "python", 1321 | "name": "python2" 1322 | }, 1323 | "language_info": { 1324 | "codemirror_mode": { 1325 | "name": "ipython", 1326 | "version": 2 1327 | }, 1328 | "file_extension": ".py", 1329 | "mimetype": "text/x-python", 1330 | "name": "python", 1331 | "nbconvert_exporter": "python", 1332 | "pygments_lexer": "ipython2", 1333 | "version": "2.7.12" 1334 | } 1335 | }, 1336 | "nbformat": 4, 1337 | "nbformat_minor": 0 1338 | } 1339 | --------------------------------------------------------------------------------