├── .gitignore ├── CHANGES.txt ├── Dockerfile ├── LICENSE ├── README.md ├── docker-compose.yml ├── notebooks ├── cc-tutorial1_clustering.ipynb ├── cluster_gridsearch.ipynb ├── ethiopia-rf.ipynb ├── final_clusters.ipynb ├── fit_LSTM.ipynb ├── fit_LSTM_labeled.ipynb ├── fit_lstm_test.ipynb ├── rukwa-classified.ipynb └── satts-tutorial1_clustering.ipynb ├── requirements.txt ├── rukwa-mask.py ├── satts-tutorial1_clustering.ipynb ├── satts ├── __init__.py ├── tsclust.py ├── tsmask.py ├── tspredict.py ├── tstrain.py └── version.py └── setup.py /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | .idea 3 | -------------------------------------------------------------------------------- /CHANGES.txt: -------------------------------------------------------------------------------- 1 | 0.1.0: 2 | - initial release 3 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM developmentseed/geolambda:latest 2 | 3 | RUN \ 4 | yum makecache fast; 5 | 6 | RUN pip3 install --upgrade pip 7 | RUN pip3 install cython 8 | RUN pip3 install pyyaml h5py 9 | 10 | ENV \ 11 | PYCURL_SSL_LIBRARY=nss 12 | 13 | # install requirements 14 | WORKDIR /build 15 | COPY requirements*txt /build/ 16 | RUN \ 17 | pip3 install -r requirements.txt; 18 | #pip3 install -r requirements-dev.txt 19 | 20 | # Jupyter and Tensorboard ports 21 | EXPOSE 8888 6006 22 | 23 | # Store notebooks in this mounted directory 24 | VOLUME /notebooks 25 | 26 | CMD ["/run_jupyter.sh"] 27 | 28 | # install app 29 | COPY . /build 30 | RUN \ 31 | pip3 install . -v; \ 32 | rm -rf /build/*; 33 | 34 | WORKDIR /home/geolambda 35 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2018 Development Seed 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy 4 | of this software and associated documentation files (the "Software"), to deal 5 | in the Software without restriction, including without limitation the rights 6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 7 | copies of the Software, and to permit persons to whom the Software is 8 | furnished to do so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in all 11 | copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 19 | SOFTWARE. 20 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Pixel-level clustering and classification of multi-spectral, multi-temporal earth observation data 2 | 3 | This library contains classes and functions to generate datasets corresponding to spatial features from a time-series of satellite images. The impetus for this project was to develop an easy to use, high-level interface to numerous Python modules for the clustering and classification of land cover/land use (LULC) types, with an initial focus on classifying individual crop types in challenging geographies using a time-series of multi-spectral earth observatoin (EO) images. The use of a time-series of EO images better captures the dynamic nature of the appearance of crops and other LULC classes through a growing season, enabling more accurate model predictions. The functions and methods provided in this library can be used to generate EO reflectance time-series datasets and models for arbitraty vector data, e.g. points or polygons. 4 | 5 | ## Using this library 6 | The library is divided in to several components: 7 | 8 | 9 | 1. `tsmask`: provides functions to create a masked numpy arrays corresponding to areas of interest, as well as a `BandTimeSeries` object initialized using the maked array. Specific functions and objects include: 10 | 11 | - `raserize` utilizes the `osgeo` library and the underlying `gdal` functionaility to rasterize vector features from a shapefile and output a .tif file sharing the relevant metadata and dimensions as the reference image from which it was created. A `check_rasterize` function is also provided to confirm that the features were correclty "buned" into the raster layer. The resulting image can be characterized as a land cover "mask". 12 | 13 | - `mask_to_array` generates a 3D numpy array from the output of `rasterize`. Each element of the 3D array is a 2D array representing band reflectance values for a given date. Values in the 3D array that are not no-data values correspond to a land cover class burned in using `rasterize`. 14 | 15 | - `BandTimeSeries` objects contain information about time-series' of reflectance values for samples in a given land cover class, and methods to operate on and format the reflectance time-series. `BandTimeSeries` objects are initialized using an output from the `mask_to_array` function, along with arguments specifying the land cover class of the object, and the variable (band) name of the reflectance time-series. The `time_series_data_frame` method allows for interpolation of the time-series. 16 | 17 | 18 | 2. `tsclust`: provides a `TimeSeriesSample` class that is useful for generating a dataset from all or a subset of data contained in a `BandTimeSeries` and formating it for direct use in the functions and classes provided in the [`tslearn`](https://tslearn.readthedocs.io/en/latest/) library. 19 | 20 | - `TimeSeriesSample` take n_samples of the data in a `BandTimeSeries` and optionally smooth the time-series' using a Savgol signal smoothing. The `ts_dataset` method generates an object that can be used directly in the time series clustering and classification algorithms provided in the `tslearn` library. 21 | 22 | - `cluster_time_series` performs either `GlobalAlignmentKernelKMeans` or `TimeSeriesKMeans` (both from the `tslearn` library) on a `TimeSeriesSample` object. The user specifies the number of clusters as well as the distance metric used if the clustering algorithm is `TimeSeriesKMeans` (dynamic time warping or soft dynamic time warping). Sillhouette scores computed on the resulting clusters can optionally be returned. Alternative sets of hyperparamters for `cluster_times_series` can be tested using the `cluster_grid_search` function. 23 | 24 | - `cluster_mean_quantiles` and `plot_clusters` provide methods for inspecting and visualizing cluster results. 25 | 26 | 3. `tstrain` provides functions for extracting training datasets comprising time-series' of band reflectance values at known locations (x,y numpy array indices) from satelite scenes. 27 | 28 | - `random_ts_samples` takes n_samples from .csv files containging reflectance time-series data for a given land cover class. 29 | 30 | - `get_training_data` reads satellite scenes, e.g. scense corresponding to an areo of interest specified with [`sat-search`](https://github.com/sat-utils/sat-search) and download and saved using the default direcorty structure of`sat-search load`, into numpy arrays using functionaility from [`gippy`](https://gippy.readthedocs.io/en/latest/). The output is a long-form `pandas` dataframe with colums for date, feature (band-value), band reflectance value, the 2d array index, and a label corresponding to a samples land cover class. 31 | 32 | - `format_training_data` takes the ouput of `get_training_data` and reshapes it into a 3D numpy array of shape (n_samples, n_timesteps, n_features) suitable for use in a `Keras` Sequential model. Both x and y (optionally one-hot encoded) are returned. 33 | 34 | ## Examples 35 | 36 | Coming soon: Two jupyter notebook tutorials showcasing the functionality in this library 37 | 38 | 39 | -------------------------------------------------------------------------------- /docker-compose.yml: -------------------------------------------------------------------------------- 1 | version: '2' 2 | 3 | services: 4 | 5 | base: 6 | build: 7 | context: . 8 | image: 'temporal-crop-classification:latest' 9 | entrypoint: /bin/bash 10 | #env_file: .env 11 | volumes: 12 | - '.:/home/geolambda/work' 13 | 14 | test: 15 | image: 'developmentseed/temporal-crop-classification:latest' 16 | entrypoint: bash -c 'pytest test/' 17 | #env_file: .env 18 | volumes: 19 | - './test:/home/geolambda/test' 20 | -------------------------------------------------------------------------------- /notebooks/cluster_gridsearch.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 7, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import tsclust\n", 10 | "import pandas as pd" 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 8, 16 | "metadata": {}, 17 | "outputs": [], 18 | "source": [ 19 | "cropdf = pd.read_csv('/Users/jameysmith/Documents/sentinel2_tanz/aoiTS/lc_ndvi_ts/crop_ndvi_interp.csv')\n", 20 | "cropdf = cropdf.rename(columns={\"array_ind\": \"array_index\"})" 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": 13, 26 | "metadata": {}, 27 | "outputs": [], 28 | "source": [ 29 | "# Number of unique pixels (time-series) is 83,403. This is the max 'n_samples' value\n", 30 | "pg = {\n", 31 | " 'time_seriesdf': [cropdf],\n", 32 | " 'n_samples': [10],\n", 33 | " 'cluster_alg': ['GAKM', 'TSKM'],\n", 34 | " 'n_clusters': list(range(2, 8)),\n", 35 | " 'smooth': [True],\n", 36 | " 'ts_var': ['ndvi'],\n", 37 | " 'window': [7],\n", 38 | " 'poly': [3],\n", 39 | " 'cluster_metric': ['dtw', 'softdtw'],\n", 40 | " 'score': [True]\n", 41 | "}" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 14, 47 | "metadata": { 48 | "scrolled": true 49 | }, 50 | "outputs": [ 51 | { 52 | "name": "stdout", 53 | "output_type": "stream", 54 | "text": [ 55 | "11.779 --> 10.628 --> 10.628 --> \n", 56 | "11.779 --> 10.628 --> 10.628 --> \n", 57 | "11.128 --> 9.907 --> 9.907 --> \n", 58 | "11.128 --> 9.907 --> 9.907 --> \n", 59 | "8.157 --> 7.027 --> 7.027 --> \n", 60 | "8.157 --> 7.027 --> 7.027 --> \n", 61 | "6.856 --> 5.577 --> 5.577 --> \n", 62 | "6.856 --> 5.577 --> 5.577 --> \n", 63 | "Resumed because of empty cluster\n", 64 | "Resumed because of empty cluster\n", 65 | "6.134 --> 4.177 --> 3.291 --> 3.291 --> \n", 66 | "Resumed because of empty cluster\n", 67 | "Resumed because of empty cluster\n", 68 | "6.134 --> 4.177 --> 3.291 --> 3.291 --> \n", 69 | "Resumed because of empty cluster\n", 70 | "Resumed because of empty cluster\n", 71 | "Resumed because of empty cluster\n", 72 | "Resumed because of empty cluster\n", 73 | "Resumed because of empty cluster\n", 74 | "Resumed because of empty cluster\n", 75 | "Resumed because of empty cluster\n", 76 | "Resumed because of empty cluster\n", 77 | "Resumed because of empty cluster\n", 78 | "Resumed because of empty cluster\n", 79 | "Resumed because of empty cluster\n", 80 | "Resumed because of empty cluster\n", 81 | "Resumed because of empty cluster\n", 82 | "Resumed because of empty cluster\n", 83 | "Resumed because of empty cluster\n", 84 | "Resumed because of empty cluster\n", 85 | "Resumed because of empty cluster\n", 86 | "Resumed because of empty cluster\n", 87 | "Resumed because of empty cluster\n", 88 | "Resumed because of empty cluster\n", 89 | "0.211 --> 0.064 --> 0.063 --> 0.063 --> \n", 90 | "17261.967 --> 17411.115 --> 17411.132 --> 17411.132 --> 17411.132 --> \n", 91 | "0.078 --> 0.038 --> 0.038 --> \n", 92 | "17385.340 --> 17438.902 --> 17439.120 --> 17439.126 --> 17439.127 --> 17439.127 --> 17439.127 --> \n", 93 | "0.076 --> 0.027 --> 0.022 --> 0.022 --> \n", 94 | "17443.264 --> 17487.108 --> 17487.176 --> 17487.178 --> 17487.178 --> \n", 95 | "0.056 --> 0.023 --> 0.018 --> 0.018 --> \n", 96 | "17468.368 --> 17507.731 --> 17507.799 --> 17507.800 --> 17507.800 --> \n", 97 | "0.030 --> 0.016 --> 0.016 --> \n", 98 | "17486.779 --> 17516.053 --> 17516.110 --> 17516.111 --> 17516.111 --> \n", 99 | "0.018 --> 0.007 --> 0.007 --> \n", 100 | "17497.761 --> 17525.153 --> 17525.212 --> 17525.213 --> 17525.213 --> \n" 101 | ] 102 | } 103 | ], 104 | "source": [ 105 | "# Grid search on crop land cover class\n", 106 | " " 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": 17, 112 | "metadata": {}, 113 | "outputs": [], 114 | "source": [ 115 | "# Get cluster dataframe corresponding to parameter combination with largest silhouette score\n", 116 | "lowscore = pg_dict['clusters'][pg_df['sil_score'].idxmax()]" 117 | ] 118 | } 119 | ], 120 | "metadata": { 121 | "kernelspec": { 122 | "display_name": "Python 3", 123 | "language": "python", 124 | "name": "python3" 125 | }, 126 | "language_info": { 127 | "codemirror_mode": { 128 | "name": "ipython", 129 | "version": 3 130 | }, 131 | "file_extension": ".py", 132 | "mimetype": "text/x-python", 133 | "name": "python", 134 | "nbconvert_exporter": "python", 135 | "pygments_lexer": "ipython3", 136 | "version": "3.6.5" 137 | } 138 | }, 139 | "nbformat": 4, 140 | "nbformat_minor": 2 141 | } 142 | -------------------------------------------------------------------------------- /notebooks/final_clusters.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "This is markdown" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": null, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "import tsclust\n", 17 | "import pandas as pd" 18 | ] 19 | }, 20 | { 21 | "cell_type": "code", 22 | "execution_count": null, 23 | "metadata": {}, 24 | "outputs": [], 25 | "source": [ 26 | "cropdf = pd.read_csv('/home/ec2-user/crop_ndvi_interp.csv')\n", 27 | "cropdf = cropdf.rename(columns={\"array_ind\": \"array_index\"})" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 3, 33 | "metadata": {}, 34 | "outputs": [], 35 | "source": [ 36 | "cropts = tsclust.TimeSeriesSample(cropdf, n_samples=10000, ts_var='ndvi', seed=0).smooth()" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 4, 42 | "metadata": {}, 43 | "outputs": [ 44 | { 45 | "name": "stdout", 46 | "output_type": "stream", 47 | "text": [ 48 | "0.148 --> 0.057 --> 0.055 --> 0.054 --> 0.053 --> 0.053 --> 0.053 --> 0.053 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> 0.052 --> \n", 49 | "0.134 --> 0.053 --> 0.049 --> 0.048 --> 0.048 --> 0.048 --> 0.048 --> 0.048 --> 0.048 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> 0.047 --> \n", 50 | "0.093 --> 0.047 --> 0.046 --> 0.045 --> 0.044 --> 0.044 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> 0.043 --> \n" 51 | ] 52 | } 53 | ], 54 | "source": [ 55 | "clust4 = tsclust.cluster_time_series(cropts, cluster_alg='TSKM', n_clusters=4, cluster_metric='dtw')\n", 56 | "clust5 = tsclust.cluster_time_series(cropts, cluster_alg='TSKM', n_clusters=5, cluster_metric='dtw')\n", 57 | "clust6 = tsclust.cluster_time_series(cropts, cluster_alg='TSKM', n_clusters=6, cluster_metric='dtw')" 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": 5, 63 | "metadata": {}, 64 | "outputs": [], 65 | "source": [ 66 | "clust4.to_csv('/home/ec2-user/final_clusters/4_clusters.csv', index=False)\n", 67 | "clust5.to_csv('/home/ec2-user/final_clusters/5_clusters.csv', index=False)\n", 68 | "clust6.to_csv('/home/ec2-user/final_clusters/6_clusters.csv', index=False)" 69 | ] 70 | } 71 | ], 72 | "metadata": { 73 | "kernelspec": { 74 | "display_name": "Python 3", 75 | "language": "python", 76 | "name": "python3" 77 | }, 78 | "language_info": { 79 | "codemirror_mode": { 80 | "name": "ipython", 81 | "version": 3 82 | }, 83 | "file_extension": ".py", 84 | "mimetype": "text/x-python", 85 | "name": "python", 86 | "nbconvert_exporter": "python", 87 | "pygments_lexer": "ipython3", 88 | "version": "3.6.5" 89 | } 90 | }, 91 | "nbformat": 4, 92 | "nbformat_minor": 2 93 | } 94 | -------------------------------------------------------------------------------- /notebooks/fit_LSTM.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [ 8 | { 9 | "name": "stderr", 10 | "output_type": "stream", 11 | "text": [ 12 | "Using TensorFlow backend.\n" 13 | ] 14 | } 15 | ], 16 | "source": [ 17 | "import tstrain\n", 18 | "import tsclust\n", 19 | "import pandas as pd\n", 20 | "from keras.models import Sequential\n", 21 | "from keras.layers import Dense\n", 22 | "from keras.layers import LSTM" 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": 3, 28 | "metadata": {}, 29 | "outputs": [], 30 | "source": [ 31 | "### Create ndvi bands for each date in Sentinel-2 asset dataset ###\n", 32 | "\n", 33 | "# File path containing Sentinel-2 bands intersecting with AOI, organized by date \n", 34 | "#fp = '/home/ec2-user/Sentinel-2A' # <- all bands all dates\n", 35 | "fp = '/home/ec2-user/prediction_scenes/Sentinel-2A' # <- First 6 dates only; NDVI, green and blue bands \n", 36 | "\n", 37 | "# Band dictionary to match asset names with variables\n", 38 | "asset_dict = {'B02': 'blue',\n", 39 | " 'B03': 'green',\n", 40 | " 'B04': 'red',\n", 41 | " 'B08': 'nir'}\n", 42 | "\n", 43 | "# For now, create ndvi index only\n", 44 | "#indices = ['ndvi']" 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": 3, 50 | "metadata": {}, 51 | "outputs": [], 52 | "source": [ 53 | "# # Perform calculation and save images to appropriate directory\n", 54 | "# tstrain.calulate_indices(fp, asset_dict, indices)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 4, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/plain": [ 65 | "(10000, 82)" 66 | ] 67 | }, 68 | "execution_count": 4, 69 | "metadata": {}, 70 | "output_type": "execute_result" 71 | } 72 | ], 73 | "source": [ 74 | "# Clustered NDVI time-series data for cropped area; 5 clusters\n", 75 | "clust5 = pd.read_csv('/home/ec2-user/sample/5_clusters.csv')\n", 76 | "clust5.shape" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": 5, 82 | "metadata": {}, 83 | "outputs": [ 84 | { 85 | "name": "stderr", 86 | "output_type": "stream", 87 | "text": [ 88 | "/home/ec2-user/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:15: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version\n", 89 | "of pandas will change to not sort by default.\n", 90 | "\n", 91 | "To accept the future behavior, pass 'sort=True'.\n", 92 | "\n", 93 | "To retain the current behavior and silence the warning, pass sort=False\n", 94 | "\n", 95 | " from ipykernel import kernelapp as app\n" 96 | ] 97 | } 98 | ], 99 | "source": [ 100 | "# Combine samples from clustered cropped area, and other land cover classes into single dataset\n", 101 | "# for model fitting. `lcts` is file path to .csv files containing NDVI time-series' from vegetation,\n", 102 | "# urban, and water land cover classes.\n", 103 | "lcts = '/home/ec2-user/sample/land_cover_samples'\n", 104 | "\n", 105 | "# Take n_samples of each non-crop land cover class\n", 106 | "noncrop_samples = tstrain.random_ts_samples(lcts, n_samples=10000, seed=0)\n", 107 | "\n", 108 | "# Rename and drop columns to allow concatination of crop and non-crop samples\n", 109 | "clust5 = clust5.rename(columns={'cluster': 'label'})\n", 110 | "clust5 = clust5.drop(['lc'], axis=1)\n", 111 | "\n", 112 | "# Combine datasets\n", 113 | "dlist = [clust5, noncrop_samples]\n", 114 | "allsamples = pd.concat(dlist, ignore_index=True)" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": 6, 120 | "metadata": {}, 121 | "outputs": [], 122 | "source": [ 123 | "# Using raster index locations from `allsamples` (Step 1), extract band reflectance values from a time-series of \n", 124 | "# scenes contained in a directory generated using the default sat-search directory structure\n", 125 | "\n", 126 | "# Extract training data from Sentinel-2 time-series\n", 127 | "training_data = tstrain.get_training_data(fp, asset_dict, allsamples, standardize=False)" 128 | ] 129 | }, 130 | { 131 | "cell_type": "code", 132 | "execution_count": 7, 133 | "metadata": {}, 134 | "outputs": [ 135 | { 136 | "data": { 137 | "text/html": [ 138 | "
\n", 139 | "\n", 152 | "\n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | "
valuelabel
dateindfeature
2016-11-16(2428, 4479)blue0.1081veg
green0.1010veg
ndvi0.2294veg
2017-02-14(2428, 4479)blue0.0854veg
green0.0870veg
ndvi0.6892veg
2017-04-05(2428, 4479)blue0.0766veg
green0.0841veg
ndvi0.7618veg
2017-05-05(2428, 4479)blue0.0815veg
green0.0804veg
ndvi0.7272veg
2017-05-15(2428, 4479)blue0.0772veg
green0.0740veg
ndvi0.7265veg
2017-05-25(2428, 4479)blue0.0760veg
green0.0706veg
ndvi0.6695veg
2016-11-16(2429, 4441)blue0.1048veg
green0.0963veg
\n", 286 | "
" 287 | ], 288 | "text/plain": [ 289 | " value label\n", 290 | "date ind feature \n", 291 | "2016-11-16 (2428, 4479) blue 0.1081 veg\n", 292 | " green 0.1010 veg\n", 293 | " ndvi 0.2294 veg\n", 294 | "2017-02-14 (2428, 4479) blue 0.0854 veg\n", 295 | " green 0.0870 veg\n", 296 | " ndvi 0.6892 veg\n", 297 | "2017-04-05 (2428, 4479) blue 0.0766 veg\n", 298 | " green 0.0841 veg\n", 299 | " ndvi 0.7618 veg\n", 300 | "2017-05-05 (2428, 4479) blue 0.0815 veg\n", 301 | " green 0.0804 veg\n", 302 | " ndvi 0.7272 veg\n", 303 | "2017-05-15 (2428, 4479) blue 0.0772 veg\n", 304 | " green 0.0740 veg\n", 305 | " ndvi 0.7265 veg\n", 306 | "2017-05-25 (2428, 4479) blue 0.0760 veg\n", 307 | " green 0.0706 veg\n", 308 | " ndvi 0.6695 veg\n", 309 | "2016-11-16 (2429, 4441) blue 0.1048 veg\n", 310 | " green 0.0963 veg" 311 | ] 312 | }, 313 | "execution_count": 7, 314 | "metadata": {}, 315 | "output_type": "execute_result" 316 | } 317 | ], 318 | "source": [ 319 | "i = training_data.set_index(['date', 'ind', 'feature'])\n", 320 | "i.head(20)" 321 | ] 322 | }, 323 | { 324 | "cell_type": "code", 325 | "execution_count": 11, 326 | "metadata": {}, 327 | "outputs": [], 328 | "source": [ 329 | "#training_data.to_csv('/home/ec2-user/training_data.csv')\n", 330 | "#training_data.to_csv('/home/ec2-user/training_data_few_bands.csv')" 331 | ] 332 | }, 333 | { 334 | "cell_type": "code", 335 | "execution_count": 8, 336 | "metadata": {}, 337 | "outputs": [], 338 | "source": [ 339 | "# Use first 7 dates in time series (Nov 16, 2016 through June 6, 2017)\n", 340 | "dates = training_data.date.unique()\n", 341 | "datesub = dates[0:6]\n", 342 | "trainsub = training_data[training_data['date'].isin(datesub)]" 343 | ] 344 | }, 345 | { 346 | "cell_type": "code", 347 | "execution_count": 9, 348 | "metadata": {}, 349 | "outputs": [ 350 | { 351 | "data": { 352 | "text/plain": [ 353 | "array(['2016-11-16', '2017-02-14', '2017-04-05', '2017-05-05',\n", 354 | " '2017-05-15', '2017-05-25'], dtype=object)" 355 | ] 356 | }, 357 | "execution_count": 9, 358 | "metadata": {}, 359 | "output_type": "execute_result" 360 | } 361 | ], 362 | "source": [ 363 | "datesub" 364 | ] 365 | }, 366 | { 367 | "cell_type": "code", 368 | "execution_count": 10, 369 | "metadata": {}, 370 | "outputs": [], 371 | "source": [ 372 | "# Fit a LSTM recurrent neural network. In this 'toy' example, a total of 25,000 samples are used to fit a model.\n", 373 | "# including 10,000 from the clustered \"cropped\" class, and 5,000 from each of the \"water\", \"urban\" and\n", 374 | "# \"vegetation\" classes. The bands (features) include red, blue, green, and nir. Y labels are numerically\n", 375 | "# encoded, and converted to \"one-hot\" vectors.\n", 376 | "\n", 377 | "# Format training data into correct 3D array of shape (n_samples, n_timesetps, n_features) required to fit a\n", 378 | "# Keras LSTM model. N_features corresponds to number of bands included in training data\n", 379 | "\n", 380 | "class_codes, x, y = tstrain.format_training_data(trainsub)" 381 | ] 382 | }, 383 | { 384 | "cell_type": "code", 385 | "execution_count": 12, 386 | "metadata": {}, 387 | "outputs": [ 388 | { 389 | "data": { 390 | "text/plain": [ 391 | "{0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 'urban', 6: 'veg', 7: 'water'}" 392 | ] 393 | }, 394 | "execution_count": 12, 395 | "metadata": {}, 396 | "output_type": "execute_result" 397 | } 398 | ], 399 | "source": [ 400 | "class_codes" 401 | ] 402 | }, 403 | { 404 | "cell_type": "code", 405 | "execution_count": 14, 406 | "metadata": {}, 407 | "outputs": [], 408 | "source": [ 409 | "# Split training and test data\n", 410 | "x_train, x_test, y_train, y_test = tstrain.split_train_test(x, y, seed=0)" 411 | ] 412 | }, 413 | { 414 | "cell_type": "code", 415 | "execution_count": 16, 416 | "metadata": {}, 417 | "outputs": [], 418 | "source": [ 419 | "# def standardize_features(x_train, x_test):\n", 420 | "# '''Standardize features of 3D array formated for keras sequential model'''\n", 421 | "\n", 422 | "# mu = x_train.mean(axis=(0, 1))\n", 423 | "# sd = x_train.std(axis=(0, 1))\n", 424 | "\n", 425 | "# x_train_norm = (x_train - mu) / sd\n", 426 | "# x_test_norm = (x_test - mu) / sd\n", 427 | "\n", 428 | "# return mu, sd, x_train_norm, x_test_norm" 429 | ] 430 | }, 431 | { 432 | "cell_type": "code", 433 | "execution_count": 15, 434 | "metadata": {}, 435 | "outputs": [], 436 | "source": [ 437 | "# Standardize features\n", 438 | "mu, sd, x_train_norm, x_test_norm = tstrain.standardize_features(x_train, x_test)" 439 | ] 440 | }, 441 | { 442 | "cell_type": "code", 443 | "execution_count": 17, 444 | "metadata": {}, 445 | "outputs": [ 446 | { 447 | "name": "stdout", 448 | "output_type": "stream", 449 | "text": [ 450 | "Epoch 1/50\n", 451 | " - 24s - loss: 0.6634 - categorical_accuracy: 0.7589\n", 452 | "Epoch 2/50\n", 453 | " - 21s - loss: 0.4108 - categorical_accuracy: 0.8467\n", 454 | "Epoch 3/50\n", 455 | " - 21s - loss: 0.3545 - categorical_accuracy: 0.8662\n", 456 | "Epoch 4/50\n", 457 | " - 21s - loss: 0.3263 - categorical_accuracy: 0.8755\n", 458 | "Epoch 5/50\n", 459 | " - 21s - loss: 0.3030 - categorical_accuracy: 0.8853\n", 460 | "Epoch 6/50\n", 461 | " - 21s - loss: 0.2863 - categorical_accuracy: 0.8896\n", 462 | "Epoch 7/50\n", 463 | " - 21s - loss: 0.2685 - categorical_accuracy: 0.8964\n", 464 | "Epoch 8/50\n", 465 | " - 21s - loss: 0.2539 - categorical_accuracy: 0.9024\n", 466 | "Epoch 9/50\n", 467 | " - 21s - loss: 0.2416 - categorical_accuracy: 0.9057\n", 468 | "Epoch 10/50\n", 469 | " - 21s - loss: 0.2365 - categorical_accuracy: 0.9074\n", 470 | "Epoch 11/50\n", 471 | " - 21s - loss: 0.2256 - categorical_accuracy: 0.9119\n", 472 | "Epoch 12/50\n", 473 | " - 21s - loss: 0.2214 - categorical_accuracy: 0.9129\n", 474 | "Epoch 13/50\n", 475 | " - 21s - loss: 0.2125 - categorical_accuracy: 0.9166\n", 476 | "Epoch 14/50\n", 477 | " - 21s - loss: 0.2071 - categorical_accuracy: 0.9188\n", 478 | "Epoch 15/50\n", 479 | " - 21s - loss: 0.2015 - categorical_accuracy: 0.9212\n", 480 | "Epoch 16/50\n", 481 | " - 21s - loss: 0.1958 - categorical_accuracy: 0.9220\n", 482 | "Epoch 17/50\n", 483 | " - 21s - loss: 0.1936 - categorical_accuracy: 0.9231\n", 484 | "Epoch 18/50\n", 485 | " - 21s - loss: 0.1893 - categorical_accuracy: 0.9243\n", 486 | "Epoch 19/50\n", 487 | " - 21s - loss: 0.1835 - categorical_accuracy: 0.9271\n", 488 | "Epoch 20/50\n", 489 | " - 21s - loss: 0.1827 - categorical_accuracy: 0.9270\n", 490 | "Epoch 21/50\n", 491 | " - 21s - loss: 0.1773 - categorical_accuracy: 0.9295\n", 492 | "Epoch 22/50\n", 493 | " - 21s - loss: 0.1738 - categorical_accuracy: 0.9303\n", 494 | "Epoch 23/50\n", 495 | " - 21s - loss: 0.1774 - categorical_accuracy: 0.9299\n", 496 | "Epoch 24/50\n", 497 | " - 21s - loss: 0.1691 - categorical_accuracy: 0.9340\n", 498 | "Epoch 25/50\n", 499 | " - 21s - loss: 0.1660 - categorical_accuracy: 0.9329\n", 500 | "Epoch 26/50\n", 501 | " - 21s - loss: 0.1639 - categorical_accuracy: 0.9330\n", 502 | "Epoch 27/50\n", 503 | " - 21s - loss: 0.1719 - categorical_accuracy: 0.9301\n", 504 | "Epoch 28/50\n", 505 | " - 21s - loss: 0.1601 - categorical_accuracy: 0.9371\n", 506 | "Epoch 29/50\n", 507 | " - 21s - loss: 0.1558 - categorical_accuracy: 0.9364\n", 508 | "Epoch 30/50\n", 509 | " - 21s - loss: 0.1558 - categorical_accuracy: 0.9363\n", 510 | "Epoch 31/50\n", 511 | " - 21s - loss: 0.1561 - categorical_accuracy: 0.9366\n", 512 | "Epoch 32/50\n", 513 | " - 21s - loss: 0.1528 - categorical_accuracy: 0.9371\n", 514 | "Epoch 33/50\n", 515 | " - 21s - loss: 0.1497 - categorical_accuracy: 0.9393\n", 516 | "Epoch 34/50\n", 517 | " - 21s - loss: 0.1492 - categorical_accuracy: 0.9388\n", 518 | "Epoch 35/50\n", 519 | " - 21s - loss: 0.1444 - categorical_accuracy: 0.9398\n", 520 | "Epoch 36/50\n", 521 | " - 21s - loss: 0.1485 - categorical_accuracy: 0.9386\n", 522 | "Epoch 37/50\n", 523 | " - 21s - loss: 0.1478 - categorical_accuracy: 0.9393\n", 524 | "Epoch 38/50\n", 525 | " - 21s - loss: 0.1517 - categorical_accuracy: 0.9381\n", 526 | "Epoch 39/50\n", 527 | " - 21s - loss: 0.1417 - categorical_accuracy: 0.9408\n", 528 | "Epoch 40/50\n", 529 | " - 21s - loss: 0.1401 - categorical_accuracy: 0.9429\n", 530 | "Epoch 41/50\n", 531 | " - 21s - loss: 0.1365 - categorical_accuracy: 0.9454\n", 532 | "Epoch 42/50\n", 533 | " - 21s - loss: 0.1389 - categorical_accuracy: 0.9440\n", 534 | "Epoch 43/50\n", 535 | " - 21s - loss: 0.1371 - categorical_accuracy: 0.9435\n", 536 | "Epoch 44/50\n", 537 | " - 21s - loss: 0.1337 - categorical_accuracy: 0.9449\n", 538 | "Epoch 45/50\n", 539 | " - 21s - loss: 0.1329 - categorical_accuracy: 0.9466\n", 540 | "Epoch 46/50\n", 541 | " - 21s - loss: 0.1362 - categorical_accuracy: 0.9444\n", 542 | "Epoch 47/50\n", 543 | " - 21s - loss: 0.1365 - categorical_accuracy: 0.9424\n", 544 | "Epoch 48/50\n", 545 | " - 21s - loss: 0.1294 - categorical_accuracy: 0.9474\n", 546 | "Epoch 49/50\n", 547 | " - 21s - loss: 0.1316 - categorical_accuracy: 0.9459\n", 548 | "Epoch 50/50\n", 549 | " - 21s - loss: 0.1272 - categorical_accuracy: 0.9464\n" 550 | ] 551 | }, 552 | { 553 | "data": { 554 | "text/plain": [ 555 | "" 556 | ] 557 | }, 558 | "execution_count": 17, 559 | "metadata": {}, 560 | "output_type": "execute_result" 561 | } 562 | ], 563 | "source": [ 564 | "# Train LSTM model\n", 565 | "n_timesteps = len(trainsub['date'].unique())\n", 566 | "n_features = len(trainsub['feature'].unique())\n", 567 | "\n", 568 | "model = Sequential()\n", 569 | "model.add(LSTM(32, activation='relu', return_sequences=True, input_shape=(n_timesteps, n_features)))\n", 570 | "model.add(LSTM(32, activation='relu', return_sequences=True))\n", 571 | "model.add(LSTM(32, activation='relu', return_sequences=True))\n", 572 | "model.add(LSTM(32, activation='relu', return_sequences=True))\n", 573 | "model.add(LSTM(32))\n", 574 | "model.add(Dense(activation='softmax', units=y.shape[1]))\n", 575 | "model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['categorical_accuracy'])\n", 576 | "model.fit(x_train_norm, y_train, epochs=50, batch_size=32, verbose=2)" 577 | ] 578 | }, 579 | { 580 | "cell_type": "code", 581 | "execution_count": 18, 582 | "metadata": {}, 583 | "outputs": [ 584 | { 585 | "name": "stdout", 586 | "output_type": "stream", 587 | "text": [ 588 | "7191/7191 [==============================] - 2s 215us/step\n" 589 | ] 590 | }, 591 | { 592 | "data": { 593 | "text/plain": [ 594 | "0.9413155333055208" 595 | ] 596 | }, 597 | "execution_count": 18, 598 | "metadata": {}, 599 | "output_type": "execute_result" 600 | } 601 | ], 602 | "source": [ 603 | "# Model accuracy\n", 604 | "_, accuracy = model.evaluate(x_test_norm, y_test, batch_size=32)\n", 605 | "accuracy" 606 | ] 607 | }, 608 | { 609 | "cell_type": "code", 610 | "execution_count": 19, 611 | "metadata": {}, 612 | "outputs": [ 613 | { 614 | "data": { 615 | "text/html": [ 616 | "
\n", 617 | "\n", 630 | "\n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | "
01234urbanvegwaterrecall
0377333106200.893365
172325166100200.754060
218254531631200.804618
35815700000.802817
41113101235001200.715746
urban730001179100.990756
veg521081198500.991508
water000000020231.000000
\n", 744 | "
" 745 | ], 746 | "text/plain": [ 747 | " 0 1 2 3 4 urban veg water recall\n", 748 | "0 377 33 3 1 0 6 2 0 0.893365\n", 749 | "1 72 325 16 6 10 0 2 0 0.754060\n", 750 | "2 18 25 453 1 63 1 2 0 0.804618\n", 751 | "3 5 8 1 57 0 0 0 0 0.802817\n", 752 | "4 11 13 101 2 350 0 12 0 0.715746\n", 753 | "urban 7 3 0 0 0 1179 1 0 0.990756\n", 754 | "veg 5 2 1 0 8 1 1985 0 0.991508\n", 755 | "water 0 0 0 0 0 0 0 2023 1.000000" 756 | ] 757 | }, 758 | "execution_count": 19, 759 | "metadata": {}, 760 | "output_type": "execute_result" 761 | } 762 | ], 763 | "source": [ 764 | "# Confusion matrix\n", 765 | "tstrain.conf_mat(x_test_norm, y_test, model, class_codes)" 766 | ] 767 | }, 768 | { 769 | "cell_type": "code", 770 | "execution_count": 20, 771 | "metadata": {}, 772 | "outputs": [], 773 | "source": [ 774 | "# serialize model to JSON\n", 775 | "model_json = model.to_json()\n", 776 | "with open(\"/home/ec2-user/model_improved.json\", \"w\") as json_file:\n", 777 | " json_file.write(model_json)" 778 | ] 779 | }, 780 | { 781 | "cell_type": "code", 782 | "execution_count": 21, 783 | "metadata": {}, 784 | "outputs": [ 785 | { 786 | "name": "stdout", 787 | "output_type": "stream", 788 | "text": [ 789 | "Saved model to disk\n" 790 | ] 791 | } 792 | ], 793 | "source": [ 794 | "# serialize weights to HDF5\n", 795 | "model.save_weights(\"/home/ec2-user/model.h5\")\n", 796 | "print(\"Saved model to disk\")" 797 | ] 798 | }, 799 | { 800 | "cell_type": "code", 801 | "execution_count": 22, 802 | "metadata": { 803 | "scrolled": true 804 | }, 805 | "outputs": [ 806 | { 807 | "name": "stderr", 808 | "output_type": "stream", 809 | "text": [ 810 | "/home/ec2-user/anaconda3/lib/python3.6/site-packages/matplotlib/cbook/deprecation.py:107: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance. In a future version, a new instance will always be created and returned. Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.\n", 811 | " warnings.warn(message, mplDeprecation, stacklevel=1)\n" 812 | ] 813 | }, 814 | { 815 | "data": { 816 | "image/png": "\n", 817 | "text/plain": [ 818 | "
" 819 | ] 820 | }, 821 | "metadata": {}, 822 | "output_type": "display_data" 823 | } 824 | ], 825 | "source": [ 826 | "cropclusts = clust5.rename(columns={'label': 'cluster'})\n", 827 | "tsclust.plot_clusters(cropclusts, fill=False)" 828 | ] 829 | }, 830 | { 831 | "cell_type": "code", 832 | "execution_count": null, 833 | "metadata": {}, 834 | "outputs": [], 835 | "source": [] 836 | } 837 | ], 838 | "metadata": { 839 | "kernelspec": { 840 | "display_name": "Python 3", 841 | "language": "python", 842 | "name": "python3" 843 | }, 844 | "language_info": { 845 | "codemirror_mode": { 846 | "name": "ipython", 847 | "version": 3 848 | }, 849 | "file_extension": ".py", 850 | "mimetype": "text/x-python", 851 | "name": "python", 852 | "nbconvert_exporter": "python", 853 | "pygments_lexer": "ipython3", 854 | "version": "3.6.5" 855 | } 856 | }, 857 | "nbformat": 4, 858 | "nbformat_minor": 2 859 | } 860 | -------------------------------------------------------------------------------- /notebooks/fit_LSTM_labeled.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [ 8 | { 9 | "name": "stderr", 10 | "output_type": "stream", 11 | "text": [ 12 | "Using TensorFlow backend.\n" 13 | ] 14 | } 15 | ], 16 | "source": [ 17 | "import tstrain\n", 18 | "import tsclust\n", 19 | "import pandas as pd\n", 20 | "from keras.models import Sequential\n", 21 | "from keras.layers import Dense\n", 22 | "from keras.layers import LSTM" 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": 3, 28 | "metadata": {}, 29 | "outputs": [], 30 | "source": [ 31 | "# Training data - no crop labels\n", 32 | "#td = pd.read_csv('/home/ec2-user/training_data_large.csv')\n", 33 | "td = pd.read_csv('/home/ec2-user/training_data_few_bands.csv') # <- only ndvi, green and blue bands" 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 4, 39 | "metadata": {}, 40 | "outputs": [], 41 | "source": [ 42 | "td = td.drop(['Unnamed: 0'], axis=1)" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 5, 48 | "metadata": {}, 49 | "outputs": [], 50 | "source": [ 51 | "# Weird unexpexted strings\n", 52 | "td.loc[td['label'] == '0', 'label'] = 0\n", 53 | "td.loc[td['label'] == '1', 'label'] = 1\n", 54 | "td.loc[td['label'] == '2', 'label'] = 2\n", 55 | "td.loc[td['label'] == '3', 'label'] = 3\n", 56 | "td.loc[td['label'] == '4', 'label'] = 4" 57 | ] 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": 6, 62 | "metadata": {}, 63 | "outputs": [ 64 | { 65 | "data": { 66 | "text/plain": [ 67 | "array(['veg', 'water', 4, 2, 0, 1, 3, 'urban'], dtype=object)" 68 | ] 69 | }, 70 | "execution_count": 6, 71 | "metadata": {}, 72 | "output_type": "execute_result" 73 | } 74 | ], 75 | "source": [ 76 | "td.label.unique()" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": 7, 82 | "metadata": {}, 83 | "outputs": [], 84 | "source": [ 85 | "# Group clusters 2 and 4 as \"maize\"; group clusters 0 and 1 as \"crop_2\"; leave cluster 3 alone, call it \"crop_3\"\n", 86 | "td.loc[td['label'] == 2, 'label'] = \"maize\"\n", 87 | "td.loc[td['label'] == 4, 'label'] = \"maize\"\n", 88 | "td.loc[td['label'] == 0, 'label'] = \"crop_2\"\n", 89 | "td.loc[td['label'] == 1, 'label'] = \"crop_2\"\n", 90 | "td.loc[td['label'] == 3, 'label'] = \"crop_3\"" 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": 8, 96 | "metadata": {}, 97 | "outputs": [ 98 | { 99 | "data": { 100 | "text/plain": [ 101 | "array(['veg', 'water', 'maize', 'crop_2', 'crop_3', 'urban'], dtype=object)" 102 | ] 103 | }, 104 | "execution_count": 8, 105 | "metadata": {}, 106 | "output_type": "execute_result" 107 | } 108 | ], 109 | "source": [ 110 | "td.label.unique()" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": 9, 116 | "metadata": {}, 117 | "outputs": [], 118 | "source": [ 119 | "# Use first 6 dates in time series (Nov 16, 2016 through May 25, 2017)\n", 120 | "dates = td.date.unique()\n", 121 | "datesub = dates[0:6]\n", 122 | "trainsub = td[td['date'].isin(datesub)]" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": 13, 128 | "metadata": {}, 129 | "outputs": [], 130 | "source": [ 131 | "# Fit a LSTM recurrent neural network. In this 'toy' example, a total of 25,000 samples are used to fit a model.\n", 132 | "# including 10,000 from the clustered \"cropped\" class, and 5,000 from each of the \"water\", \"urban\" and\n", 133 | "# \"vegetation\" classes. The bands (features) include red, blue, green, and nir. Y labels are numerically\n", 134 | "# encoded, and converted to \"one-hot\" vectors.\n", 135 | "\n", 136 | "# Format training data into correct 3D array of shape (n_samples, n_timesetps, n_features) required to fit a\n", 137 | "# Keras LSTM model. N_features corresponds to number of bands included in training data\n", 138 | "\n", 139 | "class_codes, x, y = tstrain.format_training_data(trainsub)" 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": 15, 145 | "metadata": {}, 146 | "outputs": [], 147 | "source": [ 148 | "# Split training and test data\n", 149 | "x_train, x_test, y_train, y_test = tstrain.split_train_test(x, y, seed=0)" 150 | ] 151 | }, 152 | { 153 | "cell_type": "code", 154 | "execution_count": 16, 155 | "metadata": {}, 156 | "outputs": [], 157 | "source": [ 158 | "# Standardize features\n", 159 | "mu, sd, x_train_norm, x_test_norm = tstrain.standardize_features(x_train, x_test)" 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": 17, 165 | "metadata": {}, 166 | "outputs": [], 167 | "source": [ 168 | "import numpy as np\n", 169 | "np.save('/home/ec2-user/mu.npy', mu)\n", 170 | "np.save('/home/ec2-user/sd.npy', sd)" 171 | ] 172 | }, 173 | { 174 | "cell_type": "code", 175 | "execution_count": 19, 176 | "metadata": {}, 177 | "outputs": [ 178 | { 179 | "name": "stdout", 180 | "output_type": "stream", 181 | "text": [ 182 | "Epoch 1/50\n", 183 | " - 25s - loss: 0.5378 - categorical_accuracy: 0.7764\n", 184 | "Epoch 2/50\n", 185 | " - 22s - loss: 0.3194 - categorical_accuracy: 0.8797\n", 186 | "Epoch 3/50\n", 187 | " - 22s - loss: 0.2674 - categorical_accuracy: 0.8996\n", 188 | "Epoch 4/50\n", 189 | " - 22s - loss: 0.2329 - categorical_accuracy: 0.9137\n", 190 | "Epoch 5/50\n", 191 | " - 22s - loss: 0.2193 - categorical_accuracy: 0.9179\n", 192 | "Epoch 6/50\n", 193 | " - 22s - loss: 0.1960 - categorical_accuracy: 0.9267\n", 194 | "Epoch 7/50\n", 195 | " - 22s - loss: 0.1796 - categorical_accuracy: 0.9318\n", 196 | "Epoch 8/50\n", 197 | " - 22s - loss: 0.1682 - categorical_accuracy: 0.9360\n", 198 | "Epoch 9/50\n", 199 | " - 22s - loss: 0.1584 - categorical_accuracy: 0.9412\n", 200 | "Epoch 10/50\n", 201 | " - 22s - loss: 0.1488 - categorical_accuracy: 0.9443\n", 202 | "Epoch 11/50\n", 203 | " - 22s - loss: 0.1448 - categorical_accuracy: 0.9466\n", 204 | "Epoch 12/50\n", 205 | " - 22s - loss: 0.1380 - categorical_accuracy: 0.9479\n", 206 | "Epoch 13/50\n", 207 | " - 22s - loss: 0.1282 - categorical_accuracy: 0.9517\n", 208 | "Epoch 14/50\n", 209 | " - 22s - loss: 0.1251 - categorical_accuracy: 0.9535\n", 210 | "Epoch 15/50\n", 211 | " - 22s - loss: 0.1187 - categorical_accuracy: 0.9553\n", 212 | "Epoch 16/50\n", 213 | " - 22s - loss: 0.1170 - categorical_accuracy: 0.9573\n", 214 | "Epoch 17/50\n", 215 | " - 22s - loss: 0.1099 - categorical_accuracy: 0.9582\n", 216 | "Epoch 18/50\n", 217 | " - 22s - loss: 0.1082 - categorical_accuracy: 0.9604\n", 218 | "Epoch 19/50\n", 219 | " - 22s - loss: 0.1025 - categorical_accuracy: 0.9614\n", 220 | "Epoch 20/50\n", 221 | " - 22s - loss: 0.1019 - categorical_accuracy: 0.9612\n", 222 | "Epoch 21/50\n", 223 | " - 22s - loss: 0.0971 - categorical_accuracy: 0.9631\n", 224 | "Epoch 22/50\n", 225 | " - 22s - loss: 0.0940 - categorical_accuracy: 0.9640\n", 226 | "Epoch 23/50\n", 227 | " - 22s - loss: 0.0950 - categorical_accuracy: 0.9652\n", 228 | "Epoch 24/50\n", 229 | " - 22s - loss: 0.0906 - categorical_accuracy: 0.9654\n", 230 | "Epoch 25/50\n", 231 | " - 22s - loss: 0.0910 - categorical_accuracy: 0.9656\n", 232 | "Epoch 26/50\n", 233 | " - 22s - loss: 0.0865 - categorical_accuracy: 0.9669\n", 234 | "Epoch 27/50\n", 235 | " - 22s - loss: 0.0843 - categorical_accuracy: 0.9679\n", 236 | "Epoch 28/50\n", 237 | " - 22s - loss: 0.0843 - categorical_accuracy: 0.9681\n", 238 | "Epoch 29/50\n", 239 | " - 22s - loss: 0.0842 - categorical_accuracy: 0.9676\n", 240 | "Epoch 30/50\n", 241 | " - 22s - loss: 0.0774 - categorical_accuracy: 0.9704\n", 242 | "Epoch 31/50\n", 243 | " - 22s - loss: 0.0802 - categorical_accuracy: 0.9699\n", 244 | "Epoch 32/50\n", 245 | " - 22s - loss: 0.0759 - categorical_accuracy: 0.9712\n", 246 | "Epoch 33/50\n", 247 | " - 22s - loss: 0.0779 - categorical_accuracy: 0.9707\n", 248 | "Epoch 34/50\n", 249 | " - 21s - loss: 0.0770 - categorical_accuracy: 0.9706\n", 250 | "Epoch 35/50\n", 251 | " - 22s - loss: 0.0753 - categorical_accuracy: 0.9717\n", 252 | "Epoch 36/50\n", 253 | " - 22s - loss: 0.0717 - categorical_accuracy: 0.9719\n", 254 | "Epoch 37/50\n", 255 | " - 22s - loss: 0.0724 - categorical_accuracy: 0.9734\n", 256 | "Epoch 38/50\n", 257 | " - 22s - loss: 0.0705 - categorical_accuracy: 0.9732\n", 258 | "Epoch 39/50\n", 259 | " - 22s - loss: 0.0685 - categorical_accuracy: 0.9736\n", 260 | "Epoch 40/50\n", 261 | " - 22s - loss: 0.0693 - categorical_accuracy: 0.9738\n", 262 | "Epoch 41/50\n", 263 | " - 22s - loss: 0.0665 - categorical_accuracy: 0.9736\n", 264 | "Epoch 42/50\n", 265 | " - 22s - loss: 0.0675 - categorical_accuracy: 0.9737\n", 266 | "Epoch 43/50\n", 267 | " - 22s - loss: 0.0636 - categorical_accuracy: 0.9757\n", 268 | "Epoch 44/50\n", 269 | " - 22s - loss: 0.0669 - categorical_accuracy: 0.9738\n", 270 | "Epoch 45/50\n", 271 | " - 22s - loss: 0.0622 - categorical_accuracy: 0.9759\n", 272 | "Epoch 46/50\n", 273 | " - 21s - loss: 0.0643 - categorical_accuracy: 0.9750\n", 274 | "Epoch 47/50\n", 275 | " - 21s - loss: 0.0648 - categorical_accuracy: 0.9756\n", 276 | "Epoch 48/50\n", 277 | " - 21s - loss: 0.0638 - categorical_accuracy: 0.9755\n", 278 | "Epoch 49/50\n", 279 | " - 22s - loss: 0.0647 - categorical_accuracy: 0.9753\n", 280 | "Epoch 50/50\n", 281 | " - 22s - loss: 0.0578 - categorical_accuracy: 0.9774\n" 282 | ] 283 | }, 284 | { 285 | "data": { 286 | "text/plain": [ 287 | "" 288 | ] 289 | }, 290 | "execution_count": 19, 291 | "metadata": {}, 292 | "output_type": "execute_result" 293 | } 294 | ], 295 | "source": [ 296 | "# Train LSTM model\n", 297 | "n_timesteps = len(trainsub['date'].unique())\n", 298 | "n_features = len(trainsub['feature'].unique())\n", 299 | "\n", 300 | "model = Sequential()\n", 301 | "model.add(LSTM(32, activation='relu', return_sequences=True, input_shape=(n_timesteps, n_features)))\n", 302 | "model.add(LSTM(32, activation='relu', return_sequences=True))\n", 303 | "model.add(LSTM(32, activation='relu', return_sequences=True))\n", 304 | "model.add(LSTM(32, activation='relu', return_sequences=True))\n", 305 | "model.add(LSTM(32))\n", 306 | "model.add(Dense(activation='softmax', units=y.shape[1]))\n", 307 | "model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['categorical_accuracy'])\n", 308 | "model.fit(x_train_norm, y_train, epochs=50, batch_size=32, verbose=2)" 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": 20, 314 | "metadata": {}, 315 | "outputs": [ 316 | { 317 | "name": "stdout", 318 | "output_type": "stream", 319 | "text": [ 320 | "7191/7191 [==============================] - 2s 217us/step\n" 321 | ] 322 | }, 323 | { 324 | "data": { 325 | "text/plain": [ 326 | "0.9732999582811848" 327 | ] 328 | }, 329 | "execution_count": 20, 330 | "metadata": {}, 331 | "output_type": "execute_result" 332 | } 333 | ], 334 | "source": [ 335 | "# Model accuracy\n", 336 | "_, accuracy = model.evaluate(x_test_norm, y_test, batch_size=32)\n", 337 | "accuracy" 338 | ] 339 | }, 340 | { 341 | "cell_type": "code", 342 | "execution_count": 21, 343 | "metadata": {}, 344 | "outputs": [ 345 | { 346 | "data": { 347 | "text/html": [ 348 | "
\n", 349 | "\n", 362 | "\n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | "
crop_2crop_3maizeurbanvegwaterrecall
crop_279873161100.935522
crop_3115910000.830986
maize51597801800.929658
urban11011177100.989076
veg150212196400.981019
water0000020231.000000
\n", 438 | "
" 439 | ], 440 | "text/plain": [ 441 | " crop_2 crop_3 maize urban veg water recall\n", 442 | "crop_2 798 7 31 6 11 0 0.935522\n", 443 | "crop_3 11 59 1 0 0 0 0.830986\n", 444 | "maize 51 5 978 0 18 0 0.929658\n", 445 | "urban 11 0 1 1177 1 0 0.989076\n", 446 | "veg 15 0 21 2 1964 0 0.981019\n", 447 | "water 0 0 0 0 0 2023 1.000000" 448 | ] 449 | }, 450 | "execution_count": 21, 451 | "metadata": {}, 452 | "output_type": "execute_result" 453 | } 454 | ], 455 | "source": [ 456 | "# Confusion matrix\n", 457 | "tstrain.conf_mat(x_test_norm, y_test, model, class_codes)" 458 | ] 459 | }, 460 | { 461 | "cell_type": "code", 462 | "execution_count": 22, 463 | "metadata": {}, 464 | "outputs": [], 465 | "source": [ 466 | "# serialize model to JSON\n", 467 | "model_json = model.to_json()\n", 468 | "with open(\"/home/ec2-user/model_labeled.json\", \"w\") as json_file:\n", 469 | " json_file.write(model_json)" 470 | ] 471 | }, 472 | { 473 | "cell_type": "code", 474 | "execution_count": 23, 475 | "metadata": {}, 476 | "outputs": [ 477 | { 478 | "name": "stdout", 479 | "output_type": "stream", 480 | "text": [ 481 | "Saved model to disk\n" 482 | ] 483 | } 484 | ], 485 | "source": [ 486 | "# serialize weights to HDF5\n", 487 | "model.save_weights(\"/home/ec2-user/model_labeled.h5\")\n", 488 | "print(\"Saved model to disk\")" 489 | ] 490 | }, 491 | { 492 | "cell_type": "code", 493 | "execution_count": null, 494 | "metadata": {}, 495 | "outputs": [], 496 | "source": [] 497 | } 498 | ], 499 | "metadata": { 500 | "kernelspec": { 501 | "display_name": "Python 3", 502 | "language": "python", 503 | "name": "python3" 504 | }, 505 | "language_info": { 506 | "codemirror_mode": { 507 | "name": "ipython", 508 | "version": 3 509 | }, 510 | "file_extension": ".py", 511 | "mimetype": "text/x-python", 512 | "name": "python", 513 | "nbconvert_exporter": "python", 514 | "pygments_lexer": "ipython3", 515 | "version": "3.6.5" 516 | } 517 | }, 518 | "nbformat": 4, 519 | "nbformat_minor": 2 520 | } 521 | -------------------------------------------------------------------------------- /notebooks/rukwa-classified.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [ 8 | { 9 | "name": "stderr", 10 | "output_type": "stream", 11 | "text": [ 12 | "Using TensorFlow backend.\n" 13 | ] 14 | } 15 | ], 16 | "source": [ 17 | "import tspredict\n", 18 | "import pandas as pd\n", 19 | "import numpy as np\n", 20 | "import os\n", 21 | "import time\n", 22 | "from keras.models import Sequential\n", 23 | "from keras.models import model_from_json" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "Load trained LSTM model. Model was trained on the first 6 dates in the 2017 growing season with <15% cloud cover. Features included blue, green, NDVI bands. " 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": 2, 36 | "metadata": {}, 37 | "outputs": [], 38 | "source": [ 39 | "# load json and create model\n", 40 | "json_file = open(\"/home/ec2-user/model_labeled.json\", 'r')\n", 41 | "loaded_model_json = json_file.read()\n", 42 | "json_file.close()\n", 43 | "model = model_from_json(loaded_model_json)" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": 3, 49 | "metadata": {}, 50 | "outputs": [ 51 | { 52 | "name": "stdout", 53 | "output_type": "stream", 54 | "text": [ 55 | "Loaded model from disk\n" 56 | ] 57 | } 58 | ], 59 | "source": [ 60 | "# load weights into model\n", 61 | "model.load_weights(\"/home/ec2-user/model_labeled.h5\")\n", 62 | "print(\"Loaded model from disk\")" 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": 4, 68 | "metadata": {}, 69 | "outputs": [], 70 | "source": [ 71 | "# mean and standard deviation of features from training data - used to standardize features for model prediction\n", 72 | "mu = np.load('/home/ec2-user/mu.npy')\n", 73 | "sd = np.load('/home/ec2-user/sd.npy')" 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": 5, 79 | "metadata": {}, 80 | "outputs": [], 81 | "source": [ 82 | "# File paths to Sentinel-2 tiles to predict (intersecting the Rukwa region)\n", 83 | "fp = '/home/ec2-user/sent-scenes-s3'\n", 84 | "tiles = [fp + '/' + t for t in os.listdir(fp)]" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": 6, 90 | "metadata": {}, 91 | "outputs": [ 92 | { 93 | "name": "stdout", 94 | "output_type": "stream", 95 | "text": [ 96 | "/home/ec2-user/sent-scenes-s3/T36MTS\n", 97 | "/home/ec2-user/sent-scenes-s3/T35LRL\n", 98 | "/home/ec2-user/sent-scenes-s3/T35MRM\n", 99 | "/home/ec2-user/sent-scenes-s3/T36MUT\n", 100 | "/home/ec2-user/sent-scenes-s3/T36LTR\n", 101 | "/home/ec2-user/sent-scenes-s3/T36MTT\n", 102 | "/home/ec2-user/sent-scenes-s3/T35MRN\n", 103 | "/home/ec2-user/sent-scenes-s3/T36MVS\n", 104 | "/home/ec2-user/sent-scenes-s3/T36MUS\n", 105 | "/home/ec2-user/sent-scenes-s3/T36LUR\n" 106 | ] 107 | } 108 | ], 109 | "source": [ 110 | "for tile in tiles:\n", 111 | " print(tile)" 112 | ] 113 | }, 114 | { 115 | "cell_type": "code", 116 | "execution_count": null, 117 | "metadata": {}, 118 | "outputs": [ 119 | { 120 | "name": "stdout", 121 | "output_type": "stream", 122 | "text": [ 123 | "/home/ec2-user/sent-scenes-s3/T36MTS formatted\n", 124 | "/home/ec2-user/sent-scenes-s3/T36MTS predicted\n" 125 | ] 126 | } 127 | ], 128 | "source": [ 129 | "start_time = time.time()\n", 130 | "\n", 131 | "for tile in tiles:\n", 132 | " # Reshape bands in each scene to match input shape required by Keras sequential model\n", 133 | " formatted_scene = tspredict.format_scene(tile, mu, sd)\n", 134 | " \n", 135 | " print(tile + ' formatted')\n", 136 | " \n", 137 | " # refimg can be any band from the same Sentinel-2 tile, outimg for writing pred. scene to disk\n", 138 | " band_paths = []\n", 139 | " for path, subdirs, files in os.walk(tile):\n", 140 | " for name in files:\n", 141 | " band_paths.append(os.path.join(path, name))\n", 142 | " \n", 143 | " refimg = band_paths[0]\n", 144 | " outimg = tile + '/tile_predicted.tif'\n", 145 | " \n", 146 | " # Predict the full tile\n", 147 | " predicted_tile = tspredict.classify_scene(formatted_scene=formatted_scene, model=model, \n", 148 | " refimg=refimg, outimg=outimg)\n", 149 | " print(tile + ' predicted')\n", 150 | "\n", 151 | "print(\"--- %s seconds ---\" % (time.time() - start_time))" 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": null, 157 | "metadata": {}, 158 | "outputs": [], 159 | "source": [ 160 | "# from joblib import Parallel, delayed\n", 161 | "# import multiprocessing\n", 162 | "\n", 163 | "# def predict_tile(tile):\n", 164 | "# # Reshape bands in each scene to match input shape required by Keras sequential model\n", 165 | "# formatted_scene = tspredict.format_scene(tile, mu, sd)\n", 166 | " \n", 167 | "# # refimg can be any band from the same Sentinel-2 tile, outimg for writing pred. scene to disk\n", 168 | "# band_paths = []\n", 169 | "# for path, subdirs, files in os.walk(tile):\n", 170 | "# for name in files:\n", 171 | "# band_paths.append(os.path.join(path, name))\n", 172 | " \n", 173 | "# refimg = band_paths[0]\n", 174 | "# outimg = tile + '/tile_predicted.tif'\n", 175 | " \n", 176 | "# # Predict the full tile\n", 177 | "# predicted_tile = tspredict.classify_scene(formatted_scene=formatted_scene, model=model, \n", 178 | "# refimg=refimg, outimg=outimg)\n", 179 | "\n", 180 | "# # Perform computation in parallel\n", 181 | "# #Parallel(n_jobs=-1, backend=\"multiprocessing\")(map(delayed(predict_tile), tiles))\n", 182 | "# Parallel(n_jobs=-1)(delayed(predict_tile)(tile) for tile in tiles)" 183 | ] 184 | } 185 | ], 186 | "metadata": { 187 | "kernelspec": { 188 | "display_name": "Python 3", 189 | "language": "python", 190 | "name": "python3" 191 | }, 192 | "language_info": { 193 | "codemirror_mode": { 194 | "name": "ipython", 195 | "version": 3 196 | }, 197 | "file_extension": ".py", 198 | "mimetype": "text/x-python", 199 | "name": "python", 200 | "nbconvert_exporter": "python", 201 | "pygments_lexer": "ipython3", 202 | "version": "3.6.5" 203 | } 204 | }, 205 | "nbformat": 4, 206 | "nbformat_minor": 2 207 | } 208 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | gippy~=1.0.1b1 2 | pandas~=0.23.1 3 | numpy~=1.14.5 4 | scipy~=1.1.0 5 | matplotlib~=2.2.2 6 | Cython~=0.28.2 7 | tslearn~=0.1.18.3 8 | scikit-learn~=0.19.1 9 | tensorflow 10 | keras-preprocessing 11 | keras-applications 12 | keras 13 | jupyter 14 | -------------------------------------------------------------------------------- /rukwa-mask.py: -------------------------------------------------------------------------------- 1 | from satts import tspredict 2 | from importlib import reload 3 | import fiona 4 | import rasterio.mask 5 | import matplotlib.pyplot as plt 6 | import os 7 | import numpy as np 8 | import pandas as pd 9 | 10 | 11 | fp = '/Users/jameysmith/Documents/sentinel2_tanz/LSTM-predictions-rukwa/reproj/' 12 | 13 | images = [fp + img for img in os.listdir(fp) if not img.startswith('.')] 14 | 15 | poly = "/Users/jameysmith/Documents/sentinel2_tanz/rukwa_polygon/rukwa.geojson" 16 | 17 | with fiona.open(poly, 'r') as json: 18 | features = [feature["geometry"] for feature in json] 19 | 20 | with rasterio.open(img) as src: 21 | out_image, out_transform = rasterio.mask.mask(src, features, crop=True) 22 | out_meta = src.meta.copy() 23 | 24 | t = out_image[out_image == 2] 25 | 26 | 27 | def get_area(polygon, images, label_num): 28 | 29 | with fiona.open(polygon, 'r') as json: 30 | features = [feature["geometry"] for feature in json] 31 | 32 | pixel_count = np.empty((0, len(images)), int) 33 | 34 | for image in images: 35 | 36 | with rasterio.open(image) as src: 37 | out_image, out_transform = rasterio.mask.mask(src, features, crop=True) 38 | out_meta = src.meta.copy() 39 | 40 | count = len(out_image[out_image == label_num]) 41 | pixel_count = np.append(pixel_count, count) 42 | 43 | return pixel_count 44 | 45 | test = get_area(polygon=poly, images=images, label_num=2) 46 | 47 | 48 | img = '/Users/jameysmith/Documents/sentinel2_tanz/LSTM-predictions-rukwa/reproj/ruk-clipped.tif' 49 | 50 | dataset = rasterio.open(img) 51 | preds = dataset.read() 52 | 53 | 54 | u, c = np.unique(preds, return_counts=True) 55 | 56 | classes = { 57 | 0: "crop_2", 58 | 1: "cop_3", 59 | 2: "maize", 60 | 3: "urban", 61 | 4: "veg", 62 | 5: "water" 63 | } 64 | 65 | df = pd.DataFrame(np.asarray((u, c)).T, columns=['lc_class', 'pix_count']) 66 | df['area_ha'] = df['pix_count'] * 100 67 | df['area_ha'] = df['area_ha'] / 10000 68 | 69 | df = df.replace({"lc_class": classes}) 70 | df = df.drop([6]) -------------------------------------------------------------------------------- /satts/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/developmentseed/satTS/931efe3fb8e61bfb99d1f0962365f658f0f7877b/satts/__init__.py -------------------------------------------------------------------------------- /satts/tsclust.py: -------------------------------------------------------------------------------- 1 | from tslearn.utils import to_time_series_dataset 2 | from tslearn.clustering import silhouette_score 3 | import tslearn.clustering as clust 4 | from scipy import signal 5 | import itertools 6 | import pandas as pd 7 | import numpy as np 8 | import matplotlib.pyplot as plt 9 | from gippy import GeoImage 10 | import gippy.algorithms as alg 11 | import re 12 | from os import listdir, walk 13 | 14 | 15 | def calulate_indices(filepath, asset_dict, indices): 16 | ''' Create image files for indices 17 | 18 | :param filepath (str): Full path to directory containing satellite scenes in default structure created 19 | by sat-search load --download 20 | :param asset_dict (dict): Keys = asset (band) names in scene files (e.g. 'B01', 'B02'); Values = value names 21 | corresponding to keys (e.g. 'red', 'nir') 22 | :param indices (list): Which indices to generate? Options include any index included in gippy.alg.indices 23 | 24 | :return: None (writes files to disk) 25 | ''' 26 | 27 | subdirs = [x[0] for x in walk(filepath)] 28 | subdirs = subdirs[1:len(subdirs)] 29 | 30 | for folder in subdirs: 31 | 32 | # Filepath points to folder of geotiffs of Sentinel 2 time-series of bands 4 (red) and 8 (nir) 33 | files = [folder + '/' + f for f in listdir(folder) if not f.startswith('.')] 34 | 35 | # Asset (band) names 36 | pattern = '[^_.]+(?=\.[^_.]*$)' 37 | bands = [re.search(pattern, f).group(0) for f in files] 38 | 39 | # Match band names 40 | bands = [asset_dict.get(band, band) for band in bands] 41 | 42 | img = GeoImage.open(filenames=files, bandnames=bands, nodata=0) 43 | 44 | for ind in indices: 45 | alg.indices(img, products=[ind], filename=folder + '/index_' + ind + '.tif') 46 | 47 | img = None 48 | 49 | 50 | def apply_savgol(x, value, window, poly): 51 | """ Perform Savgol signal smoothing on time-series in dataframe group object (x) 52 | 53 | Parameters 54 | ---------- 55 | x: (pd.DataFrame.groupby) Grouped dataframe object 56 | window (int): smoothing window - pass to signal.savgol_filter 'window_length' param 57 | poly (int): polynomial order used to fit samples - pass to signal.savgol_filter 'polyorder' param 58 | value (str): Name of value (variable) to smooth 59 | 60 | Returns 61 | ------- 62 | x: "Smoothed" time-series 63 | """ 64 | 65 | x[value] = signal.savgol_filter(x[value], window_length=window, polyorder=poly) 66 | 67 | return x 68 | 69 | 70 | class TimeSeriesSample: 71 | 72 | def __init__(self, time_series_df, n_samples, ts_var, seed): 73 | # Take random `n_samples of pixels from time-series dataframe 74 | self.ts_var = ts_var 75 | self.group = time_series_df.groupby(['lc', 'pixel', 'array_index']) 76 | self.arranged_group = np.arange(self.group.ngroups) 77 | 78 | # Ensure same pixels are sampled each time function is run when same `n_samples` parameter is supplied 79 | np.random.seed(seed) 80 | np.random.shuffle(self.arranged_group) 81 | 82 | # Take the random sample 83 | self.sample = time_series_df[self.group.ngroup().isin(self.arranged_group[:n_samples])] 84 | 85 | if self.sample['date'].dtype != 'O': 86 | self.sample['date'] = self.sample['date'].dt.strftime('%Y-%m-%d') 87 | 88 | self.sample_dates = self.sample['date'].unique() 89 | self.tslist = self.sample.groupby(['lc', 'pixel', 'array_index'])[self.ts_var].apply(list) 90 | self.dataset = None 91 | 92 | def smooth(self, window=7, poly=3): 93 | # Perform Savgol signal smoothing to each time-series 94 | self.sample = self.sample.groupby(['lc', 'pixel', 'array_index']).apply(apply_savgol, self.ts_var, window, poly) 95 | self.tslist = self.sample.groupby(['lc', 'pixel', 'array_index'])[self.ts_var].apply(list) 96 | return self 97 | 98 | @ property 99 | def ts_dataset(self): 100 | #tslist = self.sample.groupby(['lc', 'pixel', 'array_index'])[self.ts_var].apply(list) 101 | self.dataset = to_time_series_dataset(self.tslist) 102 | return self.dataset 103 | 104 | 105 | def cluster_time_series(ts_sample, cluster_alg, n_clusters, cluster_metric, score=False): 106 | 107 | # Dataframe to store cluster results 108 | clust_df = pd.DataFrame(ts_sample.tslist.tolist(), index=ts_sample.tslist.index).reset_index() 109 | clust_df.columns.values[3:] = ts_sample.sample_dates 110 | 111 | # Fit model 112 | if cluster_alg == "GAKM": 113 | km = clust.GlobalAlignmentKernelKMeans(n_clusters=n_clusters) 114 | 115 | if cluster_alg == "TSKM": 116 | km = clust.TimeSeriesKMeans(n_clusters=n_clusters, metric=cluster_metric) 117 | 118 | # Add predicted cluster labels to cluster results dataframe 119 | labels = km.fit_predict(ts_sample.ts_dataset) 120 | clust_df['cluster'] = labels 121 | 122 | if score: 123 | s = silhouette_score(ts_sample.ts_dataset, labels) 124 | return clust_df, s 125 | 126 | return clust_df 127 | 128 | 129 | def cluster_grid_search(parameter_grid): 130 | ''' Perform grid search on cluster_ndvi_ts parameters 131 | 132 | :param parameter_grid: (dict) parameter grid containing all parameter values to explore 133 | 134 | :return: 1) dictionary with cluster labels and silhouette scores 2) dataframe with parameter combinations 135 | and corresponding silhouette score 136 | ''' 137 | 138 | # List of all possible parameter combinations 139 | d = [] 140 | for vals in itertools.product(*parameter_grid.values()): 141 | d.append(dict(zip(parameter_grid, vals))) 142 | 143 | # Convert to data frame; use to store silhouette scores 144 | df = pd.DataFrame(d) 145 | df = df.drop(['ts_sample'], axis=1) 146 | 147 | # Perform grid search 148 | output = {'clusters': [], 'scores': []} 149 | for values in itertools.product(*parameter_grid.values()): 150 | # Run clustering function on all combinations of parameters in parameter grid 151 | clusters, score = cluster_time_series(**dict(zip(parameter_grid, values))) 152 | 153 | # 'clusters' = dataframes with cluster results; scores = silhouette scores of corresponding cluster results 154 | output['clusters'].append(clusters) 155 | output['scores'].append(score) 156 | 157 | # Add silhouette scores to dataframe 158 | df['sil_score'] = output['scores'] 159 | 160 | return output, df 161 | 162 | 163 | def cluster_mean_quantiles(df): 164 | '''Calculate mean and 10th, 90th percentile for each cluster at all dates in time series 165 | 166 | :param df: dataframe output from `cluster_ndvi_ts` 167 | 168 | :return: two dataframes: one for mean time-series per-cluster, one for quantile time-series per-cluster 169 | ''' 170 | 171 | # Columns with ndvi values 172 | cols = df.columns[3:-1] 173 | 174 | # Cluster means at each time-step 175 | m = df.groupby('cluster', as_index=False)[cols].mean().T.reset_index() 176 | m = m.iloc[1:] 177 | m.rename(columns={'index':'date'}, inplace=True) 178 | m.set_index('date', drop=True, inplace=True) 179 | m.index = pd.to_datetime(m.index) 180 | 181 | # Cluster 10th and 90th percentile at each time-step 182 | q = df.groupby('cluster', as_index=False)[cols].quantile([.1, 0.9]).T.reset_index() 183 | q.rename(columns={'index':'date'}, inplace=True) 184 | q.set_index('date', drop=True, inplace=True) 185 | q.index = pd.to_datetime(q.index) 186 | 187 | return m, q 188 | 189 | 190 | def plot_clusters(obj, index=None, fill=True, title=None, save=False, filename=None): 191 | 192 | if type(obj) is dict: 193 | cluster_df = obj['clusters'][index] 194 | else: 195 | cluster_df = obj 196 | 197 | # Get cluster means and 10th, 90th quantiles 198 | m, q = cluster_mean_quantiles(cluster_df) 199 | 200 | # Plot cluster results 201 | nclusts = len(cluster_df.cluster.unique()) 202 | color = iter(plt.cm.Set2(np.linspace(0, 1, nclusts))) 203 | 204 | fig = plt.figure(figsize=(10, 8)) 205 | cnt = 0 206 | for i in range(0, nclusts): 207 | # Plot mean time-series for each cluster 208 | c = next(color) 209 | plt.plot(m.index, m[i], 'k', color=c) 210 | 211 | # Fill 10th and 90th quantile time-series of each cluster 212 | if fill: 213 | plt.fill_between(m.index, q.iloc[:, [cnt]].values.flatten(), q.iloc[:, [cnt+1]].values.flatten(), 214 | alpha=0.5, edgecolor=c, facecolor=c) 215 | cnt += 2 216 | 217 | # Legend and title 218 | plt.legend(loc='upper left') 219 | plt.title(title) 220 | 221 | # Axis labels 222 | ax = fig.add_subplot(111) 223 | ax.set_xlabel('Date') 224 | ax.set_ylabel('NDVI') 225 | 226 | if save: 227 | pattern = '.png' 228 | if not pattern in filename: 229 | raise ValueError('File type should be .png') 230 | fig.savefig(filename) 231 | 232 | -------------------------------------------------------------------------------- /satts/tsmask.py: -------------------------------------------------------------------------------- 1 | from osgeo import ogr, gdal 2 | import gippy 3 | import pandas as pd 4 | import matplotlib.pyplot as plt 5 | import numpy as np 6 | 7 | 8 | def rasterize(shapefile, outimg, refimg, attribute): 9 | ''' Rasterize a shapefile containing land cover polygons. Shapefile should have an attribute 10 | called 'id' corresponding to unique land cover class or other label 11 | 12 | :param shapefile (str): file path to shapefile to be rasterized 13 | :param outimg (str): file path to rasterized image 14 | :param refimg (str): file path to a reference image. Used to fetch dimensions and other metadata for rasterized img 15 | :param attribute (str): name of attribute in `shapefile` to burn into raster layer 16 | 17 | :return: None (saves image to file) 18 | ''' 19 | 20 | # Open reference raster image to grab projection info and metadata 21 | img = gdal.Open(refimg, gdal.GA_ReadOnly) 22 | 23 | # Fetch dimensions of reference raster 24 | ncol = img.RasterXSize 25 | nrow = img.RasterYSize 26 | 27 | # Projection and extent of raster reference 28 | proj = img.GetProjectionRef() 29 | ext = img.GetGeoTransform() 30 | 31 | # Close reference image 32 | img = None 33 | 34 | # Create raster mask 35 | memdrive = gdal.GetDriverByName('GTiff') 36 | outrast = memdrive.Create(outimg, ncol, nrow, 1, gdal.GDT_Byte) 37 | 38 | # Set rasterized image's projection and extent to input raster's projection and extent 39 | outrast.SetProjection(proj) 40 | outrast.SetGeoTransform(ext) 41 | 42 | # Fill output band with the 0 blank (no class) label 43 | b = outrast.GetRasterBand(1) 44 | b.Fill(0) 45 | 46 | # Open the shapefile 47 | polys = ogr.Open(shapefile) 48 | layer = polys.GetLayerByIndex(0) 49 | 50 | # Rasterize the shapefile layer to new dataset 51 | status = gdal.RasterizeLayer(outrast, [1], layer, None, None, [0], ['ALL_TOUCHED=TRUE', 'ATTRIBUTE=' + attribute]) 52 | 53 | # Close rasterized dataset 54 | outrast = None 55 | 56 | 57 | def check_rasterize(rasterized_file, plot=True): 58 | '''Checks how many pixels are in each class of a rasterized image 59 | 60 | :param rasterized_file (str): File path to a rasterized image 61 | :param plot (bool): Should the result of the rasterized layer be plotted? 62 | 63 | :return: None 64 | ''' 65 | 66 | # Read rasterized image 67 | roi_ds = gdal.Open(rasterized_file, gdal.GA_ReadOnly) 68 | roi = roi_ds.GetRasterBand(1).ReadAsArray() 69 | 70 | # How many pixels are in each class? 71 | classes = np.unique(roi) 72 | 73 | # Iterate over all class labels in the ROI image, print num pixels/class 74 | for c in classes: 75 | print('Class {c} contains {n} pixels'.format(c=c, n=(roi == c).sum())) 76 | 77 | if plot: 78 | plt.imshow(roi) 79 | 80 | 81 | def mask_to_array(files, dates, mask, class_num, gain, missing_vals=None): 82 | ''' Generate a 3d array of values corresponding to a time-series of image masks for a land cover class 83 | 84 | :param files (list): List of files containing the image time-series (e.g. a stack of NDVI images) 85 | :param dates (list): List of dates corresponding in the image time-series 86 | :param mask (str): File path to a land cover mask 87 | :param class_num (int): ID number of land cover class in `mask` 88 | 89 | :return: 3d array of time-series mask for a specified land cover class 90 | ''' 91 | 92 | # Grab dimensions to set empty array 93 | ts = gippy.GeoImage.open(filenames=files, bandnames=dates, nodata=0, gain=gain) 94 | 95 | nbands = ts.nbands() 96 | nrows = ts.ysize() 97 | ncols = ts.xsize() 98 | 99 | # Close connection 100 | ts = None 101 | 102 | arr = np.empty((nbands, nrows, ncols)) 103 | 104 | for band in range(0, nbands): 105 | # Open image time-series 106 | ndvi_ts = gippy.GeoImage.open(filenames=files, bandnames=dates, nodata=0, gain=gain) 107 | 108 | # Open rasterized landcover 109 | land_cover = gippy.GeoImage.open(filenames=[mask], bandnames=(['land_cover']), nodata=0) 110 | 111 | # Create land cover mask 112 | lc_mask = ndvi_ts.add_mask(land_cover['land_cover'] == class_num) 113 | 114 | # Read mask for time-step[band] into np.array 115 | lc_mask = lc_mask[band].read() 116 | 117 | # Deal with no-data values 118 | if missing_vals is not None: 119 | lc_mask[lc_mask == missing_vals] = np.nan 120 | 121 | # Append water mask np.array 122 | arr[band] = lc_mask 123 | 124 | # Close image connections 125 | ndvi_ts = None 126 | land_cover = None 127 | 128 | return arr 129 | 130 | 131 | class BandTimeSeries: 132 | """Time-series of image band values for a (masked) land cover class""" 133 | 134 | def __init__(self, mask, lc_class, ts_var, dates): 135 | """ 136 | :param mask (numpy array): 3D numpy array corresponding to masked time-series for an image band or index 137 | :param lc_class (str): name of land cover class 138 | :param ts_var (str): name of variable contained in masked time-series (e.g. 'red', 'ndvi') 139 | :param dates (list): list of dates corresponding to time-series 140 | """ 141 | self.land_cover_class = lc_class 142 | self.mask = mask 143 | self.ts_var = ts_var 144 | if len(dates) == len(mask): 145 | self.ts_dates = dates 146 | else: 147 | raise ValueError('length of dates must match number of time-steps in mask') 148 | 149 | # 2D time-series array of shape (num_timesteps, num_non-nan-pixels) 150 | mask_vals = self.mask[np.logical_not(np.isnan(self.mask))] 151 | self.ts_matrix = mask_vals.reshape((len(self.mask), int(mask_vals.shape[0] / len(self.mask)))) 152 | self.num_timesteps = self.ts_matrix.shape[0] 153 | self.num_timeseries = self.ts_matrix.shape[1] 154 | 155 | def mask_indices(self): 156 | """Get the indices of non-nan values in crop mask 157 | :return: list of length #non-nan cells with each element a tuple: (rowindex, colindex) 158 | """ 159 | w = np.argwhere(np.logical_not(np.isnan(self.mask))) 160 | wdf = pd.DataFrame(w) 161 | wsub = wdf.loc[wdf[0] == 0, [1, 2]] 162 | ind = list(zip(wsub[1], wsub[2])) 163 | 164 | return ind 165 | 166 | def time_series_dataframe(self, frequency, interpolate=True): 167 | """Create dataframe with band-value time-series for each pixel in land cover class 168 | :param interpolate (bool): Should time-series be interpolated? 169 | :param frequency (str): interpolation frequency, e.g. '1d' for daily, '5d' for 5 days 170 | :return: Dataframe with band-value time-series per-pixel/land cover class 171 | """ 172 | 173 | # Array indices (from original image) of non-nan values 174 | lc_ind = self.mask_indices() 175 | 176 | # Transpose time-series matrix of dim (# time steps, # non-nan pixels) 177 | mat_transpose = self.ts_matrix.T 178 | 179 | # Convert to dataframe, change col names to dates 180 | ts_df = pd.DataFrame(mat_transpose) 181 | ts_df.columns = self.ts_dates 182 | 183 | # append array indices as column 184 | ts_df['array_index'] = lc_ind 185 | 186 | # Create land cover value and pixel value columns 187 | ts_df['lc'] = self.land_cover_class 188 | ts_df['pixel'] = ts_df.index 189 | 190 | # Convert to long-format and sort 191 | ts_df = pd.melt(ts_df, id_vars=['lc', 'pixel', 'array_index'], var_name='date', value_name=self.ts_var) 192 | ts_df = ts_df.sort_values(['lc', 'pixel', 'date']) 193 | 194 | # Convert date column to datetime object (can be used as datetime index for interpolation) 195 | ts_df['date'] = pd.to_datetime(ts_df['date'], format="%Y-%m-%d") 196 | 197 | if interpolate: 198 | ts_df = ts_df.set_index('date').groupby(['lc', 'pixel', 'array_index']) 199 | ts_df = ts_df.resample(frequency)[self.ts_var].asfreq().interpolate(method='linear').reset_index() 200 | 201 | return ts_df -------------------------------------------------------------------------------- /satts/tspredict.py: -------------------------------------------------------------------------------- 1 | import os 2 | import gippy 3 | import numpy as np 4 | from osgeo import gdal 5 | 6 | 7 | def format_scene(file_path, mu, sd): 8 | 9 | # Folders containing band values for a given date 10 | scenes = [file_path + '/' + f for f in os.listdir(file_path) if not f.startswith('.')] 11 | scenes.sort() 12 | 13 | # Sorted to ensure the 2D arrays are placed in same order as features in the trained model 14 | all_dates = [] 15 | for s in scenes: 16 | bands = [s + '/' + b for b in os.listdir(s) if not b.startswith('.')] 17 | bands.sort() 18 | all_dates.append(bands) 19 | 20 | # Get dimensions for the final 3D input array for Keras model 21 | get_shape = gippy.GeoImage.open(filenames=[all_dates[0][0]]) 22 | 23 | n_samples = get_shape.xsize() * get_shape.ysize() 24 | n_timesteps = len(scenes) 25 | n_features = len(all_dates[0]) 26 | 27 | # Close image 28 | get_shape = None 29 | 30 | # All band values for all dates in time-series 31 | full_scene = np.empty([n_samples, n_timesteps, n_features]) 32 | for date in range(0, len(all_dates)): 33 | geoimg = gippy.GeoImage.open(filenames=all_dates[date], nodata=0, gain=0.0001) 34 | 35 | scene_vals = np.empty([n_samples, n_features]) 36 | for i in range(0, geoimg.nbands()): 37 | arr = geoimg[i].read() 38 | flat = arr.flatten() 39 | scene_vals[:, i] = flat 40 | 41 | geoimg = None 42 | 43 | full_scene[:, date, :] = scene_vals 44 | 45 | # Normalize data with mu and sd from model training data 46 | full_norm = (full_scene - mu) / sd 47 | 48 | return full_norm 49 | 50 | 51 | def classify_scene(formatted_scene, model, refimg, outimg): 52 | '''Predict land cover for full Sentinel-2 scene 53 | 54 | -> Use a band (not an index) for reference image 55 | ''' 56 | 57 | img = gdal.Open(refimg, gdal.GA_ReadOnly) 58 | 59 | # For masking no-data values 60 | arr = np.array(img.GetRasterBand(1).ReadAsArray()) 61 | 62 | # Fetch dimensions of reference raster 63 | ncol = img.RasterXSize 64 | nrow = img.RasterYSize 65 | 66 | # Projection and extent of raster reference 67 | proj = img.GetProjectionRef() 68 | ext = img.GetGeoTransform() 69 | 70 | # Close reference image 71 | img = None 72 | 73 | # Allocate memory for prediction image 74 | memdrive = gdal.GetDriverByName('GTiff') 75 | outrast = memdrive.Create(outimg, ncol, nrow, 1, gdal.GDT_Int16) 76 | 77 | # Set prediction image's projection and extent to input image projection and extent 78 | outrast.SetProjection(proj) 79 | outrast.SetGeoTransform(ext) 80 | 81 | # Model predictions 82 | preds = model.predict(formatted_scene) 83 | pred_bool = (preds > 0.5) 84 | pred_class = pred_bool.argmax(axis=1) 85 | 86 | # Reshape to match 2D image array 87 | pred_mat = pred_class.reshape(nrow, ncol) 88 | 89 | # Mask no-data values 90 | pred_mat[arr == 0.] = 9999 91 | 92 | # Fill output image with the predicted class values 93 | b = outrast.GetRasterBand(1) 94 | b.WriteArray(pred_mat) 95 | 96 | outrast = None 97 | 98 | -------------------------------------------------------------------------------- /satts/tstrain.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import gippy 3 | from gippy import GeoImage 4 | import gippy.algorithms as alg 5 | import os 6 | import re 7 | import numpy as np 8 | from keras.utils.np_utils import to_categorical 9 | from sklearn.utils import shuffle 10 | from sklearn import preprocessing 11 | from sklearn.metrics import confusion_matrix 12 | 13 | 14 | def random_ts_samples(file_path, n_samples, seed=None): 15 | ''' Sample of locations for each land cover class included in file_path 16 | 17 | :param file_path (str): Full path to directory containing .csv files with location data for land class samples 18 | :param n_samples (int): Number of samples to select from full dataset 19 | :param seed (int): Set a seed to generate same dataset repeatedly 20 | 21 | :return: pd.DataFrame with locations for n_samples of each land cover class 22 | ''' 23 | 24 | # CSV files containing time-series' to be samples 25 | ts_files = [file_path + '/' + file for file in os.listdir(file_path) if not file.startswith('.')] 26 | 27 | np.random.seed(seed) 28 | 29 | # Sample each dataframe corresponding to a land cover class, store in list 30 | dfs = [] 31 | for file in ts_files: 32 | df = pd.read_csv(file) 33 | g = df.groupby('pixel') 34 | a = np.arange(g.ngroups) 35 | np.random.shuffle(a) 36 | s = df[g.ngroup().isin(a[:n_samples])] 37 | 38 | dfs.append(s) 39 | 40 | # Convert list to single sample dataframe, convert to same shape as cluster results 41 | lc_samples = pd.concat(dfs) 42 | lc_samples = lc_samples.rename(columns={'lc': 'label', 'array_ind': 'array_index'}) 43 | lc_samples = lc_samples.pivot_table(index=['array_index', 'label', 'pixel'], columns='date', values='ndvi').reset_index() 44 | 45 | return lc_samples 46 | 47 | 48 | def calulate_indices(filepath, asset_dict, indices): 49 | ''' Create image files for indices 50 | 51 | :param filepath (str): Full path to directory containing satellite scenes in default structure created 52 | by sat-search load --download 53 | :param asset_dict (dict): Keys = asset (band) names in scene files (e.g. 'B01', 'B02'); Values = value names 54 | corresponding to keys (e.g. 'red', 'nir') 55 | :param indices (list): Which indices to generate? Options include any index included in gippy.alg.indices 56 | 57 | :return: None (writes files to disk) 58 | ''' 59 | 60 | subdirs = [x[0] for x in os.walk(filepath)] 61 | subdirs = subdirs[1:len(subdirs)] 62 | 63 | for folder in subdirs: 64 | 65 | # Filepath points to folder of geotiffs of Sentinel 2 time-series of bands 4 (red) and 8 (nir) 66 | files = [folder + '/' + f for f in os.listdir(folder) if not f.startswith('.')] 67 | 68 | # Asset (band) names 69 | pattern = '[^_.]+(?=\.[^_.]*$)' 70 | bands = [re.search(pattern, f).group(0) for f in files] 71 | 72 | # Match band names 73 | bands = [asset_dict.get(band, band) for band in bands] 74 | 75 | img = GeoImage.open(filenames=files, bandnames=bands, nodata=0) 76 | 77 | for ind in indices: 78 | alg.indices(img, products=[ind], filename=folder + '/index_' + ind + '.tif') 79 | 80 | img = None 81 | 82 | 83 | def get_training_data(asset_dir, asset_dict, samples_df, standardize=True): 84 | ''' Create a dataset of n_features (bands) at each samples location for n_timeseteps 85 | 86 | :param asset_dir (str): File path to directory containing satellite scenes downloaded using the default 87 | output of sat-search load 88 | :param asset_dict (dict): Keys = asset (band) names in scene files (e.g. 'B01', 'B02'); Values = value names 89 | corresponding to keys (e.g. 'red', 'nir') 90 | :param samples_df (pd.DataFrame): pd.DataFrame with samples locations for each land cover class 91 | :param scale (bool): Scale features using sklearn.preprecessing.MinMaxScaler() 92 | 93 | :return: pd.DataFrame with time-series of n_features (band reflectance values) for each sample location 94 | ''' 95 | 96 | # Array indices corresponding to sample locations 97 | ind = list(samples_df.array_index) 98 | ind = [elem.strip('()').split(',') for elem in ind] 99 | ind = [list(map(int, elem)) for elem in ind] 100 | sample_ind = np.array([*ind]) 101 | 102 | # Class labels 103 | labels = samples_df.label 104 | 105 | # Full file-path for every asset in `fp` (directory structure = default output of sat-search) 106 | file_paths = [] 107 | for path, subdirs, files in os.walk(asset_dir): 108 | for name in files: 109 | # Address .DS_Store file issue 110 | if not name.startswith('.'): 111 | file_paths.append(os.path.join(path, name)) 112 | 113 | # Scene dates 114 | dates = [re.findall('\d\d\d\d-\d\d-\d\d', f) for f in file_paths] 115 | dates = [date for sublist in dates for date in sublist] 116 | 117 | # Asset (band) names 118 | pattern = '[^_.]+(?=\.[^_.]*$)' 119 | bands = [re.search(pattern, f).group(0) for f in file_paths] 120 | 121 | # Match band names 122 | bands = [asset_dict.get(band, band) for band in bands] 123 | 124 | samples_list = [] 125 | for i in range(0, len(file_paths)): 126 | 127 | img = gippy.GeoImage.open(filenames=[file_paths[i]], bandnames=[bands[i]], nodata=0, gain=0.0001) 128 | bandvals = img.read() 129 | 130 | # Extract values at sample indices for band[i] in time-step[i] 131 | sample_values = bandvals[sample_ind[:, 0], sample_ind[:, 1]] 132 | 133 | # Store extracted band values as dataframe 134 | d = {'feature': bands[i], 135 | 'value': sample_values, 136 | 'date': dates[i], 137 | 'label': labels, 138 | 'ind': [*sample_ind]} 139 | 140 | # Necessary due to varying column lengths 141 | samp = pd.DataFrame(dict([(k, pd.Series(v)) for k, v in d.items()])).ffill() 142 | 143 | # Zero mean and unit variance for feature 144 | if standardize: 145 | samp['value'] = preprocessing.scale(samp['value']) 146 | 147 | samples_list.append(samp) 148 | 149 | # Combine all samples into single, long-form dataframe 150 | training = pd.concat(samples_list) 151 | 152 | # Reshape for time-series generation 153 | training['ind'] = tuple(list(training['ind'])) 154 | training = training.sort_values(by=['ind', 'date', 'feature']) 155 | 156 | return training 157 | 158 | 159 | def format_training_data(training_data, one_hot=True, seed=None): 160 | ''' Format time-series of reflectance data for fitting a Keras Sequential model 161 | 162 | :param training_data (pd.DataFrame): output of get_training_data 163 | :param one_hot (bool): Format response variable to one-hot encoded vectors? 164 | :param seed (bool): Set a seed to generate same train/test datasets repeatedly 165 | 166 | :return: X (feature) matrix , Y (response) matrix, codes (dict) for Y-labels 167 | ''' 168 | 169 | np.random.seed(seed) 170 | 171 | # Create 3D numpy array from sample values 172 | i = training_data.set_index(['date', 'ind', 'feature']) 173 | shape = list(map(len, i.index.levels)) 174 | arr = np.full(shape, np.nan) 175 | arr[i.index.labels] = i.values[:, 0].flat 176 | 177 | # Kereas LSTM shape: [n_samples, n_timesteps, n_feaures] 178 | x = arr.swapaxes(0, 1) 179 | 180 | # Data labels (Y values); first encode labels as int 181 | training_data['label'] = training_data['label'].astype('category') 182 | 183 | # Store categorical codes 184 | label_codes = dict(enumerate(training_data['label'].cat.categories)) 185 | 186 | # Convert labels to int 187 | training_data['label'] = training_data['label'].cat.codes.astype('str').astype('int') 188 | 189 | # Get Y 190 | group = training_data.groupby('ind') 191 | 192 | y = group.apply(lambda x: x['label'].unique()) 193 | y = y.apply(pd.Series) 194 | y = y[0].values 195 | 196 | if one_hot: 197 | y = to_categorical(y, num_classes=len(training_data['label'].unique())) 198 | 199 | return label_codes, x, y 200 | 201 | 202 | def split_train_test(x, y, seed=0, prop_train=0.8): 203 | ''' Generate training and test datasets for keras LSTM 204 | 205 | :param x (np.array): dataset of shape (n_samples, n_features, n_timesteps) 206 | :param y (np.array): data labels of shape (n_classes, n_samples); likely one-hot encoded vectors 207 | :param seed (int): for shuffling data 208 | :param prop_train (float): proportion of dataset to use for training; default = 0.8 for 80/20 train/test split 209 | 210 | :return: x and y matrices for train/test sets 211 | ''' 212 | x, y = shuffle(x, y, random_state=seed) 213 | 214 | x_train, x_test = x[0:int(x.shape[0] * prop_train)], x[int(x.shape[0] * prop_train):len(x)] 215 | y_train, y_test = y[0:int(y.shape[0] * prop_train)], y[int(y.shape[0] * prop_train):len(y)] 216 | 217 | return x_train, x_test, y_train, y_test 218 | 219 | 220 | def standardize_features(x_train, x_test): 221 | '''Standardize features of 3D array formated for keras sequential model''' 222 | 223 | mu = x_train.mean(axis=(0, 1)) 224 | sd = x_train.std(axis=(0, 1)) 225 | 226 | x_train_norm = (x_train - mu) / sd 227 | x_test_norm = (x_test - mu) / sd 228 | 229 | return mu, sd, x_train_norm, x_test_norm 230 | 231 | 232 | def conf_mat(x_test, y_test, model, label_dict): 233 | 234 | # Model predictions on test set 235 | predictions = model.predict(x_test) 236 | y_pred = (predictions > 0.5) 237 | 238 | # Generate confusion matrix 239 | matrix = confusion_matrix(y_test.argmax(axis=1), y_pred.argmax(axis=1)) 240 | 241 | # Add data labels and per-class recall 242 | tp = matrix.diagonal() 243 | cm = pd.DataFrame(matrix, index=list(label_dict.values()), columns=list(label_dict.values())) 244 | cm['recall'] = tp / cm.sum(axis=1) 245 | 246 | return cm -------------------------------------------------------------------------------- /satts/version.py: -------------------------------------------------------------------------------- 1 | __version__ = '0.1.0' 2 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | from setuptools import setup, find_packages 3 | from imp import load_source 4 | from os import path 5 | import io 6 | 7 | __version__ = load_source('cropclass.version', 'cropclass/version.py').__version__ 8 | 9 | here = path.abspath(path.dirname(__file__)) 10 | 11 | # get the dependencies and installs 12 | with io.open(path.join(here, 'requirements.txt'), encoding='utf-8') as f: 13 | all_reqs = f.read().split('\n') 14 | 15 | install_requires = [x.strip() for x in all_reqs if 'git+' not in x] 16 | dependency_links = [x.strip().replace('git+', '') for x in all_reqs if 'git+' not in x] 17 | 18 | setup( 19 | name='cropclass', 20 | author='', 21 | author_email='', 22 | version=__version__, 23 | description='python-seed', 24 | url='https://github.com/', 25 | license='MIT', 26 | classifiers=[ 27 | 'Intended Audience :: Developers', 28 | 'License :: OSI Approved :: MIT License', 29 | 'Programming Language :: Python :: 2.7', 30 | 'Programming Language :: Python :: 3.5', 31 | 'Programming Language :: Python :: 3.6', 32 | ], 33 | keywords='', 34 | # entry_points={ 35 | # 'console_scripts': ['PACKAGENAME=PACKAGENAME.main:cli'], 36 | # }, 37 | packages=find_packages(exclude=['docs', 'tests*']), 38 | include_package_data=True, 39 | install_requires=install_requires, 40 | dependency_links=dependency_links, 41 | ) 42 | --------------------------------------------------------------------------------