├── .gitignore ├── LICENSE ├── README.md ├── bias_correct.py ├── data ├── merra_example.nc └── prism_example.nc ├── merra_prism_example.py ├── preprocess.bash ├── qmap.py └── spatial_scaling.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 TJ Vandal 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Bias Correction Spatial Disaggregation 2 | 3 | This code was used in this 2019 paper: [Intercomparison of machine learning methods for statistical downscaling: the case of daily and extreme precipitation 4 | ](https://link.springer.com/article/10.1007/s00704-018-2613-3) 5 | 6 | 7 | ## Prism Downscaling MERRA-2 - Precipitation 8 | 9 | Requirements 10 | ---------------- 11 | - python2.7 12 | - xarray (http://xarray.pydata.org/en/stable/index.html) 13 | - climate data operators (cdo) (https://code.zmaw.de/projects/cdo) 14 | 15 | ### Data 16 | Merra 2 - A reanalysis dataset provided by NASA's Global Modeling and Assimilation 17 | Office. We extract preciptation from the land product to downscale. Reanalysis datasets 18 | are used to test a downscaling model's skill against an observed dataset. 19 | https://gmao.gsfc.nasa.gov/reanalysis/MERRA-2/ 20 | 21 | Prism - The prism 4km precipitation dataset is aggregated to 16km, which will be our 22 | observations. http://www.prism.oregonstate.edu/ 23 | 24 | ### Preprocessing of data 25 | - Interplate missing values 26 | - Upscale Prism and remap to MERRA 27 | - Merge all years into single files, 1 per data source 28 | 29 | ```bash 30 | cd data 31 | prism='prism_example.nc' 32 | merra='merra_example.nc' 33 | prism_upscaled='prism_upscaled.nc' 34 | merra_filled='merra_filled.nc' 35 | 36 | cdo griddes $merra > merra_grid 37 | # This trick of setting and resetting the missing value seems to allow fillmiss to work 38 | # if resetting is not done then merra_prism_example.py won't read the file correctly 39 | cdo setmissval,nan $prism temp_miss.nc 40 | cdo fillmiss temp_miss.nc tmp.nc 41 | cdo setmissval,-9999 tmp.nc tmp_filled.nc 42 | cdo remapbil,merra_grid -gridboxmean,3,3 tmp_filled.nc $prism_upscaled 43 | cdo fillmiss $merra $merra_filled 44 | rm tmp_filled.nc 45 | ``` 46 | 47 | ### Bias Correction 48 | ```python 49 | python ../merra_prism_example.py $prism_upscaled $merra_filled ppt PRECTOTLAND merra_bc.nc 50 | ``` 51 | 52 | ### Spatial Disaggregation - Scaling 53 | #### Remap Bias Corrected Merra to the High Resolution Prism 54 | ```bash 55 | cdo griddes $prism > prism_grid 56 | cdo remapbil,prism_grid merra_bc.nc merra_bc_interp.nc 57 | ``` 58 | #### Interpolate upscaled Prism to Original Resolution 59 | ```bash 60 | cdo remapbil,prism_grid $prism_upscaled prism_reinterpolated.nc 61 | ``` 62 | #### Compute scaling Factors 63 | ```bash 64 | cdo ydayavg prism_reinterpolated.nc prism_interpolated_ydayavg.nc 65 | cdo ydayavg $prism prism_ydayavg.nc 66 | cdo div prism_ydayavg.nc prism_interpolated_ydayavg.nc scale_factors.nc 67 | ``` 68 | 69 | #### Execute Spatial Scaling 70 | ```python 71 | python ../spatial_scaling.py merra_bc_interp.nc scale_factors.nc merra_bcsd.nc 72 | ``` 73 | 74 | #### Masking (optional) 75 | The dataset provided does not contain any bodies of water but 76 | when downscaling north america the ocean is filled with interpolated values. 77 | After spatial scaling we'll want to replace filled values with NaN. Here, we 78 | build a dataset with 1's over land and NaN over bodies of water. 79 | ```bash 80 | cdo seltimestep,1 -div -addc,1 $prism -addc,1 $prism mask.nc 81 | cdo mul mask.nc merra_bcsd.nc merra_bcsd_masked.nc 82 | ``` 83 | -------------------------------------------------------------------------------- /bias_correct.py: -------------------------------------------------------------------------------- 1 | import pickle 2 | import os, sys 3 | 4 | import numpy as np 5 | import xray 6 | from joblib import Parallel, delayed 7 | 8 | from qmap import QMap 9 | 10 | np.seterr(invalid='ignore') 11 | 12 | def mapper(x, y, train_num, step=0.01): 13 | qmap = QMap(step=step) 14 | qmap.fit(x[:train_num], y[:train_num], axis=0) 15 | return qmap.predict(y) 16 | 17 | def nanarray(size): 18 | arr = np.empty(size) 19 | arr[:] = np.nan 20 | return arr 21 | 22 | def convert_to_float32(ds): 23 | for var in ds.data_vars: 24 | if ds[var].dtype == 'float64': 25 | ds[var] = ds[var].astype('float32', copy=False) 26 | return ds 27 | 28 | class BiasCorrectDaily(): 29 | """ A class which can perform bias correction on daily data 30 | 31 | The process applied is based on the bias correction process applied by 32 | the NASA NEX team 33 | (https://nex.nasa.gov/nex/static/media/other/NEX-GDDP_Tech_Note_v1_08June2015.pdf) 34 | This process does NOT require temporal disaggregation from monthly to daily time steps. 35 | Instead pooling is used to capture a greater range of variablity 36 | """ 37 | def __init__(self, pool=15, max_train_year=np.inf, step=0.1): 38 | self.pool = pool 39 | self.max_train_year = max_train_year 40 | self.step = step 41 | 42 | def bias_correction(self, obs, modeled, obs_var, modeled_var, njobs=1): 43 | """ 44 | Parameters 45 | --------------------------------------------------------------- 46 | obs: :py:class:`~xarray.DataArray`, required 47 | A baseline gridded low resolution observed dataset. This should include 48 | high quality gridded observations. lat and lon are expected as dimensions. 49 | modeled: :py:class:`~xarray.DataArray`, required 50 | A gridded low resolution climate variable to be bias corrected. This may include 51 | reanalysis or GCM datasets. It is recommended that the lat and lon dimensions 52 | match are very similar to obs. 53 | obs_var: str, required 54 | The variable name in dataset obs which to model 55 | modeled_var: str, required 56 | The variable name in Dataset modeled which to bias correct 57 | njobs: int, optional 58 | The number of processes to execute in parallel 59 | """ 60 | # Select intersecting time perids 61 | d1 = obs.time.values 62 | d2 = modeled.time.values 63 | intersection = np.intersect1d(d1, d2) 64 | obs = obs.loc[dict(time=intersection)] 65 | modeled = modeled.loc[dict(time=intersection)] 66 | 67 | dayofyear = obs['time.dayofyear'] 68 | lat_vals = modeled.lat.values 69 | lon_vals = modeled.lon.values 70 | 71 | # initialize the output data array 72 | mapped_data = np.zeros(shape=(intersection.shape[0], lat_vals.shape[0], 73 | lon_vals.shape[0])) 74 | # loop through each day of the year, 1 to 366 75 | for day in np.unique(dayofyear.values): 76 | print "Day = %i" % day 77 | # select days +- pool 78 | dayrange = (np.arange(day-self.pool, day+self.pool+1) + 366) % 366 + 1 79 | days = np.in1d(dayofyear, dayrange) 80 | subobs = obs.loc[dict(time=days)] 81 | submodeled = modeled.loc[dict(time=days)] 82 | 83 | # which rows correspond to these days 84 | sub_curr_day_rows = np.where(day == subobs['time.dayofyear'].values)[0] 85 | curr_day_rows = np.where(day == obs['time.dayofyear'].values)[0] 86 | train_num = np.where(subobs['time.year'] <= self.max_train_year)[0][-1] 87 | mapped_times = subobs['time'].values[sub_curr_day_rows] 88 | 89 | jobs = [] # list to collect jobs 90 | for i, lat in enumerate(lat_vals): 91 | X_lat = subobs.sel(lat=lat, lon=lon_vals, method='nearest')[obs_var].values 92 | Y_lat = submodeled.sel(lat=lat, lon=lon_vals)[modeled_var].values 93 | jobs.append(delayed(mapper)(X_lat, Y_lat, train_num, self.step)) 94 | 95 | print "Running jobs", len(jobs) 96 | # select only those days which correspond to the current day of the year 97 | day_mapped = np.asarray(Parallel(n_jobs=njobs)(jobs))[:, sub_curr_day_rows] 98 | day_mapped = np.swapaxes(day_mapped, 0, 1) 99 | mapped_data[curr_day_rows, :, :] = day_mapped 100 | 101 | # put data into a data array 102 | dr = xray.DataArray(mapped_data, coords=[obs['time'].values, lat_vals, lon_vals], 103 | dims=['time', 'lat', 'lon']) 104 | dr.attrs['gridtype'] = 'latlon' 105 | ds = xray.Dataset({'bias_corrected': dr}) 106 | ds = ds.reindex_like(modeled) 107 | modeled = modeled.merge(ds) # merging aids in preserving netcdf structure 108 | # delete modeled variable to save space 109 | del modeled[modeled_var] 110 | return modeled 111 | 112 | -------------------------------------------------------------------------------- /data/merra_example.nc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tjvandal/bcsd-python/9565b5fcd96ae872cd8076d2a94e13b478c4077e/data/merra_example.nc -------------------------------------------------------------------------------- /data/prism_example.nc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tjvandal/bcsd-python/9565b5fcd96ae872cd8076d2a94e13b478c4077e/data/prism_example.nc -------------------------------------------------------------------------------- /merra_prism_example.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | import argparse 4 | 5 | import xarray as xr 6 | import numpy as np 7 | 8 | from bias_correct import BiasCorrectDaily, convert_to_float32 9 | 10 | parser = argparse.ArgumentParser() 11 | parser.add_argument("fobserved", help="Netcdf file containing an upscaled " \ 12 | "version of the observed dataset", type=str) 13 | parser.add_argument("fmodeled", help="Netcdf file of a GCM or Reanalysis dataset", 14 | type=str) 15 | parser.add_argument("var1", help="Variable name of the observed dataset") 16 | parser.add_argument("var2", help="Variable name of the modeled dataset") 17 | parser.add_argument("ofile", help="File to save bias corrected dataset") 18 | parser.add_argument("--njobs", help="File to save bias corrected dataset", 19 | default=1, type=int) 20 | args = parser.parse_args() 21 | args = vars(args) 22 | 23 | f_observed = 'data/prism_upscaled.nc' 24 | f_modeled = 'data/merra_filled.nc' 25 | obs_var = 'ppt' 26 | modeled_var = 'PRECTOTLAND' 27 | 28 | 29 | obs_data = xr.open_dataset(args['fobserved']) 30 | 31 | print "loading observations" 32 | obs_data.load() 33 | obs_data = obs_data.dropna('time', how='all') 34 | obs_data = obs_data.resample("D", "time") 35 | obs_data = convert_to_float32(obs_data) 36 | 37 | print "loading modeled" 38 | modeled_data = xr.open_dataset(args['fmodeled']) 39 | del modeled_data['time_bnds'] 40 | modeled_data.load() 41 | modeled_data = modeled_data.resample("D", "time") 42 | convert_to_float32(modeled_data) 43 | 44 | print "starting bcsd" 45 | t0 = time.time() 46 | bc = BiasCorrectDaily(max_train_year=2001, pool=2) 47 | corrected = bc.bias_correction(obs_data, modeled_data, args['var1'], 48 | args['var2'], njobs=args['njobs']) 49 | print "running time:", (time.time() - t0) 50 | corrected.to_netcdf(args['ofile']) 51 | -------------------------------------------------------------------------------- /preprocess.bash: -------------------------------------------------------------------------------- 1 | cd ~/repos/bcsd-python/data/ 2 | prism='prism_example.nc' 3 | merra='merra_example.nc' 4 | prism_upscaled='prism_upscaled.nc' 5 | merra_filled='merra_filled.nc' 6 | 7 | cdo griddes $merra > merra_map 8 | cdo fillmiss $prism tmp_filled.nc 9 | cdo remapbil,merra_map -gridboxmean,3,3 tmp_filled.nc $prism_upscaled 10 | cdo fillmiss $merra $merra_filled 11 | rm tmp_filled.nc 12 | -------------------------------------------------------------------------------- /qmap.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | class QMap(): 4 | def __init__(self, step=0.01): 5 | self.step = step 6 | 7 | def fit(self, x, y, axis=None): 8 | if axis not in (None, 0): 9 | raise ValueError("Axis should be None or 0") 10 | self.axis = axis 11 | steps = np.arange(0, 100, self.step) 12 | self.x_map = np.percentile(x, steps, axis=axis) 13 | self.y_map = np.percentile(y, steps, axis=axis) 14 | return self 15 | 16 | def predict(self, y): 17 | idx = [np.abs(val - self.y_map).argmin(axis=self.axis) for val in y] 18 | if self.axis == 0: 19 | out = np.asarray([self.x_map[k, range(y.shape[1])] for k in idx]) 20 | else: 21 | out = self.x_map[idx] 22 | return out 23 | 24 | def test_qmap(): 25 | np.random.seed(0) 26 | x = np.random.normal(10, size=(10,20)) 27 | y = np.random.normal(100, size=(10, 20)) 28 | mapped = np.zeros(x.shape) 29 | for j in range(x.shape[1]): 30 | qmap = QMap() 31 | qmap.fit(x[:,j], y[:,j]) 32 | mapped[:, j] = qmap.predict(y[:,j]) 33 | 34 | if __name__ == "__main__": 35 | test_qmap() 36 | -------------------------------------------------------------------------------- /spatial_scaling.py: -------------------------------------------------------------------------------- 1 | import xarray as xr 2 | import argparse 3 | 4 | # parse arguments 5 | parser = argparse.ArgumentParser() 6 | parser.add_argument('bias_corrected', help="The bias corrected gcm or reanalysis file.") 7 | parser.add_argument('scale_file', help="Netcdf file withe scaling factors.") 8 | parser.add_argument('fout', help='BCSD output file') 9 | args = parser.parse_args() 10 | args = vars(args) 11 | 12 | scale = xr.open_dataset(args['scale_file']) 13 | bc = xr.open_dataset(args['bias_corrected']) 14 | 15 | 16 | scaledayofyear = scale['time.dayofyear'] 17 | 18 | # align indices 19 | print "Grouping" 20 | scale = scale.groupby('time.dayofyear').mean('time') 21 | scale['lat'] = bc.lat 22 | scale['lon'] = bc.lon 23 | 24 | daydata = [] 25 | for key, val in bc.groupby('time.dayofyear'): 26 | print key 27 | # multiply interpolated by scaling factor 28 | if key == 366: 29 | key = 365 30 | daydata += [val.bias_corrected * scale.sel(dayofyear=key)] 31 | 32 | # join all days 33 | bcsd = xr.concat(daydata, 'time') 34 | order = bcsd.time.argsort() 35 | bcsd = bcsd.sel(time=bcsd.time[order]) 36 | 37 | bcsd.to_netcdf(args['fout']) 38 | --------------------------------------------------------------------------------