├── .gitignore
├── LICENSE
├── README.md
├── bias_correct.py
├── data
    ├── merra_example.nc
    └── prism_example.nc
├── merra_prism_example.py
├── preprocess.bash
├── qmap.py
└── spatial_scaling.py


/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2022 TJ Vandal
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Bias Correction Spatial Disaggregation
 2 | 
 3 | This code was used in this 2019 paper: [Intercomparison of machine learning methods for statistical downscaling: the case of daily and extreme precipitation
 4 | ](https://link.springer.com/article/10.1007/s00704-018-2613-3)
 5 | 
 6 | 
 7 | ## Prism Downscaling MERRA-2 - Precipitation
 8 | 
 9 | Requirements
10 | ----------------
11 | - python2.7
12 | - xarray (http://xarray.pydata.org/en/stable/index.html)
13 | - climate data operators (cdo) (https://code.zmaw.de/projects/cdo)
14 | 
15 | ### Data
16 | Merra 2 - A reanalysis dataset provided by NASA's Global Modeling and Assimilation 
17 | Office. We extract preciptation from the land product to downscale. Reanalysis datasets
18 | are used to test a downscaling model's skill against an observed dataset.
19 | https://gmao.gsfc.nasa.gov/reanalysis/MERRA-2/
20 | 
21 | Prism - The prism 4km precipitation dataset is aggregated to 16km, which will be our
22 | observations. http://www.prism.oregonstate.edu/
23 | 
24 | ### Preprocessing of data
25 | - Interplate missing values
26 | - Upscale Prism and remap to MERRA
27 | - Merge all years into single files, 1 per data source
28 | 
29 | ```bash
30 | cd data
31 | prism='prism_example.nc'
32 | merra='merra_example.nc'
33 | prism_upscaled='prism_upscaled.nc'
34 | merra_filled='merra_filled.nc'
35 | 
36 | cdo griddes $merra > merra_grid
37 | # This trick of setting and resetting the missing value seems to allow fillmiss to work
38 | #     if resetting is not done then merra_prism_example.py won't read the file correctly
39 | cdo setmissval,nan $prism temp_miss.nc
40 | cdo fillmiss temp_miss.nc tmp.nc
41 | cdo setmissval,-9999 tmp.nc tmp_filled.nc
42 | cdo remapbil,merra_grid -gridboxmean,3,3 tmp_filled.nc $prism_upscaled
43 | cdo fillmiss $merra $merra_filled
44 | rm tmp_filled.nc
45 | ```
46 | 
47 | ### Bias Correction
48 | ```python
49 | python ../merra_prism_example.py $prism_upscaled $merra_filled ppt PRECTOTLAND merra_bc.nc
50 | ```
51 | 
52 | ### Spatial Disaggregation - Scaling
53 | #### Remap Bias Corrected Merra to the High Resolution Prism
54 | ```bash
55 | cdo griddes $prism > prism_grid
56 | cdo remapbil,prism_grid merra_bc.nc merra_bc_interp.nc 
57 | ```
58 | #### Interpolate upscaled Prism to Original Resolution
59 | ```bash
60 | cdo remapbil,prism_grid $prism_upscaled prism_reinterpolated.nc
61 | ```
62 | #### Compute scaling Factors
63 | ```bash
64 | cdo ydayavg prism_reinterpolated.nc prism_interpolated_ydayavg.nc
65 | cdo ydayavg $prism prism_ydayavg.nc
66 | cdo div prism_ydayavg.nc prism_interpolated_ydayavg.nc scale_factors.nc
67 | ```
68 | 
69 | #### Execute Spatial Scaling
70 | ```python
71 | python ../spatial_scaling.py merra_bc_interp.nc scale_factors.nc merra_bcsd.nc
72 | ```
73 | 
74 | #### Masking (optional)
75 | The dataset provided does not contain any bodies of water but 
76 | when downscaling north america the ocean is filled with interpolated values.
77 | After spatial scaling we'll want to replace filled values with NaN. Here, we 
78 | build a dataset with 1's over land and NaN over bodies of water.
79 | ```bash
80 | cdo seltimestep,1 -div -addc,1 $prism -addc,1 $prism mask.nc
81 | cdo mul mask.nc merra_bcsd.nc merra_bcsd_masked.nc
82 | ```
83 | 


--------------------------------------------------------------------------------
/bias_correct.py:
--------------------------------------------------------------------------------
  1 | import pickle
  2 | import os, sys
  3 | 
  4 | import numpy as np
  5 | import xray
  6 | from joblib import Parallel, delayed
  7 | 
  8 | from qmap import QMap
  9 | 
 10 | np.seterr(invalid='ignore')
 11 | 
 12 | def mapper(x, y, train_num, step=0.01):
 13 |     qmap = QMap(step=step)
 14 |     qmap.fit(x[:train_num], y[:train_num], axis=0)
 15 |     return qmap.predict(y)
 16 | 
 17 | def nanarray(size):
 18 |     arr = np.empty(size)
 19 |     arr[:] = np.nan
 20 |     return arr
 21 | 
 22 | def convert_to_float32(ds):
 23 |     for var in ds.data_vars:
 24 |         if ds[var].dtype == 'float64':
 25 |             ds[var] = ds[var].astype('float32', copy=False)
 26 |     return ds
 27 | 
 28 | class BiasCorrectDaily():
 29 |     """ A class which can perform bias correction on daily data
 30 | 
 31 |     The process applied is based on the bias correction process applied by
 32 |     the NASA NEX team
 33 |     (https://nex.nasa.gov/nex/static/media/other/NEX-GDDP_Tech_Note_v1_08June2015.pdf)
 34 |     This process does NOT require temporal disaggregation from monthly to daily time steps.
 35 |     Instead pooling is used to capture a greater range of variablity
 36 |     """
 37 |     def __init__(self, pool=15, max_train_year=np.inf, step=0.1):
 38 |         self.pool = pool
 39 |         self.max_train_year = max_train_year
 40 |         self.step = step
 41 | 
 42 |     def bias_correction(self, obs, modeled, obs_var, modeled_var, njobs=1):
 43 |         """
 44 |         Parameters
 45 |         ---------------------------------------------------------------
 46 |         obs: :py:class:`~xarray.DataArray`, required
 47 |             A baseline gridded low resolution observed dataset. This should include
 48 |             high quality gridded observations. lat and lon are expected as dimensions.
 49 |         modeled: :py:class:`~xarray.DataArray`, required
 50 |             A gridded low resolution climate variable to be bias corrected. This may include
 51 |             reanalysis or GCM datasets. It is recommended that the lat and lon dimensions 
 52 |             match are very similar to obs.
 53 |         obs_var: str, required
 54 |             The variable name in dataset obs which to model
 55 |         modeled_var: str, required
 56 |             The variable name in Dataset modeled which to bias correct
 57 |         njobs: int, optional
 58 |             The number of processes to execute in parallel
 59 |         """
 60 |         # Select intersecting time perids
 61 |         d1 = obs.time.values
 62 |         d2 = modeled.time.values
 63 |         intersection = np.intersect1d(d1, d2)
 64 |         obs = obs.loc[dict(time=intersection)]
 65 |         modeled = modeled.loc[dict(time=intersection)]
 66 | 
 67 |         dayofyear = obs['time.dayofyear']
 68 |         lat_vals = modeled.lat.values
 69 |         lon_vals = modeled.lon.values
 70 | 
 71 |         # initialize the output data array
 72 |         mapped_data = np.zeros(shape=(intersection.shape[0], lat_vals.shape[0], 
 73 |                                       lon_vals.shape[0]))
 74 |         # loop through each day of the year, 1 to 366
 75 |         for day in np.unique(dayofyear.values):
 76 |             print "Day = %i" % day
 77 |             # select days +- pool
 78 |             dayrange = (np.arange(day-self.pool, day+self.pool+1) + 366) % 366 + 1
 79 |             days = np.in1d(dayofyear, dayrange)
 80 |             subobs = obs.loc[dict(time=days)]
 81 |             submodeled = modeled.loc[dict(time=days)]
 82 | 
 83 |             # which rows correspond to these days
 84 |             sub_curr_day_rows = np.where(day == subobs['time.dayofyear'].values)[0]
 85 |             curr_day_rows = np.where(day == obs['time.dayofyear'].values)[0]
 86 |             train_num = np.where(subobs['time.year'] <= self.max_train_year)[0][-1]
 87 |             mapped_times = subobs['time'].values[sub_curr_day_rows]
 88 | 
 89 |             jobs = [] # list to collect jobs
 90 |             for i, lat in enumerate(lat_vals):
 91 |                 X_lat = subobs.sel(lat=lat, lon=lon_vals, method='nearest')[obs_var].values
 92 |                 Y_lat = submodeled.sel(lat=lat, lon=lon_vals)[modeled_var].values
 93 |                 jobs.append(delayed(mapper)(X_lat, Y_lat, train_num, self.step))
 94 | 
 95 |             print "Running jobs", len(jobs)
 96 |             # select only those days which correspond to the current day of the year
 97 |             day_mapped = np.asarray(Parallel(n_jobs=njobs)(jobs))[:, sub_curr_day_rows]
 98 |             day_mapped = np.swapaxes(day_mapped, 0, 1)
 99 |             mapped_data[curr_day_rows, :, :] = day_mapped
100 | 
101 |         # put data into a data array
102 |         dr = xray.DataArray(mapped_data, coords=[obs['time'].values, lat_vals, lon_vals],
103 |                        dims=['time', 'lat', 'lon'])
104 |         dr.attrs['gridtype'] = 'latlon'
105 |         ds = xray.Dataset({'bias_corrected': dr}) 
106 |         ds = ds.reindex_like(modeled)
107 |         modeled = modeled.merge(ds) # merging aids in preserving netcdf structure
108 |         # delete modeled variable to save space
109 |         del modeled[modeled_var]
110 |         return modeled
111 | 
112 | 


--------------------------------------------------------------------------------
/data/merra_example.nc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tjvandal/bcsd-python/9565b5fcd96ae872cd8076d2a94e13b478c4077e/data/merra_example.nc


--------------------------------------------------------------------------------
/data/prism_example.nc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tjvandal/bcsd-python/9565b5fcd96ae872cd8076d2a94e13b478c4077e/data/prism_example.nc


--------------------------------------------------------------------------------
/merra_prism_example.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import time
 3 | import argparse
 4 | 
 5 | import xarray as xr
 6 | import numpy as np
 7 | 
 8 | from bias_correct import BiasCorrectDaily, convert_to_float32
 9 | 
10 | parser = argparse.ArgumentParser()
11 | parser.add_argument("fobserved", help="Netcdf file containing an upscaled " \
12 |                     "version of the observed dataset", type=str)
13 | parser.add_argument("fmodeled", help="Netcdf file of a GCM or Reanalysis dataset",
14 |                     type=str)
15 | parser.add_argument("var1", help="Variable name of the observed dataset")
16 | parser.add_argument("var2", help="Variable name of the modeled dataset")
17 | parser.add_argument("ofile", help="File to save bias corrected dataset")
18 | parser.add_argument("--njobs", help="File to save bias corrected dataset",
19 |                    default=1, type=int)
20 | args = parser.parse_args()
21 | args = vars(args)
22 | 
23 | f_observed = 'data/prism_upscaled.nc'
24 | f_modeled = 'data/merra_filled.nc'
25 | obs_var = 'ppt'
26 | modeled_var = 'PRECTOTLAND'
27 | 
28 | 
29 | obs_data = xr.open_dataset(args['fobserved'])
30 | 
31 | print "loading observations"
32 | obs_data.load()
33 | obs_data = obs_data.dropna('time', how='all')
34 | obs_data = obs_data.resample("D", "time")
35 | obs_data = convert_to_float32(obs_data)
36 | 
37 | print "loading modeled"
38 | modeled_data = xr.open_dataset(args['fmodeled'])
39 | del modeled_data['time_bnds']
40 | modeled_data.load()
41 | modeled_data = modeled_data.resample("D", "time")
42 | convert_to_float32(modeled_data)
43 | 
44 | print "starting bcsd"
45 | t0 = time.time()
46 | bc = BiasCorrectDaily(max_train_year=2001, pool=2)
47 | corrected = bc.bias_correction(obs_data, modeled_data, args['var1'],
48 |                                args['var2'], njobs=args['njobs'])
49 | print "running time:", (time.time() - t0)
50 | corrected.to_netcdf(args['ofile'])
51 | 


--------------------------------------------------------------------------------
/preprocess.bash:
--------------------------------------------------------------------------------
 1 | cd ~/repos/bcsd-python/data/
 2 | prism='prism_example.nc'
 3 | merra='merra_example.nc'
 4 | prism_upscaled='prism_upscaled.nc'
 5 | merra_filled='merra_filled.nc'
 6 | 
 7 | cdo griddes $merra > merra_map
 8 | cdo fillmiss $prism tmp_filled.nc
 9 | cdo remapbil,merra_map -gridboxmean,3,3 tmp_filled.nc $prism_upscaled
10 | cdo fillmiss $merra $merra_filled
11 | rm tmp_filled.nc
12 | 


--------------------------------------------------------------------------------
/qmap.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | class QMap():
 4 |     def __init__(self, step=0.01):
 5 |         self.step = step
 6 | 
 7 |     def fit(self, x, y, axis=None):
 8 |         if axis not in (None, 0):
 9 |             raise ValueError("Axis should be None or 0")
10 |         self.axis = axis
11 |         steps = np.arange(0, 100, self.step)
12 |         self.x_map = np.percentile(x, steps, axis=axis)
13 |         self.y_map = np.percentile(y, steps, axis=axis)
14 |         return self
15 | 
16 |     def predict(self, y):
17 |         idx = [np.abs(val - self.y_map).argmin(axis=self.axis) for val in y]
18 |         if self.axis == 0:
19 |             out = np.asarray([self.x_map[k, range(y.shape[1])] for k in idx])
20 |         else:
21 |             out = self.x_map[idx]
22 |         return out
23 | 
24 | def test_qmap():
25 |     np.random.seed(0)
26 |     x = np.random.normal(10, size=(10,20))
27 |     y = np.random.normal(100, size=(10, 20))
28 |     mapped = np.zeros(x.shape)
29 |     for j in range(x.shape[1]):
30 |         qmap = QMap()
31 |         qmap.fit(x[:,j], y[:,j])
32 |         mapped[:, j] = qmap.predict(y[:,j])
33 | 
34 | if __name__ == "__main__":
35 |     test_qmap()
36 | 


--------------------------------------------------------------------------------
/spatial_scaling.py:
--------------------------------------------------------------------------------
 1 | import xarray as xr
 2 | import argparse
 3 | 
 4 | # parse arguments
 5 | parser = argparse.ArgumentParser()
 6 | parser.add_argument('bias_corrected', help="The bias corrected gcm or reanalysis file.")
 7 | parser.add_argument('scale_file', help="Netcdf file withe scaling factors.")
 8 | parser.add_argument('fout', help='BCSD output file')
 9 | args = parser.parse_args()
10 | args = vars(args)
11 | 
12 | scale = xr.open_dataset(args['scale_file'])
13 | bc = xr.open_dataset(args['bias_corrected'])
14 | 
15 | 
16 | scaledayofyear = scale['time.dayofyear']
17 | 
18 | # align indices
19 | print "Grouping"
20 | scale = scale.groupby('time.dayofyear').mean('time')
21 | scale['lat'] = bc.lat
22 | scale['lon'] = bc.lon
23 | 
24 | daydata = []
25 | for key, val in bc.groupby('time.dayofyear'):
26 |     print key
27 |     # multiply interpolated by scaling factor
28 |     if key == 366:
29 |         key = 365
30 |     daydata += [val.bias_corrected * scale.sel(dayofyear=key)]
31 | 
32 | # join all days
33 | bcsd = xr.concat(daydata, 'time')
34 | order = bcsd.time.argsort()
35 | bcsd = bcsd.sel(time=bcsd.time[order])
36 | 
37 | bcsd.to_netcdf(args['fout'])
38 | 


--------------------------------------------------------------------------------