├── .gitignore ├── .ipynb_checkpoints ├── 2019_Bano_CI-checkpoint.ipynb ├── 2020_Bano_CD-checkpoint.ipynb └── 2021_Bano_NSD-checkpoint.ipynb ├── 2018_Bano_CI.ipynb ├── 2018_Bano_CI.pdf ├── 2019_Bano_CI.ipynb ├── 2019_Bano_CI.pdf ├── 2020_Bano_CD.ipynb ├── 2020_Bano_CI.ipynb ├── 2020_Bano_CI.pdf ├── 2020_Bano_GMD.ipynb ├── 2020_Bano_GMD_FULL.ipynb ├── 2022_Bano_GMD.ipynb ├── DeepDownscaling.Rproj └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | .Ruserdata 5 | -------------------------------------------------------------------------------- /2018_Bano_CI.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SantanderMetGroup/DeepDownscaling/fb55b2dd479fea92c1c632f85ce97063d74a3bb4/2018_Bano_CI.pdf -------------------------------------------------------------------------------- /2019_Bano_CI.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SantanderMetGroup/DeepDownscaling/fb55b2dd479fea92c1c632f85ce97063d74a3bb4/2019_Bano_CI.pdf -------------------------------------------------------------------------------- /2020_Bano_CD.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## On the suitability of deep convolutional neural networks for continental-wide downscaling of climate change projections\n", 8 | "### *Climate Dynamics*\n", 9 | "### J. Baño-Medina, R. Manzanas and J. M. Gutiérrez" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "This notebook reproduces the results presented in *On the suitability of deep convolutional neural networks for continental-wide downscaling of climate change projections*, submitted to *Climate Dynamics* by *J. Baño-Medina, R. Manzanas and J. M. Gutiérrez*. That paper focuses on the suitability of statistical downscaling (SD) methods (in particular different configurations of generalized linear models (GLM) and convolutional neural networks (CNN)) for climate change applications under the perfect-prognosis approach. Throughout this notebook we deploy the code necessary to test the 3 key assumptions that have to be fullfilled for any SD method to be applied for climate change purposes (the reader is referred to the paper for more details). To do this, we build on the experimental framework developed in the experiment 2 of the [COST action VALUE](http://www.value-cost.eu/), which focuses on producing high-resolution climate change projections of temperature and precipitation over Europe by downscaling the 12th run of the EC-Earth global climate model (GCM). The technical specifications of the machine used to run the code presented herein can be found at the end of the notebook. **Please note that about 5 days were required to run the full notebook**.\n", 17 | "\n", 18 | "**Note:** This notebook is written in the free programming language `R`(version 3.6.1) and builds on [climate4R](https://github.com/SantanderMetGroup/climate4R), a suite of `R` packages developed by the [Santander Meteorology Group](http://meteo.unican.es) for transparent climate data access, post processing (including bias correction and downscaling) and visualization. For details on climate4R (C4R hereafter), the interested reader is referred to [Iturbide et al. 2019](https://www.sciencedirect.com/science/article/pii/S1364815218303049?via%3Dihub)." 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "## 1. Loading libraries" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "The C4R libraries that are needed to run this notebook can be installed either through the `devtools` package (e.g. `devtools::install_github(\"SantanderMetGroup/loadeR\")` for `loadeR`) or with *conda* (version 1.3.0); see detailed instructions [here](https://github.com/SantanderMetGroup/climate4R). The deep learning models used in this work are implemented in [`downscaleR.keras`](https://github.com/SantanderMetGroup/downscaleR.keras), an extension of `downscaleR` which integrates *keras* in the C4R." 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": null, 38 | "metadata": {}, 39 | "outputs": [], 40 | "source": [ 41 | "options(java.parameters = \"-Xmx8g\")\n", 42 | "\n", 43 | "# loading C4R and auxiliary packages needed to run this notebook\n", 44 | "library(climate4R.UDG) # version 0.2.1\n", 45 | "library(loadeR) # version 1.6.1\n", 46 | "library(loadeR.2nc)\n", 47 | "library(transformeR) # version 1.7.4\n", 48 | "library(downscaleR) # version 3.3.2\n", 49 | "library(visualizeR) # version 1.5.1\n", 50 | "library(climate4R.value) # version 0.0.2 (also relies on VALUE version 2.2.1)\n", 51 | "library(magrittr) # useful library to improve readability and maintainability of code\n", 52 | "library(gridExtra) # plotting functionalities\n", 53 | "library(RColorBrewer) # plotting functionalities\n", 54 | "library(sp) # plotting functionalities \n", 55 | "library(downscaleR.keras) # version 0.0.2 (relies on keras version 2.2.2 and tensorflow version 2.0.0)" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "In order to avoid errors while running the notebook, please set the path to your desired working directory and create three files named \"Data\", \"figures\" and \"models\" which will contain the downscaled predictions, the figures, and the trained deep models, respectively. Moreover, since we will undertake two distinct studies here (one for precipitation and another for temperature), please create two more directories named \"precip\" and \"temperature\" within \"Data\" and \"models\". " 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": null, 68 | "metadata": {}, 69 | "outputs": [], 70 | "source": [ 71 | "path = \"yourworkingdirectory\" \n", 72 | "setwd(path)\n", 73 | "\n", 74 | "# creating the directories where the generated outputs will be stored\n", 75 | "dir.create(\"Data\")\n", 76 | "dir.create(\"Data/precip/\")\n", 77 | "dir.create(\"Data/temperature/\")\n", 78 | "dir.create(\"models\")\n", 79 | "dir.create(\"models/temperature/\")\n", 80 | "dir.create(\"models/precip/\")\n", 81 | "dir.create(\"figures\")\n" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": {}, 87 | "source": [ 88 | "## 2. Loading data\n", 89 | "\n", 90 | "In this section we describe how to load into your R session the 3 datasets involved in this study: ERA-Interim, E-OBS (version 14), and the 12th run of the EC-Earth. This work is framed under VALUE experiment 2, and therefore data has become publicly available by the innitiative. A specific notebook was devoted to particularly explain the different ways to access the data. In this study we rely on the [User Data Gateway (UDG)](http://meteo.unican.es/udg-tap/home), a THREDDS-based service from the Santander Climate Data Service (CDS) to load the data into our session (register [here](http://meteo.unican.es/udg-tap/signup) freely to get a user). We can log in using the `loginUDG` function from `loadeR`:" 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": null, 96 | "metadata": {}, 97 | "outputs": [], 98 | "source": [ 99 | "loginUDG(username = \"***\", password = \"***\") # log into the Santander CDS" 100 | ] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": {}, 105 | "source": [ 106 | "We use as predictor data the set of variables defined in VALUE's experiment 1: Temperature (ta), zonal (ua) and meridional (va) wind velocity, geopotential (z) and specific humidity (hus) at 1000, 850, 700 and 500 hPa levels." 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": null, 112 | "metadata": {}, 113 | "outputs": [], 114 | "source": [ 115 | "# vector containing the labels that identify each of the \n", 116 | "# predictor variables used in this work: e.g. 'z@500' stands for geopotential at 500 hPa\n", 117 | "variables <- c(\"z@500\",\"z@700\",\"z@850\",\"z@1000\",\n", 118 | " \"hus@500\",\"hus@700\",\"hus@850\",\"hus@1000\",\n", 119 | " \"ta@500\",\"ta@700\",\"ta@850\",\"ta@1000\",\n", 120 | " \"ua@500\",\"ua@700\",\"ua@850\",\"ua@1000\",\n", 121 | " \"va@500\",\"va@700\",\"va@850\",\"va@1000\")" 122 | ] 123 | }, 124 | { 125 | "cell_type": "markdown", 126 | "metadata": {}, 127 | "source": [ 128 | "The `loadGridData` function from `loadeR` can be used to load into our `R` session all the predictor datasets we are going to need. For each dataset there is a label identifying the desired dataset (type `?UDG.datasets()` for information on the full list of datasets available at UDG). For example, ERA-Interim can be loaded passing the *ECMWF_ERA-Interim-ESD* label to the `dataset` input argument in `loadGridData`. In the following chunk of code, we use `lapply` to create a list whose elements contain each of the predictors defined in the `variables` vector. Afterwards, we use `makeMultiGrid` from `transformeR` to create a multigrid encompassing all those variables into a single C4R object. Note that the domain (latitude-longitude) and temporal period are both given by the `latLim`, `lonLim` and `years` parameters of the `loadGridData` funtion." 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "execution_count": null, 134 | "metadata": {}, 135 | "outputs": [], 136 | "source": [ 137 | "# ERA-Interim\n", 138 | "x <- lapply(variables, function(x) {\n", 139 | " loadGridData(dataset = \"ECMWF_ERA-Interim-ESD\",\n", 140 | " var = x,\n", 141 | " lonLim = c(-10,32),\n", 142 | " latLim = c(36,72), \n", 143 | " years = 1979:2008)\n", 144 | "}) %>% makeMultiGrid()\n", 145 | "\n", 146 | "# EC-Earth (historical) --> we use interpGrid to interpolate EC-Earth's resolution (1.125º) to the ERA-Interim's resolution (2º)\n", 147 | "xh <- lapply(variables, function(z) {\n", 148 | " loadGridData(dataset = \"CMIP5-subset_EC-EARTH_r12i1p1_historical\",\n", 149 | " var = z,\n", 150 | " lonLim = c(-10,32),\n", 151 | " latLim = c(36,72), \n", 152 | " years = 1979:2008) %>% \n", 153 | " interpGrid(new.coordinates = getGrid(x))\n", 154 | "}) %>% makeMultiGrid()\n", 155 | "\n", 156 | "# EC-Earth (RCP8.5) --> we use interpGrid to interpolate EC-Earth's resolution (1.125º) to the ERA-Interim's resolution (2º)\n", 157 | "xf <- lapply(variables, function(z) {\n", 158 | " loadGridData(dataset = \"CMIP5-subset_EC-EARTH_r12i1p1_rcp85\",\n", 159 | " var = z,\n", 160 | " lonLim = c(-10,32),\n", 161 | " latLim = c(36,72), \n", 162 | " years = 2071:2100) %>% \n", 163 | " interpGrid(new.coordinates = getGrid(x))\n", 164 | "}) %>% makeMultiGrid()" 165 | ] 166 | }, 167 | { 168 | "cell_type": "markdown", 169 | "metadata": {}, 170 | "source": [ 171 | "## 3. The perfect-prognosis assumption\n", 172 | "\n", 173 | "One of the key assumptions in 'perfect prognosis' downscaling is that the statistical distributions of reanalysis and GCM predictors should be compatible. To test this hypothesis, we define in the next block of code the `ksPanelPlot`function, which will be later used to apply a Kolmogorov-Smirnoff (KS) test comparing the distributions of reanalysis and GCM (historical) predictors. In particular, note that the `valueMeasure` function from `climate4R.value` is internally used to obtain both the distance score and the p-value from the KS test. Moreover, the input argument `type` allows for switching between the two types of standardization considered in this work (further details are given in the paper), 'harmonize+scaling' and 'scaling', which are applied using the `scaleGrid` function from `transformeR`. " 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": null, 179 | "metadata": {}, 180 | "outputs": [], 181 | "source": [ 182 | "# function to apply a KS test comparing the distribution\n", 183 | "# of reanalysis and GCM (historical) predictors\n", 184 | "ksPanelPlot <- function(x.grid, y.grid,\n", 185 | " type = c(\"harmonize+scaling\",\"scaling\"),\n", 186 | " season,\n", 187 | " vars = getVarNames(x.grid)) {\n", 188 | " ks.score.list <- pval.list <- rep(list(bquote()), length(vars))\n", 189 | " x.grid2 <- scaleGrid(x.grid,type = \"standardize\")\n", 190 | " if (type == \"harmonize+scaling\") {\n", 191 | " y.grid2 <- scaleGrid(y.grid, ref = x, \n", 192 | " type = \"center\", \n", 193 | " spatial.frame = \"gridbox\", \n", 194 | " time.frame = \"monthly\") %>%\n", 195 | " scaleGrid(type = \"standardize\")\n", 196 | " }\n", 197 | " if (type == \"scaling\") { \n", 198 | " y.grid2 <- scaleGrid(y.grid,type = \"standardize\")\n", 199 | " }\n", 200 | " for (i in 1:length(vars)) {\n", 201 | " # We use valueMeasure from climate4R.value to compute the KS-statistic and the p-value \n", 202 | " ks.score.list[[i]] <- valueMeasure(y = subsetGrid(y.grid, var = vars[i]) %>% subsetGrid(season = season),\n", 203 | " x = subsetGrid(x.grid, var = vars[i]) %>% subsetGrid(season = season),\n", 204 | " measure.code = \"ts.ks.pval\")$\"Measure\" \n", 205 | " pval.list[[i]] <- valueMeasure(y = subsetGrid(y.grid2, var = vars[i]) %>% subsetGrid(season = season),\n", 206 | " x = subsetGrid(x.grid2, var = vars[i]) %>% subsetGrid(season = season),\n", 207 | " measure.code = \"ts.ks.pval\")$\"Measure\" %>% climatology() %>% map.stippling(condition = \"LT\",\n", 208 | " pch = 4,\n", 209 | " cex = 0.4,\n", 210 | " col = \"red\",\n", 211 | " which = i)\n", 212 | " }\n", 213 | " ksmap <- do.call(\"makeMultiGrid\", ks.score.list)\n", 214 | " return(list(\"map\" = ksmap, \"stippling\" = pval.list))\n", 215 | "}" 216 | ] 217 | }, 218 | { 219 | "cell_type": "markdown", 220 | "metadata": {}, 221 | "source": [ 222 | "Next, we plot maps showing the results obtained from the application of `ksPanelPlot` to ERA-Interim and EC-EARTH (historical) predictors over the whole year, boreal summer (Jun-Jul-Aug) and winter (Dec-Jan-Feb), for the two types of standardization considered." 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": null, 228 | "metadata": {}, 229 | "outputs": [], 230 | "source": [ 231 | "# plotting maps showing the results obtained from the KS test\n", 232 | "# comparing ERA-Interim and EC-EARTH (historical) predictors\n", 233 | "ppfigs <- lapply(c(\"harmonize+scaling\",\"scaling\"), FUN = function(j){\n", 234 | "lapply(list(1:12,c(6,7,8),c(12,1,2)), FUN = function(i){\n", 235 | " fig.info <- ksPanelPlot(x.grid = x, y.grid = xh,\n", 236 | " type = j,\n", 237 | " season = i) \n", 238 | " spatialPlot(fig.info$map, color.theme = \"BuPu\",\n", 239 | " at = seq(0,0.3,0.02),\n", 240 | " set.min = 0,set.max = 0.3,\n", 241 | " backdrop.theme = \"coastline\",\n", 242 | " sp.layout = fig.info$stippling)\n", 243 | " \n", 244 | " }) \n", 245 | "})\n", 246 | "grid.arrange(grobs = ppfigs[[1]][i]) # harmonize+scaling: list of 3 elements with maps for the whole year, summer and winter\n", 247 | "grid.arrange(grobs = ppfigs[[2]][i]) # scaling: list of 3 elements with maps for the whole year, summer and winter" 248 | ] 249 | }, 250 | { 251 | "cell_type": "markdown", 252 | "metadata": {}, 253 | "source": [ 254 | "According to these results based on the KS-statistic we standardize the predictors from the historical and RCP85 scenario by substracting the monthly mean using as reference the observational grid (ERA-Interim) and then scale with the mean and standard deviation of the harmonized predictors of the historical scenario. This is done with function `scaleGrid` of `transformeR`." 255 | ] 256 | }, 257 | { 258 | "cell_type": "code", 259 | "execution_count": null, 260 | "metadata": {}, 261 | "outputs": [], 262 | "source": [ 263 | "# To perform the harmonzation+scaling step prior to model training and prediction \n", 264 | "xf <- scaleGrid(xf, base = xh, ref = x, type = \"center\", spatial.frame = \"gridbox\", time.frame = \"monthly\") \n", 265 | "xh <- scaleGrid(xh, base = xh, ref = x, type = \"center\", spatial.frame = \"gridbox\", time.frame = \"monthly\") \n", 266 | "xf <- scaleGrid(xf, base = xh, type = \"standardize\")\n", 267 | "xh <- scaleGrid(xh, type = \"standardize\")" 268 | ] 269 | }, 270 | { 271 | "cell_type": "markdown", 272 | "metadata": {}, 273 | "source": [ 274 | "## 4. Temperature\n", 275 | "In this section we present the code needed to downscale temperature in the historical and RCP8.5 scenarios from the EC-EARTH model. \n", 276 | "\n", 277 | "First, we use `loadGriData` to load into our `R` session the predictand dataset, surface temperature from E-OBS (version 14) at 0.5º resolution. The spatial and temporal domains proposed by VALUE are considered. Once loaded, the data are saved as a netCDF file into our local machine using the `grid2nc` function." 278 | ] 279 | }, 280 | { 281 | "cell_type": "code", 282 | "execution_count": null, 283 | "metadata": {}, 284 | "outputs": [], 285 | "source": [ 286 | "# loading E-OBS temperature\n", 287 | "y <- loadGridData(dataset = \"E-OBS_v14_0.50regular\",\n", 288 | " var = \"tas\",\n", 289 | " lonLim = c(-10,32),\n", 290 | " latLim = c(36,72), \n", 291 | " years = 1979:2008)\n", 292 | "# saving to local directory\n", 293 | "grid2nc(y, NetCDFOutFile = \"./Data/temperature/tas_E-OBS_v14_0.50regular.nc4\")" 294 | ] 295 | }, 296 | { 297 | "cell_type": "markdown", 298 | "metadata": {}, 299 | "source": [ 300 | "### 4.1 Downscaling (temperature) with two local GLMs\n", 301 | "The following block of code allows to build and apply the GLM1 and GLM4 models, which rely on local predictor information at neighbouring gridboxes (see the paper for details). To do so, we use the `downscaleChunk` function from `downscaleR`, which first trains the model using reanalysis and observations. Afterwards, the same function applies the learnt model to make predictions from a new dataset (i.e., historical and RCP8.5 scenarios from EC-EARTH in this case). Note that the predictors are conveniently harmonized and scaled before entering the model using `scaleGrid`. The predictions are saved as netCDF files in the specified local directory (`grid2nc`). These netCDF files will be later used during the validation step." 302 | ] 303 | }, 304 | { 305 | "cell_type": "code", 306 | "execution_count": null, 307 | "metadata": {}, 308 | "outputs": [], 309 | "source": [ 310 | "# building GLM1 and GLM4 models to downscale temperature \n", 311 | "# from EC-EARTH (historical and RCP8.5 scenarios)\n", 312 | "\n", 313 | "# NOTE THAT YOU MAY HAVE TO RUN THE LOOP MANUALLY IF YOUR COMPUTER DO NOT HAVE ENOUGH MEMORY CAPACITY\n", 314 | "glmName <- c(\"glm1\",\"glm4\")\n", 315 | "neighs <- c(1,4)\n", 316 | "scenario <- c(\"historical\",\"rcp85\")\n", 317 | "lapply(1:length(glmName), FUN = function(z) {\n", 318 | " s1 <- Sys.time() \n", 319 | " # downscaleChunk function from downsaleR, to build the models and to predict on the GCM projections \n", 320 | " p <- downscaleChunk(x = scaleGrid(x,type = \"standardize\"), \n", 321 | " y = y, newdata = list(xh,xf),\n", 322 | " method = \"GLM\", family = \"gaussian\", \n", 323 | " prepareData.args = list(local.predictors = list(n=neighs[z], vars = getVarNames(x)))) \n", 324 | " # save the predictions to local directory of a given GCM scenario and GLM configuration (GLM1 or GLM4) \n", 325 | " lapply(2:length(p), FUN = function(zz) { \n", 326 | " grid2nc(p[[zz]],NetCDFOutFile = paste0(\"./Data/temperature/predictions_\",scenario[zz-1],\"_\",glmName[z],\".nc4\"))\n", 327 | " })\n", 328 | " s2 <- Sys.time()\n", 329 | " c(s1,s2) \n", 330 | "})" 331 | ] 332 | }, 333 | { 334 | "cell_type": "markdown", 335 | "metadata": {}, 336 | "source": [ 337 | "### 4.2 Downscaling (temperature) with a spatial GLM\n", 338 | "\n", 339 | "The next model we use to downscale temperature is a spatial GLM which considers the leading principal components as predictors, called GLMPC (see the paper for details). To do this, we split the whole domain into the 8 [PRUDENCE regions](http://ensemblesrt3.dmi.dk/quicklook/regions.html), whose coordinates are included in the `visualizeR` package:" 340 | ] 341 | }, 342 | { 343 | "cell_type": "code", 344 | "execution_count": null, 345 | "metadata": {}, 346 | "outputs": [], 347 | "source": [ 348 | "# loading PRUDENCE regions\n", 349 | "areas <- PRUDENCEregions\n", 350 | "n <- names(PRUDENCEregions)\n", 351 | "n_regions <- length(n)" 352 | ] 353 | }, 354 | { 355 | "cell_type": "markdown", 356 | "metadata": {}, 357 | "source": [ 358 | "The next block of code allows to reproduce Fig. 1 of the paper. We use `spatialPlot` from `visualizeR` to plot the map. In addition, we also plot the spatial resolution of predictor and predictand fields using `SpatialPoints` (from the `sp` library) inside `spatialPlot`." 359 | ] 360 | }, 361 | { 362 | "cell_type": "code", 363 | "execution_count": null, 364 | "metadata": {}, 365 | "outputs": [], 366 | "source": [ 367 | "# code to reproduce Fig. 1 of the paper\n", 368 | "coords_x <- expand.grid(x$xyCoords$x,x$xyCoords$y) ; names(coords_x) <- c(\"x\",\"y\") \n", 369 | "grid_clim <- climatology(subsetDimension(x,dimension = \"var\",indices = 1))\n", 370 | "coords_y <- expand.grid(y$xyCoords$x,y$xyCoords$y) ; names(coords_y) <- c(\"x\",\"y\")\n", 371 | "spatialPlot(grid_clim,at = seq(-2, 2, 0.1), set.min = 4, set.max = 8, \n", 372 | " backdrop.theme = \"coastline\", \n", 373 | " sp.layout = list(list(SpatialPoints(coords_x), first = FALSE, \n", 374 | " col = \"black\", pch = 19, cex = 0.4),\n", 375 | " list(SpatialPoints(coords_y), first = FALSE, \n", 376 | " col = \"gray\", pch = 19, cex = 0.1),\n", 377 | " list(areas[1], col = \"red\", lwd = 2),\n", 378 | " list(areas[2], col = \"brown\", lwd = 2),\n", 379 | " list(areas[3], col = \"orange\", lwd = 2),\n", 380 | " list(areas[4], col = \"darkolivegreen4\", lwd = 2),\n", 381 | " list(areas[5], col = \"purple\", lwd = 2),\n", 382 | " list(areas[6], col = \"deeppink\", lwd = 2),\n", 383 | " list(areas[7], col = \"gray47\", lwd = 2),\n", 384 | " list(areas[8], col = \"blue\", lwd = 2)),colorkey = FALSE)" 385 | ] 386 | }, 387 | { 388 | "cell_type": "markdown", 389 | "metadata": {}, 390 | "source": [ 391 | "At this point, we build a GLM model for each PRUDENCE region which uses as predictors the principal components explaining the 95% of the total variance over the region. To do this, we first use `downscaleTrain` to train the GLM based on reanalysis and observations (note that `prepareData` allows to easily compute the principal components required). Subsequently, `downscalePredict` is used to apply the learnt model to make predictions from EC-EARTH (both for the historical and RCP8.5 scenarios). The perdictions are merged into a single C4R object (`mergeGrid`) and saved locally as netCDF files (`grid2nc`)." 392 | ] 393 | }, 394 | { 395 | "cell_type": "code", 396 | "execution_count": null, 397 | "metadata": {}, 398 | "outputs": [], 399 | "source": [ 400 | "# building the GLMPC model to downscale temperature \n", 401 | "# from EC-EARTH (historical and RCP8.5 scenarios)\n", 402 | "s1 <- Sys.time()\n", 403 | "p <- lapply(1:n_regions, FUN = function(i) { # to loop over the Prudence Regions\n", 404 | " xlim <- areas[n[i]]@bbox[1,]; ylim <- areas[n[i]]@bbox[2,] \n", 405 | " if (i == 6) xlim[2] <- xlim[2] + 0.5\n", 406 | " x <- subsetGrid(x,lonLim = xlim,latLim = ylim) # subset the latitude-longitude are of the given Prudence Region\n", 407 | " y <- loadGridData(dataset = \"E-OBS_v14_0.50regular\",\n", 408 | " var = \"tas\",\n", 409 | " lonLim = xlim,\n", 410 | " latLim = ylim,\n", 411 | " years = 1979:2008) \n", 412 | " # Compute the PCs that explain the 95% of the variance with prepareData, as indicated by the argument spatial.predictors\n", 413 | " xyT <- prepareData(x = scaleGrid(x,type = \"standardize\"), y = y,\n", 414 | " spatial.predictors = list(v.exp=0.95, which.combine = getVarNames(x)),\n", 415 | " combined.only = TRUE)\n", 416 | " # Build the GLM model \n", 417 | " model <- downscaleTrain(xyT,\n", 418 | " method = \"GLM\",\n", 419 | " family = \"gaussian\")\n", 420 | " \n", 421 | " lapply(1:2, FUN = function(z) { \n", 422 | " if (z == 1) {grid <- xh} else if (z == 2) {grid <- xf}\n", 423 | " grid <- subsetGrid(grid,lonLim = xlim,latLim = ylim) # subset the latitude-longitude are of the given Prudence Region for the GCM predictors \n", 424 | " xyt <- prepareNewData(grid,xyT)\n", 425 | " # Predict \n", 426 | " downscalePredict(xyt,model) %>% redim(drop = TRUE)\n", 427 | " })\n", 428 | "}) %>% unlist(recursive = FALSE)\n", 429 | "s2 <- Sys.time()\n", 430 | "\n", 431 | "# We bind the 8 PRUDENCE regions in a single C4R object\n", 432 | "lapply(c(\"historical\",\"rcp85\"), FUN = function(z){\n", 433 | " if (z == \"historical\") {ind <- seq(1,n_regions*2,2)} \n", 434 | " else if (z == \"rcp85\") {ind <- seq(2,n_regions*2,2)}\n", 435 | " p <- p[ind] \n", 436 | " p <- lapply(1:getShape(p[[1]],\"time\"), FUN = function(zz){ # for computational tractability we loop over time and after bind with bindGrid at the end of the loop\n", 437 | " lapply(1:length(p), FUN = function(z){\n", 438 | " subsetDimension(p[[z]],dimension = \"time\", indices = zz)\n", 439 | " }) %>% mergeGrid(aggr.fun = list(FUN = \"mean\",na.rm = TRUE)) # use mergeGrid to merge all the Prudence Regions predictions into one single C4R object\n", 440 | " }) %>% bindGrid(dimension = \"time\")\n", 441 | " p <- p[c(\"Variable\",\"Data\",\"xyCoords\",\"Dates\")]\n", 442 | " # save the predictions to local directory of a given GCM scenario \n", 443 | " grid2nc(p,NetCDFOutFile = paste0(\"./Data/temperature/predictions_\",z,\"_glmPC.nc4\"))\n", 444 | "})\n", 445 | "s3 <- Sys.time()\n", 446 | "c(s1,s2,s3)" 447 | ] 448 | }, 449 | { 450 | "cell_type": "markdown", 451 | "metadata": {}, 452 | "source": [ 453 | "### 4.3 Downscaling (temperature) with deep neural networks\n", 454 | "The following blocks of code explain how to build the CNN model used in the paper. We would like to encourage the reader to visit the [`downsaleR.keras` GitHub repository](https://github.com/SantanderMetGroup/downscaleR.keras) and/or the official [keras documentation](https://keras.io/getting_started/) to better understand this part of the notebook.\n", 455 | "\n", 456 | "First of all, we call the `prepareData.keras` function from `downscaleR.keras` to reshape the predictors and predictands so that they fit the type of network topology used. In our case the input layer is convolutionally connected to the first hidden layer (`first.connection = \"conv\"`) whereas the last hidden layer is fully connected to the output layer (`last.connection = \"dense\"`)." 457 | ] 458 | }, 459 | { 460 | "cell_type": "code", 461 | "execution_count": null, 462 | "metadata": {}, 463 | "outputs": [], 464 | "source": [ 465 | "s1 <- Sys.time() \n", 466 | "# reshaping predictand and predictors to fit our network topology\n", 467 | "xyT <- prepareData.keras(scaleGrid(x,type = \"standardize\"),\n", 468 | " y,\n", 469 | " first.connection = \"conv\",\n", 470 | " last.connection = \"dense\",\n", 471 | " channels = \"last\")" 472 | ] 473 | }, 474 | { 475 | "cell_type": "markdown", 476 | "metadata": {}, 477 | "source": [ 478 | "Our CNN is defined using the `keras` functions loaded via `downscaleR.keras`. The model consists of three convolutional hidden layers with 50, 25 and 10 feature maps with *ReLu* activation functions. " 479 | ] 480 | }, 481 | { 482 | "cell_type": "code", 483 | "execution_count": null, 484 | "metadata": {}, 485 | "outputs": [], 486 | "source": [ 487 | "# defining our CNN (see the keras documentation for details)\n", 488 | "inputs <- layer_input(shape = dim(xyT$x.global)[-1])\n", 489 | "l0 = inputs\n", 490 | "l1 = layer_conv_2d(l0 ,filters = 50, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 491 | "l2 = layer_conv_2d(l1,filters = 25, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 492 | "l3 = layer_conv_2d(l2,filters = 10, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 493 | "l4 = layer_flatten(l3)\n", 494 | "outputs = layer_dense(l4,units = dim(xyT$y$Data)[2])\n", 495 | "model <- keras_model(inputs = inputs, outputs = outputs)" 496 | ] 497 | }, 498 | { 499 | "cell_type": "markdown", 500 | "metadata": {}, 501 | "source": [ 502 | "The model is trained based on reanalysis and observations using the `downscaleTrain.keras` function. Note that an early-stopping criterion with a patience of 30 epochs is set up through the `callback_early_stopping` function. Once trained, the resulting model is saved to the local output directory using the `callback_model_checkpoint` function. " 503 | ] 504 | }, 505 | { 506 | "cell_type": "code", 507 | "execution_count": null, 508 | "metadata": {}, 509 | "outputs": [], 510 | "source": [ 511 | "# training the CNN model\n", 512 | "downscaleTrain.keras(obj = xyT,\n", 513 | " model = model,\n", 514 | " clear.session = TRUE,\n", 515 | " compile.args = list(\"loss\" = \"mse\",\n", 516 | " \"optimizer\" = optimizer_adam(lr = 0.0001)),\n", 517 | " fit.args = list(\"batch_size\" = 100,\n", 518 | " \"epochs\" = 1000,\n", 519 | " \"validation_split\" = 0.1,\n", 520 | " \"verbose\" = 1,\n", 521 | " \"callbacks\" = list(callback_early_stopping(patience = 30),\n", 522 | " callback_model_checkpoint(filepath=paste0('./models/temperature/CNN.h5'),\n", 523 | " monitor='val_loss', save_best_only=TRUE))))" 524 | ] 525 | }, 526 | { 527 | "cell_type": "markdown", 528 | "metadata": {}, 529 | "source": [ 530 | "Subsequently, the model is applied to make predictions from EC-EARTH (both for the historical and RCP8.5 scenarios) using the `downscalePredict.keras`. The results are locally saved as netCDF files (`grid2nc`)." 531 | ] 532 | }, 533 | { 534 | "cell_type": "code", 535 | "execution_count": null, 536 | "metadata": {}, 537 | "outputs": [], 538 | "source": [ 539 | "# using the CNN model to make temperature predictions from \n", 540 | "# EC-EARTH (both for historical and RCP8.5 scenarios)\n", 541 | "scenario <- c(\"historical\",\"rcp85\")\n", 542 | "lapply(scenario,FUN = function(z){\n", 543 | " # 'if' loop to distnguish between the historical and RCP8.5 scenario. \n", 544 | " if (z == \"historical\") {xy <- prepareNewData.keras(xh,xyT)} \n", 545 | " else if (z == \"rcp85\") {xy <- prepareNewData.keras(xf,xyT)}\n", 546 | " # Downscale with the CNN \n", 547 | " p <- downscalePredict.keras(xy,\n", 548 | " model = list(\"filepath\" = paste0(\"./models/temperature/CNN.h5\")),\n", 549 | " C4R.template = y,\n", 550 | " clear.session = TRUE)\n", 551 | " # save the predictions to local directory of a given GCM scenario \n", 552 | " grid2nc(p,NetCDFOutFile = paste0(\"./Data/temperature/predictions_\",z,\"_CNN.nc4\"))\n", 553 | "}) \n", 554 | "rm(xyT)\n", 555 | "s2 <- Sys.time() \n", 556 | "c(s1,s2)" 557 | ] 558 | }, 559 | { 560 | "cell_type": "markdown", 561 | "metadata": {}, 562 | "source": [ 563 | "### 4.4 Download temperature from the EC-Earth\n", 564 | "In this section we download EC-EARTH's temperature over continental Europe. For direct comparison purposes with the downscaled projections, we apply the E-OBS mask (note this dataset only provides data over land) to filter out sea points, and interpolate both models to a commo 0.5º regular grid. The resulting data are locally saved as netCDF files. " 565 | ] 566 | }, 567 | { 568 | "cell_type": "code", 569 | "execution_count": null, 570 | "metadata": {}, 571 | "outputs": [], 572 | "source": [ 573 | "# preparing E-OBS mask\n", 574 | "mask_h <- loadGridData(\"./Data/temperature/tas_E-OBS_v14_0.50regular.nc4\",var = \"tas\") %>% \n", 575 | " gridArithmetics(0) %>% gridArithmetics(1, operator = \"+\") \n", 576 | "mask_f <- subsetDimension(mask_h,dimension = \"time\", indices = 1:(getShape(mask_h,\"time\")-1))" 577 | ] 578 | }, 579 | { 580 | "cell_type": "code", 581 | "execution_count": null, 582 | "metadata": {}, 583 | "outputs": [], 584 | "source": [ 585 | "# post-processing EC-EARTH (temperature)\n", 586 | "labelsCM <- c(\"CMIP5-subset_EC-EARTH_r12i1p1_historical\", # UDG labels (see UDG.datasets())\n", 587 | " \"CMIP5-subset_EC-EARTH_r12i1p1_rcp85\") # UDG labels (see UDG.datasets())\n", 588 | "for (z in 1:length(labelsCM)) {\n", 589 | " if (z == 1) {years <- 1979:2008 ; mask <- mask_h}\n", 590 | " if (z == 2) {years <- 2071:2100 ; mask <- mask_f}\n", 591 | " grid <- loadGridData(dataset = labelsCM[z],\n", 592 | " var = \"tas\",\n", 593 | " lonLim = c(-10,32),\n", 594 | " latLim = c(36,72), \n", 595 | " years = years) %>% \n", 596 | " interpGrid(getGrid(y)) \n", 597 | " grid %>% gridArithmetics(mask) %>% \n", 598 | " grid2nc(NetCDFOutFile = paste0(\"./Data/temperature/tas_\",labelsCM[z],\".nc4\"))\n", 599 | " # interpGrid: We interpolate the EC-Earth's resolution to math that of the predictand resolution of interest: 0.5º\n", 600 | " # gridArithmetics: to mask the sea\n", 601 | " # grid2nc: save the EC-Earth's air temperature in netCDF file \n", 602 | "}" 603 | ] 604 | }, 605 | { 606 | "cell_type": "markdown", 607 | "metadata": {}, 608 | "source": [ 609 | "### 4.5 Validation of results\n", 610 | "The metrics used to validate the downscaled temperature predictions obtained in the previous sections are listed in Table 1 of the paper. Here, we explain how to replicate some of the results presented in the paper. In particular, we focuse on the biases for P02, Mean and P98 between 1) the historical and observed values and 2) the RCP8.5 and historical values. These pairs of datasets are coupled in a list below, named as `models`. `loadGridData` is used to load into `R` the corresponding netCDF files produced (and locally saved) along the previous sections and the `valueMeasure` function from `climate4R.value` is used to compute the associated biases. The result is a list of multigrid C4R objects containing the aforementioned validation metrics." 611 | ] 612 | }, 613 | { 614 | "cell_type": "code", 615 | "execution_count": null, 616 | "metadata": {}, 617 | "outputs": [], 618 | "source": [ 619 | "# computing validation metrics\n", 620 | "models <- list(\n", 621 | " c(\"tas_E-OBS_v14_0.50regular\",NA),\n", 622 | " c(\"tas_E-OBS_v14_0.50regular\",\"tas_CMIP5-subset_EC-EARTH_r12i1p1_historical\"),\n", 623 | " c(\"tas_E-OBS_v14_0.50regular\",\"predictions_historical_glm1\"),\n", 624 | " c(\"tas_E-OBS_v14_0.50regular\",\"predictions_historical_glm4\"),\n", 625 | " c(\"tas_E-OBS_v14_0.50regular\",\"predictions_historical_glmPC\"),\n", 626 | " c(\"tas_E-OBS_v14_0.50regular\",\"predictions_historical_CNN\"),\n", 627 | " c(\"tas_CMIP5-subset_EC-EARTH_r12i1p1_historical\",NA),\n", 628 | " c(\"tas_CMIP5-subset_EC-EARTH_r12i1p1_historical\",\"tas_CMIP5-subset_EC-EARTH_r12i1p1_rcp85\"),\n", 629 | " c(\"predictions_historical_glm1\",\"predictions_rcp85_glm1\"),\n", 630 | " c(\"predictions_historical_glm4\",\"predictions_rcp85_glm4\"),\n", 631 | " c(\"predictions_historical_glmPC\",\"predictions_rcp85_glmPC\"),\n", 632 | " c(\"predictions_historical_CNN\",\"predictions_rcp85_CNN\")\n", 633 | ")\n", 634 | "index <- c(\"P02\",\"Mean\",\"P98\")\n", 635 | "validation.list <- lapply(1:length(models), FUN = function(zz){ # We loop over the grids to compute the validation indices of interest\n", 636 | " args <- list()\n", 637 | " if (!any(zz == c(1,7))) {\n", 638 | " args[[\"y\"]] <- loadGridData(paste0(\"./Data/temperature/\",models[[zz]][1],\".nc4\"),var = \"tas\")\n", 639 | " args[[\"x\"]] <- loadGridData(paste0(\"./Data/temperature/\",models[[zz]][2],\".nc4\"),var = \"tas\")\n", 640 | " } else {\n", 641 | " args[[\"grid\"]] <- loadGridData(paste0(\"./Data/temperature/\",models[[zz]][1],\".nc4\"),var = \"tas\") \n", 642 | " }\n", 643 | " lapply(1:length(index), FUN = function(z) {\n", 644 | " if (zz == 5) args[[\"y\"]] <- intersectGrid(args[[\"y\"]],args[[\"x\"]],type = \"spatial\")\n", 645 | " if (!any(zz == c(1,7))) args[[\"measure.code\"]] <- \"bias\"\n", 646 | " args[[\"index.code\"]] <- index[z]\n", 647 | " if (any(zz == c(1,7))) return(do.call(\"valueIndex\",args)$Index)\n", 648 | " if (!any(zz == c(1,7))) return(do.call(\"valueMeasure\",args)$Measure)\n", 649 | " }) %>% makeMultiGrid()\n", 650 | "})" 651 | ] 652 | }, 653 | { 654 | "cell_type": "markdown", 655 | "metadata": {}, 656 | "source": [ 657 | "Next, we use `spatialPlot` to plot the results (temperature pannels in Fig.3 of the paper)." 658 | ] 659 | }, 660 | { 661 | "cell_type": "code", 662 | "execution_count": null, 663 | "metadata": {}, 664 | "outputs": [], 665 | "source": [ 666 | "# replicating temperature pannels in Fig. 3 of the paper\n", 667 | "nmes <- c(\"EC-EARTH\",\"GLM1\",\"GLM4\",\"GLMPC\",\"CNN\")\n", 668 | "cb <- rev(brewer.pal(n = 11, \"RdBu\"))\n", 669 | "cb[5:7] <- \"#FFFFFF\"; cb <- cb %>% colorRampPalette()\n", 670 | "val.plots <- lapply(1:length(nmes), FUN = function(zz) {\n", 671 | " lapply(1:length(index), FUN = function(z) {\n", 672 | " at <- \n", 673 | " spatialPlot(redim(subsetGrid(validation.list[[zz+1]],var = index[z]),drop = TRUE),\n", 674 | " backdrop.theme = \"coastline\",\n", 675 | " main = paste(nmes[zz],\"- bias\",index[z]),\n", 676 | " col.regions = cb,\n", 677 | " at = seq(-5,5,length.out = 21),\n", 678 | " set.min = -5, set.max = 5) \n", 679 | " }) \n", 680 | "}) %>% unlist(recursive = FALSE)\n", 681 | "pdf(file = \"./figures/fig01_temperature.pdf\",width = 10,height = 16)\n", 682 | "grid.arrange(grobs = val.plots, ncol = 3)\n", 683 | "dev.off() " 684 | ] 685 | }, 686 | { 687 | "cell_type": "markdown", 688 | "metadata": {}, 689 | "source": [ 690 | "We also plot the climate change signals obtained for the downscaled projections and for the raw EC-EARTH outputs (temperature pannels in Fig. 4 of the paper)." 691 | ] 692 | }, 693 | { 694 | "cell_type": "code", 695 | "execution_count": null, 696 | "metadata": {}, 697 | "outputs": [], 698 | "source": [ 699 | "# replicating temperature pannels in Fig. 4 of the paper\n", 700 | "delta.plots <- lapply(1:length(nmes), FUN = function(zz) {\n", 701 | " lapply(1:length(index), FUN = function(z) {\n", 702 | " if (zz == 1) {\n", 703 | " spatialPlot(redim(subsetGrid(validation.list[[zz+7]],var = index[z]),drop = TRUE),\n", 704 | " backdrop.theme = \"coastline\",\n", 705 | " main = paste(nmes[zz],\"- delta\",index[z]),\n", 706 | " col.regions = brewer.pal(n = 9, \"OrRd\") %>% colorRampPalette(),\n", 707 | " at = seq(0,10,0.5),\n", 708 | " set.min = 0, set.max = 10) \n", 709 | " } else {\n", 710 | " grid2 <- subsetGrid(validation.list[[zz+7]],var = index[z])\n", 711 | " grid1 <- intersectGrid(subsetGrid(validation.list[[8]],var = index[z]),grid2,type = \"spatial\")\n", 712 | " grid <- gridArithmetics(grid2,\n", 713 | " grid1,\n", 714 | " operator = \"-\")\n", 715 | " spatialPlot(grid,\n", 716 | " backdrop.theme = \"coastline\",\n", 717 | " main = paste(nmes[zz],\"- delta diff.\",index[z]),\n", 718 | " col.regions = cb,\n", 719 | " at = seq(-5, 5, length.out = 21),\n", 720 | " set.min = -5, set.max = 5)\n", 721 | " }\n", 722 | " }) \n", 723 | "}) %>% unlist(recursive = FALSE)\n", 724 | "pdf(file = \"./figures/fig02_temperature.pdf\",width = 10,height = 16)\n", 725 | "grid.arrange(grobs = delta.plots, ncol = 3)\n", 726 | "dev.off() " 727 | ] 728 | }, 729 | { 730 | "cell_type": "markdown", 731 | "metadata": {}, 732 | "source": [ 733 | "## 5. Precipitation\n", 734 | "As for the case of temperature (Section 3), we present in this section the code needed to replicate the downscaling of precipitation for the historical and RCP8.5 scenarios of EC-EARTH. We start by loading E-OBS precipitation and saving it locally:" 735 | ] 736 | }, 737 | { 738 | "cell_type": "code", 739 | "execution_count": null, 740 | "metadata": {}, 741 | "outputs": [], 742 | "source": [ 743 | "y <- loadGridData(dataset = \"E-OBS_v14_0.50regular\",\n", 744 | " var = \"pr\",\n", 745 | " lonLim = c(-10,32),\n", 746 | " latLim = c(36,72), \n", 747 | " years = 1979:2008)\n", 748 | "grid2nc(y,NetCDFOutFile = \"./Data/precip/pr_E-OBS_v14_0.50regular.nc4\")" 749 | ] 750 | }, 751 | { 752 | "cell_type": "markdown", 753 | "metadata": {}, 754 | "source": [ 755 | "### 5.1 Downscaling (precipitation) with two local GLMs\n", 756 | "\n", 757 | "Unlike for temperature, we need two independent GLMs to downscale precipitation: one for precipitation occurrence and another for precipitation amount. Therefore, we need to define a first binomial GLM (`family = binomial(link = \"logit\")`) which will produce binary deterministic (yes/no) predictions of occurrence plus a second gamma GLM (`family = Gamma(link = \"log\")` and `simulate = TRUE`) which will produce stochastic values of precipitaton amount. Both deterministic and stochastic series need to be multiplied (`gridArithmetics` function) to obtain the final predicted precipitation, which are locally saved in netCDF format. Note that the `prepareData.args` input list in `downscaleChunk` allows for specifyinig the type of predictors to be considered (local information at neighbouring gridboxes in this case)." 758 | ] 759 | }, 760 | { 761 | "cell_type": "code", 762 | "execution_count": null, 763 | "metadata": {}, 764 | "outputs": [], 765 | "source": [ 766 | "# building GLM1 and GLM4 models to downscale precipitation \n", 767 | "# from EC-EARTH (historical and RCP8.5 scenarios)\n", 768 | "\n", 769 | "# NOTE THAT YOU MAY HAVE TO RUN THE LOOP MANUALLY IF YOUR COMPUTER DO NOT HAVE ENOUGH MEMORY CAPACITY\n", 770 | "glmName <- c(\"glm1\",\"glm4\")\n", 771 | "neighs <- c(1,4)\n", 772 | "scenario <- c(\"historical\",\"rcp85\")\n", 773 | "lapply(1:length(glmName), FUN = function(z){ # GLM1 and GLM4\n", 774 | " s1 <- Sys.time() \n", 775 | " # Occurrence model (logistic regression) \n", 776 | " pred_ocu <- downscaleChunk(x = scaleGrid(x,type = \"standardize\"), \n", 777 | " y = binaryGrid(y,condition = \"GE\",threshold = 1), \n", 778 | " newdata = list(xh,xf),\n", 779 | " method = \"GLM\", \n", 780 | " family = binomial(link = \"logit\"), \n", 781 | " simulate = c(FALSE,TRUE),\n", 782 | " prepareData.args = list(local.predictors = list(n=neighs[z], vars = getVarNames(x))))\n", 783 | " # rainfall model (gamma regression with link logarithmic). We substract 0.99 to center the Gamma on (recall that rainy day >= 1mm/day)\n", 784 | " pred_amo <- downscaleChunk(x = scaleGrid(x,type = \"standardize\"), \n", 785 | " y = gridArithmetics(y,0.99,operator = \"-\"), \n", 786 | " newdata = list(xh,xf),\n", 787 | " method = \"GLM\", \n", 788 | " family = Gamma(link = \"log\"), \n", 789 | " simulate = c(FALSE,TRUE),\n", 790 | " condition = \"GT\", threshold = 0,\n", 791 | " prepareData.args = list(local.predictors = list(n=neighs[z], vars = getVarNames(x))))\n", 792 | " for (a in 1:length(pred_amo)) pred_amo[[a]] %<>% gridArithmetics(0.99,operator = \"+\") \n", 793 | " \n", 794 | " # Save the deterministic predictions \n", 795 | " lapply(c(2,3), FUN = function(zz) {\n", 796 | " # We transform the probabilities to binary values with binaryGrid\n", 797 | " pred_bin <- binaryGrid(pred_ocu[[zz]],ref.obs = binaryGrid(y,condition = \"GE\",threshold = 1),ref.pred = pred_ocu[[1]])\n", 798 | " # We recreate the precipitation serie by multiplying the predictions from both gamma and regression models. \n", 799 | " p <- gridArithmetics(pred_amo[[zz]],pred_bin)\n", 800 | " # We save the predictions of a given GLM model and scenario to a local directory \n", 801 | " grid2nc(p,NetCDFOutFile = paste0(\"./Data/precip/predictions_\",scenario[zz-1],\"_deterministic_\",glmName[z],\".nc4\"))\n", 802 | " })\n", 803 | " # Save the stochastic predictions \n", 804 | " lapply(c(4,5), FUN = function(zz) {\n", 805 | " # We recreate the precipitation serie by multiplying the predictions from both gamma and regression models. \n", 806 | " p <- gridArithmetics(pred_amo[[zz]],pred_ocu[[zz]])\n", 807 | " # We save the predictions of a given GLM model and scenario to a local directory \n", 808 | " grid2nc(p,NetCDFOutFile = paste0(\"./Data/precip/predictions_\",scenario[zz-3],\"_stochastic_\",glmName[z],\".nc4\"))\n", 809 | " })\n", 810 | " s2 <- Sys.time()\n", 811 | " c(s1,s2) \n", 812 | "})" 813 | ] 814 | }, 815 | { 816 | "cell_type": "markdown", 817 | "metadata": {}, 818 | "source": [ 819 | "### 5.2 Downscaling (precipitation) with a spatial GLM\n", 820 | "\n", 821 | "This section is equivalent to Section 3.2 but for precipitation. Therefore, as explained in the previous subsection, the main difference is the inclusion of two (instead just one) GLMs for each PRUDENCE region, which are needed due to the mixed (binary/continuous) character of precipitation. Again, note that `prepareData` allows for easiliy computing the principal components needed as predictors within each region." 822 | ] 823 | }, 824 | { 825 | "cell_type": "code", 826 | "execution_count": null, 827 | "metadata": {}, 828 | "outputs": [], 829 | "source": [ 830 | "# building the GLMPC model to downscale precipitation \n", 831 | "# from EC-EARTH (historical and RCP8.5 scenarios)\n", 832 | "s1 <- Sys.time()\n", 833 | "\n", 834 | "p <- lapply(1:n_regions, FUN = function(i) { # we loop over the Prudence Regions\n", 835 | " xlim <- areas[n[i]]@bbox[1,]; ylim <- areas[n[i]]@bbox[2,] \n", 836 | " if (i == 6) xlim[2] <- xlim[2] + 0.5\n", 837 | " x <- subsetGrid(x,lonLim = xlim,latLim = ylim)\n", 838 | " y <- loadGridData(dataset = \"E-OBS_v14_0.50regular\",var = \"pr\",lonLim = xlim,latLim = ylim,years = 1979:2008)\n", 839 | " # We train the logistic GLM \n", 840 | " xyT <- prepareData(x = scaleGrid(x,type = \"standardize\"), y = binaryGrid(y,condition = \"GE\",threshold = 1),\n", 841 | " spatial.predictors = list(v.exp=0.95, which.combine = getVarNames(x)),\n", 842 | " combined.only = TRUE) \n", 843 | " model <- downscaleTrain(xyT,method = \"GLM\",family = binomial(link = \"logit\"))\n", 844 | " # We predict on the train set, which is to be used to adjust the frequency of rainy days on the prediction set \n", 845 | " pred_ocu_train <- model$pred %>% redim(drop = TRUE)\n", 846 | " \n", 847 | " pred_ocu <- lapply(c(FALSE,TRUE), FUN = function(sim) {\n", 848 | " lapply(1:2, FUN = function(z) { \n", 849 | " if (z == 1) {grid <- xh} else if (z == 2) {grid <- xf}\n", 850 | " grid <- subsetGrid(grid,lonLim = xlim,latLim = ylim) # subset the latitude-longitude are of the given Prudence Region for the GCM predictors \n", 851 | " xyt <- prepareNewData(grid,xyT)\n", 852 | " # Predict \n", 853 | " pred_ocu <- downscalePredict(xyt,model,simulate = sim) %>% redim(drop = TRUE)\n", 854 | " if (!isTRUE(sim)) pred_ocu <- binaryGrid(pred_ocu,ref.obs = binaryGrid(y,condition = \"GE\",threshold = 1),ref.pred = pred_ocu_train) \n", 855 | " pred_ocu \n", 856 | " }) \n", 857 | " }) %>% unlist(recursive = FALSE)\n", 858 | " rm(model) # To free memory \n", 859 | " # We train the Gamma GLM \n", 860 | " xyT <- prepareData(x = scaleGrid(x,type = \"standardize\"), y = gridArithmetics(y,0.99,operator = \"-\"),\n", 861 | " spatial.predictors = list(v.exp=0.95, which.combine = getVarNames(x)),\n", 862 | " combined.only = TRUE)\n", 863 | " model <- downscaleTrain(xyT,method = \"GLM\",family = Gamma(link = \"log\"),condition = \"GT\", threshold = 0)\n", 864 | " pred <- lapply(c(FALSE,TRUE), FUN = function(sim) {\n", 865 | " lapply(1:2, FUN = function(z) { \n", 866 | " if (z == 1) {grid <- xh} else if (z == 2) {grid <- xf}\n", 867 | " grid <- subsetGrid(grid,lonLim = xlim,latLim = ylim) # subset the latitude-longitude are of the given Prudence Region for the GCM predictors \n", 868 | " xyt <- prepareNewData(grid,xyT)\n", 869 | " # Predict \n", 870 | " grid_amo <- downscalePredict(xyt,model,simulate = sim) %>% gridArithmetics(0.99,operator = \"+\") %>% redim(drop = TRUE)\n", 871 | " \n", 872 | " if (!isTRUE(sim) && z == 1) grid_ocu <- pred_ocu[[1]] \n", 873 | " if (!isTRUE(sim) && z == 2) grid_ocu <- pred_ocu[[2]] \n", 874 | " if (isTRUE(sim) && z == 1) grid_ocu <- pred_ocu[[3]] \n", 875 | " if (isTRUE(sim) && z == 2) grid_ocu <- pred_ocu[[4]] \n", 876 | " gridArithmetics(grid_amo,grid_ocu) \n", 877 | " })\n", 878 | " }) %>% unlist(recursive = FALSE) \n", 879 | "}) %>% unlist(recursive = FALSE) \n", 880 | "s2 <- Sys.time()\n", 881 | "\n", 882 | "# We bind the 8 PRUDENCE regions in a single C4R object\n", 883 | "lapply(c(\"deterministic\",\"stochastic\"), FUN = function(zz) {\n", 884 | " lapply(c(\"historical\",\"rcp85\"), FUN = function(z) {\n", 885 | " if (zz == \"deterministic\" && z == \"historical\") {ind <- seq(1,n_regions*4,4)} \n", 886 | " if (zz == \"deterministic\" && z == \"rcp85\") {ind <- seq(2,n_regions*4,4)} \n", 887 | " if (zz == \"stochastic\" && z == \"historical\") {ind <- seq(3,n_regions*4,4)} \n", 888 | " if (zz == \"stochastic\" && z == \"rcp85\") {ind <- seq(4,n_regions*4,4)} \n", 889 | " p <- p[ind] \n", 890 | " p <- lapply(1:getShape(p[[1]],\"time\"), FUN = function(zz){ # for computational tractability we loop over time and after bind with bindGrid at the end of the loop\n", 891 | " lapply(1:length(p), FUN = function(z){\n", 892 | " subsetDimension(p[[z]],dimension = \"time\", indices = zz)\n", 893 | " }) %>% mergeGrid(aggr.fun = list(FUN = \"mean\",na.rm = TRUE)) # use mergeGrid to merge all the Prudence Regions predictions into one single C4R object\n", 894 | " }) %>% bindGrid(dimension = \"time\")\n", 895 | " p <- p[c(\"Variable\",\"Data\",\"xyCoords\",\"Dates\")]\n", 896 | " # save the predictions to local directory of a given GCM scenario \n", 897 | " grid2nc(p,NetCDFOutFile = paste0(\"./Data/precip/predictions_\",z,\"_\",zz,\"_glmPC.nc4\"))\n", 898 | " })\n", 899 | "})\n", 900 | "s3 <- Sys.time()\n", 901 | "c(s1,s2,s3) " 902 | ] 903 | }, 904 | { 905 | "cell_type": "markdown", 906 | "metadata": {}, 907 | "source": [ 908 | "### 5.3 Downscaling (precipitation) with deep neural networks\n", 909 | "\n", 910 | "For the particular case of precipitation, there is a lack of data in the E-OBS dataset, especially for the period 2005-2008 over the (45º-49ºN, 16º-25ºE) domain. Therefore, and due to the multi-site nature of neural networks, we get rid of the days presenting no data in this region. To do so, we use the functions `filterNA`, `subsetGrid` and `intersectGrid` from `transformeR`. Beyond this particularity, the downscaling process is esentially the same presented for temperature in Section 3.3." 911 | ] 912 | }, 913 | { 914 | "cell_type": "code", 915 | "execution_count": null, 916 | "metadata": {}, 917 | "outputs": [], 918 | "source": [ 919 | "s1 <- Sys.time() \n", 920 | "# discarding days presenting missing data over \n", 921 | "# the (45º-49ºN, 16º-25ºE) domain in E-OBS\n", 922 | "ysub <- filterNA(subsetGrid(y,latLim = c(49,55), lonLim = c(16,25))) %>% intersectGrid(y,which.return = 2)\n", 923 | "xsub <- intersectGrid(x,ysub,which.return = 1)\n", 924 | "xyT <- prepareData.keras(scaleGrid(xsub,type = \"standardize\"),\n", 925 | " binaryGrid(gridArithmetics(ysub,0.99, operator = \"-\"),\n", 926 | " condition = \"GE\",\n", 927 | " threshold = 0,\n", 928 | " partial = TRUE),\n", 929 | " first.connection = \"conv\",\n", 930 | " last.connection = \"dense\",\n", 931 | " channels = \"last\")" 932 | ] 933 | }, 934 | { 935 | "cell_type": "markdown", 936 | "metadata": {}, 937 | "source": [ 938 | "We define our CNN, which consists of three convolutional hidden layers with 50, 25 and 1 feature maps with *ReLu* activation functions. The last hidden layer is fully connected to the output layer, which is a concatenation of 3 layers representing the 3 estimated parameters (*p* = probability of rain, *alpha* = shape factor, *beta* = scale factor) for each predictand gridbox. We use `downscaleTrain.keras` to infer the model, which optimizes the negative log-likelihood of a Bernouilli-Gamma distribution. This is specified through the custom `bernouilliGamma.loss_function` loss function from `downscaleR.keras`. Note that, unlike for the case of the GLMs, the occurrence and amount of rainfall is simultaneously estimated by our CNN. " 939 | ] 940 | }, 941 | { 942 | "cell_type": "code", 943 | "execution_count": null, 944 | "metadata": {}, 945 | "outputs": [], 946 | "source": [ 947 | "# defining and training our CNN (see the keras documentation for details)\n", 948 | "inputs <- layer_input(shape = dim(xyT$x.global)[-1])\n", 949 | "l0 = inputs\n", 950 | "l1 = layer_conv_2d(l0 ,filters = 50, kernel_size = c(3,3), activation = 'relu', padding = \"same\")\n", 951 | "l2 = layer_conv_2d(l1,filters = 25, kernel_size = c(3,3), activation = 'relu', padding = \"same\")\n", 952 | "l3 = layer_conv_2d(l2,filters = 1, kernel_size = c(3,3), activation = 'relu', padding = \"same\")\n", 953 | "l4 = layer_flatten(l3)\n", 954 | "parameter1 = layer_dense(l4,units = dim(xyT$y$Data)[2], activation = \"sigmoid\")\n", 955 | "parameter2 = layer_dense(l4,units = dim(xyT$y$Data)[2])\n", 956 | "parameter3 = layer_dense(l4,units = dim(xyT$y$Data)[2])\n", 957 | "outputs = layer_concatenate(list(parameter1,parameter2,parameter3))\n", 958 | "model <- keras_model(inputs = inputs, outputs = outputs)\n", 959 | "downscaleTrain.keras(obj = xyT,\n", 960 | " model = model,\n", 961 | " clear.session = TRUE,\n", 962 | " compile.args = list(\"loss\" = bernouilliGamma.loss_function(last.connection = \"dense\"),\n", 963 | " \"optimizer\" = optimizer_adam(lr = 0.0001)),\n", 964 | " fit.args = list(\"batch_size\" = 100,\n", 965 | " \"epochs\" = 1000,\n", 966 | " \"validation_split\" = 0.1,\n", 967 | " \"verbose\" = 1,\n", 968 | " \"callbacks\" = list(callback_early_stopping(patience = 30),\n", 969 | " callback_model_checkpoint(filepath=paste0('./models/precip/CNN.h5'),\n", 970 | " monitor='val_loss', save_best_only=TRUE))))" 971 | ] 972 | }, 973 | { 974 | "cell_type": "markdown", 975 | "metadata": {}, 976 | "source": [ 977 | "We use now the above model to make predictions from the training dataset, which will be later used to adjust the frequency of rainy days." 978 | ] 979 | }, 980 | { 981 | "cell_type": "code", 982 | "execution_count": null, 983 | "metadata": {}, 984 | "outputs": [], 985 | "source": [ 986 | "# predictions for the training dataset\n", 987 | "xyt <- prepareNewData.keras(scaleGrid(x,type = \"standardize\"),xyT)\n", 988 | "pred_ocu_train <- downscalePredict.keras(newdata = xyt,\n", 989 | " model = list(\"filepath\" = paste0(\"./models/precip/CNN.h5\"), \n", 990 | " \"custom_objects\" = c(\"custom_loss\" = bernouilliGamma.loss_function(last.connection = \"dense\"))),\n", 991 | " C4R.template = ysub,\n", 992 | " clear.session = TRUE) %>% \n", 993 | " subsetGrid(var = \"pr1\")\n", 994 | "rm(xyt)" 995 | ] 996 | }, 997 | { 998 | "cell_type": "markdown", 999 | "metadata": {}, 1000 | "source": [ 1001 | "Next we use `downscalePredict.keras` to make predictions from both the historical and RCP8.5 scenarios. Note that the `bernouilliGamma.statistics` function is used to compute the expectance of the conditional daily distributions using the parameters infered by the networkd on the output layer. The predictions are locally saved in netCDF format." 1002 | ] 1003 | }, 1004 | { 1005 | "cell_type": "code", 1006 | "execution_count": null, 1007 | "metadata": {}, 1008 | "outputs": [], 1009 | "source": [ 1010 | "# using the CNN model to make precipitation predictions from \n", 1011 | "# EC-EARTH (both for historical and RCP8.5 scenarios)\n", 1012 | "simulateName <- c(\"deterministic\",\"stochastic\")\n", 1013 | "simulateDeep <- c(FALSE,TRUE)\n", 1014 | "scenario <- c(\"rcp85\",\"historical\")\n", 1015 | "lapply(scenario,FUN = function(z){\n", 1016 | " # 'if' loop to distnguish between the historical and RCP8.5 scenario. \n", 1017 | " if (z == \"historical\") {xy <- prepareNewData.keras(xh,xyT)} \n", 1018 | " else if (z == \"rcp85\") {xy <- prepareNewData.keras(xf,xyT)}\n", 1019 | " pred <- downscalePredict.keras(xy,\n", 1020 | " model = list(\"filepath\" = paste0(\"./models/precip/CNN.h5\"), \n", 1021 | " \"custom_objects\" = c(\"custom_loss\" = bernouilliGamma.loss_function(last.connection = \"dense\"))),\n", 1022 | " C4R.template = ysub,\n", 1023 | " clear.session = TRUE)\n", 1024 | " rm(xy)\n", 1025 | " lapply(1:length(simulateDeep),FUN = function(zz) {\n", 1026 | " # We use the function bernouilliGamma.statistics to 1) compute the expectance or 2) sample from, the daily conditional Bernouilli-Gamma distributions \n", 1027 | " pred <- bernouilliGamma.statistics(p = subsetGrid(pred,var = \"pr1\"),\n", 1028 | " alpha = subsetGrid(pred,var = \"pr2\"),\n", 1029 | " beta = subsetGrid(pred,var = \"pr3\"),\n", 1030 | " simulate = simulateDeep[zz],\n", 1031 | " bias = 0.99)\n", 1032 | " pred_ocu <- subsetGrid(pred,var = \"probOfRain\") %>% redim(drop = TRUE)\n", 1033 | " pred_amo <- subsetGrid(pred,var = \"amountOfRain\") %>% redim(drop = TRUE)\n", 1034 | " if (!isTRUE(simulateDeep[zz])) {\n", 1035 | " pred_bin <- binaryGrid(pred_ocu,\n", 1036 | " ref.obs = binaryGrid(y,threshold = 1,condition = \"GE\"),\n", 1037 | " ref.pred = pred_ocu_train)\n", 1038 | " } else {\n", 1039 | " pred_bin <- pred_ocu\n", 1040 | " }\n", 1041 | " p <- gridArithmetics(pred_bin,pred_amo)\n", 1042 | " p$Variable$varName <- \"pr\"; attr(p$Variable,\"longname\") <- \"pr\"\n", 1043 | " # We save the CNN predictions to local directory \n", 1044 | " grid2nc(p,NetCDFOutFile = paste0(\"./Data/precip/predictions_\",z,\"_\",simulateName[zz],\"_CNN.nc4\"))\n", 1045 | " })\n", 1046 | "}) \n", 1047 | "rm(xyT,pred_ocu_train,ysub,xsub) # to save space\n", 1048 | "s2 <- Sys.time() \n", 1049 | "c(s1,s2)" 1050 | ] 1051 | }, 1052 | { 1053 | "cell_type": "markdown", 1054 | "metadata": {}, 1055 | "source": [ 1056 | "### 5.4 Download precipitation from the EC-Earth\n", 1057 | "\n", 1058 | "This section is equivalent to Section 4.4 but for precipitation." 1059 | ] 1060 | }, 1061 | { 1062 | "cell_type": "code", 1063 | "execution_count": null, 1064 | "metadata": {}, 1065 | "outputs": [], 1066 | "source": [ 1067 | "# preparing E-OBS masks\n", 1068 | "mask_h <- loadGridData(\"./Data/precip/pr_E-OBS_v14_0.50regular.nc4\",var = \"pr\") %>% \n", 1069 | " gridArithmetics(0) %>% gridArithmetics(1,operator = \"+\") \n", 1070 | "mask_f <- subsetDimension(mask_h,dimension = \"time\", indices = 1:(getShape(mask_h,\"time\")-1))\n", 1071 | "\n", 1072 | "# post-processing EC-EARTH precipitation\n", 1073 | "labelsCM <- c(\"CMIP5-subset_EC-EARTH_r12i1p1_historical\",\n", 1074 | " \"CMIP5-subset_EC-EARTH_r12i1p1_rcp85\")\n", 1075 | "for (z in 1:length(labelsCM)) {\n", 1076 | " if (z == 1) {years <- 1979:2008 ; mask <- mask_h}\n", 1077 | " if (z == 2) {years <- 2071:2100 ; mask <- mask_f}\n", 1078 | " grid <- loadGridData(dataset = labelsCM[z],\n", 1079 | " var = \"pr\",\n", 1080 | " lonLim = c(-10,32),\n", 1081 | " latLim = c(36,72), \n", 1082 | " years = years) %>% \n", 1083 | " interpGrid(getGrid(y)) %>% \n", 1084 | " binaryGrid(condition = \"GE\",threshold = 1,partial = TRUE)\n", 1085 | " grid %>% gridArithmetics(mask) %>% \n", 1086 | " grid2nc(NetCDFOutFile = paste0(\"./Data/precip/pr_\",labelsCM[z],\".nc4\"))\n", 1087 | "}" 1088 | ] 1089 | }, 1090 | { 1091 | "cell_type": "markdown", 1092 | "metadata": {}, 1093 | "source": [ 1094 | "### 5.5 Validation of results\n", 1095 | "This section is equivalent to Section 4.5 but for precipitation. The metrics computed for this variable are the relative biases for the R01, SDII and P98 indices between 1) the historical and observed values and 2) the RCP8.5 and historical values. " 1096 | ] 1097 | }, 1098 | { 1099 | "cell_type": "code", 1100 | "execution_count": null, 1101 | "metadata": {}, 1102 | "outputs": [], 1103 | "source": [ 1104 | "# computing validation metrics\n", 1105 | "models <- list(\n", 1106 | " c(\"pr_E-OBS_v14_0.50regular\",NA),\n", 1107 | " c(\"pr_E-OBS_v14_0.50regular\",\"pr_CMIP5-subset_EC-EARTH_r12i1p1_historical\"),\n", 1108 | " c(\"pr_E-OBS_v14_0.50regular\",\"predictions_historical_deterministic_glm1\"),\n", 1109 | " c(\"pr_E-OBS_v14_0.50regular\",\"predictions_historical_deterministic_glm4\"),\n", 1110 | " c(\"pr_E-OBS_v14_0.50regular\",\"predictions_historical_deterministic_glmPC\"),\n", 1111 | " c(\"pr_E-OBS_v14_0.50regular\",\"predictions_historical_deterministic_CNN\"),\n", 1112 | " c(\"pr_CMIP5-subset_EC-EARTH_r12i1p1_historical\",NA),\n", 1113 | " c(\"pr_CMIP5-subset_EC-EARTH_r12i1p1_historical\",\"pr_CMIP5-subset_EC-EARTH_r12i1p1_rcp85\"),\n", 1114 | " c(\"predictions_historical_deterministic_glm1\",\"predictions_rcp85_deterministic_glm1\"),\n", 1115 | " c(\"predictions_historical_deterministic_glm4\",\"predictions_rcp85_deterministic_glm4\"),\n", 1116 | " c(\"predictions_historical_deterministic_glmPC\",\"predictions_rcp85_deterministic_glmPC\"),\n", 1117 | " c(\"predictions_historical_deterministic_CNN\",\"predictions_rcp85_deterministic_CNN\")\n", 1118 | ")\n", 1119 | "\n", 1120 | "sdModel <- c(\"glm1\",\"glm4\",\"glmPC\",\"CNN\")\n", 1121 | "index <- c(\"R01\",\"SDII\",\"P98\",\"P98\")\n", 1122 | "validation.list <- lapply(1:length(models), FUN = function(zz){\n", 1123 | " args <- list()\n", 1124 | " if (!any(zz == c(1,7))) {\n", 1125 | " args[[\"y\"]] <- loadGridData(paste0(\"./Data/precip/\",models[[zz]][1],\".nc4\"),var = \"pr\")\n", 1126 | " args[[\"x\"]] <- loadGridData(paste0(\"./Data/precip/\",models[[zz]][2],\".nc4\"),var = \"pr\")\n", 1127 | " } else {\n", 1128 | " args[[\"grid\"]] <- loadGridData(paste0(\"./Data/precip/\",models[[zz]][1],\".nc4\"),var = \"pr\")\n", 1129 | " }\n", 1130 | " lapply(1:length(index), FUN = function(z) {\n", 1131 | " if (any(z == c(3,4))) {\n", 1132 | " args[[\"condition\"]] <- \"GE\" \n", 1133 | " args[[\"threshold\"]] <- 1\n", 1134 | " args[[\"which.wetdays\"]] <- \"Independent\" \n", 1135 | " if (z == 4) {\n", 1136 | " if (any(zz == c(3,4,5,6))) {\n", 1137 | " args[[\"x\"]] <- loadGridData(paste0(\"./Data/precip/predictions_historical_stochastic_\",sdModel[zz-2],\".nc4\"),var = \"pr\")\n", 1138 | " } else if (any(zz == c(9,10,11,12))) {\n", 1139 | " args[[\"y\"]] <- loadGridData(paste0(\"./Data/precip/predictions_historical_stochastic_\",sdModel[zz-8],\".nc4\"),var = \"pr\")\n", 1140 | " args[[\"x\"]] <- loadGridData(paste0(\"./Data/precip/predictions_rcp85_stochastic_\",sdModel[zz-8],\".nc4\"),var = \"pr\")\n", 1141 | " }\n", 1142 | " }\n", 1143 | " }\n", 1144 | " if (zz == 5) args[[\"y\"]] <- intersectGrid(args[[\"y\"]],args[[\"x\"]],type = \"spatial\")\n", 1145 | " if (!any(zz == c(1,7))) args[[\"measure.code\"]] <- \"biasRel\"\n", 1146 | " args[[\"index.code\"]] <- index[z]\n", 1147 | " if (any(zz == c(1,7))) return(do.call(\"valueIndex\",args)$Index)\n", 1148 | " if (!any(zz == c(1,7))) return(do.call(\"valueMeasure\",args)$Measure)\n", 1149 | " }) %>% makeMultiGrid()\n", 1150 | "})" 1151 | ] 1152 | }, 1153 | { 1154 | "cell_type": "markdown", 1155 | "metadata": {}, 1156 | "source": [ 1157 | "The following block of code replicates the precipitation pannels shown in Fig. 3 of the paper:" 1158 | ] 1159 | }, 1160 | { 1161 | "cell_type": "code", 1162 | "execution_count": null, 1163 | "metadata": {}, 1164 | "outputs": [], 1165 | "source": [ 1166 | "# replicating precipitation pannels in Fig. 3 of the paper\n", 1167 | "nmes <- c(\"EC-EARTH\",\"GLM1\",\"GLM4\",\"GLMPC\",\"CNN\")\n", 1168 | "cb <- brewer.pal(n = 11, \"BrBG\")\n", 1169 | "cb[5:7] <- \"#FFFFFF\"; cb <- cb %>% colorRampPalette()\n", 1170 | "val.plots <- lapply(1:length(nmes), FUN = function(zz) {\n", 1171 | " lapply(1:length(index), FUN = function(z) {\n", 1172 | " spatialPlot(redim(subsetDimension(validation.list[[zz+1]],dimension = \"var\", indices = z),drop = TRUE),\n", 1173 | " backdrop.theme = \"coastline\",\n", 1174 | " main = paste(nmes[zz],\"- biasRel\",index[z]),\n", 1175 | " col.regions = cb,\n", 1176 | " at = seq(-0.5,0.5,length.out = 21),\n", 1177 | " set.min = -0.5, set.max = 0.5) \n", 1178 | " }) \n", 1179 | "}) %>% unlist(recursive = FALSE)\n", 1180 | "pdf(file = \"./figures/fig01_precip.pdf\",width = 13,height = 16)\n", 1181 | "grid.arrange(grobs = val.plots, ncol = 4)\n", 1182 | "dev.off() " 1183 | ] 1184 | }, 1185 | { 1186 | "cell_type": "markdown", 1187 | "metadata": {}, 1188 | "source": [ 1189 | "This last piece of code replicates the precipitation pannels shown in Fig. 4 of the paper:" 1190 | ] 1191 | }, 1192 | { 1193 | "cell_type": "code", 1194 | "execution_count": null, 1195 | "metadata": {}, 1196 | "outputs": [], 1197 | "source": [ 1198 | "# replicating precipitation pannels in Fig. 4 of the paper\n", 1199 | "delta.plots <- lapply(1:length(nmes), FUN = function(zz) {\n", 1200 | " lapply(1:length(index), FUN = function(z) {\n", 1201 | " if (zz == 1) {\n", 1202 | " spatialPlot(redim(subsetDimension(validation.list[[zz+7]],dimension = \"var\", indices = z),drop = TRUE),\n", 1203 | " backdrop.theme = \"coastline\",\n", 1204 | " main = paste(nmes[zz],\"- delta\",index[z]),\n", 1205 | " col.regions = cb,\n", 1206 | " at = seq(-0.5, 0.5,length.out = 21),\n", 1207 | " set.min = -0.5, set.max = 0.5)\n", 1208 | " } else {\n", 1209 | " grid2 <- subsetDimension(validation.list[[zz+7]],dimension = \"var\", indices = z)\n", 1210 | " grid1 <- intersectGrid(subsetDimension(validation.list[[8]],dimension = \"var\", indices = z),grid2,type = \"spatial\")\n", 1211 | " grid <- gridArithmetics(grid2,\n", 1212 | " grid1,\n", 1213 | " operator = \"-\")\n", 1214 | " spatialPlot(grid,\n", 1215 | " backdrop.theme = \"coastline\",\n", 1216 | " main = paste(nmes[zz],\"- delta diff.\",index[z]),\n", 1217 | " col.regions = cb,\n", 1218 | " at = seq(-0.5, 0.5,length.out = 21),\n", 1219 | " set.min = -0.5, set.max = 0.5)\n", 1220 | " }\n", 1221 | " }) \n", 1222 | "}) %>% unlist(recursive = FALSE)\n", 1223 | "pdf(file = \"./figures/fig02_precip.pdf\",width = 13,height = 16)\n", 1224 | "grid.arrange(grobs = delta.plots, ncol = 4)\n", 1225 | "dev.off() " 1226 | ] 1227 | }, 1228 | { 1229 | "cell_type": "markdown", 1230 | "metadata": {}, 1231 | "source": [ 1232 | "### 6. Technical specifications \n", 1233 | "\n", 1234 | "This notebook was run on a machine with the following technical specifications:\n", 1235 | "\n", 1236 | "1. Virtual machine:\n", 1237 | " + Operating system: Ubuntu 18.04.3 LTS (64 bits)\n", 1238 | " + Memory: 60 GiB \n", 1239 | " + Processor: 2x Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz (16 cores, 32 threads)" 1240 | ] 1241 | } 1242 | ], 1243 | "metadata": { 1244 | "kernelspec": { 1245 | "display_name": "R", 1246 | "language": "R", 1247 | "name": "ir" 1248 | }, 1249 | "language_info": { 1250 | "codemirror_mode": "r", 1251 | "file_extension": ".r", 1252 | "mimetype": "text/x-r-source", 1253 | "name": "R", 1254 | "pygments_lexer": "r", 1255 | "version": "3.6.2" 1256 | } 1257 | }, 1258 | "nbformat": 4, 1259 | "nbformat_minor": 2 1260 | } 1261 | -------------------------------------------------------------------------------- /2020_Bano_CI.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Understanding Deep Learning Decisions in Statistical Downscaling Models\n", 8 | "### 10th International Conference on Climate Informatics 2020\n", 9 | "### Jorge Baño-Medina\n", 10 | "\n", 11 | "\n", 12 | "GitHub repository at https://github.com/SantanderMetGroup/DeepDownscaling" 13 | ] 14 | }, 15 | { 16 | "cell_type": "markdown", 17 | "metadata": {}, 18 | "source": [ 19 | "This notebook reproduces the results presented in the paper *Understanding deep learning decisions in statistical downscaling models* by *Jorge Baño-Medina*, in the *10th International Conference on Climate Informatics 2020* (https://doi.org/10.5194/gmd-2019-278). The code developed herein address the computation of saliency maps relying on the prediction difference analysis technique to gain understanding about deep learning models regarding the application of statistical downscaling. In particular we provide saliency maps providing information regarding the downscaling of both precipitation and surface air temperature. The technical specifications of the machine can be found at the end of the notebook. **The notebook takes around 3-4 weeks to fully reproduce the results**.\n", 20 | "\n", 21 | "**Note:** This notebook is written in the free programming language `R`(version 3.6.1) and builds on the `R` framework [`climate4R`](https://github.com/SantanderMetGroup/climate4R) (C4R hereafter, conda and docker installations available), a suite of `R` packages developed by the [Santander Met Group](http://meteo.unican.es) for transparent climate data access, post processing (including bias correction and downscaling) and visualization. The interested reader is referred to [Iturbide et al. 2019](https://www.sciencedirect.com/science/article/pii/S1364815218303049?via%3Dihub)." 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "## 1. Loading libraries\n", 29 | "\n", 30 | "All the working steps rely on the [climate4R](https://github.com/SantanderMetGroup/climate4R) (conda and docker installations available), which is a set of libraries especifically developed to handle climate data (`loadeR`,`loadeR.2nc`,`transformeR`,`downscaleR`,`visualizeR` and `climate4R.value`). In this study, [C4R](https://github.com/SantanderMetGroup/climate4R) is used for loading and post-processing, downscaling, validation and visualization. Different sectorial [notebooks](https://github.com/SantanderMetGroup/notebooks) are available illustrating the use of C4R functions. \n", 31 | "\n", 32 | "Deep learning models are included as an extension of the downscaleR package: [`downscaleR.keras`](https://github.com/SantanderMetGroup/downscaleR.keras) which integrates *keras* in the C4R framework. There is also a specific function devoted to the computation of saliency maps to provide interpretability of deep learning models.\n", 33 | "\n", 34 | "To install the associated C4R libraries you can proceed with the devtools package. Instructions can be found in the [climate4R](https://github.com/SantanderMetGroup/climate4R) github repository." 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": 2, 40 | "metadata": {}, 41 | "outputs": [], 42 | "source": [ 43 | "library(magrittr)\n", 44 | "library(loadeR) # version 1.6.1 \n", 45 | "library(transformeR) # version 1.7.4\n", 46 | "library(downscaleR.keras)# version 1.0.0 (relies on keras version 2.2.2 and tensorflow version 2.0.0)\n", 47 | "library(visualizeR) # version 1.5.1\n", 48 | "library(RColorBrewer)\n", 49 | "library(sp)\n", 50 | "library(gridExtra)" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "## 2. Loading data\n", 58 | "\n", 59 | "We rely on ERA-Interim and E-OBS as the predictor and predictand datasets in perfect conditions. In particular we consider a set of 20 large-scale predictors (see the `variables` R object in the following code chunk). All these data can be loaded remotely from the [Santander Climate Data Service](http://meteo.unican.es/cds) (register [here](http://meteo.unican.es/udg-tap/signup) freely to get a user), which provides access to various kinds of climate datasets (global and regional climate models, reanalysis, observations...). We will use the C4R packages [`loadeR`](https://github.com/SantanderMetGroup/loadeR) and [`transformeR`](https://github.com/SantanderMetGroup/transformeR) to load and postprocess the required information." 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 4, 65 | "metadata": {}, 66 | "outputs": [], 67 | "source": [ 68 | "loginUDG(username = \"\", password = \"\") # login into the Santander CDS" 69 | ] 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "metadata": {}, 74 | "source": [ 75 | "We find the label associated to ERA-Interim via the `UDG.datasets()` function of loadeR: “ECMWF_ERA-Interim-ESD”. Then we load the predictors by calling `loadGridData` from loadeR. We use the period indicated in VALUE: 1979-2008." 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": null, 81 | "metadata": {}, 82 | "outputs": [], 83 | "source": [ 84 | "variables <- c(\"z@500\",\"z@700\",\"z@850\",\"z@1000\",\n", 85 | " \"hus@500\",\"hus@700\",\"hus@850\",\"hus@1000\",\n", 86 | " \"ta@500\",\"ta@700\",\"ta@850\",\"ta@1000\",\n", 87 | " \"ua@500\",\"ua@700\",\"ua@850\",\"ua@1000\",\n", 88 | " \"va@500\",\"va@700\",\"va@850\",\"va@1000\")\n", 89 | "x <- lapply(variables, function(x) {\n", 90 | " loadGridData(dataset = \"ECMWF_ERA-Interim-ESD\",\n", 91 | " var = x,\n", 92 | " lonLim = c(-10,32),\n", 93 | " latLim = c(36,72), \n", 94 | " years = 1979:2008)\n", 95 | "}) %>% makeMultiGrid()" 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "metadata": {}, 101 | "source": [ 102 | "## 3. Saliency Maps\n", 103 | "\n", 104 | "In this section we compute the saliency maps based on the prediction difference analysis technique. These can be directly computed with the `relevanceMaps` function of the `downscaleR.keras` library.\n", 105 | "\n", 106 | "In this study we evaluate interpretable maps for the following predictand gridpoints:" 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": null, 112 | "metadata": {}, 113 | "outputs": [], 114 | "source": [ 115 | "Paris <- c(2.25,48.75)\n", 116 | "Rome <- c(12.25,42.25)\n", 117 | "Copenhagen <- c(12.25,55.75)\n", 118 | "Alps <- c(6.75,46.25)\n", 119 | "stations <- rbind(Paris,Rome,Copenhagen,Alps)" 120 | ] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "metadata": {}, 125 | "source": [ 126 | "### 3.1. Precipitation\n", 127 | "To infer the saliency maps of the downscaling model devoted to precipitation we first download the precipitation variable of the E-OBS dataset with the `loadGridData` function." 128 | ] 129 | }, 130 | { 131 | "cell_type": "code", 132 | "execution_count": null, 133 | "metadata": {}, 134 | "outputs": [], 135 | "source": [ 136 | "y_pr <- loadGridData(dataset = \"E-OBS_v14_0.50regular\",\n", 137 | " var = \"pr\",\n", 138 | " lonLim = c(-10,32),\n", 139 | " latLim = c(36,72), \n", 140 | " years = 1979:2008)" 141 | ] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": {}, 146 | "source": [ 147 | "We define the train period and then train the model with `downscaleTrain.keras`. We save the model in an `.h5` file that will be loaded to compute the saliency maps. This model is a replicate of the `CNN1` model intercompared in the [*Configuration and intercomparison of deep learning neural models for statistical downscaling*](https://www.geosci-model-dev.net/13/2109/2020/gmd-13-2109-2020-discussion.html) paper that provides a [jupyter notebook](https://github.com/SantanderMetGroup/DeepDownscaling/blob/master/2020_Bano_GMD_FULL.ipynb) for reproducibility, and therefore we refer the reader for concrete details regarding this model." 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": null, 153 | "metadata": {}, 154 | "outputs": [], 155 | "source": [ 156 | "xT <- subsetGrid(x,years = 1979:2002)\n", 157 | "xT <- xT %>% scaleGrid(type = \"standardize\")\n", 158 | "yT <- subsetGrid(y_pr,years = 1979:2002)\n", 159 | "xyT <- prepareData.keras(xT,\n", 160 | " binaryGrid(gridArithmetics(yT,0.99,operator = \"-\"),\n", 161 | " condition = \"GE\",\n", 162 | " threshold = 0,\n", 163 | " partial = TRUE),\n", 164 | " first.connection = \"conv\",\n", 165 | " last.connection = \"dense\",\n", 166 | " channels = \"last\")\n", 167 | "inputs <- layer_input(shape = c(getShape(xT,\"lat\"),getShape(xT,\"lon\"),getShape(xT,\"var\")))\n", 168 | "l0 = inputs\n", 169 | "l1 = layer_conv_2d(inputs,filters = 50, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 170 | "l2 = layer_conv_2d(l1,filters = 25, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 171 | "l3 = layer_conv_2d(l2,filters = 1, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 172 | "l4 = layer_flatten(l3)\n", 173 | "l51 = layer_dense(l4,units = ncol(xyT$y$Data), activation = 'sigmoid') \n", 174 | "l52 = layer_dense(l4,units = ncol(xyT$y$Data), activation = 'linear') \n", 175 | "l53 = layer_dense(l4,units = ncol(xyT$y$Data), activation = 'linear') \n", 176 | "outputs <- layer_concatenate(list(l51,l52,l53)) \n", 177 | "model <- keras_model(inputs = inputs, outputs = outputs) \n", 178 | "downscaleTrain.keras(obj = xyT,\n", 179 | " model = model,\n", 180 | " clear.session = TRUE,\n", 181 | " compile.args = list(\"loss\" = bernouilliGammaLoss(last.connection = \"dense\"),\n", 182 | " \"optimizer\" = optimizer_adam(lr = 0.0001)),\n", 183 | " fit.args = list(\"batch_size\" = 100,\n", 184 | " \"epochs\" = 1000,\n", 185 | " \"validation_split\" = 0.1,\n", 186 | " \"verbose\" = 1,\n", 187 | " \"callbacks\" = list(callback_early_stopping(patience = 30),\n", 188 | " callback_model_checkpoint(filepath=paste0('CNN_pr.h5'),\n", 189 | " monitor='val_loss', save_best_only=TRUE))))\n", 190 | "\n" 191 | ] 192 | }, 193 | { 194 | "cell_type": "markdown", 195 | "metadata": {}, 196 | "source": [ 197 | "Once we have trained the model, we subset the test period (i.e., 2008) and compute the rest of the parameters required as input by the `relevanceMaps` function." 198 | ] 199 | }, 200 | { 201 | "cell_type": "code", 202 | "execution_count": null, 203 | "metadata": {}, 204 | "outputs": [], 205 | "source": [ 206 | "xt <- scaleGrid(subsetGrid(x,years = 2008),\n", 207 | " subsetGrid(x,years = 1979:2002),\n", 208 | " type = \"standardize\") %>% redim(drop = TRUE)\n", 209 | "C4R.template <- subsetGrid(y_pr,years = 1979:2002)\n", 210 | "xy <- prepareData.keras(subsetGrid(x,years = 2008),\n", 211 | " subsetGrid(y_pr,years = 1979:2002),first.connection = \"conv\",\n", 212 | " last.connection = \"dense\",\n", 213 | " channels = \"last\")" 214 | ] 215 | }, 216 | { 217 | "cell_type": "markdown", 218 | "metadata": {}, 219 | "source": [ 220 | "We call the `relevanceMaps` function that will provide a saliency map for every day of the test period. The parameter `parch` indicates that the input features will be marginalized by channel (i.e., providing the influence of every input variable independently). Their influence will be measured over the stations chosen, input in the function via the `outputCoords` parameter, by computing the difference in the expectance of a Bernouilli-Gamma distribution (see the `loss` parameter)." 221 | ] 222 | }, 223 | { 224 | "cell_type": "code", 225 | "execution_count": null, 226 | "metadata": {}, 227 | "outputs": [], 228 | "source": [ 229 | "rm(x,y_pr,xT,yT,xyT) # to free memory\n", 230 | "gc() # to free memory\n", 231 | "objMaps <- relevanceMaps(xt, \n", 232 | " obj = xy,\n", 233 | " C4R.template = C4R.template,\n", 234 | " outputCoords = stations,\n", 235 | " model = list(\"filepath\" = paste0(\"CNN_pr.h5\"), \n", 236 | " \"custom_objects\" = c(\"custom_loss\" = bernouilliGammaLoss(last.connection = \"dense\"))),\n", 237 | " loss = \"bernouilliGammaLoss\",\n", 238 | " parch = \"channel\",\n", 239 | " k=1,l=5,num_samples = 10)\n", 240 | "attr(objMaps$Variable,\"longname\") <- objMaps$Variable$varName\n", 241 | "objMaps$Variable$level <- objMaps$Variable$level*NA" 242 | ] 243 | }, 244 | { 245 | "cell_type": "markdown", 246 | "metadata": {}, 247 | "source": [ 248 | "We define a new function `fig` that will visualize the saliency maps by relying on the plotting functions of library `visualizeR`." 249 | ] 250 | }, 251 | { 252 | "cell_type": "code", 253 | "execution_count": null, 254 | "metadata": {}, 255 | "outputs": [], 256 | "source": [ 257 | "fig <- function(objMaps) {\n", 258 | " cb <- brewer.pal(n = 9, \"PuBu\") \n", 259 | " cb[1:2] <- \"#FFFFFF\" ; cb <- cb %>% colorRampPalette()\n", 260 | " lapply(1:getShape(objMaps,dimension = \"member\"), FUN = function(z){\n", 261 | " grid <- subsetGrid(objMaps,members = z) \n", 262 | " grid$Data <- abs(grid$Data) \n", 263 | " grid <- grid %>% climatology()\n", 264 | " seqPlot <- seq(0,abs(max(grid$Data)+0.01*max(grid$Data)), length.out = 25)\n", 265 | " attr(grid,\"memberCoords\")$x <- attr(objMaps,\"memberCoords\")$x[z]\n", 266 | " attr(grid,\"memberCoords\")$y <- attr(objMaps,\"memberCoords\")$y[z]\n", 267 | " spatialPlot(grid,backdrop.theme = \"coastline\",\n", 268 | " col.regions = cb,\n", 269 | " at = seqPlot,\n", 270 | " set.min = seqPlot[1], set.max = seqPlot[length(seqPlot)],\n", 271 | " colorkey = TRUE,\n", 272 | " sp.layout = list(list(SpatialPoints(attr(grid,\"memberCoords\")), first = FALSE,\n", 273 | " col = \"red\", pch = 15, cex = 0.75))\n", 274 | " )\n", 275 | " })\n", 276 | "}" 277 | ] 278 | }, 279 | { 280 | "cell_type": "markdown", 281 | "metadata": {}, 282 | "source": [ 283 | "Finally we call the above defined function and save the saliency maps in a `.pdf` file." 284 | ] 285 | }, 286 | { 287 | "cell_type": "code", 288 | "execution_count": null, 289 | "metadata": {}, 290 | "outputs": [], 291 | "source": [ 292 | "f <- fig(objMaps)\n", 293 | "pdf(file = paste0(\"fig_pr.pdf\"),width = 20,height = 20)\n", 294 | "grid.arrange(grobs = f, ncol = 2)\n", 295 | "dev.off() " 296 | ] 297 | }, 298 | { 299 | "cell_type": "markdown", 300 | "metadata": {}, 301 | "source": [ 302 | "### 3.2 Temperature\n", 303 | "In this section we infer the saliency maps of the downscaling model of temperature. As done in the last section with precipitation, we load the surface air temperature of the E-OBS dataset with `loadGridData`." 304 | ] 305 | }, 306 | { 307 | "cell_type": "code", 308 | "execution_count": null, 309 | "metadata": {}, 310 | "outputs": [], 311 | "source": [ 312 | "y_tas <- loadGridData(dataset = \"E-OBS_v14_0.50regular\",\n", 313 | " var = \"tas\",\n", 314 | " lonLim = c(-10,32),\n", 315 | " latLim = c(36,72), \n", 316 | " years = 1979:2008)" 317 | ] 318 | }, 319 | { 320 | "cell_type": "markdown", 321 | "metadata": {}, 322 | "source": [ 323 | "Now we repeat the steps to compute the saliency maps done with the downscaling model of precipitation: train the model, compute the maps and save the plots in a `.pdf` file. Note that the only exception, is that in this case the last hidden layer consists on 10 feature maps and that the loss function is the negative log-likelihood of a gaussian distribution." 324 | ] 325 | }, 326 | { 327 | "cell_type": "code", 328 | "execution_count": null, 329 | "metadata": {}, 330 | "outputs": [], 331 | "source": [ 332 | "xT <- subsetGrid(x,years = 1979:2002)\n", 333 | "xT <- xT %>% scaleGrid(type = \"standardize\")\n", 334 | "yT <- subsetGrid(y_tas,years = 1979:2002)\n", 335 | "xyT <- prepareData.keras(xT,yT,\n", 336 | " first.connection = \"conv\",\n", 337 | " last.connection = \"dense\",\n", 338 | " channels = \"last\")\n", 339 | "inputs <- layer_input(shape = c(getShape(xT,\"lat\"),getShape(xT,\"lon\"),getShape(xT,\"var\")))\n", 340 | "l0 = inputs\n", 341 | "l1 = layer_conv_2d(inputs,filters = 50, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 342 | "l2 = layer_conv_2d(l1,filters = 25, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 343 | "l3 = layer_conv_2d(l2,filters = 10, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 344 | "l4 = layer_flatten(l3)\n", 345 | "l51 = layer_dense(l4,units = ncol(xyT$y$Data), activation = 'linear') \n", 346 | "l52 = layer_dense(l4,units = ncol(xyT$y$Data), activation = 'linear') \n", 347 | "outputs <- layer_concatenate(list(l51,l52)) \n", 348 | "model <- keras_model(inputs = inputs, outputs = outputs) \n", 349 | "downscaleTrain.keras(obj = xyT,\n", 350 | " model = model,\n", 351 | " clear.session = TRUE,\n", 352 | " compile.args = list(\"loss\" = gaussianLoss(last.connection = \"dense\"),\n", 353 | " \"optimizer\" = optimizer_adam(lr = 0.0001)),\n", 354 | " fit.args = list(\"batch_size\" = 100,\n", 355 | " \"epochs\" = 1000,\n", 356 | " \"validation_split\" = 0.1,\n", 357 | " \"verbose\" = 1,\n", 358 | " \"callbacks\" = list(callback_early_stopping(patience = 30),\n", 359 | " callback_model_checkpoint(filepath=paste0('CNN_tas.h5'),\n", 360 | " monitor='val_loss', save_best_only=TRUE))))\n", 361 | "\n", 362 | "\n", 363 | "xt <- scaleGrid(subsetGrid(x,years = 2008),\n", 364 | " subsetGrid(x,years = 1979:2002),\n", 365 | " type = \"standardize\") %>% redim(drop = TRUE)\n", 366 | "C4R.template <- subsetGrid(y_tas,years = 1979:2002)\n", 367 | "xy <- prepareData.keras(subsetGrid(x,years = 2008),\n", 368 | " subsetGrid(y_tas,years = 1979:2002),first.connection = \"conv\",\n", 369 | " last.connection = \"dense\",\n", 370 | " channels = \"last\")\n", 371 | "rm(x,y_tas,xT,yT,xyT) # to free memory\n", 372 | "gc() # to free memory\n", 373 | "objMaps <- relevanceMaps(xt, \n", 374 | " obj = xy,\n", 375 | " C4R.template = C4R.template,\n", 376 | " outputCoords = stations,\n", 377 | " loss = \"gaussianLoss\",\n", 378 | " model = list(\"filepath\" = paste0(\"CNN_tas.h5\"), \n", 379 | " \"custom_objects\" = c(\"custom_loss\" = gaussianLoss(last.connection = \"dense\"))),\n", 380 | " parch = \"channel\",\n", 381 | " k=1,l=5,num_samples = 30)\n", 382 | "attr(objMaps$Variable,\"longname\") <- objMaps$Variable$varName\n", 383 | "objMaps$Variable$level <- objMaps$Variable$level*NA\n", 384 | "f <- fig(objMaps)\n", 385 | "pdf(file = paste0(\"fig_tas.pdf\"),width = 20,height = 20)\n", 386 | "grid.arrange(grobs = f, ncol = 2)\n", 387 | "dev.off() " 388 | ] 389 | }, 390 | { 391 | "cell_type": "markdown", 392 | "metadata": {}, 393 | "source": [ 394 | "## Technical especifications\n", 395 | "To perform all the stages involved in this study we relied on the machine described below.\n", 396 | "- Machine (HP-ProDesk-600-G2-MT)\n", 397 | " - Operating system: ubuntu 4.15.0-72-generic\n", 398 | " - Memory: 15.6 GiB\n", 399 | " - Processor: Intel® Core™ i7-6700 CPU @ 3.40GHz × 8\n", 400 | " - SO: 64 bits\n", 401 | " - Disc: 235.1 GiB" 402 | ] 403 | } 404 | ], 405 | "metadata": { 406 | "kernelspec": { 407 | "display_name": "R", 408 | "language": "R", 409 | "name": "ir" 410 | }, 411 | "language_info": { 412 | "codemirror_mode": "r", 413 | "file_extension": ".r", 414 | "mimetype": "text/x-r-source", 415 | "name": "R", 416 | "pygments_lexer": "r", 417 | "version": "3.6.2" 418 | } 419 | }, 420 | "nbformat": 4, 421 | "nbformat_minor": 4 422 | } 423 | -------------------------------------------------------------------------------- /2020_Bano_CI.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SantanderMetGroup/DeepDownscaling/fb55b2dd479fea92c1c632f85ce97063d74a3bb4/2020_Bano_CI.pdf -------------------------------------------------------------------------------- /2020_Bano_GMD_FULL.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "button": false, 7 | "new_sheet": false, 8 | "run_control": { 9 | "read_only": false 10 | } 11 | }, 12 | "source": [ 13 | "## Configuration and Intercomparison of Deep Learning Neural Models for Statistical Downscaling *(code for full reproducibility)*\n", 14 | "### *Geoscientific Model Development preprint*\n", 15 | "### J. Baño-Medina, R. Manzanas and J. M. Gutiérrez\n", 16 | "https://doi.org/10.5194/gmd-2019-278\n", 17 | "\n", 18 | "GitHub repository at https://github.com/SantanderMetGroup/DeepDownscaling" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "button": false, 25 | "new_sheet": false, 26 | "run_control": { 27 | "read_only": false 28 | } 29 | }, 30 | "source": [ 31 | "This notebook reproduces the results presented in the paper *Configuration and Intercomparison of Deep Learning Neural Models for Statistical Downscaling* by *J. Baño-Medina, R. Manzanas and J. M. Gutiérrez*, in *Geoscientific Model Development* (https://doi.org/10.5194/gmd-2019-278). In particular, the code developed herein concerns the downscaling of temperature and precipitation and therefore, their particularities are treated throughout the notebook in two different sections. Note that the programming lenguage is `R` and the technical specifications of the machine can be found at the end of the notebook. **The notebook takes around 5 days to fully reproduce the results. For this reason we refer the reader to a reduced version of the notebook in which we only consider a particular model configuration ([2020_Bano_GMD.ipynb](https://github.com/SantanderMetGroup/DeepDownscaling))**.\n", 32 | "\n", 33 | "**Note:** This notebook is written in the free programming language `R` and builds on the `R` framework [`climate4R`](https://github.com/SantanderMetGroup/climate4R) (C4R hereafter, conda and docker installations available), a suite of `R` packages developed by the [Santander Met Group](http://meteo.unican.es) for transparent climate data access, post processing (including bias correction and downscaling, via the [`downscaleR`](https://github.com/SantanderMetGroup/downscaleR) package; [Bedia et al. 2020](https://www.geosci-model-dev-discuss.net/gmd-2019-224/)) and visualization. The interested reader is referred to [Iturbide et al. 2019](https://www.sciencedirect.com/science/article/pii/S1364815218303049?via%3Dihub). " 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": { 39 | "button": false, 40 | "new_sheet": false, 41 | "run_control": { 42 | "read_only": false 43 | } 44 | }, 45 | "source": [ 46 | "## 1. Loading libraries" 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": { 52 | "button": false, 53 | "new_sheet": false, 54 | "run_control": { 55 | "read_only": false 56 | } 57 | }, 58 | "source": [ 59 | "\n", 60 | "All the working steps rely on the [climate4R](https://github.com/SantanderMetGroup/climate4R) (conda and docker installations available), which is a set of libraries especifically developed to handle climate data (`loadeR`,`transformeR`,`downscaleR`, `visualizeR` and `climate4R.value`). In this study, [C4R](https://github.com/SantanderMetGroup/climate4R) is used for loading and post-processing, downscaling, validation and visualization. Different sectorial [notebooks](https://github.com/SantanderMetGroup/notebooks) are available illustrating the use of C4R functions. \n", 61 | "\n", 62 | "Deep learning models are indcluded as an extension of the downscaleR package: [`downscaleR.keras`](https://github.com/SantanderMetGroup/downscaleR.keras) which integrates *keras* in the C4R framework.\n", 63 | "\n", 64 | "Here we use a **specific version of the conda C4R installer (v1.3.0)** containing the package versions needed to fully reproduce the results of this notebook and including also the [`downscaleR.keras`](https://github.com/SantanderMetGroup/downscaleR.keras) package which is not included in the standard C4R conda distribution (visit this [site](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) for more information on how to obtain conda). Thus, below we illustrate how to create a new environment with conda to install the C4R libraries (please visit the [`climate4R`](https://github.com/SantanderMetGroup/climate4R) github repository for more details regarding the installation procedures). Note that the packages can be also installed individually, but this is more time consuming than using the conda installation." 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": null, 70 | "metadata": {}, 71 | "outputs": [], 72 | "source": [ 73 | "## --- type this in terminal to proceed with the installation of C4R with conda\n", 74 | "\n", 75 | "# conda create --name myEnvironment # create a new environment\n", 76 | "# conda activate myEnvironment # activate the environment\n", 77 | "# conda install -c conda-forge -c defaults -c santandermetgroup climate4r=1.3.0 # install climate4R v1.3.0 in the new environment\n", 78 | "# R # type 'R' to initialize R software" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "Once in R, we load the libraries:" 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": 2, 91 | "metadata": { 92 | "button": false, 93 | "collapsed": true, 94 | "new_sheet": false, 95 | "run_control": { 96 | "read_only": false 97 | } 98 | }, 99 | "outputs": [], 100 | "source": [ 101 | "# C4R framework and auxiliary packages\n", 102 | "\n", 103 | "library(loadeR) # version 1.6.1 \n", 104 | "library(transformeR) # version 1.7.4\n", 105 | "library(downscaleR) # version 3.1.3\n", 106 | "library(visualizeR) # version 1.5.1\n", 107 | "library(climate4R.value) # version 0.0.2 (also relies on VALUE version 2.2.1)\n", 108 | "library(magrittr)\n", 109 | "library(gridExtra)\n", 110 | "library(RColorBrewer)\n", 111 | "library(sp) \n", 112 | "library(downscaleR.keras) # version 0.0.2 (relies on keras version 2.2.2 and tensorflow version 2.0.0)" 113 | ] 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "metadata": { 118 | "button": false, 119 | "new_sheet": false, 120 | "run_control": { 121 | "read_only": false 122 | } 123 | }, 124 | "source": [ 125 | "In order to avoid possible errors while running the notebook, you have to set the path to your desired working directory and create two files named \"Data\" and \"models\", that will contain the downscaled predictions and the trained deep models, respectively. Moreover, as we perform 2 distinct studies, one for precipitation and other for temperature, you should create 2 new directories named \"precip\" and \"temperature\" within the previous created directories (i.e., \"Data\" and \"models\"). An example of the latter would be \"personalpath/Data/temperature\". The predictions and models infered will be automatically saved in these folders and therefore not creating them will end into saving errors across the notebook." 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "execution_count": null, 131 | "metadata": { 132 | "button": false, 133 | "collapsed": true, 134 | "new_sheet": false, 135 | "run_control": { 136 | "read_only": false 137 | } 138 | }, 139 | "outputs": [], 140 | "source": [ 141 | "path = \"\"\n", 142 | "setwd(path)\n", 143 | "dir.create(\"Data\")\n", 144 | "dir.create(\"Data/precip/\")\n", 145 | "dir.create(\"models\")\n", 146 | "dir.create(\"models/temperature/\")\n", 147 | "dir.create(\"models/precip/\")" 148 | ] 149 | }, 150 | { 151 | "cell_type": "markdown", 152 | "metadata": { 153 | "button": false, 154 | "new_sheet": false, 155 | "run_control": { 156 | "read_only": false 157 | } 158 | }, 159 | "source": [ 160 | "## 2. Loading data" 161 | ] 162 | }, 163 | { 164 | "cell_type": "markdown", 165 | "metadata": { 166 | "button": false, 167 | "new_sheet": false, 168 | "run_control": { 169 | "read_only": false 170 | } 171 | }, 172 | "source": [ 173 | "As explained in the paper, we have considered 20 large-scale variables from the ERA-Interim reanalysis as predictors and surface temperature from E-OBS as predictand. All these data can be loaded remotely from the [Santander Climate Data Service](http://meteo.unican.es/cds) (register [here](https://meteo.unican.es/trac/wiki/udg) freely to get a user), which provides access to various kinds of climate datasets (global and regional climate models, reanalysis, observations...). We will use the C4R packages [`loadeR`](https://github.com/SantanderMetGroup/loadeR) and [`transformeR`](https://github.com/SantanderMetGroup/transformeR) to load and postprocess the required information." 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": null, 179 | "metadata": { 180 | "button": false, 181 | "collapsed": true, 182 | "new_sheet": false, 183 | "run_control": { 184 | "read_only": false 185 | } 186 | }, 187 | "outputs": [], 188 | "source": [ 189 | "loginUDG(username = \"\", password = \"\") # login into the Santander CDS" 190 | ] 191 | }, 192 | { 193 | "cell_type": "markdown", 194 | "metadata": { 195 | "button": false, 196 | "new_sheet": false, 197 | "run_control": { 198 | "read_only": false 199 | } 200 | }, 201 | "source": [ 202 | "We find the label associated to ERA-Interim via the `UDG.datasets()` function of loadeR: “ECMWF_ERA-Interim-ESD”. Then we load the predictors by calling `loadGridData` from loadeR." 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": null, 208 | "metadata": { 209 | "button": false, 210 | "collapsed": true, 211 | "new_sheet": false, 212 | "run_control": { 213 | "read_only": false 214 | } 215 | }, 216 | "outputs": [], 217 | "source": [ 218 | "variables <- c(\"z@500\",\"z@700\",\"z@850\",\"z@1000\",\n", 219 | " \"hus@500\",\"hus@700\",\"hus@850\",\"hus@1000\",\n", 220 | " \"ta@500\",\"ta@700\",\"ta@850\",\"ta@1000\",\n", 221 | " \"ua@500\",\"ua@700\",\"ua@850\",\"ua@1000\",\n", 222 | " \"va@500\",\"va@700\",\"va@850\",\"va@1000\")\n", 223 | "x <- lapply(variables, function(x) {\n", 224 | " loadGridData(dataset = \"ECMWF_ERA-Interim-ESD\",\n", 225 | " var = x,\n", 226 | " lonLim = c(-10,32), # 22 points en total\n", 227 | " latLim = c(36,72), # 19 points en total\n", 228 | " years = 1979:2008)\n", 229 | "}) %>% makeMultiGrid()" 230 | ] 231 | }, 232 | { 233 | "cell_type": "markdown", 234 | "metadata": { 235 | "button": false, 236 | "new_sheet": false, 237 | "run_control": { 238 | "read_only": false 239 | } 240 | }, 241 | "source": [ 242 | "We split into train (i.e., 1979-2002) and test (i.e., 2003-2008) datasets by using the `subsetGrid` function from transformeR." 243 | ] 244 | }, 245 | { 246 | "cell_type": "code", 247 | "execution_count": null, 248 | "metadata": { 249 | "button": false, 250 | "collapsed": true, 251 | "new_sheet": false, 252 | "run_control": { 253 | "read_only": false 254 | } 255 | }, 256 | "outputs": [], 257 | "source": [ 258 | "xT <- subsetGrid(x,years = 1979:2002)\n", 259 | "xt <- subsetGrid(x,years = 2003:2008)" 260 | ] 261 | }, 262 | { 263 | "cell_type": "markdown", 264 | "metadata": { 265 | "button": false, 266 | "new_sheet": false, 267 | "run_control": { 268 | "read_only": false 269 | } 270 | }, 271 | "source": [ 272 | "## 3. Temperature\n", 273 | "In this section we present the code needed to downscale temperature. Once the predictors were loaded above we proceed to download the predictand dataset: E-OBS version 14 at a resolution of 0.5º. The E-OBS dataset is also accesible through the UDG portal. Thus, we load the temperature by calling again `loadGridData` and split it into train and test periods." 274 | ] 275 | }, 276 | { 277 | "cell_type": "code", 278 | "execution_count": null, 279 | "metadata": { 280 | "button": false, 281 | "collapsed": true, 282 | "new_sheet": false, 283 | "run_control": { 284 | "read_only": false 285 | } 286 | }, 287 | "outputs": [], 288 | "source": [ 289 | "y <- loadGridData(dataset = \"E-OBS_v14_0.50regular\",\n", 290 | " var = \"tas\",\n", 291 | " lonLim = c(-10,32),\n", 292 | " latLim = c(36,72), \n", 293 | " years = 1979:2008)\n", 294 | "\n", 295 | "yT <- subsetGrid(y,years = 1979:2002)\n", 296 | "yt <- subsetGrid(y,years = 2003:2008)" 297 | ] 298 | }, 299 | { 300 | "cell_type": "markdown", 301 | "metadata": { 302 | "button": false, 303 | "new_sheet": false, 304 | "run_control": { 305 | "read_only": false 306 | } 307 | }, 308 | "source": [ 309 | "We can take a look at the grid's resolutions of ERA-Interim (2º) and E-OBS (0.5º) in order to better visualize the gap we try to bridge with the downscaling (Figure 1 of the manuscript)." 310 | ] 311 | }, 312 | { 313 | "cell_type": "code", 314 | "execution_count": null, 315 | "metadata": { 316 | "button": false, 317 | "collapsed": true, 318 | "new_sheet": false, 319 | "run_control": { 320 | "read_only": false 321 | } 322 | }, 323 | "outputs": [], 324 | "source": [ 325 | "cb <- colorRampPalette(brewer.pal(9, \"OrRd\"))(80)\n", 326 | "coords_x <- expand.grid(xt$xyCoords$x,xt$xyCoords$y) ; names(coords_x) <- c(\"x\",\"y\")\n", 327 | "coords_y <- expand.grid(yt$xyCoords$x,yt$xyCoords$y) ; names(coords_y) <- c(\"x\",\"y\")\n", 328 | "colsindex <- rev(brewer.pal(n = 9, \"RdBu\"))\n", 329 | "cb2 <- colorRampPalette(colsindex)\n", 330 | "\n", 331 | "pplot <- list()\n", 332 | "pplot[[1]] <- spatialPlot(climatology(subsetGrid(xt,var = \"ta@1000\")), backdrop.theme = \"coastline\",\n", 333 | " main = \"Temperature 1000 hPa (ERA-Interim)\",\n", 334 | " col.regions = cb2,\n", 335 | " at = seq(-3, 15, 1),\n", 336 | " set.min = -3, set.max = 15, colorkey = TRUE,\n", 337 | " sp.layout = list(list(SpatialPoints(coords_x), \n", 338 | " first = FALSE, col = \"black\", \n", 339 | " pch = 20, cex = 1)))\n", 340 | "pplot[[2]] <- spatialPlot(climatology(yt), backdrop.theme = \"coastline\", \n", 341 | " main = \"Temperature (E-OBS)\",\n", 342 | " col.regions = cb,\n", 343 | " at = seq(-3, 15, 1),\n", 344 | " set.min = -3, set.max = 15, colorkey = TRUE,\n", 345 | " sp.layout = list(list(SpatialPoints(coords_y), \n", 346 | " first = FALSE, col = \"black\", \n", 347 | " pch = 20, cex = 1)))\n", 348 | "\n", 349 | "lay = rbind(c(1,2))\n", 350 | "grid.arrange(grobs = pplot, layout_matrix = lay)" 351 | ] 352 | }, 353 | { 354 | "cell_type": "markdown", 355 | "metadata": { 356 | "button": false, 357 | "new_sheet": false, 358 | "run_control": { 359 | "read_only": false 360 | } 361 | }, 362 | "source": [ 363 | "We can visualize some statistics of the train and test distributions, such as the climatology, or the percentiles 02th and 98th in order to gain knowledge about the observed data (Figure 2 of the manuscript). To compute the statistics we use `valueIndex` and `valueMeasure` functions from `climate4R.value` , a wrapper to the [`VALUE`](https://github.com/SantanderMetGroup/VALUE) package, which was developed in the [COST action VALUE](http://www.value-cost.eu/) for validation purposes. If interested in knowing more about the plotting options used here, the reader is referred to the documentation of the `spatialPlot` function." 364 | ] 365 | }, 366 | { 367 | "cell_type": "code", 368 | "execution_count": null, 369 | "metadata": { 370 | "button": false, 371 | "collapsed": true, 372 | "new_sheet": false, 373 | "run_control": { 374 | "read_only": false 375 | } 376 | }, 377 | "outputs": [], 378 | "source": [ 379 | "cb <- colorRampPalette(brewer.pal(9, \"OrRd\"))(80)\n", 380 | "colsindex <- rev(brewer.pal(n = 9, \"RdBu\"))\n", 381 | "cb2 <- colorRampPalette(colsindex)\n", 382 | "\n", 383 | "pplot <- at <- list()\n", 384 | "n1 <- 0; n2 <- 3\n", 385 | "indexNames <- c(\"Climatology\", \"P02\", \"P98\")\n", 386 | "for (indexName in indexNames) {\n", 387 | " if (indexName == \"Climatology\") {\n", 388 | " indexTrain <- valueIndex(yT,index.code = \"Mean\")$Index %>% redim() \n", 389 | " indexTest <- valueIndex(yt,index.code = \"Mean\")$Index %>% redim()\n", 390 | " at[[1]] <- seq(-3, 15, 1); at[[2]] <- seq(-2, 2, 0.1)\n", 391 | " }\n", 392 | " if (indexName == \"P02\") {\n", 393 | " indexTrain <- valueIndex(yT,index.code = \"P02\")$Index %>% redim() \n", 394 | " indexTest <- valueIndex(yt,index.code = \"P02\")$Index %>% redim()\n", 395 | " at[[1]] <- seq(-20, 10, 1); at[[2]] <- seq(-2, 2, 0.1)\n", 396 | " }\n", 397 | " if (indexName == \"P98\") {\n", 398 | " indexTrain <- valueIndex(yT,index.code = \"P98\")$Index %>% redim() \n", 399 | " indexTest <- valueIndex(yt,index.code = \"P98\")$Index %>% redim()\n", 400 | " at[[1]] <- seq(10, 30, 1); at[[2]] <- seq(-2, 2, 0.1)\n", 401 | " }\n", 402 | " \n", 403 | " for (i in 1:2) {\n", 404 | " if (i == 1) {\n", 405 | " dataset <- \"(train)\"; index <- indexTrain; n1 <- n1 + 1; n <- n1\n", 406 | " value <- index$Data; colorbar <- cb\n", 407 | " }\n", 408 | " if (i == 2) {\n", 409 | " indexTest <- gridArithmetics(indexTest,indexTrain,operator = \"-\")\n", 410 | " dataset <- \"(test bias)\"; index <- indexTest; n2 <- n2 + 1; n <- n2\n", 411 | " value <- abs(index$Data); colorbar <- cb2\n", 412 | " }\n", 413 | " pplot[[n]] <- spatialPlot(climatology(index), backdrop.theme = \"coastline\", \n", 414 | " main = paste(indexName,paste0(dataset,\":\"),\n", 415 | " round(mean(value, na.rm = TRUE),digits = 2)),\n", 416 | " col.regions = colorbar,\n", 417 | " at = at[[i]],\n", 418 | " set.min = at[[i]][1], set.max = at[[i]][length(at[[i]])], \n", 419 | " colorkey = TRUE)\n", 420 | " }\n", 421 | "}\n", 422 | "\n", 423 | "lay = rbind(c(1,2,3),\n", 424 | " c(4,5,6))\n", 425 | "grid.arrange(grobs = pplot, layout_matrix = lay)" 426 | ] 427 | }, 428 | { 429 | "cell_type": "markdown", 430 | "metadata": { 431 | "button": false, 432 | "new_sheet": false, 433 | "run_control": { 434 | "read_only": false 435 | } 436 | }, 437 | "source": [ 438 | "Once the data is loaded we standardize the predictors by calling `scaleGrid` function of transformeR." 439 | ] 440 | }, 441 | { 442 | "cell_type": "code", 443 | "execution_count": null, 444 | "metadata": { 445 | "button": false, 446 | "collapsed": true, 447 | "new_sheet": false, 448 | "run_control": { 449 | "read_only": false 450 | } 451 | }, 452 | "outputs": [], 453 | "source": [ 454 | "xt <- scaleGrid(xt,xT, type = \"standardize\", spatial.frame = \"gridbox\") %>% redim(drop = TRUE)\n", 455 | "xT <- scaleGrid(xT, type = \"standardize\", spatial.frame = \"gridbox\") %>% redim(drop = TRUE)" 456 | ] 457 | }, 458 | { 459 | "cell_type": "markdown", 460 | "metadata": { 461 | "button": false, 462 | "new_sheet": false, 463 | "run_control": { 464 | "read_only": false 465 | } 466 | }, 467 | "source": [ 468 | "### 3.1 Downscaling with generalized linear models (temperature)" 469 | ] 470 | }, 471 | { 472 | "cell_type": "markdown", 473 | "metadata": { 474 | "button": false, 475 | "new_sheet": false, 476 | "run_control": { 477 | "read_only": false 478 | } 479 | }, 480 | "source": [ 481 | "To downscale via generalized linear models (GLM) we rely on the `downscaleR` package of `C4R`. In particular, we use the `downscaleChunk` function of `downscaleR`. In the case of temperature, the generalized linear model has a gaussian family with link identity which is, in fact, an ordinary least squares regression. Therefore, we input to the function the predictor (x), the number of local predictors to be used (neighbours), the predictand (y) and the test set to apply the infered relationship as newdata. Finally we save the predictions which will be loaded during validation. Note that `downscaleChunk` temporarily creates .rda files in your working directory, containing the predictions per chunk. " 482 | ] 483 | }, 484 | { 485 | "cell_type": "code", 486 | "execution_count": null, 487 | "metadata": { 488 | "button": false, 489 | "collapsed": true, 490 | "new_sheet": false, 491 | "run_control": { 492 | "read_only": false 493 | } 494 | }, 495 | "outputs": [], 496 | "source": [ 497 | "glmName <- c(\"glm1\",\"glm4\")\n", 498 | "neighs <- c(1,4)\n", 499 | "lapply(1:length(glmName), FUN = function(z) {\n", 500 | " pred <- downscaleChunk(x = xT, y = yT, newdata = list(xt),\n", 501 | " method = \"GLM\", family = \"gaussian\", simulate = FALSE,\n", 502 | " prepareData.args = list(local.predictors = list(n=neighs[z], \n", 503 | " vars = getVarNames(xT))))[[2]] %>% \n", 504 | " redim(drop = TRUE)\n", 505 | " save(pred,file = paste0(\"./Data/temperature/predictions_\",glmName[z],\".rda\"))\n", 506 | "})" 507 | ] 508 | }, 509 | { 510 | "cell_type": "markdown", 511 | "metadata": { 512 | "button": false, 513 | "new_sheet": false, 514 | "run_control": { 515 | "read_only": false 516 | } 517 | }, 518 | "source": [ 519 | "### 3.2 Downscaling with deep neural networks" 520 | ] 521 | }, 522 | { 523 | "cell_type": "markdown", 524 | "metadata": { 525 | "button": false, 526 | "new_sheet": false, 527 | "run_control": { 528 | "read_only": false 529 | } 530 | }, 531 | "source": [ 532 | "In the following code we define a function containing the deep learning topologies intercompared in the study (see Table 2 and Figure 3 of the manuscript for more information about these architectures)." 533 | ] 534 | }, 535 | { 536 | "cell_type": "code", 537 | "execution_count": null, 538 | "metadata": { 539 | "button": false, 540 | "collapsed": true, 541 | "new_sheet": false, 542 | "run_control": { 543 | "read_only": false 544 | } 545 | }, 546 | "outputs": [], 547 | "source": [ 548 | "deepName <- c(\"CNN-LM\",\"CNN1\",\"CNN10\",\"CNN-PR\",\"CNNdense\")\n", 549 | "architectures <- function(architecture,input_shape,output_shape) {\n", 550 | " if (architecture == \"CNN-LM\") {\n", 551 | " inputs <- layer_input(shape = input_shape)\n", 552 | " x = inputs\n", 553 | " l1 = layer_conv_2d(x ,filters = 50, kernel_size = c(3,3), activation = 'linear', padding = \"same\")\n", 554 | " l2 = layer_conv_2d(l1,filters = 25, kernel_size = c(3,3), activation = 'linear', padding = \"same\")\n", 555 | " l3 = layer_conv_2d(l2,filters = 1, kernel_size = c(3,3), activation = 'linear', padding = \"same\")\n", 556 | " l4 = layer_flatten(l3)\n", 557 | " outputs = layer_dense(l4,units = output_shape)\n", 558 | " model <- keras_model(inputs = inputs, outputs = outputs)\n", 559 | " }\n", 560 | " \n", 561 | " if (architecture == \"CNN1\") {\n", 562 | " inputs <- layer_input(shape = input_shape)\n", 563 | " x = inputs\n", 564 | " l1 = layer_conv_2d(x ,filters = 50, kernel_size = c(3,3), activation = 'relu', padding = \"same\")\n", 565 | " l2 = layer_conv_2d(l1,filters = 25, kernel_size = c(3,3), activation = 'relu', padding = \"same\")\n", 566 | " l3 = layer_conv_2d(l2,filters = 1, kernel_size = c(3,3), activation = 'relu', padding = \"same\")\n", 567 | " l4 = layer_flatten(l3)\n", 568 | " outputs = layer_dense(l4,units = output_shape)\n", 569 | " model <- keras_model(inputs = inputs, outputs = outputs)\n", 570 | " }\n", 571 | " \n", 572 | " if (architecture == \"CNN10\") {\n", 573 | " inputs <- layer_input(shape = input_shape)\n", 574 | " x = inputs\n", 575 | " l1 = layer_conv_2d(x ,filters = 50, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 576 | " l2 = layer_conv_2d(l1,filters = 25, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 577 | " l3 = layer_conv_2d(l2,filters = 10, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 578 | " l4 = layer_flatten(l3)\n", 579 | " outputs = layer_dense(l4,units = output_shape)\n", 580 | " model <- keras_model(inputs = inputs, outputs = outputs)\n", 581 | " }\n", 582 | " \n", 583 | " if (architecture == \"CNN-PR\") {\n", 584 | " inputs <- layer_input(shape = input_shape)\n", 585 | " x = inputs\n", 586 | " l1 = layer_conv_2d(x ,filters = 10, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 587 | " l2 = layer_conv_2d(l1,filters = 25, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 588 | " l3 = layer_conv_2d(l2,filters = 50, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 589 | " l4 = layer_flatten(l3)\n", 590 | " outputs = layer_dense(l4,units = output_shape)\n", 591 | " model <- keras_model(inputs = inputs, outputs = outputs)\n", 592 | " }\n", 593 | " \n", 594 | " if (architecture == \"CNNdense\") {\n", 595 | " inputs <- layer_input(shape = input_shape)\n", 596 | " x = inputs\n", 597 | " l1 = layer_conv_2d(x ,filters = 50, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 598 | " l2 = layer_conv_2d(l1,filters = 25, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 599 | " l3 = layer_conv_2d(l2,filters = 10, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 600 | " l4 = layer_flatten(l3)\n", 601 | " l5 = layer_dense(l4,units = 50, activation = \"relu\")\n", 602 | " l6 = layer_dense(l5,units = 50, activation = \"relu\")\n", 603 | " outputs = layer_dense(l6,units = output_shape)\n", 604 | " model <- keras_model(inputs = inputs, outputs = outputs)\n", 605 | " }\n", 606 | " \n", 607 | " return(model)\n", 608 | "}" 609 | ] 610 | }, 611 | { 612 | "cell_type": "markdown", 613 | "metadata": { 614 | "button": false, 615 | "new_sheet": false, 616 | "run_control": { 617 | "read_only": false 618 | } 619 | }, 620 | "source": [ 621 | "We prepare the predictor and predictand datasets for integration with keras with the functions `prepareData.keras` and `prepareNewData.keras`" 622 | ] 623 | }, 624 | { 625 | "cell_type": "code", 626 | "execution_count": null, 627 | "metadata": { 628 | "button": false, 629 | "collapsed": true, 630 | "new_sheet": false, 631 | "run_control": { 632 | "read_only": false 633 | } 634 | }, 635 | "outputs": [], 636 | "source": [ 637 | "xy.T <- prepareData.keras(xT,yT,\n", 638 | " first.connection = \"conv\",\n", 639 | " last.connection = \"dense\",\n", 640 | " channels = \"last\")\n", 641 | "xy.t <- prepareNewData.keras(xt,xy.T)" 642 | ] 643 | }, 644 | { 645 | "cell_type": "markdown", 646 | "metadata": { 647 | "button": false, 648 | "new_sheet": false, 649 | "run_control": { 650 | "read_only": false 651 | } 652 | }, 653 | "source": [ 654 | "We loop over the topologies to train the deep models and predict over the test set. Unlike GLMs where there was a model per gridpoint, deep models perform multi-task, downscaling to all sites at a time." 655 | ] 656 | }, 657 | { 658 | "cell_type": "code", 659 | "execution_count": null, 660 | "metadata": { 661 | "button": false, 662 | "collapsed": true, 663 | "new_sheet": false, 664 | "run_control": { 665 | "read_only": false 666 | } 667 | }, 668 | "outputs": [], 669 | "source": [ 670 | "lapply(1:length(deepName), FUN = function(z){\n", 671 | " model <- architectures(architecture = deepName[z],\n", 672 | " input_shape = dim(xy.T$x.global)[-1],\n", 673 | " output_shape = dim(xy.T$y$Data)[2])\n", 674 | " downscaleTrain.keras(obj = xy.T,\n", 675 | " model = model,\n", 676 | " clear.session = TRUE,\n", 677 | " compile.args = list(\"loss\" = \"mse\",\n", 678 | " \"optimizer\" = optimizer_adam(lr = 0.0001)),\n", 679 | " fit.args = list(\"batch_size\" = 100,\n", 680 | " \"epochs\" = 1000,\n", 681 | " \"validation_split\" = 0.1,\n", 682 | " \"verbose\" = 1,\n", 683 | " \"callbacks\" = list(callback_early_stopping(patience = 30),\n", 684 | " callback_model_checkpoint(\n", 685 | " filepath=paste0('./models/temperature/',deepName[z],'.h5'),\n", 686 | " monitor='val_loss', save_best_only=TRUE))))\n", 687 | " pred <- downscalePredict.keras(newdata = xy.t,\n", 688 | " model = list(\"filepath\" = \n", 689 | " paste0(\"./models/temperature/\",deepName[z],\".h5\")),\n", 690 | " C4R.template = yT,\n", 691 | " clear.session = TRUE)\n", 692 | " save(pred,file = paste0(\"./Data/temperature/predictions_\",deepName[z],\".rda\"))\n", 693 | "})" 694 | ] 695 | }, 696 | { 697 | "cell_type": "markdown", 698 | "metadata": { 699 | "button": false, 700 | "new_sheet": false, 701 | "run_control": { 702 | "read_only": false 703 | } 704 | }, 705 | "source": [ 706 | "### 3.3 Validation of results\n", 707 | "In this code, we calculate the validation indices by using the library `climate4R.value` of `C4R`. In particular the indices used are: the Root Mean Squared Error (RMSE), the deseasonal perason correlation, the biases of the climatology and of the percentile 2th and 98th and the ratio of the standard deviations." 708 | ] 709 | }, 710 | { 711 | "cell_type": "code", 712 | "execution_count": null, 713 | "metadata": { 714 | "button": false, 715 | "collapsed": true, 716 | "new_sheet": false, 717 | "run_control": { 718 | "read_only": false 719 | } 720 | }, 721 | "outputs": [], 722 | "source": [ 723 | "models <- c(\"glm1\",\"glm4\",\n", 724 | " \"CNN-LM\",\"CNN1\",\"CNN10\",\n", 725 | " \"CNN-PR\",\"CNNdense\")\n", 726 | "measures <- c(\"ts.RMSE\",\"ts.rp\",\"ratio\",rep(\"bias\",6))\n", 727 | "index <- c(rep(NA,2),\"sd\",\"Mean\",\"P02\",\"P98\",\"AC1\",\n", 728 | " \"WarmAnnualMaxSpell\",\"ColdAnnualMaxSpell\")\n", 729 | "yt2 <- scaleGrid(yt,time.frame = \"daily\",window.width = 31) %>% redim(drop=TRUE)\n", 730 | "validation.list <- lapply(1:length(measures), FUN = function(z) {\n", 731 | " lapply(1:length(models), FUN = function(zz){\n", 732 | " args <- list()\n", 733 | " load(paste0(\"./Data/temperature/predictions_\",models[zz],\".rda\"))\n", 734 | " if (any(measures[z] == c(\"ts.rp\",\"ratio\"))) {\n", 735 | " pred2 <- scaleGrid(pred,time.frame = \"daily\",window.width = 31) %>% redim(drop=TRUE)\n", 736 | " args[[\"y\"]] <- yt2; args[[\"x\"]] <- pred2\n", 737 | " } else {\n", 738 | " args[[\"y\"]] <- yt; args[[\"x\"]] <- pred\n", 739 | " }\n", 740 | " args[[\"measure.code\"]] <- measures[z]\n", 741 | " if (!is.na(index[z])) args[[\"index.code\"]] <- index[z]\n", 742 | " do.call(\"valueMeasure\",args)$Measure\n", 743 | " }) %>% makeMultiGrid()\n", 744 | "})\n", 745 | "save(validation.list, file = \"./Data/temperature/validation.rda\")" 746 | ] 747 | }, 748 | { 749 | "cell_type": "markdown", 750 | "metadata": { 751 | "button": false, 752 | "new_sheet": false, 753 | "run_control": { 754 | "read_only": false 755 | } 756 | }, 757 | "source": [ 758 | "Once the validation indices are calculated, we represent the results in boxplots (Figure 4 of the manuscript)." 759 | ] 760 | }, 761 | { 762 | "cell_type": "code", 763 | "execution_count": null, 764 | "metadata": { 765 | "button": false, 766 | "collapsed": true, 767 | "new_sheet": false, 768 | "run_control": { 769 | "read_only": false 770 | } 771 | }, 772 | "outputs": [], 773 | "source": [ 774 | "ylabs <- c(\"RMSE (ºC)\",\"Cor. deseasonal\",\n", 775 | " \"Standard dev. ratio\",\"bias (ºC)\",\n", 776 | " \"bias P02 (ºC)\",\"bias P98 (ºC)\",\n", 777 | " \"bias AC1\", \"bias WAMS (days)\",\n", 778 | " \"bias CAMS (days)\")\n", 779 | "par(mfrow = c(3,3))\n", 780 | "lapply(1:length(validation.list), FUN = function(z) {\n", 781 | " if (z == 1) {ylim <- c(0.75,1.75)}\n", 782 | " if (z == 2) {ylim <- c(0.9,1)}\n", 783 | " if (z == 3) {ylim <- c(0.9,1.05)}\n", 784 | " if (z == 4) {ylim <- c(-0.2,0.5)}\n", 785 | " if (z == 5) {ylim <- c(-0.5,1.5)}\n", 786 | " if (z == 6) {ylim <- c(-1,1)}\n", 787 | " if (z == 7) {ylim <- c(-0.1,0.1)}\n", 788 | " if (any(z == c(8,9))) {ylim <- c(-3,3)}\n", 789 | " index <- (validation.list[[z]] %>% redim(drop = TRUE))$Data\n", 790 | " dim(index) <- c(nrow(index),prod(dim(index)[2:3]))\n", 791 | " indLand <- (!apply(index,MARGIN = 2,anyNA)) %>% which()\n", 792 | " index <- index[,indLand] %>% t()\n", 793 | " mglm4 <- median(index[,2],na.rm = TRUE)\n", 794 | " perc <- apply(index,MARGIN = 2,FUN = function(z) quantile(z,probs = c(0.1,0.9)))\n", 795 | " boxplot(index, outline = FALSE, ylim = ylim, range = 0.0001, ylab = ylabs[z], asp = 1)\n", 796 | " lines(c(0,8),c(mglm4,mglm4), col = \"red\")\n", 797 | " for (i in 1:ncol(index)) lines(c(i,i),perc[,i], lty = 2)\n", 798 | "})" 799 | ] 800 | }, 801 | { 802 | "cell_type": "markdown", 803 | "metadata": { 804 | "button": false, 805 | "new_sheet": false, 806 | "run_control": { 807 | "read_only": false 808 | } 809 | }, 810 | "source": [ 811 | "In order to obtain a spatial representation of the validation indices computed above we use the function `spatialPlot` of visualizeR. In particular, we plot the deseasonal correlation and the biases of the percentile 2th, 98th and the mean for the glm1, glm4 and CNN10 models (Figure 5 of the manuscript)." 812 | ] 813 | }, 814 | { 815 | "cell_type": "code", 816 | "execution_count": null, 817 | "metadata": { 818 | "button": false, 819 | "collapsed": true, 820 | "new_sheet": false, 821 | "run_control": { 822 | "read_only": false 823 | } 824 | }, 825 | "outputs": [], 826 | "source": [ 827 | "ylabs <- c(\"glm1\",\"glm4\",NA,NA,\"CNN10\")\n", 828 | "mains <- c(NA,\"Cor. deseasonal\",NA,\"bias (ºC)\",\"bias P02 (ºC)\",\"bias P98 (ºC)\")\n", 829 | "cb <- colorRampPalette(brewer.pal(9, \"OrRd\"))(80)\n", 830 | "colsindex <- rev(brewer.pal(n = 9, \"RdBu\"))\n", 831 | "cb2 <- colorRampPalette(colsindex)\n", 832 | "validation.plots <- lapply(c(2,4,5,6),FUN = function(z) {\n", 833 | " lapply(c(1,2,5),FUN = function(zz) {\n", 834 | " if (z == 2) {\n", 835 | " at <- seq(0.85, 1, 0.005); colorbar <- cb\n", 836 | " } else {\n", 837 | " at <- seq(-2, 2, 0.1); colorbar <- cb2\n", 838 | " }\n", 839 | " index <- subsetDimension(validation.list[[z]],dimension = \"var\",indices = zz) %>% \n", 840 | " redim(drop = TRUE)\n", 841 | " spatialPlot(index, backdrop.theme = \"coastline\",\n", 842 | " ylab = ylabs[zz],\n", 843 | " main = paste(mains[z],\n", 844 | " round(mean(abs(index$Data), na.rm = TRUE), digits = 2)),\n", 845 | " col.regions = colorbar,\n", 846 | " at = at,\n", 847 | " set.min = at[1], set.max = at[length(at)], colorkey = TRUE)\n", 848 | " })\n", 849 | "})\n", 850 | "lay = cbind(1:3,4:6,7:9,10:12)\n", 851 | "grid.arrange(grobs = unlist(validation.plots,recursive = FALSE), layout_matrix = lay)" 852 | ] 853 | }, 854 | { 855 | "cell_type": "markdown", 856 | "metadata": {}, 857 | "source": [ 858 | "To provide more insight into the extrapolation abilities of the statistical downscaling methods tested herein, we compute the bias of the frequency of days above the percentile 95th (over the train period) in the observed and predicted (GLM4 and CNN10) time series with respect to the same frequency in the observed train values." 859 | ] 860 | }, 861 | { 862 | "cell_type": "code", 863 | "execution_count": null, 864 | "metadata": {}, 865 | "outputs": [], 866 | "source": [ 867 | "grids <- c(\"yt\",\"predictions_glm1\",\"predictions_glm4\",\"predictions_CNN10\")\n", 868 | "gridsNames <- c(\"Test\",\"GLM1\",\"GLM4\",\"CNN10\")\n", 869 | "p.plots <- lapply(c(0.95,0.99), FUN = function(zzz){\n", 870 | " pX <- apply(yT$Data,MARGIN = c(2,3),FUN = function(z) quantile(z,probs = zzz,na.rm=TRUE))\n", 871 | " if (zzz == 0.95) at <- seq(-0.1, 0.1, length.out = 30)\n", 872 | " if (zzz == 0.99) at <- seq(-0.05, 0.05, length.out = 30)\n", 873 | " lapply(1:length(grids),FUN = function(zz) {\n", 874 | " grid <- get(load(paste0(\"./Data/temperature/\",grids[zz],\".rda\")))\n", 875 | " freq.t <- grid\n", 876 | " dimNames <- attr(grid$Data,\"dimensions\")\n", 877 | " days.t <- lapply(1:getShape(grid,\"time\"),FUN = function(z){\n", 878 | " grid$Data[z,,] > pX \n", 879 | " }) %>% abind::abind(along = 0)\n", 880 | " freq.t$Data <- apply(days.t,MARGIN = c(2,3),FUN = function(z) mean(z,na.rm = TRUE))\n", 881 | " attr(freq.t$Data,\"dimensions\") <- dimNames[-1]\n", 882 | " freq.t <- gridArithmetics(freq.t,1-zzz,operator = \"-\")\n", 883 | " spatialPlot(freq.t,\n", 884 | " backdrop.theme = \"coastline\",\n", 885 | " col.regions = rev(brewer.pal(n = 9, \"RdBu\")) %>% colorRampPalette(),\n", 886 | " main = gridsNames[zz],\n", 887 | " at = at,\n", 888 | " set.min = at[[1]], set.max = at[[length(at)]], \n", 889 | " colorkey = TRUE)\n", 890 | " })\n", 891 | "})\n", 892 | "# P95\n", 893 | "grid.arrange(grobs = p.plots[[1]], ncol = 4)\n", 894 | "# P99\n", 895 | "grid.arrange(grobs = p.plots[[2]], ncol = 4)" 896 | ] 897 | }, 898 | { 899 | "cell_type": "markdown", 900 | "metadata": { 901 | "button": false, 902 | "new_sheet": false, 903 | "run_control": { 904 | "read_only": false 905 | } 906 | }, 907 | "source": [ 908 | "## 4. Precipitation\n", 909 | "In this section we present the code needed to downscale precipitation. Though the steps taken are very similar to those of temperature there are some particularities that are good to mention. We start by loading the precipitation using `loadGridData` and to convert to binary the precipitation (values greater than 1mm/day of rain are set to 1 and the rest to 0)." 910 | ] 911 | }, 912 | { 913 | "cell_type": "code", 914 | "execution_count": null, 915 | "metadata": { 916 | "button": false, 917 | "collapsed": true, 918 | "new_sheet": false, 919 | "run_control": { 920 | "read_only": false 921 | } 922 | }, 923 | "outputs": [], 924 | "source": [ 925 | "y <- loadGridData(dataset = \"E-OBS_v14_0.50regular\",\n", 926 | " var = \"pr\",lonLim = c(-10,32),\n", 927 | " latLim = c(36,72), \n", 928 | " years = 1979:2008)\n", 929 | "\n", 930 | "yT <- subsetGrid(y,years = 1979:2002)\n", 931 | "yT_bin <- binaryGrid(yT,threshold = 1,condition = \"GT\")\n", 932 | "yt <- subsetGrid(y,years = 2003:2008)\n", 933 | "yt_bin <- binaryGrid(yt,threshold = 1,condition = \"GT\")" 934 | ] 935 | }, 936 | { 937 | "cell_type": "markdown", 938 | "metadata": { 939 | "button": false, 940 | "new_sheet": false, 941 | "run_control": { 942 | "read_only": false 943 | } 944 | }, 945 | "source": [ 946 | "We can take a look at the grid resolutions of ERA-Interim (2º) and E-OBS (0.5º) by plotting the specific humidity at 1000hPa and the observed precipitation (Figure 1 of the manuscript). " 947 | ] 948 | }, 949 | { 950 | "cell_type": "code", 951 | "execution_count": null, 952 | "metadata": { 953 | "button": false, 954 | "collapsed": true, 955 | "new_sheet": false, 956 | "run_control": { 957 | "read_only": false 958 | } 959 | }, 960 | "outputs": [], 961 | "source": [ 962 | "colsindex <- brewer.pal(n = 9, \"BrBG\")\n", 963 | "cb <- colorRampPalette(colsindex)\n", 964 | "coords_x <- expand.grid(xt$xyCoords$x,xt$xyCoords$y) ; names(coords_x) <- c(\"x\",\"y\")\n", 965 | "coords_y <- expand.grid(yt$xyCoords$x,yt$xyCoords$y) ; names(coords_y) <- c(\"x\",\"y\")\n", 966 | "\n", 967 | "pplot <- list()\n", 968 | "pplot[[1]] <- spatialPlot(climatology(subsetGrid(xt,var = \"hus@1000\")), backdrop.theme = \"coastline\",\n", 969 | " main = \"Q1000 (ERA-Interim)\",\n", 970 | " col.regions = cb,\n", 971 | " at = seq(0,0.01, 0.0001),\n", 972 | " set.min = 0, set.max = 0.01, colorkey = TRUE,\n", 973 | " sp.layout = list(list(SpatialPoints(coords_x), \n", 974 | " first = FALSE, col = \"black\", \n", 975 | " pch = 20, cex = 1)))\n", 976 | "pplot[[2]] <- spatialPlot(climatology(yt), backdrop.theme = \"coastline\", \n", 977 | " main = \"Precipitation (E-OBS)\",\n", 978 | " col.regions = cb,\n", 979 | " at = seq(0, 4, 0.1),\n", 980 | " set.min = 0, set.max = 4, colorkey = TRUE,\n", 981 | " sp.layout = list(list(SpatialPoints(coords_y), \n", 982 | " first = FALSE, col = \"black\", \n", 983 | " pch = 20, cex = 1)))\n", 984 | "\n", 985 | "lay = rbind(c(1,2))\n", 986 | "grid.arrange(grobs = pplot, layout_matrix = lay)" 987 | ] 988 | }, 989 | { 990 | "cell_type": "markdown", 991 | "metadata": { 992 | "button": false, 993 | "new_sheet": false, 994 | "run_control": { 995 | "read_only": false 996 | } 997 | }, 998 | "source": [ 999 | "We can visualize some statistics of the train and test distributions, such as the climatology, or the percentiles 2th and 98th in order to gain knowledge about the observed data (Figure 2 of the manuscript). To compute the statistics we use the library `climate4R.value`." 1000 | ] 1001 | }, 1002 | { 1003 | "cell_type": "code", 1004 | "execution_count": null, 1005 | "metadata": { 1006 | "button": false, 1007 | "collapsed": true, 1008 | "new_sheet": false, 1009 | "run_control": { 1010 | "read_only": false 1011 | } 1012 | }, 1013 | "outputs": [], 1014 | "source": [ 1015 | "pplot <- at <- list()\n", 1016 | "n1 <- 0; n2 <- 3\n", 1017 | "indexNames <- c(\"Climatology\", \"Frequency of rain\", \"P98\")\n", 1018 | "for (indexName in indexNames) {\n", 1019 | " if (indexName == \"Climatology\") {\n", 1020 | " indexTrain <- valueIndex(yT,index.code = \"Mean\")$Index %>% redim() \n", 1021 | " indexTest <- valueIndex(yt,index.code = \"Mean\")$Index %>% redim()\n", 1022 | " at[[1]] <- seq(0, 4, 0.1); at[[2]] <- seq(-1, 1, 0.1)\n", 1023 | " }\n", 1024 | " if (indexName == \"Frequency of rain\") {\n", 1025 | " indexTrain <- valueIndex(yT_bin,index.code = \"Mean\")$Index %>% redim() \n", 1026 | " indexTest <- valueIndex(yt_bin,index.code = \"Mean\")$Index %>% redim()\n", 1027 | " at[[1]] <- seq(0, 0.5, 0.01); at[[2]] <- seq(-0.1, 0.1, 0.01)\n", 1028 | " }\n", 1029 | " if (indexName == \"P98\") {\n", 1030 | " indexTrain <- valueIndex(yT,index.code = \"P98\")$Index %>% redim() \n", 1031 | " indexTest <- valueIndex(yt,index.code = \"P98\")$Index %>% redim()\n", 1032 | " at[[1]] <- seq(10, 20, 0.25); at[[2]] <- seq(-5, 5, 0.2)\n", 1033 | " }\n", 1034 | " \n", 1035 | " for (i in 1:2) {\n", 1036 | " if (i == 1) {\n", 1037 | " dataset <- \"(train)\"; index <- indexTrain; n1 <- n1 + 1; n <- n1\n", 1038 | " }\n", 1039 | " if (i == 2) {\n", 1040 | " indexTest <- gridArithmetics(indexTest,indexTrain,operator = \"-\")\n", 1041 | " dataset <- \"(test bias)\"; index <- indexTest; n2 <- n2 + 1; n <- n2\n", 1042 | " }\n", 1043 | " pplot[[n]] <- spatialPlot(climatology(index), backdrop.theme = \"coastline\", \n", 1044 | " main = paste(indexName,paste0(dataset,\":\"),\n", 1045 | " round(mean(abs(index$Data), na.rm = TRUE),digits = 2)),\n", 1046 | " col.regions = cb,\n", 1047 | " at = at[[i]],\n", 1048 | " set.min = at[[i]][1], set.max = at[[i]][length(at[[i]])], \n", 1049 | " colorkey = TRUE)\n", 1050 | " }\n", 1051 | "}\n", 1052 | "\n", 1053 | "lay = rbind(c(1,2,3),\n", 1054 | " c(4,5,6))\n", 1055 | "grid.arrange(grobs = pplot, layout_matrix = lay)" 1056 | ] 1057 | }, 1058 | { 1059 | "cell_type": "markdown", 1060 | "metadata": { 1061 | "button": false, 1062 | "new_sheet": false, 1063 | "run_control": { 1064 | "read_only": false 1065 | } 1066 | }, 1067 | "source": [ 1068 | "### 4.1 Downscaling with generalized linear models (precipitation)" 1069 | ] 1070 | }, 1071 | { 1072 | "cell_type": "markdown", 1073 | "metadata": { 1074 | "button": false, 1075 | "new_sheet": false, 1076 | "run_control": { 1077 | "read_only": false 1078 | } 1079 | }, 1080 | "source": [ 1081 | "In the case of precipitation, there are 2 generalized linear models (GLMs): one to predict the occurrence of precipitation with binomial family and link logit and another to predict tha rainfall amount based on a gamma family and link logarithmic. We train the model and predict on the test set using the package `downscaleR`. In particular we train two configurations using as predictor data the closest neighbour (glm1) or the 4 closest neigbours (glm4). We obtain stochastic predictions by setting the parameter `simulate` to `TRUE`. We refer the reader to the [downscaleR wiki](https://github.com/SantanderMetGroup/downscaleR/wiki) for a detailed explanation about the working of the downscaleR functions" 1082 | ] 1083 | }, 1084 | { 1085 | "cell_type": "code", 1086 | "execution_count": null, 1087 | "metadata": { 1088 | "button": false, 1089 | "collapsed": true, 1090 | "new_sheet": false, 1091 | "run_control": { 1092 | "read_only": false 1093 | } 1094 | }, 1095 | "outputs": [], 1096 | "source": [ 1097 | "simulateName <- c(\"deterministic\",\"stochastic\")\n", 1098 | "glmName <- c(\"glm1\",\"glm4\")\n", 1099 | "neighs <- c(1,4)\n", 1100 | "y.ocu <- binaryGrid(yT,condition = \"GT\",threshold = 1)\n", 1101 | "y.rest <- gridArithmetics(yT,1,operator = \"-\")\n", 1102 | "simulateGLM <- c(FALSE,TRUE)\n", 1103 | "lapply(1:length(glmName), FUN = function(z){\n", 1104 | " lapply(1:length(simulateGLM),FUN = function(zz) {\n", 1105 | " pred <- downscaleChunk(x = xT, y = y.ocu, newdata = list(xt),\n", 1106 | " method = \"GLM\", \n", 1107 | " family = binomial(link = \"logit\"), \n", 1108 | " simulate = simulateGLM[zz],\n", 1109 | " prepareData.args = list(local.predictors = list(n=neighs[z],\n", 1110 | " vars = getVarNames(xT)))\n", 1111 | " )\n", 1112 | " pred_ocu_train <- pred[[1]] %>% redim(drop = TRUE)\n", 1113 | " pred_ocu <- pred[[2]] %>% redim(drop = TRUE)\n", 1114 | " rm(pred)\n", 1115 | " pred_amo <- downscaleChunk(x = xT, y = y.rest, newdata = list(xt),\n", 1116 | " method = \"GLM\", \n", 1117 | " family = Gamma(link = \"log\"), \n", 1118 | " simulate = simulateGLM[zz],\n", 1119 | " condition = \"GT\", threshold = 0,\n", 1120 | " prepareData.args = list(local.predictors = list(n=neighs[z], \n", 1121 | " vars = getVarNames(xT))))[[2]] %>% \n", 1122 | " gridArithmetics(1,operator = \"+\") %>% redim(drop = TRUE)\n", 1123 | " pred_bin <- binaryGrid(pred_ocu,ref.obs = yT_bin,ref.pred = pred_ocu_train); rm(pred_ocu_train)\n", 1124 | " save(pred_bin,pred_ocu,pred_amo,\n", 1125 | " file = paste0(\"./Data/precip/predictions_\",simulateName[zz],\"_\",glmName[z],\".rda\"))\n", 1126 | " })\n", 1127 | "})" 1128 | ] 1129 | }, 1130 | { 1131 | "cell_type": "markdown", 1132 | "metadata": { 1133 | "button": false, 1134 | "new_sheet": false, 1135 | "run_control": { 1136 | "read_only": false 1137 | } 1138 | }, 1139 | "source": [ 1140 | "### 4.2 Downscaling with deep neural networks (precipitation)\n", 1141 | "As done with temperature we define a function containing the topologies intercompared in the study. The main difference with respect to the downscaling of temperature is that for precipitation we infer the probability of rain and the shape and scale parameters of a *Gamma* distribution, and therefore there are three output parameters per predictand's gridpoint." 1142 | ] 1143 | }, 1144 | { 1145 | "cell_type": "code", 1146 | "execution_count": null, 1147 | "metadata": { 1148 | "button": false, 1149 | "collapsed": true, 1150 | "new_sheet": false, 1151 | "run_control": { 1152 | "read_only": false 1153 | } 1154 | }, 1155 | "outputs": [], 1156 | "source": [ 1157 | "deepName <- c(\"CNN-LM\",\"CNN1\",\"CNN10\",\"CNN-PR\",\"CNNdense\")\n", 1158 | "architectures <- function(architecture,input_shape,output_shape) {\n", 1159 | " if (architecture == \"CNN-LM\") {\n", 1160 | " inputs <- layer_input(shape = input_shape)\n", 1161 | " x = inputs\n", 1162 | " l1 = layer_conv_2d(x ,filters = 50, kernel_size = c(3,3), activation = 'linear', padding = \"same\")\n", 1163 | " l2 = layer_conv_2d(l1,filters = 25, kernel_size = c(3,3), activation = 'linear', padding = \"same\")\n", 1164 | " l3 = layer_conv_2d(l2,filters = 1, kernel_size = c(3,3), activation = 'linear', padding = \"same\")\n", 1165 | " l4 = layer_flatten(l3)\n", 1166 | " parameter1 = layer_dense(l4,units = output_shape, activation = \"sigmoid\")\n", 1167 | " parameter2 = layer_dense(l4,units = output_shape)\n", 1168 | " parameter3 = layer_dense(l4,units = output_shape)\n", 1169 | " outputs = layer_concatenate(list(parameter1,parameter2,parameter3))\n", 1170 | " model <- keras_model(inputs = inputs, outputs = outputs)\n", 1171 | " }\n", 1172 | " \n", 1173 | " if (architecture == \"CNN1\") {\n", 1174 | " inputs <- layer_input(shape = input_shape)\n", 1175 | " x = inputs\n", 1176 | " l1 = layer_conv_2d(x ,filters = 50, kernel_size = c(3,3), activation = 'relu', padding = \"same\")\n", 1177 | " l2 = layer_conv_2d(l1,filters = 25, kernel_size = c(3,3), activation = 'relu', padding = \"same\")\n", 1178 | " l3 = layer_conv_2d(l2,filters = 1, kernel_size = c(3,3), activation = 'relu', padding = \"same\")\n", 1179 | " l4 = layer_flatten(l3)\n", 1180 | " parameter1 = layer_dense(l4,units = output_shape, activation = \"sigmoid\")\n", 1181 | " parameter2 = layer_dense(l4,units = output_shape)\n", 1182 | " parameter3 = layer_dense(l4,units = output_shape)\n", 1183 | " outputs = layer_concatenate(list(parameter1,parameter2,parameter3))\n", 1184 | " model <- keras_model(inputs = inputs, outputs = outputs)\n", 1185 | " }\n", 1186 | " \n", 1187 | " if (architecture == \"CNN10\") {\n", 1188 | " inputs <- layer_input(shape = input_shape)\n", 1189 | " x = inputs\n", 1190 | " l1 = layer_conv_2d(x ,filters = 50, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 1191 | " l2 = layer_conv_2d(l1,filters = 25, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 1192 | " l3 = layer_conv_2d(l2,filters = 10, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 1193 | " l4 = layer_flatten(l3)\n", 1194 | " parameter1 = layer_dense(l4,units = output_shape, activation = \"sigmoid\")\n", 1195 | " parameter2 = layer_dense(l4,units = output_shape)\n", 1196 | " parameter3 = layer_dense(l4,units = output_shape)\n", 1197 | " outputs = layer_concatenate(list(parameter1,parameter2,parameter3))\n", 1198 | " model <- keras_model(inputs = inputs, outputs = outputs)\n", 1199 | " }\n", 1200 | " \n", 1201 | " if (architecture == \"CNN-PR\") {\n", 1202 | " inputs <- layer_input(shape = input_shape)\n", 1203 | " x = inputs\n", 1204 | " l1 = layer_conv_2d(x ,filters = 10, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 1205 | " l2 = layer_conv_2d(l1,filters = 25, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 1206 | " l3 = layer_conv_2d(l2,filters = 50, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 1207 | " l4 = layer_flatten(l3)\n", 1208 | " parameter1 = layer_dense(l4,units = output_shape, activation = \"sigmoid\")\n", 1209 | " parameter2 = layer_dense(l4,units = output_shape)\n", 1210 | " parameter3 = layer_dense(l4,units = output_shape)\n", 1211 | " outputs = layer_concatenate(list(parameter1,parameter2,parameter3))\n", 1212 | " model <- keras_model(inputs = inputs, outputs = outputs)\n", 1213 | " }\n", 1214 | " \n", 1215 | " if (architecture == \"CNNdense\") {\n", 1216 | " inputs <- layer_input(shape = input_shape)\n", 1217 | " x = inputs\n", 1218 | " l1 = layer_conv_2d(x ,filters = 50, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 1219 | " l2 = layer_conv_2d(l1,filters = 25, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 1220 | " l3 = layer_conv_2d(l2,filters = 10, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 1221 | " l4 = layer_flatten(l3)\n", 1222 | " l5 = layer_dense(l4,units = 50, activation = \"relu\")\n", 1223 | " l6 = layer_dense(l5,units = 50, activation = \"relu\")\n", 1224 | " parameter1 = layer_dense(l6,units = output_shape, activation = \"sigmoid\")\n", 1225 | " parameter2 = layer_dense(l6,units = output_shape)\n", 1226 | " parameter3 = layer_dense(l6,units = output_shape)\n", 1227 | " outputs = layer_concatenate(list(parameter1,parameter2,parameter3))\n", 1228 | " model <- keras_model(inputs = inputs, outputs = outputs)\n", 1229 | " }\n", 1230 | " \n", 1231 | " return(model)\n", 1232 | "}" 1233 | ] 1234 | }, 1235 | { 1236 | "cell_type": "markdown", 1237 | "metadata": { 1238 | "button": false, 1239 | "new_sheet": false, 1240 | "run_control": { 1241 | "read_only": false 1242 | } 1243 | }, 1244 | "source": [ 1245 | "We prepare the predictor and predictand datasets. We substract 1 to the precipitation (*yT*) to center the conditional Gamma distribution in 0 and convert to 0 the negative values with binaryGrid in order to avoid issues with logarithms when minimizing the loss function. This will be later added to the prediction output." 1246 | ] 1247 | }, 1248 | { 1249 | "cell_type": "code", 1250 | "execution_count": null, 1251 | "metadata": { 1252 | "button": false, 1253 | "collapsed": true, 1254 | "new_sheet": false, 1255 | "run_control": { 1256 | "read_only": false 1257 | } 1258 | }, 1259 | "outputs": [], 1260 | "source": [ 1261 | "xy.T <- prepareData.keras(xT,binaryGrid(gridArithmetics(yT,1,operator = \"-\"),\n", 1262 | " condition = \"GE\",\n", 1263 | " threshold = 0,\n", 1264 | " partial = TRUE),\n", 1265 | " first.connection = \"conv\",\n", 1266 | " last.connection = \"dense\",\n", 1267 | " channels = \"last\")\n", 1268 | "xy.tT <- prepareNewData.keras(xT,xy.T)\n", 1269 | "xy.t <- prepareNewData.keras(xt,xy.T)" 1270 | ] 1271 | }, 1272 | { 1273 | "cell_type": "markdown", 1274 | "metadata": { 1275 | "button": false, 1276 | "new_sheet": false, 1277 | "run_control": { 1278 | "read_only": false 1279 | } 1280 | }, 1281 | "source": [ 1282 | "We loop over the topologies to train the deep models and predict over the test set. Unlike GLMs where there was two models, one for the occurrence of rain and other for the amount of rain, with deep learning we minimize the negative log-likelihood of a *Bernouilli Gamma* distribution and therefore both the occurrence and quantity of rain for a given day are derived from these infered parameters. The custom loss function `bernouilliGamma.loss_function` is part of `downscaleR.keras`. In addition we use the function `bernouilliGamma.statistics` (also from `downscaleR.keras`) to compute the deterministic (i.e., expectance of the conditional distribution) or the stochastic (i.e., sample from the conditional distribution) prediction. " 1283 | ] 1284 | }, 1285 | { 1286 | "cell_type": "code", 1287 | "execution_count": null, 1288 | "metadata": { 1289 | "button": false, 1290 | "collapsed": true, 1291 | "new_sheet": false, 1292 | "run_control": { 1293 | "read_only": false 1294 | } 1295 | }, 1296 | "outputs": [], 1297 | "source": [ 1298 | "simulateName <- c(\"deterministic\",\"stochastic\")\n", 1299 | "simulateDeep <- c(FALSE,TRUE)\n", 1300 | "lapply(1:length(deepName), FUN = function(z){\n", 1301 | " model <- architectures(architecture = deepName[z],\n", 1302 | " input_shape = dim(xy.T$x.global)[-1],\n", 1303 | " output_shape = dim(xy.T$y$Data)[2])\n", 1304 | " downscaleTrain.keras(obj = xy.T,\n", 1305 | " model = model,\n", 1306 | " clear.session = TRUE,\n", 1307 | " compile.args = list(\"loss\" = bernouilliGamma.loss_function(last.connection = \"dense\"),\n", 1308 | " \"optimizer\" = optimizer_adam(lr = 0.0001)),\n", 1309 | " fit.args = list(\"batch_size\" = 100,\n", 1310 | " \"epochs\" = 1000,\n", 1311 | " \"validation_split\" = 0.1,\n", 1312 | " \"verbose\" = 1,\n", 1313 | " \"callbacks\" = list(callback_early_stopping(patience = 30),\n", 1314 | " callback_model_checkpoint(filepath=paste0('./models/precip/',deepName[z],'.h5'),\n", 1315 | " monitor='val_loss', save_best_only=TRUE))))\n", 1316 | " lapply(1:length(simulateDeep),FUN = function(zz) {\n", 1317 | " pred_ocu_train <- downscalePredict.keras(newdata = xy.tT,\n", 1318 | " model = list(\"filepath\" = \n", 1319 | " paste0(\"./models/precip/\",deepName[z],\".h5\"), \n", 1320 | " \"custom_objects\" = \n", 1321 | " c(\"custom_loss\" = \n", 1322 | " bernouilliGamma.loss_function(\n", 1323 | " last.connection = \"dense\"))),\n", 1324 | " C4R.template = yT,\n", 1325 | " clear.session = TRUE) %>% \n", 1326 | " subsetGrid(var = \"pr1\")\n", 1327 | " pred <- downscalePredict.keras(newdata = xy.t,\n", 1328 | " model = list(\"filepath\" = \n", 1329 | " paste0(\"./models/precip/\",deepName[z],\".h5\"), \n", 1330 | " \"custom_objects\" = \n", 1331 | " c(\"custom_loss\" = \n", 1332 | " bernouilliGamma.loss_function(last.connection = \"dense\"))),\n", 1333 | " C4R.template = yT,\n", 1334 | " clear.session = TRUE) \n", 1335 | " pred <- bernouilliGamma.statistics(p = subsetGrid(pred,var = \"pr1\"),\n", 1336 | " alpha = subsetGrid(pred,var = \"pr2\"),\n", 1337 | " beta = subsetGrid(pred,var = \"pr3\"),\n", 1338 | " simulate = simulateDeep[zz],\n", 1339 | " bias = 1)\n", 1340 | " pred_ocu <- subsetGrid(pred,var = \"probOfRain\") %>% redim(drop = TRUE)\n", 1341 | " pred_amo <- subsetGrid(pred,var = \"amountOfRain\") %>% redim(drop = TRUE)\n", 1342 | " pred_bin <- binaryGrid(pred_ocu,ref.obs = yT_bin,ref.pred = pred_ocu_train); rm(pred_ocu_train)\n", 1343 | " save(pred_bin,pred_ocu,pred_amo,file = \n", 1344 | " paste0(\"./Data/precip/predictions_\",simulateName[zz],\"_\",deepName[z],\".rda\"))\n", 1345 | " })\n", 1346 | "})" 1347 | ] 1348 | }, 1349 | { 1350 | "cell_type": "markdown", 1351 | "metadata": { 1352 | "button": false, 1353 | "new_sheet": false, 1354 | "run_control": { 1355 | "read_only": false 1356 | } 1357 | }, 1358 | "source": [ 1359 | "### 4.3 Validation of results (precipitation)\n", 1360 | "In this code, we calculate the validation indices by using the library `climate4R.value` of `C4R`. In particular the indices used are: the Root Mean Squared Error (RMSE), the deseasonal perason correlation, the biases of the climatology and of the percentile 2th and 98th and the ratio of the standard deviations." 1361 | ] 1362 | }, 1363 | { 1364 | "cell_type": "code", 1365 | "execution_count": null, 1366 | "metadata": { 1367 | "button": false, 1368 | "collapsed": true, 1369 | "new_sheet": false, 1370 | "run_control": { 1371 | "read_only": false 1372 | } 1373 | }, 1374 | "outputs": [], 1375 | "source": [ 1376 | "simulateName <- c(rep(\"deterministic\",5),\"stochastic\",rep(\"deterministic\",3))\n", 1377 | "models <- c(\"glm1\",\"glm4\",\n", 1378 | " \"CNN-LM\",\"CNN1\",\"CNN10\",\n", 1379 | " \"CNN-PR\",\"CNNdense\")\n", 1380 | "measures <- c(\"ts.rocss\",\"ts.RMSE\",\"ts.rs\",rep(\"biasRel\",3),rep(\"bias\",3))\n", 1381 | "index <- c(rep(NA,3),\"Mean\",rep(\"P98\",2),\"AnnualCycleRelAmp\",\n", 1382 | " \"WetAnnualMaxSpell\",\"DryAnnualMaxSpell\")\n", 1383 | "validation.list <- lapply(1:length(measures), FUN = function(z) {\n", 1384 | " lapply(1:length(models), FUN = function(zz){\n", 1385 | " args <- list()\n", 1386 | " load(paste0(\"./Data/precip/predictions_\",simulateName[z],\"_\",models[zz],\".rda\"))\n", 1387 | " if (simulateName[z] == \"deterministic\") {\n", 1388 | " pred <- gridArithmetics(pred_bin,pred_amo,operator = \"*\")\n", 1389 | " if (measures[z] == \"ts.rocss\") {\n", 1390 | " args[[\"y\"]] <- yt_bin; args[[\"x\"]] <- pred_ocu\n", 1391 | " } else if (measures[z] == \"ts.RMSE\") {\n", 1392 | " args[[\"y\"]] <- yt; args[[\"x\"]] <- pred_amo\n", 1393 | " args[[\"condition\"]] = \"GT\"; args[[\"threshold\"]] = 1; args[[\"which.wetdays\"]] = \"Observation\" \n", 1394 | " } else {\n", 1395 | " args[[\"y\"]] <- yt; args[[\"x\"]] <- pred\n", 1396 | " }\n", 1397 | " } else {\n", 1398 | " pred <- gridArithmetics(pred_ocu,pred_amo,operator = \"*\")\n", 1399 | " args[[\"y\"]] <- yt; args[[\"x\"]] <- pred\n", 1400 | " }\n", 1401 | " args[[\"measure.code\"]] <- measures[z]\n", 1402 | " if (!is.na(index[z])) args[[\"index.code\"]] <- index[z]\n", 1403 | " do.call(\"valueMeasure\",args)$Measure\n", 1404 | " }) %>% makeMultiGrid()\n", 1405 | "})\n", 1406 | "save(validation.list, file = \"./Data/precip/validation.rda\")" 1407 | ] 1408 | }, 1409 | { 1410 | "cell_type": "markdown", 1411 | "metadata": { 1412 | "button": false, 1413 | "new_sheet": false, 1414 | "run_control": { 1415 | "read_only": false 1416 | } 1417 | }, 1418 | "source": [ 1419 | "Once the validation indices are calculated, we represent the results in boxplots (Figure 6 of the manuscript)." 1420 | ] 1421 | }, 1422 | { 1423 | "cell_type": "code", 1424 | "execution_count": null, 1425 | "metadata": { 1426 | "button": false, 1427 | "collapsed": true, 1428 | "new_sheet": false, 1429 | "run_control": { 1430 | "read_only": false 1431 | } 1432 | }, 1433 | "outputs": [], 1434 | "source": [ 1435 | "par(mfrow = c(3,3)) \n", 1436 | "ylabs <- c(\"ROCSS\",\"RMSE (wet days, mm)\",\n", 1437 | " \"Spearman Corr.\",\"biasRel(%)\",\n", 1438 | " \"biasRel P98 (DET, %)\",\"biasRel P98 (STO, %)\",\n", 1439 | " \"Annual Cycle Rel. Amplitude\", \"biasRel WetAMS (days)\",\n", 1440 | " \"biasRel DryAMS (days)\")\n", 1441 | "lapply(1:length(validation.list), FUN = function(z) {\n", 1442 | " if (z == 1) {ylim <- c(0.65,0.9)}\n", 1443 | " if (z == 2) {ylim <- c(3,6.5)}\n", 1444 | " if (z == 3) {ylim <- c(0.5,0.8)}\n", 1445 | " if (z == 4) {ylim <- c(-0.2,0.2)}\n", 1446 | " if (z == 5) {ylim <- c(-0.4,0.0)}\n", 1447 | " if (z == 6) {ylim <- c(-0.2,0.2)}\n", 1448 | " if (z == 7) {ylim <- c(-1,1)}\n", 1449 | " if (any(z == c(8,9))) {ylim <- c(-1,1)}\n", 1450 | " index <- (validation.list[[z]] %>% redim(drop = TRUE))$Data\n", 1451 | " dim(index) <- c(nrow(index),prod(dim(index)[2:3]))\n", 1452 | " indLand <- (!apply(index,MARGIN = 2,anyNA)) %>% which()\n", 1453 | " index <- index[,indLand] %>% t()\n", 1454 | " mglm4 <- median(index[,2],na.rm = TRUE)\n", 1455 | " perc <- apply(index,MARGIN = 2,FUN = function(z) quantile(z,probs = c(0.1,0.9)))\n", 1456 | " boxplot(index, outline = FALSE, ylim = ylim, range = 0.0001, ylab = ylabs[z], asp = 1)\n", 1457 | " lines(c(0,8),c(mglm4,mglm4), col = \"red\")\n", 1458 | " for (i in 1:ncol(index)) lines(c(i,i),perc[,i], lty = 2)\n", 1459 | "})" 1460 | ] 1461 | }, 1462 | { 1463 | "cell_type": "markdown", 1464 | "metadata": { 1465 | "button": false, 1466 | "new_sheet": false, 1467 | "run_control": { 1468 | "read_only": false 1469 | } 1470 | }, 1471 | "source": [ 1472 | "In order to obtain a spatial representation of the validation indices computed above we use the function `spatialPlot` of `visualizeR`. In particular, we plot the ROCSS, the spearman correlation and the relative biases of the mean and the P98 for the glm1, glm4 and CNN1 models (Figure 7 of the manuscript)." 1473 | ] 1474 | }, 1475 | { 1476 | "cell_type": "code", 1477 | "execution_count": null, 1478 | "metadata": { 1479 | "button": false, 1480 | "collapsed": true, 1481 | "new_sheet": false, 1482 | "run_control": { 1483 | "read_only": false 1484 | } 1485 | }, 1486 | "outputs": [], 1487 | "source": [ 1488 | "ylabs <- c(\"glm1\",\"glm4\",NA,\"CNN1\")\n", 1489 | "mains <- c(\"ROCSS\",NA,\"Spearman Corr.\",\"biasRel\",\"biasRel P98\")\n", 1490 | "cb <- colorRampPalette(brewer.pal(9, \"OrRd\"))(80)\n", 1491 | "colsindex <- rev(brewer.pal(n = 9, \"RdBu\"))\n", 1492 | "cb2 <- colorRampPalette(colsindex)\n", 1493 | "validation.plots <- lapply(c(1,3,4,5),FUN = function(z) {\n", 1494 | " lapply(c(1,2,4),FUN = function(zz) {\n", 1495 | " if (z == 1) {\n", 1496 | " at <- seq(0.5, 1, 0.01); colorbar <- cb\n", 1497 | " } else if (z == 3) {\n", 1498 | " at <- seq(0.5, 1, 0.02); colorbar <- cb\n", 1499 | " } else {\n", 1500 | " at <- seq(-0.5, 0.5, 0.01); colorbar <- cb2\n", 1501 | " }\n", 1502 | " index <- subsetDimension(validation.list[[z]],dimension = \"var\",indices = zz) %>% redim(drop = TRUE)\n", 1503 | " spatialPlot(index, backdrop.theme = \"coastline\",\n", 1504 | " ylab = ylabs[zz],\n", 1505 | " main = paste(mains[z],\n", 1506 | " round(mean(abs(index$Data), na.rm = TRUE), digits = 2)),\n", 1507 | " col.regions = colorbar,\n", 1508 | " at = at,\n", 1509 | " set.min = at[1], set.max = at[length(at)], colorkey = TRUE)\n", 1510 | " })\n", 1511 | "})\n", 1512 | "lay = cbind(1:3,4:6,7:9,10:12)\n", 1513 | "grid.arrange(grobs = unlist(validation.plots,recursive = FALSE), layout_matrix = lay)" 1514 | ] 1515 | }, 1516 | { 1517 | "cell_type": "markdown", 1518 | "metadata": { 1519 | "button": false, 1520 | "new_sheet": false, 1521 | "run_control": { 1522 | "read_only": false 1523 | } 1524 | }, 1525 | "source": [ 1526 | "## 5. Technical aspects\n", 1527 | "To perform all the stages involved in this study we relied on the local machine described below.\n", 1528 | " \n", 1529 | "1. Local Machine (HP-ProDesk-600-G2-MT)\n", 1530 | " + Operating system: ubuntu 4.15.0-72-generic\n", 1531 | " + Memory: 15.6 GiB\n", 1532 | " + Processor: Intel® Core™ i7-6700 CPU @ 3.40GHz × 8 \n", 1533 | " + SO: 64 bits\n", 1534 | " + Disc: 235.1 GiB" 1535 | ] 1536 | } 1537 | ], 1538 | "metadata": { 1539 | "kernelspec": { 1540 | "display_name": "R", 1541 | "language": "R", 1542 | "name": "ir" 1543 | }, 1544 | "language_info": { 1545 | "codemirror_mode": "r", 1546 | "file_extension": ".r", 1547 | "mimetype": "text/x-r-source", 1548 | "name": "R", 1549 | "pygments_lexer": "r", 1550 | "version": "3.6.2" 1551 | } 1552 | }, 1553 | "nbformat": 4, 1554 | "nbformat_minor": 2 1555 | } 1556 | -------------------------------------------------------------------------------- /2022_Bano_GMD.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Downscaling Multi-Model Climate Projection Ensembles with Deep Learning (DeepESD): Contribution to CORDEX EUR-44\n", 8 | "### *Geoscientific Model Development*\n", 9 | "### J. Baño-Medina, R. Manzanas, E. Cimadevilla, J. Fernández, J. González-Abad, A.S. Cofiño, and J.M. Gutiérrez\n", 10 | "\n", 11 | "GitHub repository at https://github.com/SantanderMetGroup/DeepDownscaling" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "This notebook reproduces the results presented in **Downscaling Multi-Model Climate Projection Ensembles with Deep Learning (DeepESD): Contribution to CORDEX EUR-44***, submitted to *Geoscientific Model Development* by *J. Baño-Medina, R. Manzanas, E. Cimadevilla, J. Fernández, J. González-Abad, A.S. Cofiño and J.M. Gutiérrez*. \n", 19 | "This paper presents *DeepESD*, the first dataset of high-resolution (0.5º) climate change projections (up to 2100) of daily precipitation and temperature over Europe obtained with deep learning techniques (in particular convolutional neural networks) from an ensemble of eight global climate models from the Coupled Model Intercomparison Project version 5 (CMIP5). \n", 20 | "\n", 21 | "**Note:** The technical specifications of the machine used to run the code presented herein can be found at the end of the notebook. " 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "## 1. Preparing the R environment and working directories\n", 29 | "This notebook is written in the free programming language `R` (version 3.6.1) and builds on [`climate4R`](https://github.com/SantanderMetGroup/climate4R) (hereafter C4R), a suite of `R` packages developed by the [Santander Met Group](http://meteo.unican.es) for transparent climate data access, post processing (including bias correction and downscaling) and visualization. For details on climate4R (C4R hereafter), the interested reader is referred to [Iturbide et al. 2019](https://www.sciencedirect.com/science/article/pii/S1364815218303049?via%3Dihub).\n", 30 | "\n", 31 | "In particular, the following C4R libraries are used along the notebook: [`loadeR`](https://github.com/SantanderMetGroup/loadeR) and [`loadeR.2nc`](https://github.com/SantanderMetGroup/loadeR.2nc) (data loading), [`transformeR`](https://github.com/SantanderMetGroup/transformeR) (data manipulation), [`downscaleR`](https://github.com/SantanderMetGroup/downscaleR) and [`downscaleR.keras`](https://github.com/SantanderMetGroup/downscaleR.keras) (downscaling with neural networks) and [`visualizeR`](https://github.com/SantanderMetGroup/visualizeR) (visualization). To install them you may use the devtools package (e.g., ``devtools::install_github(\"SantanderMetGroup/downscaleR.keras@v1.0.0\")`` to install `downscaleR.keras`). Alternatively, you may directly install the entire C4R framework through ``conda``, C4R version 1.5.0, following the instructions provided at the end of this [page](https://github.com/SantanderMetGroup/climate4R). The latter option is highly recommended. Note that even with C4R v1.5.0 installed via conda, you still need to upgrade libraries `climate4R.UDG` and `VALUE` with the `devtools` package by typing: ``devtools::install_github(c(\"SantanderMetGroup/climate4R.UDG@v0.2.2\",\"SantanderMetGroup/VALUE@v2.2.2\"))``." 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": 3, 37 | "metadata": {}, 38 | "outputs": [], 39 | "source": [ 40 | "options(java.parameters = \"-Xmx8g\") # expanding Java memory\n", 41 | "\n", 42 | "# C4R libraries\n", 43 | "library(loadeR) # version v1.7.0\n", 44 | "library(loadeR.2nc) # version v0.1.1\n", 45 | "library(transformeR) # version v2.1.0\n", 46 | "library(downscaleR) # version v3.3.2\n", 47 | "library(visualizeR) # version v1.6.0\n", 48 | "library(downscaleR.keras) # version v1.0.0 that build on Keras version 2.3.0 and tensorflow version 2.2.0 \n", 49 | "library(climate4R.value) # version v0.0.2 and relies on VALUE version v2.2.2\n", 50 | "library(climate4R.UDG) # version v0.2.2\n", 51 | "\n", 52 | "# Other useful libraries\n", 53 | "library(magrittr) # to operate with '%>%' or '%<>%'\n", 54 | "\n", 55 | "# For visualization purposes\n", 56 | "library(RColorBrewer)\n", 57 | "library(gridExtra)\n", 58 | "library(ggplot2)" 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": {}, 64 | "source": [ 65 | "The predictions and models generated along the notebook are saved in a well-organized set of directories. Please use the `dir.create` function to create two new folders (*Data* and *models*) in your working directory. Within each of these folders, create subsequently two more subfolders, named *temperature* and *precip*. Finally, we create also the *figures* directory directly in the working directory." 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": 3, 71 | "metadata": {}, 72 | "outputs": [], 73 | "source": [ 74 | "## Uncomment to create directories\n", 75 | "# dir.create(\"./Data\")\n", 76 | "# dir.create(\"./Data/precip\")\n", 77 | "# dir.create(\"./Data/temperature\")\n", 78 | "# dir.create(\"./models\")\n", 79 | "# dir.create(\"./models/precip\")\n", 80 | "# dir.create(\"./models/temperature\")\n", 81 | "# dir.create(\"./figures\")" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": {}, 87 | "source": [ 88 | "We are now ready to load into our `R` environment all the data we are going to work with, which can be freely accessed through the [Climate Data Service](http://meteo.unican.es/cds) developed by the [Santander Met Group](http://meteo.unican.es) (non registered users need to register first [here](http://meteo.unican.es/udg-tap/signup)). Use the `loginUDG` function to log into the service with your own credentials." 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": 4, 94 | "metadata": {}, 95 | "outputs": [], 96 | "source": [ 97 | "loginUDG(username = \"youruser\", password = \"yourpassword\") # login into the Santander CDS" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "The following Table lists the [Santander Climate Data Service (CDS)](http://www.meteo.unican.es/en/cds) endpoint of the datasets used in this study (except for E-OBS whose observational records can be found in their [website](https://www.ecad.eu/download/ensembles/download.php)). For a wide variety of datasets, C4R uses labels to point to these endpoints. The available labels can be displayed by typing `UDG.datasets()` into an R terminal. Throughout the notebook we lean on the corresponding labels to load the data into our R environment.\n", 105 | "\n", 106 | "| Dataset | CDS endpoint: |\t\n", 107 | "|---|---|\n", 108 | "|ERA-Interim|https://data.meteo.unican.es/tds5/catalog/catalogs/interim/interim_DM_predictors.html?dataset=interim/daily/interim20_daily.ncml|\n", 109 | "|CMIP5|https://data.meteo.unican.es/tds5/catalog/catalogs/cmip5/cmip5Datasets.html|\n", 110 | "|CORDEX|https://data.meteo.unican.es/thredds/catalog/devel/c3s34d/catalog.html|\n", 111 | "|DeepESD|https://data.meteo.unican.es/thredds/catalog/esgcet/catalog.html|\n", 112 | "\n", 113 | "\n", 114 | "\n", 115 | "\n" 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "The following block of code allows for loading the ERA-Interim predictor variables, which are needed to train our neural networks, for the period 1979-2005 by using the `loadGridData` function. Subsequently, the `makeMultiGrid` creates a unique C4R object containing all this information." 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": null, 128 | "metadata": {}, 129 | "outputs": [], 130 | "source": [ 131 | "# Predictor variables considered (see -*-)\n", 132 | "vars <- c(\"psl\",\"z@500\",\"z@700\",\"z@850\", \n", 133 | " \"hus@500\",\"hus@700\",\"hus@850\",\n", 134 | " \"ta@500\",\"ta@700\",\"ta@850\",\n", 135 | " \"ua@500\",\"ua@700\",\"ua@850\",\n", 136 | " \"va@500\",\"va@700\",\"va@850\")\n", 137 | "# We loop over the variables and then use makeMultiGrid, to bind the variables in a single C4R object\n", 138 | "x <- lapply(vars, function(z) {\n", 139 | " loadGridData(dataset = \"ECMWF_ERA-Interim-ESD\", # \"ECMWF_ERA-Interim-ESD\" is the label that identifies the dataset of interest in the Santander CDS\n", 140 | " var = z,\n", 141 | " lonLim = c(-8,34), # domain of interest for the predictors\n", 142 | " latLim = c(34,76), # domain of interest for the predictors\n", 143 | " years = 1979:2005)\n", 144 | "}) %>% makeMultiGrid()" 145 | ] 146 | }, 147 | { 148 | "cell_type": "markdown", 149 | "metadata": {}, 150 | "source": [ 151 | "As predictands we use temperature and precipitation from E-OBS, which can be obtained as netCDF files [here](https://www.ecad.eu/download/ensembles/download.php). Once downloaded, these data can be imported in `R` with the `loadGridData` function. Subsequently, we upscale these E-OBS fields from their native 0.25º to the 0.5º regular grid our projections are delivered by using the `interpGrid` from `transformeR`. In the cell below, we illustrate how to load both precipitation and temperature with `loadGridData`, however, note that you should only load one at a time, or change the object name `y` to e.g., `y_rr` or `y_tg`." 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": 11, 157 | "metadata": {}, 158 | "outputs": [], 159 | "source": [ 160 | "grid05 = list(\"x\" = c(-9.75,30.25),\"y\" = c(34.25,74.25)) # boundaries of our projections' domain\n", 161 | "attr(grid05,\"resX\") <- attr(grid05,\"resY\") <- 0.5 # target spatial resolution for our projections\n", 162 | "\n", 163 | "## Please load only one predictand variable (either temperature or precipitation) \n", 164 | "## or give a different name to each variable (e.g. 'y_rr' and 'y_tg') ----------------------------------\n", 165 | "\n", 166 | "# To load E-OBS precipitation (previously downloaded as netCDF file)\n", 167 | "y <- loadGridData(dataset = \"rr_ens_mean_0.25deg_reg_v20.0e.nc\",\n", 168 | " var = \"rr\",\n", 169 | " lonLim = c(-10,30),\n", 170 | " latLim = c(34,74), \n", 171 | " years = 1979:2005) %>% interpGrid(new.coordinates = grid05, method = \"bilinear\")\n", 172 | "\n", 173 | "# To load E-OBS temperature (previously downloaded as netCDF file)\n", 174 | "y <- loadGridData(dataset = \"tg_ens_mean_0.25deg_reg_v20.0e.nc\",\n", 175 | " var = \"tg\",\n", 176 | " lonLim = c(-10,30),\n", 177 | " latLim = c(34,74), \n", 178 | " years = 1979:2005) %>% interpGrid(new.coordinates = grid05, method = \"bilinear\")" 179 | ] 180 | }, 181 | { 182 | "cell_type": "markdown", 183 | "metadata": {}, 184 | "source": [ 185 | "We recommend the user to save the predictor `x` and predictand `y` data into `.rda` objects since these loading steps can be quite time-consuming." 186 | ] 187 | }, 188 | { 189 | "cell_type": "markdown", 190 | "metadata": {}, 191 | "source": [ 192 | "**Convolutional Neural Networks (CNNs)**\n", 193 | "\n", 194 | "To build *DeepESD* we rely on the convolutional neural networks (CNN) presented in [Baño-Medina et al. 2020](https://gmd.copernicus.org/articles/13/2109/2020/); in particular, on the CNN -*- models, which were found to provide robust results for precipitation (temperature) both in ''perfect-prognosis'' conditions but also in the GCM space. The cell below shows how to build these CNN models based on `Keras`, and save them in a custom function called `modelCNN`. Note that precipitation and temperature CNN models are different, so please (un)comment the needed lines depending on your particular target variable of interest. \n", 195 | "\n", 196 | "**Note:** We refer the reader to [Baño-Medina et al. 2020](https://gmd.copernicus.org/articles/13/2109/2020/) for further details about the exact configuration of the CNNs used herein." 197 | ] 198 | }, 199 | { 200 | "cell_type": "code", 201 | "execution_count": null, 202 | "metadata": {}, 203 | "outputs": [], 204 | "source": [ 205 | "## Please select one ----------------------------------------------------------------------------------------------------------------\n", 206 | "## ----------------------------------------------------------------------------------------------------------------------------------\n", 207 | "## See https://gmd.copernicus.org/articles/13/2109/2020 for technical details\n", 208 | "## ----------------------------------------------------------------------------------------------------------------------------------\n", 209 | "\n", 210 | "# Precipitation model\n", 211 | "modelCNN <- function(inp) {\n", 212 | " # Input layer\n", 213 | " inputs <- layer_input(shape = dim(inp$x.global)[2:4])\n", 214 | " # Hidden layers\n", 215 | " l1 = layer_conv_2d(inputs,filters = 50, kernel_size = c(3,3), activation = 'relu', padding = \"same\")\n", 216 | " l2 = layer_conv_2d(l1,filters = 25, kernel_size = c(3,3), activation = 'relu', padding = \"same\")\n", 217 | " l3 = layer_conv_2d(l2,filters = 1, kernel_size = c(3,3), activation = 'relu', padding = \"same\")\n", 218 | " l4 = layer_flatten(l3)\n", 219 | " # Output layer\n", 220 | " l51 = layer_dense(l4,units = dim(inp$y$Data)[2], activation = 'sigmoid') \n", 221 | " l52 = layer_dense(l4,units = dim(inp$y$Data)[2], activation = 'linear') \n", 222 | " l53 = layer_dense(l4,units = dim(inp$y$Data)[2], activation = 'linear') \n", 223 | " outputs <- layer_concatenate(list(l51,l52,l53)) \n", 224 | " model <- keras_model(inputs = inputs, outputs = outputs) \n", 225 | "}\n", 226 | "\n", 227 | "## ----------------------------------------------------------------------------------------------------------------------------------\n", 228 | "\n", 229 | "# Temperature model\n", 230 | "modelCNN <- function(inp) {\n", 231 | " # Input layer \n", 232 | " inputs <- layer_input(shape = dim(inp$x.global)[2:4])\n", 233 | " # Hidden layers \n", 234 | " l1 = layer_conv_2d(inputs,filters = 50, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 235 | " l2 = layer_conv_2d(l1,filters = 25, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n", 236 | " l3 = layer_conv_2d(l2,filters = 10, kernel_size = c(3,3), activation = 'relu', padding = \"valid\") \n", 237 | " l4 = layer_flatten(l3)\n", 238 | " # Output layer \n", 239 | " l51 = layer_dense(l4,units = dim(inp$y$Data)[2], activation = 'linear') \n", 240 | " l52 = layer_dense(l4,units = dim(inp$y$Data)[2], activation = 'linear') \n", 241 | " outputs <- layer_concatenate(list(l51,l52)) \n", 242 | " model <- keras_model(inputs = inputs, outputs = outputs) \n", 243 | "}\n", 244 | "\n", 245 | "## ----------------------------------------------------------------------------------------------------------------------------------" 246 | ] 247 | }, 248 | { 249 | "cell_type": "markdown", 250 | "metadata": {}, 251 | "source": [ 252 | "## 2. DeepESD\n", 253 | "In this section we 1) load the predictor variables of interest from the 8 GCM considered from the Santander CDS, 2) harmonize and standardize these predictor fields, 3) save these processed fields in `rda` objects to avoid repeating these steps in the future, 4) build the CNN models based on ERA-Interim predictors and E-OBS predictands and 5) apply these models to the GCM predictor variables to obtain the final high-resolution (downscaled at 0.5º) projections up to 2100.\n", 254 | "\n", 255 | "### 2.1 Preparing the predictor datasets\n", 256 | "The `dh` and `df` objects below contain the labels that identify the 8 GCMs considered in this work in the Santander CDS, for the historical and RCP.8.5 scenario, respectively. These lables are used when calling the `loadGridData` function for data loading." 257 | ] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": 3, 262 | "metadata": {}, 263 | "outputs": [], 264 | "source": [ 265 | "## Use UDG.datasets() to obtain the labels of the desired GCMs\n", 266 | "\n", 267 | "# Historical scenario\n", 268 | "dh <- c(\"CMIP5-subset_CanESM2_r1i1p1_historical\",\n", 269 | " \"CMIP5-subset_CNRM-CM5_r1i1p1_historical\",\n", 270 | " \"CMIP5-subset_MPI-ESM-MR_r1i1p1_historical\",\n", 271 | " \"CMIP5-subset_MPI-ESM-LR_r1i1p1_historical\",\n", 272 | " \"CMIP5-subset_NorESM1-M_r1i1p1_historical\", \n", 273 | " \"CMIP5-subset_GFDL-ESM2M_r1i1p1_historical\",\n", 274 | " \"CMIP5-subset_EC-EARTH_r12i1p1_historical\",\n", 275 | " \"CMIP5-subset_IPSL-CM5A-MR_r1i1p1_historical\")\n", 276 | "\n", 277 | "# RCP8.5 scenario\n", 278 | "df <- c(\"CMIP5-subset_CanESM2_r1i1p1_rcp85\",\n", 279 | " \"CMIP5-subset_CNRM-CM5_r1i1p1_rcp85\", \n", 280 | " \"CMIP5-subset_MPI-ESM-MR_r1i1p1_rcp85\",\n", 281 | " \"CMIP5-subset_MPI-ESM-LR_r1i1p1_rcp85\",\n", 282 | " \"CMIP5-subset_NorESM1-M_r1i1p1_rcp85\",\n", 283 | " \"CMIP5-subset_GFDL-ESM2M_r1i1p1_rcp85\",\n", 284 | " \"CMIP5-subset_EC-EARTH_r12i1p1_rcp85\",\n", 285 | " \"CMIP5-subset_IPSL-CM5A-MR_r1i1p1_rcp85\")\n", 286 | "\n", 287 | "# Labels used to identify the 8 GCMs\n", 288 | "dName <- c(\"CanESM2\",\"CNRM-CM5\",\"MPI-ESM-MR\",\"MPI-ESM-LR\",\"NorESM1\",\"GFDL\",\"EC-Earth\",\"IPSL\")" 289 | ] 290 | }, 291 | { 292 | "cell_type": "markdown", 293 | "metadata": {}, 294 | "source": [ 295 | "The following loop allows us to load the predictors from the above GCMs over our target domain for the reference (1975-2005) and future (early-future: 2006-2040, mid-future: 2041-2070, far-future: 2071-2100) periods of interest. Note that the historical (RCP8.5) scenario is used for the reference (future) periods. Note also that, within the loop, all the GCMs are interpolated to the spatial resolution of the ERA-Interim predictors which were used to fit the CNNs. Once loaded, the GCM predictors are saved as `.rda` files." 296 | ] 297 | }, 298 | { 299 | "cell_type": "code", 300 | "execution_count": null, 301 | "metadata": {}, 302 | "outputs": [], 303 | "source": [ 304 | "lapply(c(\"h\",\"ef\",\"mf\",\"ff\"), FUN = function(sc) {\n", 305 | " if (sc == \"h\") years <- 1975:2005 # historical period of interest\n", 306 | " if (sc == \"ef\") years <- 2006:2040 # early-future\n", 307 | " if (sc == \"mf\") years <- 2041:2070 # mid-future\n", 308 | " if (sc == \"ff\") years <- 2071:2100 # far-future\n", 309 | " if (sc == \"h\"){d <- dh} else {d <- df}\n", 310 | " \n", 311 | " # We loop over the GCMs\n", 312 | " lapply(1:length(d), FUN = function(zz) { \n", 313 | " x <- lapply(vars, function(z) loadGridData(dataset = d[zz],var = z,\n", 314 | " lonLim = c(-8,34),latLim = c(34,76),\n", 315 | " years = years) %>% interpGrid(new.coordinates = getGrid(x))) %>% makeMultiGrid() \n", 316 | " \n", 317 | " # Since the IPSL contains NA values in the 850hPa level at certain gridpoints, we replace these NA values with the numeric values of their closest neighbours.\n", 318 | " if (dName[zz] == \"IPSL\") {\n", 319 | " ind850 <- grepl(\"@850\",x$Variable$varName,fixed = TRUE) %>% which()\n", 320 | " indGP <- apply(x$Data[ind850[1],1,,,], MARGIN = c(2,3), anyNA) %>% which(arr.ind = TRUE)\n", 321 | " for (i in 1:nrow(indGP)) {\n", 322 | " indTime <- is.na(x$Data[ind850[1],1,,indGP[i,1],indGP[i,2]]) %>% which()\n", 323 | " x$Data[ind850,1,indTime,indGP[i,1],indGP[i,2]] <- x$Data[ind850,1,indTime,indGP[i,1],indGP[i,2]-1]\n", 324 | " }\n", 325 | " }\n", 326 | " \n", 327 | " # We save the predictor fields as .rda files \n", 328 | " if (sc == \"h\") {\n", 329 | " xh <- x\n", 330 | " save(xh, file = paste0(\"./Data/xh_\",dName[zz],\".rda\"))\n", 331 | " rm(x,xh)\n", 332 | " } else {\n", 333 | " xf <- x\n", 334 | " save(xf, file = paste0(\"./Data/x\",sc,\"_\",dName[zz],\".rda\"))\n", 335 | " rm(x,xf)\n", 336 | " } \n", 337 | " }) \n", 338 | "}) " 339 | ] 340 | }, 341 | { 342 | "cell_type": "markdown", 343 | "metadata": {}, 344 | "source": [ 345 | "The following loop allows us to harmonize and standardize the GCM predictors loaded in the previous step to assure they reasonable resemble the ERA-Interim variables used to train the CNN models (note this is one of the key assumptions that are done in ''perfect-prognosis'' downscaling). For this harmonization+standardization step, which is different depending on the particular scenario of interest (the reader is referred again to [Baño-Medina et al. 2020](https://gmd.copernicus.org/articles/13/2109/2020/) for details about this process), the `scaleGrid` function from `transformeR` is used. The so-processed GCM predictors, wich will be used as inputs to the CNN models, are saved as `rda` files. " 346 | ] 347 | }, 348 | { 349 | "cell_type": "code", 350 | "execution_count": null, 351 | "metadata": {}, 352 | "outputs": [], 353 | "source": [ 354 | "# We loop over the GCMs\n", 355 | "lapply(1:length(dName), FUN = function(zz) {\n", 356 | " load(paste0(\"./Data/xh_\",dName[zz],\".rda\")) \n", 357 | " # We loop over the temporal periods \n", 358 | " lapply(c(\"h\",\"ef\",\"mf\",\"ff\"), FUN = function(sc) {\n", 359 | " \n", 360 | " # We harmonize and standardize the historical scenario \n", 361 | " if (sc == \"h\") {\n", 362 | " xh <- scaleGrid(xh, base = subsetGrid(xh, years = 1979:2005), ref = x, type = \"standardize\", spatial.frame = \"gridbox\", time.frame = \"monthly\") # harmonization \n", 363 | " xn <- scaleGrid(xh, base = subsetGrid(xh, years = 1979:2005), type = \"standardize\") # standardization\n", 364 | " \n", 365 | " # We harmonize and standardize the RCP8.5 scenario \n", 366 | " } else {\n", 367 | " load(paste0(\"./Data/x\",sc,\"_\",dName[zz],\".rda\")) \n", 368 | " xf <- scaleGrid(xf, base = subsetGrid(xh, years = 1979:2005), ref = x, type = \"standardize\", spatial.frame = \"gridbox\", time.frame = \"monthly\") # harmonization \n", 369 | " xh <- scaleGrid(xh, base = subsetGrid(xh, years = 1979:2005), ref = x, type = \"standardize\", spatial.frame = \"gridbox\", time.frame = \"monthly\") # harmonization \n", 370 | " xn <- xf %>% scaleGrid(base = subsetGrid(xh, years = 1979:2005), type = \"standardize\") # standardization \n", 371 | " }\n", 372 | " \n", 373 | " # We save the standardized predictor fields as `rda` objects \n", 374 | " save(xn, file = paste0(\"./Data/xn\",sc,\"_\",dName[zz],\".rda\")) \n", 375 | " })\n", 376 | "})" 377 | ] 378 | }, 379 | { 380 | "cell_type": "markdown", 381 | "metadata": {}, 382 | "source": [ 383 | "### 2.2 Precipitation downscaling\n", 384 | "This section shows how to fit the CNN model which links the large-scale predictors from ERA-Interim with the high-resolution E-OBS precipitation at surface. The steps to take would be the following:\n", 385 | "- Prepare the predictor and predictand tensors with the `prepareData.keras` function from `downscaleR.keras`.\n", 386 | "- Standardize the ERA-Interim predictors with the `scaleGrid` function from `transformeR`. \n", 387 | "- For a better fit of the Gamma distribution, 0.99 is substracted from observed precipitation and negative values are ignored (note that this step implies that rainy days are defined as those receiving 1 or more mm of precipitation). To do this, the `gridArithmetics` and `binaryGrid` functions from `transformeR` are used.\n", 388 | "- Train the CNN model encapsuled in the `modelCNN` (which has been previously defined) with the `downscaleTrain.keras` function from `downscaleR.keras`. To optimize the negative log-likelihood of the Bernoulli-Gamma distribution, we employ the custom loss function `bernouilliGammaLoss` from `downscaleR.keras`. The network is fitted using the adam optimizer and a learning rate of 1e-4. Early-stopping with a patience of 30 epochs is applied and the best model (epoch) is saved in the working directory as a `.h5` file." 389 | ] 390 | }, 391 | { 392 | "cell_type": "code", 393 | "execution_count": null, 394 | "metadata": {}, 395 | "outputs": [], 396 | "source": [ 397 | "# NOTE: Running this cell takes about 1 hour \n", 398 | "\n", 399 | "# Preparing predictor and predictand data for downscaling with downscaleR.keras\n", 400 | "xyT <- prepareData.keras(x = scaleGrid(x,type = \"standardize\"), y = binaryGrid(gridArithmetics(y,0.99,operator = \"-\"),condition = \"GE\",threshold = 0,partial = TRUE),\n", 401 | " first.connection = \"conv\",last.connection = \"dense\",channels = \"last\") \n", 402 | "\n", 403 | "# Training the CNN model to downscale precipitation\n", 404 | "downscaleTrain.keras(obj = xyT,model = modelCNN(xyT),clear.session = TRUE,\n", 405 | " compile.args = list(\"loss\" = bernouilliGammaLoss(last.connection = \"dense\"),\"optimizer\" = optimizer_adam(lr = 0.0001)),\n", 406 | " fit.args = list(\"batch_size\" = 100,\"epochs\" = 10000,\"validation_split\" = 0.1,\"verbose\" = 1,\n", 407 | " \"callbacks\" = list(callback_early_stopping(patience = 30),callback_model_checkpoint(filepath='./models/precip/CNN1.h5',monitor='val_loss', save_best_only=TRUE))))\n", 408 | "\n" 409 | ] 410 | }, 411 | { 412 | "cell_type": "markdown", 413 | "metadata": {}, 414 | "source": [ 415 | "Once the model is trained, we use it to predict in both the train (training period using ERA-Interim variables) and the GCM spaces. As per the former, we are interested in the estimation of the parameter `p` (probability of rain), since it is needed to later adjust the frequency of rain in the high-resolution projections obtained from the GCM (see the manuscript for details). To compute `p` in the train period the following is done:\n", 416 | "- Prepare the predictors which will serve as inputs for the CNN model with the `prepareNewData.keras` function. Subsequently, use them to predict in the train set with the `downscalePredict.keras` function. The `model` argument indicates the path where the CNN model was previously stored, and `C4Rtemplate` is a C4R object used as template for the predictions which provides de proper metadata. Since `downscalePredict.keras` outputs 3 parameters (the probability of rain, `p`, and the logarithmic of the shape and scale parameters of the Gamma distribution, `log_alpha` and `log_beta`), the `subsetGrid` is applied in order to keep only `p`." 417 | ] 418 | }, 419 | { 420 | "cell_type": "code", 421 | "execution_count": null, 422 | "metadata": {}, 423 | "outputs": [], 424 | "source": [ 425 | "# Preparing predictor data\n", 426 | "xyt <- prepareNewData.keras(scaleGrid(x,type = \"standardize\"), xyT) \n", 427 | "pred_ocu_train <- downscalePredict.keras(newdata = xyt,C4R.template = y,clear.session = TRUE,loss = \"bernouilliGammaLoss\",\n", 428 | " model = list(\"filepath\" = './models/precip/CNN1.h5',\"custom_objects\" = c(\"custom_loss\" = bernouilliGammaLoss(last.connection = \"dense\")))) %>% subsetGrid(var = \"p\") \n", 429 | "rm(xyt) # to save memory" 430 | ] 431 | }, 432 | { 433 | "cell_type": "markdown", 434 | "metadata": {}, 435 | "source": [ 436 | "At this point, the trained CNN model is used to generate the high-resolution projections building on the 8 GCMs considered in this work. To do so, we perform a loop over the distinct GCMs in which the corresponding predictors (which had been previously saved) are loaded and conveniently transformed using the `prepareNewData.keras` function. Finally, the `log_alpha`, `log_beta` and `p` parameters, which are obtained with the `downscalePredict.keras` function are saved in the `pred` object.\n", 437 | "- On the one hand, `log_alpha` and `log_beta` are used to obtain the rainfall amount with the `computeRainfall` function from `downscaleR.keras`. The argument `simulate` allows us for specifying if either a stochastic or a deterministic outcome is wanted. The argument `bias` is used to re-center the Gamma distribution to 1mm/day. \n", 438 | "- On the other hand, we use the `p` parameter to derive the binary event occurrence/non occurrence through the `bynaryGrid` function.\n", 439 | "- Finally, both series (binary and continuous) are multiplied to produce the complete precipitation time series.\n", 440 | "\n", 441 | "In the following block of code we compute the (deterministic) frequency and (stochastic) amount of rain. The generated projections are saved in `.nc` format with the `grid2nc` function.\n", 442 | "\n", 443 | "***Note:*** If a purely deterministic, or a purely stochastic version of the projections is wanted, please uncomment the corresponding lines." 444 | ] 445 | }, 446 | { 447 | "cell_type": "code", 448 | "execution_count": null, 449 | "metadata": {}, 450 | "outputs": [], 451 | "source": [ 452 | "# NOTE: Running this cell takes about 4 hours\n", 453 | "\n", 454 | "# We loop over the GCMs\n", 455 | "lapply(1:length(dName), FUN = function(zz) {\n", 456 | " \n", 457 | " # We loop over the temporal periods \n", 458 | " lapply(c(\"ef\",\"mf\"), FUN = function(sc) {\n", 459 | " load(paste0(\"./Data/xn2\",sc,\"_\",dName[zz],\".rda\"))\n", 460 | " xyt <- prepareNewData.keras(xn,xyT) \n", 461 | " pred <- downscalePredict.keras(newdata = xyt,C4R.template = y,clear.session = TRUE,loss = \"bernouilliGammaLoss\",\n", 462 | " model = list(\"filepath\" = './models/precip/CNN1.h5',\"custom_objects\" = c(\"custom_loss\" = bernouilliGammaLoss(last.connection = \"dense\")))) \n", 463 | " \n", 464 | " ## Frequency (deterministic) and amount of rain (stochastic) ------------------------------------------\n", 465 | " pred2 <- computeRainfall(log_alpha = subsetGrid(pred,var = \"log_alpha\"),log_beta = subsetGrid(pred,var = \"log_beta\"),bias = 1,simulate = TRUE) %>% gridArithmetics(binaryGrid(subsetGrid(pred,var = \"p\"),ref.obs = binaryGrid(y,threshold = 1, condition = \"GE\"),ref.pred = pred_ocu_train)) \n", 466 | " grid2nc(pred2,NetCDFOutFile = paste0(\"./Data/precip/CNN2_\",sc,\"_\",dName[zz],\".nc4\")) \n", 467 | " }) \n", 468 | "})" 469 | ] 470 | }, 471 | { 472 | "cell_type": "markdown", 473 | "metadata": {}, 474 | "source": [ 475 | "### 2.3 Temperature downscaling\n", 476 | "This section shows how to train the CNN model which links the large-scale predictors from ERA-Interim with the high-resolution E-OBS temperature at surface. As for precipitation, the steps to take would be the following:\n", 477 | "\n", 478 | "- Prepare the predictor and predictand tensors with the `prepareData.keras` function from `downscaleR.keras`.\n", 479 | "- Standardize the ERA-Interim predictors with the `scaleGrid` function from `transformeR`. \n", 480 | "- Train the CNN model encapsuled in the `modelCNN` (which has been previously defined) with the `downscaleTrain.keras` function from `downscaleR.keras`. To optimize the negative log-likelihood of the Gaussian distribution, we use the custom loss function `GaussianLoss` from `downscaleR.keras`. Whe network is fitted using the adam optimizer and a learning rate of 1e-4. Early-stopping with a patience of 30 epochs is applied and the best model (epoch) is saved in the working directory as a `.h5` file." 481 | ] 482 | }, 483 | { 484 | "cell_type": "code", 485 | "execution_count": null, 486 | "metadata": {}, 487 | "outputs": [], 488 | "source": [ 489 | "# NOTE: Running this cell takes about 1 hour\n", 490 | "\n", 491 | "# Preparing predictor and predictand data for downscaling with downscaleR.keras\n", 492 | "xyT <- prepareData.keras(x = scaleGrid(x,type = \"standardize\"),y = y,\n", 493 | " first.connection = \"conv\",last.connection = \"dense\",channels = \"last\") \n", 494 | "\n", 495 | "# Training the CNN model to downscale temperature\n", 496 | "downscaleTrain.keras(obj = xyT,model = modelCNN(xyT),clear.session = TRUE,\n", 497 | " compile.args = list(\"loss\" = gaussianLoss(last.connection = \"dense\"),\"optimizer\" = optimizer_adam(lr = 0.0001)),\n", 498 | " fit.args = list(\"batch_size\" = 100,\"epochs\" = 10000,\"validation_split\" = 0.1,\"verbose\" = 1,\n", 499 | " \"callbacks\" = list(callback_early_stopping(patience = 30),callback_model_checkpoint(filepath='./models/temperature/CNN10.h5',monitor='val_loss', save_best_only=TRUE))))" 500 | ] 501 | }, 502 | { 503 | "cell_type": "markdown", 504 | "metadata": {}, 505 | "source": [ 506 | "Once trained, the CNN model is used to generate the high-resolution projections of temperature. Similarly as for precipitation, we apply for this task the `prepareNewData.keras` and `downscalePredict.keras` functions, saving the `mean` and `log_var` parameters in the `pred` object. The code below allows for obtaining deterministic projections (note that only the `mean` parameter is needed in this case) and saving them in `.nc` format with the `grid2nc` function.\n", 507 | "\n", 508 | "***Note:*** If stochastic projections are wanted, please uncomment the corresponding lines.\n", 509 | "he projections as `.nc`using function `grid2nc`." 510 | ] 511 | }, 512 | { 513 | "cell_type": "code", 514 | "execution_count": null, 515 | "metadata": {}, 516 | "outputs": [], 517 | "source": [ 518 | "# NOTE: Running this cell takes about 1 hour and 20 minutes\n", 519 | "\n", 520 | "lapply(1:length(dName), FUN = function(zz) {\n", 521 | " lapply(c(\"h\",\"ef\",\"mf\",\"ff\"), FUN = function(sc) {\n", 522 | " load(paste0(\"./Data/xn\",sc,\"_\",dName[zz],\".rda\"))\n", 523 | " xyt <- prepareNewData.keras(xn,xyT)\n", 524 | " pred <- downscalePredict.keras(newdata = xyt,C4R.template = y,clear.session = TRUE,loss = \"gaussianLoss\",\n", 525 | " model = list(\"filepath\" = './models/temperature/CNN10.h5',\"custom_objects\" = c(\"custom_loss\" = gaussianLoss(last.connection = \"dense\")))) \n", 526 | " \n", 527 | " ## Deterministic version \n", 528 | " pred <- subsetGrid(pred, var = \"mean\")\n", 529 | " pred$Variable$varName <- \"tas\"\n", 530 | " grid2nc(pred,NetCDFOutFile = paste0(\"./Data/temperature/CNN_\",sc,\"_\",dName[zz],\".nc4\")) \n", 531 | " \n", 532 | " k_clear_session() \n", 533 | " }) \n", 534 | "})" 535 | ] 536 | }, 537 | { 538 | "cell_type": "markdown", 539 | "metadata": {}, 540 | "source": [ 541 | "## 3 Dynamical climate models\n", 542 | "To assess the credibility of DeepESD, it is compared against two different ensembles of dynamical models (see the *Technical Validation* section), the first/second of them formed by Global/Regional Climate Models (GCMs/RCMs). Since DeepESD covers only land, we start by creating a 0.5º land-sea mask which will be later applied to eliminate sea points from both GCMs and RCMs." 543 | ] 544 | }, 545 | { 546 | "cell_type": "code", 547 | "execution_count": null, 548 | "metadata": {}, 549 | "outputs": [], 550 | "source": [ 551 | "mask <- gridArithmetics(subsetGrid(y,year = 1990),0) %>% gridArithmetics(1,operator = \"+\") %>% climatology()\n", 552 | "grid2nc(mask, NetCDFOutFile = \"./Data/mask.nc4\") " 553 | ] 554 | }, 555 | { 556 | "cell_type": "markdown", 557 | "metadata": {}, 558 | "source": [ 559 | "### 3.1 Ensemble of Global Climate Models (GCMs)" 560 | ] 561 | }, 562 | { 563 | "cell_type": "markdown", 564 | "metadata": {}, 565 | "source": [ 566 | "We perform a loop over the temporal periods of interest (1975-2005 for the historical scenario plus 2006-2040, 2041-2070 and 2071-2100 for RCP8.5) and save the GCM ensemble as netCDF files (`grid2nc` function) in a multi-member C4R object. All GCMs are interpolated to our target 0.5º resolution (E-OBS grid), using conservative remapping. To do this interpolation, we rely on the `cdo` library and use function `system` to invoke the OS command. Please note that you can install the `cdo` library with conda, by typing `conda install cdo` in a terminal. Finally, sea points are removed by applying the land-sea mask we have previously created." 567 | ] 568 | }, 569 | { 570 | "cell_type": "code", 571 | "execution_count": null, 572 | "metadata": {}, 573 | "outputs": [], 574 | "source": [ 575 | "## Please select one: -----------------------------------------------------------------------\n", 576 | "# Parameter Setting for precipitation\n", 577 | "variable <- \"precip\"\n", 578 | "var <- \"pr\"\n", 579 | "# Parameter Setting for temperature\n", 580 | "variable <- \"temperature\"\n", 581 | "var <- \"tas\"\n", 582 | "## --------------------------------------------------------------------------------------------\n", 583 | "\n", 584 | "# We loop over the temporal periods of interest\n", 585 | "lapply(c(\"h\",\"ef\",\"mf\",\"ff\"), FUN = function(z) {\n", 586 | " if (z == \"h\") years <- 1975:2005 # historical\n", 587 | " if (z == \"ef\") years <- 2006:2040 # RCP8.5\n", 588 | " if (z == \"mf\") years <- 2041:2070 # RCP8.5\n", 589 | " if (z == \"ff\") years <- 2071:2100 # RCP8.5\n", 590 | " if (z == \"h\") {d <- dh} else {d <- df} \n", 591 | " \n", 592 | " # We loop over the GCM labels \n", 593 | " lapply(1:length(d), FUN = function(zzz) { \n", 594 | " # Load the data and interpolate to the target resolution with interpGrid \n", 595 | " yy <- loadGridData(dataset = d[zzz],var = var,years = years,lonLim = c(-10,30),latLim = c(34,74))\n", 596 | " grid2nc(yy, NetCDFOutFile = \"./aux.nc4\") # we save the GCM cropped to the European domain, and save as .nc in an auxiliary variable \n", 597 | " system(paste0(\"cdo remapcon,\", \"./Data/mask.nc4\", \" \", \"./aux.nc4\", \" \", \"./aux2.nc4\")) # We use system function to call the cdo library and interpolate the grid using conservative remapping\n", 598 | " yy <- loadGridData(\"./aux2.nc4\", var = var) # we load the interpolated GCM field\n", 599 | " file.remove(c(\"./aux.nc4\",\"./aux2.nc4\")) \n", 600 | " # Apply land-sea mask \n", 601 | " yy <- lapply(1:getShape(yy,\"time\"), FUN = function(z) gridArithmetics(subsetDimension(yy, dimension = \"time\", indices = z),mask)) %>% bindGrid(dimension = \"time\") \n", 602 | " # Save the GCM members \n", 603 | " grid2nc(yy,NetCDFOutFile = paste0(\"./Data/\",variable,\"/y_\",z,\"_\",dName[zzz],\".nc4\"))\n", 604 | " }) \n", 605 | "})" 606 | ] 607 | }, 608 | { 609 | "cell_type": "markdown", 610 | "metadata": {}, 611 | "source": [ 612 | "### 3.2 Ensemble of Regional Climate Models (RCMs) \n", 613 | "In this section we form an ensemble of EURO-CORDEX RCMs which can be easily loaded from the Santander CDS by using the appropiate labels (see the block below)." 614 | ] 615 | }, 616 | { 617 | "cell_type": "code", 618 | "execution_count": null, 619 | "metadata": {}, 620 | "outputs": [], 621 | "source": [ 622 | "# Labels for the historical scenario\n", 623 | "dh <- c(\"CORDEX-EUR-44_CCCma-CanESM2_historical_r1i1p1_SMHI-RCA4_v1\",\n", 624 | " \"CORDEX-EUR-44_CNRM-CERFACS-CNRM-CM5_historical_r1i1p1_ETH-CLMcom-CCLM5-0-6_v1\",\n", 625 | " \"CORDEX-EUR-44_CNRM-CERFACS-CNRM-CM5_historical_r1i1p1_SMHI-RCA4_v1\",\n", 626 | " \"CORDEX-EUR-44_MPI-M-MPI-ESM-LR_historical_r1i1p1_CLMcom-CCLM4-8-17_v1\",\n", 627 | " \"CORDEX-EUR-44_MPI-M-MPI-ESM-LR_historical_r1i1p1_MPI-CSC-REMO2009_v1\",\n", 628 | " \"CORDEX-EUR-44_NCC-NorESM1-M_historical_r1i1p1_SMHI-RCA4_v1\",\n", 629 | " \"CORDEX-EUR-44_NOAA-GFDL-GFDL-ESM2M_historical_r1i1p1_SMHI-RCA4_v1\",\n", 630 | " \"CORDEX-EUR-44_ICHEC-EC-EARTH_historical_r12i1p1_SMHI-RCA4_v1\",\n", 631 | " \"CORDEX-EUR-44_ICHEC-EC-EARTH_historical_r12i1p1_ETH-CLMcom-CCLM5-0-6_v1\",\n", 632 | " \"CORDEX-EUR-44_IPSL-IPSL-CM5A-MR_historical_r1i1p1_SMHI-RCA4_v1\",\n", 633 | " \"CORDEX-EUR-44_IPSL-IPSL-CM5A-MR_historical_r1i1p1_IPSL-INERIS-WRF331F_v1\")\n", 634 | "# Labels for the RCP8.5 scenario\n", 635 | "df <- c(\"CORDEX-EUR-44_CCCma-CanESM2_rcp85_r1i1p1_SMHI-RCA4_v1\",\n", 636 | " \"CORDEX-EUR-44_CNRM-CERFACS-CNRM-CM5_rcp85_r1i1p1_ETH-CLMcom-CCLM5-0-6_v1\",\n", 637 | " \"CORDEX-EUR-44_CNRM-CERFACS-CNRM-CM5_rcp85_r1i1p1_SMHI-RCA4_v1\",\n", 638 | " \"CORDEX-EUR-44_MPI-M-MPI-ESM-LR_rcp85_r1i1p1_CLMcom-CCLM4-8-17_v1\",\n", 639 | " \"CORDEX-EUR-44_MPI-M-MPI-ESM-LR_rcp85_r1i1p1_MPI-CSC-REMO2009_v1\",\n", 640 | " \"CORDEX-EUR-44_NCC-NorESM1-M_rcp85_r1i1p1_SMHI-RCA4_v1\",\n", 641 | " \"CORDEX-EUR-44_NOAA-GFDL-GFDL-ESM2M_rcp85_r1i1p1_SMHI-RCA4_v1\",\n", 642 | " \"CORDEX-EUR-44_ICHEC-EC-EARTH_rcp85_r12i1p1_SMHI-RCA4_v1\",\n", 643 | " \"CORDEX-EUR-44_ICHEC-EC-EARTH_rcp85_r12i1p1_ETH-CLMcom-CCLM5-0-6_v1\",\n", 644 | " \"CORDEX-EUR-44_IPSL-IPSL-CM5A-MR_rcp85_r1i1p1_SMHI-RCA4_v1\",\n", 645 | " \"CORDEX-EUR-44_IPSL-IPSL-CM5A-MR_rcp85_r1i1p1_IPSL-INERIS-WRF331F_v1\")" 646 | ] 647 | }, 648 | { 649 | "cell_type": "markdown", 650 | "metadata": {}, 651 | "source": [ 652 | "We perfrom a loop over the temporal periods of interest (1975-2005 for the historical scenario plus 2006-2040, 2041-2070 and 2071-2100 for RCP8.5) and save the RCM ensemble as netCDF files (`grid2nc` function) in a multi-member C4R object. All RCMs are interpolated to our target 0.5º resolution (E-OBS grid) and sea points are removed by applying the land-sea mask we have previously created." 653 | ] 654 | }, 655 | { 656 | "cell_type": "code", 657 | "execution_count": null, 658 | "metadata": {}, 659 | "outputs": [], 660 | "source": [ 661 | "## Please select one: -----------------------------------------------------------------------\n", 662 | "# Precipitation\n", 663 | "variable <- \"precip\"\n", 664 | "var <- \"pr\"\n", 665 | "## Temperature\n", 666 | "variable <- \"temperature\"\n", 667 | "var <- \"tas\"\n", 668 | "## --------------------------------------------------------------------------------------------\n", 669 | "\n", 670 | "# We loop over the temporal periods of interest\n", 671 | "lapply(c(\"h\",\"ef\",\"mf\",\"ff\"), FUN = function(z) {\n", 672 | " if (z == \"h\") years <- 1975:2005 # historical\n", 673 | " if (z == \"ef\") years <- 2006:2040 # RCP8.5\n", 674 | " if (z == \"mf\") years <- 2041:2070 # RCP8.5\n", 675 | " if (z == \"ff\") years <- 2071:2099 # RCP8.5\n", 676 | " if (z == \"h\") {d <- dh} else {d <- df}\n", 677 | " \n", 678 | " # We loop over the RCM labels \n", 679 | " yy <- lapply(1:length(d), FUN = function(zzz) { \n", 680 | " # Load the data and interpolate to the target resolution with interpGrid\n", 681 | " yy <- loadGridData(dataset = d[zzz],var = var,years = years,lonLim = c(-10,30),latLim = c(34,74)) %>% interpGrid(getGrid(y))\n", 682 | " # Apply land-sea mask\n", 683 | " yy <- lapply(1:getShape(yy,\"time\"), FUN = function(z) gridArithmetics(subsetDimension(yy, dimension = \"time\", indices = z),mask)) %>% bindGrid(dimension = \"time\") \n", 684 | " # Save the RCM members\n", 685 | " grid2nc(yy,NetCDFOutFile = paste0(\"./Data/\",variable,\"/yRCM_\",z,\"_member\",zzz,\".nc4\"))\n", 686 | " })\n", 687 | "})" 688 | ] 689 | }, 690 | { 691 | "cell_type": "markdown", 692 | "metadata": {}, 693 | "source": [ 694 | "## 4. Results\n", 695 | "This section provides the code needed to reproduce the figures presented in the manuscript. Note we mostly rely on the `visualizeR` package for plotting since it supports both spatial maps and temporal series.\n", 696 | "\n" 697 | ] 698 | }, 699 | { 700 | "cell_type": "markdown", 701 | "metadata": {}, 702 | "source": [ 703 | "### 4.1 Ensemble mean and bias with respect to E-OBS\n", 704 | "Figure 1 in the manuscript shows the climatology of the different ensembles built (GCM, RCM and DeepESD), along with the corresponding mean error (bias) with respect to the observed pattern in the historical period. We start thus by computing the climatology of the different contributing members forming each ensemble and saving them as netCDF files in the working directory." 705 | ] 706 | }, 707 | { 708 | "cell_type": "code", 709 | "execution_count": null, 710 | "metadata": {}, 711 | "outputs": [], 712 | "source": [ 713 | "## Please select one: -----------------------------------------------------------------------\n", 714 | "# Precipitation\n", 715 | "variable <- \"precip\"\n", 716 | "var <- \"pr\"\n", 717 | "# Temperature\n", 718 | "variable <- \"temperature\"\n", 719 | "var <- \"tas\"\n", 720 | "## --------------------------------------------------------------------------------------------\n", 721 | "\n", 722 | "dates <- list(start = \"2006-01-01 12:00:00 GMT\", end = \"2041-01-01 12:00:00 GMT\") # because a member of the RCM ensemble misses a value on the 01-Jan-2006, so to preserve temporal consistency in the ensemble we add this date to the ensemble mean metadata\n", 723 | "dName1 <- c(\"CanESM2\",\"CNRM-CM5\",\"MPI-ESM-MR\",\"MPI-ESM-LR\",\"NorESM1\",\"GFDL\",\"EC-Earth\",\"IPSL\")\n", 724 | "dName2 <- 1:11\n", 725 | "nn <- c(\"GCM\",\"RCM\",\"CNN\")\n", 726 | "lapply(1:length(nn), FUN = function(zz) {\n", 727 | " if (nn[zz] == \"RCM\") {dName <- dName2} else {dName <- dName1} \n", 728 | " lapply(c(\"h\",\"ef\",\"mf\",\"ff\"), FUN = function(z) { \n", 729 | " pred <- lapply(1:length(dName), FUN = function(zzz) { \n", 730 | " if (zz == 1) path <- paste0(\"./Data/\",variable,\"/y_\",z,\"_\",dName[zzz],\".nc4\") # GCM ensemble\n", 731 | " if (zz == 2) path <- paste0(\"./Data/\",variable,\"/yRCM_\",z,\"_member\",zzz,\".nc4\") # RCM ensemble \n", 732 | " if (zz == 3) path <- paste0(\"./Data/\",variable,\"/CNN_\",z,\"_\",dName[zzz],\".nc4\") # DeepESD ensemble \n", 733 | " grid <- loadGridData(dataset = path,var = var)\n", 734 | " grid <- valueIndex(grid, index.code = \"Mean\")$Index # computing the mean of each member \n", 735 | " if (z == \"ef\") grid$Dates <- dates \n", 736 | " return(grid) \n", 737 | " }) %>% bindGrid(dimension = \"member\") # bind the member means in a single C4R object along the `member` dimension\n", 738 | " \n", 739 | " pred$InitializationDates <- NULL\n", 740 | " # Saving the ensemble mean in netCDF format \n", 741 | " grid2nc(pred,NetCDFOutFile = paste0(\"./Data/\",variable,\"/\",nn[zz],\"_\",z,\"_ensemble.nc4\"))\n", 742 | " pred <- NULL \n", 743 | " })\n", 744 | "}) " 745 | ] 746 | }, 747 | { 748 | "cell_type": "markdown", 749 | "metadata": {}, 750 | "source": [ 751 | "Next, the mean climatology for each ensemble is obtained with the `aggregateGrid` function (note the aggregation is done along the member dimension) from `transformeR`. Afterwards, we can already use `spatialPlot` to plot the corresponding spatial pattern. The resulting figure is saved in `pdf` format in the path indicated in the `pdfOutput` object.\n", 752 | "\n", 753 | "***Note:*** Depending on the target variable of interest the user should comment/uncomment the appropriate lines at the beginning of the cell, which define the plotting parameters better suited for precipitation and temperature. This applies to the rest of the notebook from now on." 754 | ] 755 | }, 756 | { 757 | "cell_type": "code", 758 | "execution_count": null, 759 | "metadata": {}, 760 | "outputs": [], 761 | "source": [ 762 | "## Please select one: -----------------------------------------------------------------------\n", 763 | "# Precipitation\n", 764 | "cb <- c(\"#FFFFFF\",brewer.pal(n = 9, \"BuPu\"))\n", 765 | "cb <- cb %>% colorRampPalette()\n", 766 | "at <- seq(0,8,0.5) \n", 767 | "units <- \"mm/day\"\n", 768 | "pdfOutput <- \"./figures/ensembleMean_pr.pdf\" \n", 769 | "var <- \"precip\"\n", 770 | "# Temperature\n", 771 | "cb <- c(\"#FFFFFF\",brewer.pal(n = 9, \"OrRd\"))\n", 772 | "cb <- cb %>% colorRampPalette()\n", 773 | "at <- seq(-5, 20,2.5) \n", 774 | "units <- \"ºC\"\n", 775 | "pdfOutput <- \"./figures/ensembleMean_tas.pdf\" \n", 776 | "var <- \"temperature\"\n", 777 | "## --------------------------------------------------------------------------------------------\n", 778 | "\n", 779 | "nn <- c(\"GCM\",\"RCM\",\"CNN\")\n", 780 | "figs <- lapply(1:length(nn), FUN = function(z) {\n", 781 | " # We store in `grid` object the ensemble of climatologies \n", 782 | " grid <- loadGridData(paste0(\"./Data/\",var,\"/\",nn[z],\"_h_ensemble.nc4\"), var = \"Mean\")\n", 783 | " # Compute the ensemble mean \n", 784 | " gridMean <- aggregateGrid(grid,aggr.mem = list(FUN = \"mean\", na.rm = TRUE)) \n", 785 | " # We depict the ensemble mean with spatialPlot function \n", 786 | " spatialPlot(gridMean,\n", 787 | " backdrop.theme = \"coastline\",\n", 788 | " main = paste0(\"Ensemble Mean (\",units,\") - \",nn[z]),\n", 789 | " ylab = \"1975-2005\",\n", 790 | " col.regions = cb,\n", 791 | " at = at,\n", 792 | " set.min = at[1], set.max = at[length(at)])\n", 793 | "}) \n", 794 | "pdf(pdfOutput, width = 15, height = 10) \n", 795 | "grid.arrange(grobs = figs, ncol = 3) \n", 796 | "dev.off()" 797 | ] 798 | }, 799 | { 800 | "cell_type": "markdown", 801 | "metadata": {}, 802 | "source": [ 803 | "Now we plot the bias with respecto to the observed (i.e. E-OBS) climatology. Again, we rely on `spatialPlot` to depict the spatial fields, and `aggregateGrid` and `gridArithmetics`, to compute the ensemble mean and its bias, respectively. The resulting figures are saved in `pdf` format in the path indicated by `pdfOutput`." 804 | ] 805 | }, 806 | { 807 | "cell_type": "code", 808 | "execution_count": null, 809 | "metadata": {}, 810 | "outputs": [], 811 | "source": [ 812 | "## Please select one: -----------------------------------------------------------------------\n", 813 | "# Precipitation\n", 814 | "cb <- brewer.pal(n = 11, \"BrBG\")\n", 815 | "cb[6] <- \"#FFFFFF\"; cb <- cb %>% colorRampPalette()\n", 816 | "at <- c(seq(-2, -0.5,0.5),-0.25,0.25,seq(0.5, 2,0.5)) \n", 817 | "units <- \"mm/day\"\n", 818 | "pdfOutput <- \"./figures/bias_pr.pdf\" \n", 819 | "var <- \"precip\"\n", 820 | "# Temperature\n", 821 | "cb <- rev(brewer.pal(n = 11, \"RdBu\"))\n", 822 | "cb[6] <- \"#FFFFFF\"; cb <- cb %>% colorRampPalette()\n", 823 | "at <- c(seq(-2, -0.5,0.5),-0.25,0.25, seq(0.5,2,0.5)) \n", 824 | "units <- \"ºC\"\n", 825 | "pdfOutput <- \"./figures/bias_tas.pdf\"\n", 826 | "var <- \"temperature\"\n", 827 | "## --------------------------------------------------------------------------------------------\n", 828 | "\n", 829 | "nn <- c(\"GCM\",\"RCM\",\"CNN\")\n", 830 | "figs <- lapply(1:length(nn), FUN = function(z) {\n", 831 | " # Compute the ensemble mean \n", 832 | " grid <- loadGridData(paste0(\"./Data/\",var,\"/\",nn[z],\"_h_ensemble.nc4\"), var = \"Mean\") %>% aggregateGrid(aggr.mem = list(FUN = \"mean\", na.rm = TRUE))\n", 833 | " # Compute the bias with respect to the observed temporal climatology for the same period\n", 834 | " grid %<>% gridArithmetics(valueIndex(y, index.code = \"Mean\")$Index,operator = \"-\")\n", 835 | " # Depict the bias of the ensemble mean\n", 836 | " spatialPlot(grid,\n", 837 | " backdrop.theme = \"coastline\",\n", 838 | " main = paste0(\"Bias Ensemble Mean (\",units,\") - \",nn[z]),\n", 839 | " ylab = \"1975-2005\",\n", 840 | " col.regions = cb,\n", 841 | " at = at,\n", 842 | " set.min = at[1], set.max = at[length(at)]) \n", 843 | "}) \n", 844 | "pdf(pdfOutput, width = 15, height = 10) \n", 845 | "grid.arrange(grobs = figs, ncol = 3) \n", 846 | "dev.off()" 847 | ] 848 | }, 849 | { 850 | "cell_type": "markdown", 851 | "metadata": {}, 852 | "source": [ 853 | "### 4.2 Climate change signals\n", 854 | "To produce Figure 2 in the manuscript we perform a loop, for each ensemble (GCM, RCM and DeepESD), over the different RCP8.5 periods of interest and sequentially compute the difference between the future climatology and the historical one. These climate change signals are then averaged along the member dimension using the `aggregateGrid` function. The resulting figures are saved in `pdf` format in the path indicated by `pdfOutput`." 855 | ] 856 | }, 857 | { 858 | "cell_type": "code", 859 | "execution_count": null, 860 | "metadata": {}, 861 | "outputs": [], 862 | "source": [ 863 | "## Please select one: -----------------------------------------------------------------------\n", 864 | "# Precipitation\n", 865 | "cb <- brewer.pal(n = 11, \"BrBG\")\n", 866 | "cb[6] <- \"#FFFFFF\"; cb <- cb %>% colorRampPalette()\n", 867 | "at <- c(seq(-1, -0.25,0.25),-0.125,0.125,seq(0.25, 1,0.25)) \n", 868 | "pdfOutput <- \"./figures/deltas_pr.pdf\" \n", 869 | "var <- \"precip\"\n", 870 | "# Temperature\n", 871 | "cb <- c(\"#FFFFFF\",brewer.pal(n = 11, \"OrRd\"))\n", 872 | "at <- c(seq(0,4,0.5),5,6)\n", 873 | "pdfOutput <- \"./figures/deltas_tas.pdf\"\n", 874 | "var <- \"temperature\"\n", 875 | "## --------------------------------------------------------------------------------------------\n", 876 | "\n", 877 | "nn <- c(\"GCM\",\"RCM\",\"CNN\")\n", 878 | "figs <- lapply(c(\"ef\",\"mf\",\"ff\"), FUN = function(z) { \n", 879 | " lapply(1:length(nn), FUN = function(zz) {\n", 880 | " gridh <- loadGridData(paste0(\"./Data/\",var,\"/\",nn[zz],\"_h_ensemble.nc4\"), var = \"Mean\") \n", 881 | " gridf <- loadGridData(paste0(\"./Data/\",var,\"/\",nn[zz],\"_\",z,\"_ensemble.nc4\"), var = \"Mean\") \n", 882 | " grid <- gridArithmetics(gridf,gridh,operator = \"-\")\n", 883 | " gridMean <- aggregateGrid(grid,aggr.mem = list(FUN = \"mean\", na.rm = TRUE)) \n", 884 | "\n", 885 | " if (z == \"ef\") period <- c(\"2006-2040\")\n", 886 | " if (z == \"mf\") period <- c(\"2041-2070\")\n", 887 | " if (z == \"ff\") period <- c(\"2071-2100\")\n", 888 | " spatialPlot(gridMean,\n", 889 | " backdrop.theme = \"coastline\",\n", 890 | " main = paste(\"CC. signal wrt 1975-2005\"),\n", 891 | " ylab = period,\n", 892 | " col.regions = cb,\n", 893 | " at = at,\n", 894 | " set.min = at[1], set.max = at[length(at)]) \n", 895 | " }) \n", 896 | "}) %>% unlist(recursive = FALSE) \n", 897 | "pdf(pdfOutput, width = 15, height = 10) \n", 898 | "grid.arrange(grobs = figs, ncol = 3) \n", 899 | "dev.off()" 900 | ] 901 | }, 902 | { 903 | "cell_type": "markdown", 904 | "metadata": {}, 905 | "source": [ 906 | "### 4.3 Time-series\n", 907 | "The following block of code allows for plotting the time-series of the climate change signals. For precipitation (temperature), we perform a loop over the validation metrics of interest: R01, SDII (Mean). For details about these metrics please see the manuscript or type `show.indices()` in a new cell. At each iteration of the loop we define a `doCall.args` list which contains the `aggr.y` arguments needed for the `aggregateGrid` function (note the validation is done at an annual basis), which is finally passed t `do.call`. At the end of the loop, the resulting figures are saved in `pdf` format." 908 | ] 909 | }, 910 | { 911 | "cell_type": "code", 912 | "execution_count": null, 913 | "metadata": {}, 914 | "outputs": [], 915 | "source": [ 916 | "## Please select one: -----------------------------------------------------------------------\n", 917 | "# Precipitation\n", 918 | "indices <- c(\"R01\",\"SDII\")\n", 919 | "variable <- \"precip\"\n", 920 | "var <- \"pr\"\n", 921 | "# Temperature\n", 922 | "indices <- c(\"Mean\")\n", 923 | "variable <- \"temperature\"\n", 924 | "var <- \"tas\"\n", 925 | "\n", 926 | "## --------------------------------------------------------------------------------------------\n", 927 | "figs <- lapply(indices, FUN = function(zz) {\n", 928 | " doCall.args <- list() \n", 929 | " doCall.args[[\"aggr.y\"]] <- list()\n", 930 | " \n", 931 | " # The R01 do.call arguments \n", 932 | " if (zz == \"R01\") {\n", 933 | " doCall.args[[\"aggr.y\"]][[\"FUN\"]] <- \"index.freq\"\n", 934 | " doCall.args[[\"aggr.y\"]][[\"freq.type\"]] <- \"rel\"\n", 935 | " doCall.args[[\"aggr.y\"]][[\"condition\"]] <- \"GE\"\n", 936 | " doCall.args[[\"aggr.y\"]][[\"threshold\"]] <- 1\n", 937 | " ylim <- c(0.24,0.54) \n", 938 | " }\n", 939 | " # The SDII do.call arguments \n", 940 | " if (zz == \"SDII\"){\n", 941 | " doCall.args[[\"aggr.y\"]][[\"FUN\"]] <- \"index.meanGE\"\n", 942 | " doCall.args[[\"aggr.y\"]][[\"threshold\"]] <- 1\n", 943 | " ylim <- c(2,9) \n", 944 | " } \n", 945 | " # The Mean do.call arguments \n", 946 | " if (zz == \"Mean\"){\n", 947 | " doCall.args[[\"aggr.y\"]][[\"FUN\"]] <- \"mean\"\n", 948 | " doCall.args[[\"aggr.y\"]][[\"na.rm\"]] <- TRUE\n", 949 | " ylim <- c(0,18) \n", 950 | " } \n", 951 | " \n", 952 | " # We compute the index for the GCM ensemble. To do this, we loop over the temporal periods and then bind the serie along the time dimension with bindGrid function \n", 953 | " pred1 <- lapply(c(\"h\",\"ef\",\"mf\",\"ff\"), FUN = function(z) { \n", 954 | " lapply(1:length(dName), FUN = function(zzz) { \n", 955 | " doCall.args[[\"grid\"]] <- loadGridData(dataset = paste0(\"./Data/\",variable,\"/y_\",z,\"_\",dName[zzz],\".nc4\"),var = var)\n", 956 | " do.call(\"aggregateGrid\",doCall.args)\n", 957 | " }) %>% bindGrid(dimension = \"member\")\n", 958 | " }) %>% bindGrid(dimension = \"time\") \n", 959 | " \n", 960 | " # We compute the index for the RCM ensemble. To do this, we loop over the temporal periods and then bind the serie along the time dimension with bindGrid function\n", 961 | " pred3 <- lapply(c(\"h\",\"ef\",\"mf\",\"ff\"), FUN = function(z) { \n", 962 | " lapply(1:11, FUN = function(zzz) { \n", 963 | " doCall.args[[\"grid\"]] <- loadGridData(dataset = paste0(\"./Data/\",variable,\"/yRCM_\",z,\"_member\",zzz,\".nc4\"),var = var)\n", 964 | " do.call(\"aggregateGrid\",doCall.args)\n", 965 | " }) %>% bindGrid(dimension = \"member\")\n", 966 | " }) %>% bindGrid(dimension = \"time\")\n", 967 | " \n", 968 | " # We compute the index for the DeepESD ensemble. To do this, we loop over the temporal periods and then bind the serie along the time dimension with bindGrid function \n", 969 | " pred2 <- lapply(c(\"h\",\"ef\",\"mf\",\"ff\"), FUN = function(z) { \n", 970 | " lapply(1:length(dName), FUN = function(zzz) { \n", 971 | " doCall.args[[\"grid\"]] <- loadGridData(dataset = paste0(\"./Data/\",variable,\"/CNN_\",z,\"_\",dName[zzz],\".nc4\"),var = var)\n", 972 | " do.call(\"aggregateGrid\",doCall.args)\n", 973 | " }) %>% bindGrid(dimension = \"member\")\n", 974 | " }) %>% bindGrid(dimension = \"time\")\n", 975 | " \n", 976 | " # We compute the index for the observed temporal serie \n", 977 | " doCall.args[[\"grid\"]] <- y\n", 978 | " y <- do.call(\"aggregateGrid\",doCall.args)\n", 979 | " \n", 980 | " # We call temporalPlot to plot the times-series \n", 981 | " temporalPlot(\"OBS\" = y,\"GCM\" = pred1,\"RCM\" = pred3,\"CNN\" = pred2, cols = c(\"black\",\"red\",\"blue\",\"green\"),xyplot.custom = list(ylim = ylim)) \n", 982 | "})\n", 983 | "\n", 984 | "# Saving the resulting figures in .pdf format\n", 985 | "pdf(paste0(\"./figures/serie_\",var,\".pdf\"), width = 15, height = 4)\n", 986 | "grid.arrange(grobs = figs, ncol = 3) \n", 987 | "dev.off()" 988 | ] 989 | }, 990 | { 991 | "cell_type": "markdown", 992 | "metadata": {}, 993 | "source": [ 994 | "## Technical specifications\n", 995 | "Please note this notebook was run on a machine with the following technical specifications:\n", 996 | "\n", 997 | "- Operating system: Ubuntu 18.04.3 LTS (64 bits)\n", 998 | "- Memory: 60 GiB\n", 999 | "- Processor: 2x Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz (16 cores, 32 threads)" 1000 | ] 1001 | } 1002 | ], 1003 | "metadata": { 1004 | "kernelspec": { 1005 | "display_name": "R", 1006 | "language": "R", 1007 | "name": "ir" 1008 | }, 1009 | "language_info": { 1010 | "codemirror_mode": "r", 1011 | "file_extension": ".r", 1012 | "mimetype": "text/x-r-source", 1013 | "name": "R", 1014 | "pygments_lexer": "r", 1015 | "version": "3.6.1" 1016 | }, 1017 | "latex_envs": { 1018 | "LaTeX_envs_menu_present": true, 1019 | "autoclose": false, 1020 | "autocomplete": true, 1021 | "bibliofile": "biblio.bib", 1022 | "cite_by": "apalike", 1023 | "current_citInitial": 1, 1024 | "eqLabelWithNumbers": true, 1025 | "eqNumInitial": 1, 1026 | "hotkeys": { 1027 | "equation": "Ctrl-E", 1028 | "itemize": "Ctrl-I" 1029 | }, 1030 | "labels_anchors": false, 1031 | "latex_user_defs": false, 1032 | "report_style_numbering": false, 1033 | "user_envs_cfg": false 1034 | }, 1035 | "toc": { 1036 | "base_numbering": 1, 1037 | "nav_menu": {}, 1038 | "number_sections": true, 1039 | "sideBar": true, 1040 | "skip_h1_title": false, 1041 | "title_cell": "Table of Contents", 1042 | "title_sidebar": "Contents", 1043 | "toc_cell": false, 1044 | "toc_position": {}, 1045 | "toc_section_display": true, 1046 | "toc_window_display": false 1047 | } 1048 | }, 1049 | "nbformat": 4, 1050 | "nbformat_minor": 4 1051 | } 1052 | -------------------------------------------------------------------------------- /DeepDownscaling.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3461087.svg)](https://doi.org/10.5281/zenodo.3461087) 2 | 3 | # DeepDownscaling 4 | ### Deep learning approaches for statistical downscaling in climate 5 | 6 | Transparency and reproducibility are key ingredients to develop top-quality science. For this reason, this repository is aimed at hosting and maintaining updated versions of the code and notebooks needed to (partly or fully) reproduce the results of the papers developed in the Santander MetGroup dealing with the application of deep learning techniques for statistical dowscaling in climate. 7 | 8 | These works build on [climate4R](https://github.com/SantanderMetGroup/climate4R), a bundle of `R` packages developed for transparent climate data access, post processing (including bias correction and downscaling), visualization and model validation. A battery of Jupyter notebooks with worked examples explaining how to use the main functionalities of the core climate4R packages (including [downscaleR](https://github.com/SantanderMetGroup/downscaleR) for standard statistical downscaling methods) can be found at the [notebooks' repositoty](https://github.com/SantanderMetGroup/notebooks). 9 | For deep learning impplementations we use [`keras`](https://cran.r-project.org/web/packages/keras/index.html), an `R` library which provides an interface to [Keras](https://keras.io), a high-level neural networks API which supports arbitrary network architectures and is seamlessly integrated with [TensorFlow](https://www.tensorflow.org), and a wrapper of this package for the downscaleR package, [downscaleR.keras](https://github.com/SantanderMetGroup/downscaleR.keras). 10 | 11 | The table below lists the documents (Jupyter notebooks, scripts, etc.) contained in this respository along with the information of the corresponding published (or submitted) papers. 12 | 13 | | Notebook files | Article Title | Journal | Paper files 14 | |---|---|---|--- 15 | | 2022_Bano_GMD.ipynb | Downscaling Multi-Model Ensembles of Climate Change Projections with Deep Learning (DeepESD): Contribution to CORDEX EUR-44 | Geoscientific Model Development | - 16 | | 2020_Bano_CD.ipynb | On the suitability of deep convolutional neural networks for continental-wide downscaling of climate change projections | Climate Dynamics | https://doi.org/10.1007/s00382-021-05847-0 17 | | 2020_Bano_CI.ipynb | Understanding Deep Learning Decisions in Statistical Downscaling Models | International Conference Proceedings Series (ICPS) | https://doi.org/10.1145/3429309.3429321 2020_Bano_CI.pdf 18 | | 2020_Bano_GMD.ipynb 2020_Bano_GMD_FULL.ipynb | Configuration and Intercomparison of Deep Learning Neural Models for Statistical Downscaling | Geoscientific Model Development | https://doi.org/10.5194/gmd-2019-278 19 | | 2019_Bano_CI.ipynb | The Importance of Inductive Bias in Convolutional Models for Statistical Downscaling | Proceedings of the 9th International Workshop on Climate Informatics (CI2019) | http://dx.doi.org/10.5065/y82j-f154 2019_Bano_CI.pdf 20 | | 2018_Bano_CI.ipynb | Deep Convolutional Networks for Feature Selection in Statistical Downscaling | Proceedings of the 8th International Workshop on Climate Informatics (CI2018) | http://dx.doi.org/10.5065/D6BZ64XQ 2018_Bano_CI.pdf 21 | | | | | 22 | --------------------------------------------------------------------------------