├── air.toml ├── .github ├── .gitignore ├── ISSUE_TEMPLATE │ ├── config.yml │ ├── feature_request.yml │ ├── question.yml │ └── bug_report.yml ├── workflows │ ├── pkgdown.yaml │ └── R-CMD-check.yaml ├── CONTRIBUTING.md └── CODE_OF_CONDUCT.md ├── src ├── .gitignore ├── RcppExports.cpp └── metSimCpp.cpp ├── LICENSE ├── vignettes ├── articles │ ├── .gitignore │ └── deweather.Rmd └── .gitignore ├── data └── aqroadside.rda ├── man ├── figures │ ├── logo.png │ └── feature-banner.png ├── aqroadside.Rd ├── deweather-package.Rd ├── predict_dw.Rd ├── plot_dw_importance.Rd ├── append_dw_vars.Rd ├── getters-dw.Rd ├── simulate_dw_met.Rd ├── plot_dw_partial_1d.Rd ├── build_dw_model.Rd ├── plot_dw_partial_2d.Rd └── tune_dw_model.Rd ├── .vscode ├── extensions.json └── settings.json ├── pkgdown └── favicon │ ├── favicon.ico │ ├── favicon-96x96.png │ ├── apple-touch-icon.png │ ├── web-app-manifest-192x192.png │ ├── web-app-manifest-512x512.png │ ├── site.webmanifest │ └── favicon.svg ├── .gitignore ├── R ├── RcppExports.R ├── deweather-package.R ├── aqroadside.R ├── predict_dw.R ├── deweather-generics.R ├── plot_dw_importance.R ├── get_dw.R ├── append_dw_vars.R ├── build_dw_model.R ├── simulate_dw_met.R ├── tune_dw_model.R ├── plot_dw_partial_2d.R └── plot_dw_partial_1d.R ├── .Rbuildignore ├── tests ├── testthat.R └── testthat │ ├── test-append_dw_vars.R │ ├── test-tune_dw_model.R │ └── test-build_dw_model.R ├── deweather.Rproj ├── data-raw └── aqroadside.R ├── NAMESPACE ├── DESCRIPTION ├── _pkgdown.yml ├── NEWS.md ├── README.md └── LICENSE.md /air.toml: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /.github/.gitignore: -------------------------------------------------------------------------------- 1 | *.html 2 | -------------------------------------------------------------------------------- /src/.gitignore: -------------------------------------------------------------------------------- 1 | *.o 2 | *.so 3 | *.dll 4 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | YEAR: 2025 2 | COPYRIGHT HOLDER: deweather authors 3 | -------------------------------------------------------------------------------- /vignettes/articles/.gitignore: -------------------------------------------------------------------------------- 1 | *.html 2 | *.R 3 | *_files 4 | -------------------------------------------------------------------------------- /vignettes/.gitignore: -------------------------------------------------------------------------------- 1 | *.html 2 | *.R 3 | 4 | /.quarto/ 5 | *_files 6 | 7 | **/*.quarto_ipynb 8 | -------------------------------------------------------------------------------- /data/aqroadside.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openair-project/deweather/HEAD/data/aqroadside.rda -------------------------------------------------------------------------------- /man/figures/logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openair-project/deweather/HEAD/man/figures/logo.png -------------------------------------------------------------------------------- /.vscode/extensions.json: -------------------------------------------------------------------------------- 1 | { 2 | "recommendations": [ 3 | "Posit.air-vscode" 4 | ] 5 | } 6 | -------------------------------------------------------------------------------- /pkgdown/favicon/favicon.ico: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openair-project/deweather/HEAD/pkgdown/favicon/favicon.ico -------------------------------------------------------------------------------- /man/figures/feature-banner.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openair-project/deweather/HEAD/man/figures/feature-banner.png -------------------------------------------------------------------------------- /pkgdown/favicon/favicon-96x96.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openair-project/deweather/HEAD/pkgdown/favicon/favicon-96x96.png -------------------------------------------------------------------------------- /pkgdown/favicon/apple-touch-icon.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openair-project/deweather/HEAD/pkgdown/favicon/apple-touch-icon.png -------------------------------------------------------------------------------- /pkgdown/favicon/web-app-manifest-192x192.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openair-project/deweather/HEAD/pkgdown/favicon/web-app-manifest-192x192.png -------------------------------------------------------------------------------- /pkgdown/favicon/web-app-manifest-512x512.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openair-project/deweather/HEAD/pkgdown/favicon/web-app-manifest-512x512.png -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | docs 3 | test.R 4 | .Rhistory 5 | .Rdata 6 | .httr-oauth 7 | .DS_Store 8 | .quarto 9 | inst/doc 10 | /doc/ 11 | /Meta/ 12 | **/.quarto/ 13 | .RData 14 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/config.yml: -------------------------------------------------------------------------------- 1 | blank_issues_enabled: true 2 | contact_links: 3 | - name: openair book 4 | url: https://openair-project.github.io/book/ 5 | about: Before you submit your issue, check to see if your answer is already in the openair book! 6 | -------------------------------------------------------------------------------- /.vscode/settings.json: -------------------------------------------------------------------------------- 1 | { 2 | "[r]": { 3 | "editor.formatOnSave": true, 4 | "editor.defaultFormatter": "Posit.air-vscode" 5 | }, 6 | "[quarto]": { 7 | "editor.formatOnSave": true, 8 | "editor.defaultFormatter": "quarto.quarto" 9 | } 10 | } 11 | -------------------------------------------------------------------------------- /R/RcppExports.R: -------------------------------------------------------------------------------- 1 | # Generated by using Rcpp::compileAttributes() -> do not edit by hand 2 | # Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393 3 | 4 | get_constrained_indices_cpp <- function(doy, hod, day_win, hour_win) { 5 | .Call(`_deweather_get_constrained_indices_cpp`, doy, hod, day_win, hour_win) 6 | } 7 | 8 | -------------------------------------------------------------------------------- /.Rbuildignore: -------------------------------------------------------------------------------- 1 | ^deweather\.Rproj$ 2 | ^\.Rproj\.user$ 3 | ^LICENSE\.md$ 4 | ^_pkgdown\.yml$ 5 | ^docs$ 6 | ^pkgdown$ 7 | test.R 8 | ^doc$ 9 | ^Meta$ 10 | ^vignettes/articles$ 11 | ^vignettes/\.quarto$ 12 | ^vignettes/*_files$ 13 | ^vignettes/articles/\.quarto$ 14 | ^vignettes/articles/*_files$ 15 | ^data-raw$ 16 | ^\.github$ 17 | ^[.]?air[.]toml$ 18 | ^\.vscode$ 19 | -------------------------------------------------------------------------------- /R/deweather-package.R: -------------------------------------------------------------------------------- 1 | #' @keywords internal 2 | "_PACKAGE" 3 | 4 | ## usethis namespace: start 5 | #' @importFrom parsnip contr_one_hot 6 | #' @importFrom Rcpp sourceCpp 7 | #' @importFrom rlang %||% 8 | #' @importFrom rlang .data 9 | #' @importFrom rlang := 10 | #' @importFrom utils head 11 | #' @importFrom utils tail 12 | #' @useDynLib deweather, .registration = TRUE 13 | ## usethis namespace: end 14 | NULL 15 | -------------------------------------------------------------------------------- /tests/testthat.R: -------------------------------------------------------------------------------- 1 | # This file is part of the standard setup for testthat. 2 | # It is recommended that you do not modify it. 3 | # 4 | # Where should you do additional test configuration? 5 | # Learn more about the roles of various files in: 6 | # * https://r-pkgs.org/testing-design.html#sec-tests-files-overview 7 | # * https://testthat.r-lib.org/articles/special-files.html 8 | 9 | library(testthat) 10 | library(deweather) 11 | 12 | test_check("deweather") 13 | -------------------------------------------------------------------------------- /deweather.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | ProjectId: 3b1018c9-1201-4f35-aa0f-8ecd1b2ae411 3 | 4 | RestoreWorkspace: Default 5 | SaveWorkspace: Default 6 | AlwaysSaveHistory: Default 7 | 8 | EnableCodeIndexing: Yes 9 | UseSpacesForTab: Yes 10 | NumSpacesForTab: 2 11 | Encoding: UTF-8 12 | 13 | RnwWeave: knitr 14 | LaTeX: XeLaTeX 15 | 16 | BuildType: Package 17 | PackageUseDevtools: Yes 18 | PackageInstallArgs: --no-multiarch --with-keep.source 19 | PackageBuildArgs: --resave-data 20 | PackageRoxygenize: rd,collate,namespace 21 | -------------------------------------------------------------------------------- /pkgdown/favicon/site.webmanifest: -------------------------------------------------------------------------------- 1 | { 2 | "name": "", 3 | "short_name": "", 4 | "icons": [ 5 | { 6 | "src": "/web-app-manifest-192x192.png", 7 | "sizes": "192x192", 8 | "type": "image/png", 9 | "purpose": "maskable" 10 | }, 11 | { 12 | "src": "/web-app-manifest-512x512.png", 13 | "sizes": "512x512", 14 | "type": "image/png", 15 | "purpose": "maskable" 16 | } 17 | ], 18 | "theme_color": "#ffffff", 19 | "background_color": "#ffffff", 20 | "display": "standalone" 21 | } -------------------------------------------------------------------------------- /R/aqroadside.R: -------------------------------------------------------------------------------- 1 | #' Example air quality monitoring data for openair 2 | #' 3 | #' `aqroadside` represents a subset of long-running hourly data from the 4 | #' Marylebone Road AURN roadside monitoring station in the UK, bound with 5 | #' meteorological data from the nearby Heathrow airport met station. Five 6 | #' pollutants (NOx, NO2, Ethane, Isoprene and Benzene) and five meteorlogical 7 | #' variables (wind speed, wind direction, air temperature, relative humidity, 8 | #' and cloud height) are provided. 9 | #' 10 | #' @examples 11 | #' # basic structure 12 | #' head(aqroadside) 13 | "aqroadside" 14 | -------------------------------------------------------------------------------- /tests/testthat/test-append_dw_vars.R: -------------------------------------------------------------------------------- 1 | test_that("appending vars works", { 2 | vars <- c("hour", "weekday", "trend", "yday", "week", "month") 3 | 4 | appended <- append_dw_vars( 5 | aqroadside, 6 | vars = vars 7 | ) 8 | 9 | expect_true( 10 | all(vars %in% names(appended)) 11 | ) 12 | 13 | expect_s3_class(appended$weekday, "factor") 14 | expect_s3_class(appended$month, "factor") 15 | expect_type(appended$trend, "double") 16 | expect_type(appended$hour, "integer") 17 | expect_type(appended$yday, "integer") 18 | expect_type(appended$week, "integer") 19 | 20 | expect_error(append_dw_vars(aqroadside, .date = "DATETIME")) 21 | 22 | dummy <- aqroadside 23 | dummy$date <- as.character(dummy$date) 24 | expect_error(append_dw_vars(dummy)) 25 | }) 26 | -------------------------------------------------------------------------------- /data-raw/aqroadside.R: -------------------------------------------------------------------------------- 1 | ## code to prepare `aqroadside` dataset goes here 2 | 3 | library(openair) 4 | library(worldmet) 5 | library(dplyr) 6 | library(mirai) 7 | 8 | daemons(4) 9 | 10 | # import AQ data 11 | aqroadside <- importUKAQ( 12 | site = "my1", 13 | year = 2000:2016, 14 | hc = TRUE 15 | ) 16 | 17 | # import met data 18 | met <- importNOAA(year = 2000:2016, source = "fwf") 19 | 20 | # join together but ignore met data in aqroadside because it is modelled 21 | aqroadside <- 22 | left_join(select(aqroadside, -ws, -wd, -air_temp), met, by = "date") 23 | 24 | aqroadside <- select( 25 | aqroadside, 26 | date, 27 | nox, 28 | no2, 29 | ethane, 30 | isoprene, 31 | benzene, 32 | ws, 33 | wd, 34 | air_temp, 35 | rh = RH, 36 | cl 37 | ) 38 | 39 | aqroadside <- tibble(aqroadside) 40 | 41 | usethis::use_data(aqroadside, overwrite = TRUE) 42 | -------------------------------------------------------------------------------- /tests/testthat/test-tune_dw_model.R: -------------------------------------------------------------------------------- 1 | test_that("tuning works", { 2 | tunedata <- head(deweather::aqroadside, n = 100) 3 | 4 | expect_error(tune_dw_model(tunedata, "no2")) 5 | 6 | tuned <- 7 | with( 8 | mirai::daemons(4), 9 | tune_dw_model( 10 | tunedata, 11 | "no2", 12 | tree_depth = c(1, 5), 13 | trees = c(150, 250), 14 | grid_levels = 2 15 | ) 16 | ) 17 | 18 | expect_named(tuned) 19 | 20 | expect_equal(names(tuned), c("best_params", "final_fit")) 21 | 22 | expect_equal(names(tuned$best_params), c("trees", "tree_depth")) 23 | 24 | expect_equal(names(tuned$final_fit), c("predictions", "metrics", "plot")) 25 | 26 | expect_s3_class(tuned$final_fit$predictions, "data.frame") 27 | expect_s3_class(tuned$final_fit$metrics, "data.frame") 28 | expect_s3_class(tuned$final_fit$plot, "gg") 29 | }) 30 | -------------------------------------------------------------------------------- /NAMESPACE: -------------------------------------------------------------------------------- 1 | # Generated by roxygen2: do not edit by hand 2 | 3 | S3method(head,Deweather) 4 | S3method(plot,Deweather) 5 | S3method(print,Deweather) 6 | S3method(summary,Deweather) 7 | S3method(tail,Deweather) 8 | export(append_dw_vars) 9 | export(build_dw_model) 10 | export(get_dw_engine) 11 | export(get_dw_importance) 12 | export(get_dw_input_data) 13 | export(get_dw_model) 14 | export(get_dw_params) 15 | export(get_dw_pollutant) 16 | export(get_dw_vars) 17 | export(plot_dw_importance) 18 | export(plot_dw_partial_1d) 19 | export(plot_dw_partial_2d) 20 | export(predict_dw) 21 | export(simulate_dw_met) 22 | export(tune_dw_model) 23 | importFrom(Rcpp,sourceCpp) 24 | importFrom(parsnip,contr_one_hot) 25 | importFrom(rlang,"%||%") 26 | importFrom(rlang,":=") 27 | importFrom(rlang,.data) 28 | importFrom(utils,head) 29 | importFrom(utils,tail) 30 | useDynLib(deweather, .registration = TRUE) 31 | -------------------------------------------------------------------------------- /man/aqroadside.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/aqroadside.R 3 | \docType{data} 4 | \name{aqroadside} 5 | \alias{aqroadside} 6 | \title{Example air quality monitoring data for openair} 7 | \format{ 8 | An object of class \code{tbl_df} (inherits from \code{tbl}, \code{data.frame}) with 149040 rows and 11 columns. 9 | } 10 | \usage{ 11 | aqroadside 12 | } 13 | \description{ 14 | \code{aqroadside} represents a subset of long-running hourly data from the 15 | Marylebone Road AURN roadside monitoring station in the UK, bound with 16 | meteorological data from the nearby Heathrow airport met station. Five 17 | pollutants (NOx, NO2, Ethane, Isoprene and Benzene) and five meteorlogical 18 | variables (wind speed, wind direction, air temperature, relative humidity, 19 | and cloud height) are provided. 20 | } 21 | \examples{ 22 | # basic structure 23 | head(aqroadside) 24 | } 25 | \keyword{datasets} 26 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature_request.yml: -------------------------------------------------------------------------------- 1 | name: Feature request 2 | description: Make a case for a new feature 3 | labels: ["enhancement"] 4 | 5 | body: 6 | - type: markdown 7 | attributes: 8 | value: | 9 | Thank you for helping with the development of the openair project by requesting a new feature. 10 | 11 | Before you submit your issue, it may be useful to check through the [openair book](https://openair-project.github.io/book/) just to make sure your suggestion can't already be achieved! 12 | 13 | - type: textarea 14 | attributes: 15 | label: Feature request 16 | description: | 17 | Please provide a brief description of your feature request. 18 | placeholder: | 19 | If you would like to write pseudo-code, you can wrap it using the below syntax: 20 | ```r 21 | mySuggestedFunction() 22 | ``` 23 | 24 | - type: markdown 25 | attributes: 26 | value: "_Thank you for submitting this feature request!_" 27 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/question.yml: -------------------------------------------------------------------------------- 1 | name: Question 2 | description: Ask a question about openair plots or functions 3 | labels: ["question"] 4 | 5 | body: 6 | - type: markdown 7 | attributes: 8 | value: | 9 | This is a place to ask a question about packages in the openair project. This could be about how to properly use functions, interpret plots, or something else entirely. 10 | 11 | Before you submit your issue, it may be useful to check through the [openair book](https://openair-project.github.io/book/) just to make sure your question hasn't already been answered! 12 | 13 | - type: textarea 14 | attributes: 15 | label: Question 16 | description: | 17 | Please ask your question below. 18 | placeholder: | 19 | Please write your question here. When providing code, please contain it within code chunks using the below formatting: 20 | 21 | ```r 22 | # code goes here! 23 | openair::polarPlot(openair::mydata) 24 | ``` 25 | 26 | - type: markdown 27 | attributes: 28 | value: "_Thank you for your question!_" 29 | -------------------------------------------------------------------------------- /man/deweather-package.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/deweather-package.R 3 | \docType{package} 4 | \name{deweather-package} 5 | \alias{deweather} 6 | \alias{deweather-package} 7 | \title{deweather: Remove the influence of weather on air quality data} 8 | \description{ 9 | \if{html}{\figure{logo.png}{options: style='float: right' alt='logo' width='120'}} 10 | 11 | Model and account for (or remove) the effect of meteorology on atmospheric composition data. The technique uses boosted regression trees via the tidymodels framework. 12 | } 13 | \seealso{ 14 | Useful links: 15 | \itemize{ 16 | \item \url{https://openair-project.github.io/deweather/} 17 | \item \url{https://github.com/openair-project/deweather} 18 | \item Report bugs at \url{https://github.com/openair-project/deweather/issues} 19 | } 20 | 21 | } 22 | \author{ 23 | \strong{Maintainer}: David Carslaw \email{david.carslaw@york.ac.uk} (\href{https://orcid.org/0000-0003-0991-950X}{ORCID}) 24 | 25 | Authors: 26 | \itemize{ 27 | \item Jack Davison \email{jack.davison@ricardo.com} (\href{https://orcid.org/0000-0003-2653-6615}{ORCID}) 28 | } 29 | 30 | } 31 | \keyword{internal} 32 | -------------------------------------------------------------------------------- /man/predict_dw.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/predict_dw.R 3 | \name{predict_dw} 4 | \alias{predict_dw} 5 | \title{Use a deweather model to predict with a new dataset} 6 | \usage{ 7 | predict_dw( 8 | dw, 9 | newdata = deweather::get_dw_input_data(dw), 10 | name = deweather::get_dw_pollutant(dw), 11 | column_bind = FALSE 12 | ) 13 | } 14 | \arguments{ 15 | \item{dw}{A deweather model created with \code{\link[=build_dw_model]{build_dw_model()}}.} 16 | 17 | \item{newdata}{Data set to which to apply the model. If missing the data used 18 | to build the model in the first place will be used.} 19 | 20 | \item{name}{The name of the new column.} 21 | 22 | \item{column_bind}{If \code{TRUE}, this function will return \code{newdata} with an 23 | additional prediction column bound to it. If \code{FALSE}, return a 24 | single-column data frame.} 25 | } 26 | \value{ 27 | a \link[tibble:tibble-package]{tibble} 28 | } 29 | \description{ 30 | This function is a convenient wrapper around \code{\link[parsnip:predict.model_fit]{parsnip::predict.model_fit()}} 31 | to use a deweather model for prediction. This automatically extracts relevant 32 | parts of the deweather object and creates variables within \code{newdata} using 33 | \code{\link[=append_dw_vars]{append_dw_vars()}} if required. 34 | } 35 | \author{ 36 | Jack Davison 37 | } 38 | -------------------------------------------------------------------------------- /man/plot_dw_importance.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/plot_dw_importance.R 3 | \name{plot_dw_importance} 4 | \alias{plot_dw_importance} 5 | \title{Visualise deweather model feature importance} 6 | \usage{ 7 | plot_dw_importance(dw, aggregate_factors = FALSE, sort = TRUE, cols = "tol") 8 | } 9 | \arguments{ 10 | \item{dw}{A deweather model created with \code{\link[=build_dw_model]{build_dw_model()}}.} 11 | 12 | \item{aggregate_factors}{Defaults to \code{FALSE}. If \code{TRUE}, the importance of 13 | factor inputs (e.g., Weekday) will be summed into a single variable. This 14 | only applies to certain engines which report factor importance as 15 | disaggregate features.} 16 | 17 | \item{sort}{If \code{TRUE}, the default, features will be sorted by their 18 | importance. If \code{FALSE}, they will be sorted alphabetically. In 19 | \code{\link[=plot_dw_importance]{plot_dw_importance()}} this will change the ordering of the y-axis, whereas 20 | in \code{\link[=get_dw_importance]{get_dw_importance()}} it will change whether \code{var} is returned as a 21 | factor or character data type.} 22 | 23 | \item{cols}{Colours to use for plotting. See \code{\link[openair:openColours]{openair::openColours()}}.} 24 | } 25 | \value{ 26 | a \link[ggplot2:ggplot2-package]{ggplot2} figure 27 | } 28 | \description{ 29 | Visualise the feature importance (\% Gain for boosted tree models) for each 30 | variable of a deweather model, with some customisation. 31 | } 32 | -------------------------------------------------------------------------------- /.github/workflows/pkgdown.yaml: -------------------------------------------------------------------------------- 1 | # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples 2 | # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help 3 | on: 4 | push: 5 | branches: [main, master] 6 | pull_request: 7 | release: 8 | types: [published] 9 | workflow_dispatch: 10 | 11 | name: pkgdown.yaml 12 | 13 | permissions: read-all 14 | 15 | jobs: 16 | pkgdown: 17 | runs-on: ubuntu-latest 18 | # Only restrict concurrency for non-PR jobs 19 | concurrency: 20 | group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }} 21 | env: 22 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 23 | permissions: 24 | contents: write 25 | steps: 26 | - uses: actions/checkout@v4 27 | 28 | - uses: r-lib/actions/setup-pandoc@v2 29 | 30 | - uses: r-lib/actions/setup-r@v2 31 | with: 32 | use-public-rspm: true 33 | 34 | - uses: r-lib/actions/setup-r-dependencies@v2 35 | with: 36 | extra-packages: any::pkgdown, local::. 37 | needs: website 38 | 39 | - name: Build site 40 | run: pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE) 41 | shell: Rscript {0} 42 | 43 | - name: Deploy to GitHub pages 🚀 44 | if: github.event_name != 'pull_request' 45 | uses: JamesIves/github-pages-deploy-action@v4.5.0 46 | with: 47 | clean: false 48 | branch: gh-pages 49 | folder: docs 50 | -------------------------------------------------------------------------------- /src/RcppExports.cpp: -------------------------------------------------------------------------------- 1 | // Generated by using Rcpp::compileAttributes() -> do not edit by hand 2 | // Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393 3 | 4 | #include 5 | 6 | using namespace Rcpp; 7 | 8 | #ifdef RCPP_USE_GLOBAL_ROSTREAM 9 | Rcpp::Rostream& Rcpp::Rcout = Rcpp::Rcpp_cout_get(); 10 | Rcpp::Rostream& Rcpp::Rcerr = Rcpp::Rcpp_cerr_get(); 11 | #endif 12 | 13 | // get_constrained_indices_cpp 14 | IntegerVector get_constrained_indices_cpp(IntegerVector doy, IntegerVector hod, int day_win, int hour_win); 15 | RcppExport SEXP _deweather_get_constrained_indices_cpp(SEXP doySEXP, SEXP hodSEXP, SEXP day_winSEXP, SEXP hour_winSEXP) { 16 | BEGIN_RCPP 17 | Rcpp::RObject rcpp_result_gen; 18 | Rcpp::RNGScope rcpp_rngScope_gen; 19 | Rcpp::traits::input_parameter< IntegerVector >::type doy(doySEXP); 20 | Rcpp::traits::input_parameter< IntegerVector >::type hod(hodSEXP); 21 | Rcpp::traits::input_parameter< int >::type day_win(day_winSEXP); 22 | Rcpp::traits::input_parameter< int >::type hour_win(hour_winSEXP); 23 | rcpp_result_gen = Rcpp::wrap(get_constrained_indices_cpp(doy, hod, day_win, hour_win)); 24 | return rcpp_result_gen; 25 | END_RCPP 26 | } 27 | 28 | static const R_CallMethodDef CallEntries[] = { 29 | {"_deweather_get_constrained_indices_cpp", (DL_FUNC) &_deweather_get_constrained_indices_cpp, 4}, 30 | {NULL, NULL, 0} 31 | }; 32 | 33 | RcppExport void R_init_deweather(DllInfo *dll) { 34 | R_registerRoutines(dll, NULL, CallEntries, NULL, NULL); 35 | R_useDynamicSymbols(dll, FALSE); 36 | } 37 | -------------------------------------------------------------------------------- /.github/workflows/R-CMD-check.yaml: -------------------------------------------------------------------------------- 1 | # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples 2 | # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help 3 | on: 4 | push: 5 | branches: [main, master] 6 | pull_request: 7 | 8 | name: R-CMD-check.yaml 9 | 10 | permissions: read-all 11 | 12 | jobs: 13 | R-CMD-check: 14 | runs-on: ${{ matrix.config.os }} 15 | 16 | name: ${{ matrix.config.os }} (${{ matrix.config.r }}) 17 | 18 | strategy: 19 | fail-fast: false 20 | matrix: 21 | config: 22 | - {os: macos-latest, r: 'release'} 23 | - {os: windows-latest, r: 'release'} 24 | - {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'} 25 | - {os: ubuntu-latest, r: 'release'} 26 | - {os: ubuntu-latest, r: 'oldrel-1'} 27 | 28 | env: 29 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 30 | R_KEEP_PKG_SOURCE: yes 31 | 32 | steps: 33 | - uses: actions/checkout@v4 34 | 35 | - uses: r-lib/actions/setup-pandoc@v2 36 | 37 | - uses: r-lib/actions/setup-r@v2 38 | with: 39 | r-version: ${{ matrix.config.r }} 40 | http-user-agent: ${{ matrix.config.http-user-agent }} 41 | use-public-rspm: true 42 | 43 | - uses: r-lib/actions/setup-r-dependencies@v2 44 | with: 45 | extra-packages: any::rcmdcheck 46 | needs: check 47 | 48 | - uses: r-lib/actions/check-r-package@v2 49 | with: 50 | upload-snapshots: true 51 | build_args: 'c("--no-manual","--compact-vignettes=gs+qpdf")' 52 | -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Type: Package 2 | Package: deweather 3 | Title: Remove the influence of weather on air quality data 4 | Version: 1.0.0 5 | Authors@R: c( 6 | person("David", "Carslaw", , "david.carslaw@york.ac.uk", role = c("aut", "cre"), 7 | comment = c(ORCID = "0000-0003-0991-950X")), 8 | person("Jack", "Davison", , "jack.davison@ricardo.com", role = "aut", 9 | comment = c(ORCID = "0000-0003-2653-6615")) 10 | ) 11 | Description: Model and account for (or remove) the effect of meteorology 12 | on atmospheric composition data. The technique uses boosted regression 13 | trees via the tidymodels framework. 14 | License: MIT + file LICENSE 15 | URL: https://openair-project.github.io/deweather/, 16 | https://github.com/openair-project/deweather 17 | BugReports: https://github.com/openair-project/deweather/issues 18 | Depends: 19 | parsnip, 20 | R (>= 4.1.0) 21 | Imports: 22 | cli, 23 | DALEX, 24 | DALEXtra, 25 | dials, 26 | dplyr, 27 | ggplot2, 28 | ingredients, 29 | lubridate, 30 | mgcv, 31 | openair, 32 | patchwork, 33 | purrr, 34 | Rcpp, 35 | rlang, 36 | rsample, 37 | scales, 38 | stats, 39 | tidyr, 40 | tune, 41 | utils, 42 | vip, 43 | workflows 44 | Suggests: 45 | bonsai, 46 | carrier, 47 | knitr, 48 | lightgbm, 49 | mirai, 50 | quarto, 51 | rmarkdown, 52 | testthat (>= 3.0.0), 53 | xgboost 54 | LinkingTo: 55 | Rcpp 56 | VignetteBuilder: 57 | knitr, 58 | quarto 59 | Config/Needs/website: rmarkdown, openair-project/openairpkgdown 60 | Config/testthat/edition: 3 61 | Encoding: UTF-8 62 | Language: en-GB 63 | LazyData: true 64 | Roxygen: list(markdown = TRUE) 65 | RoxygenNote: 7.3.3 66 | -------------------------------------------------------------------------------- /man/append_dw_vars.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/append_dw_vars.R 3 | \name{append_dw_vars} 4 | \alias{append_dw_vars} 5 | \title{Conveniently append common 'deweathering' variables to an air quality time 6 | series} 7 | \usage{ 8 | append_dw_vars( 9 | data, 10 | vars = c("trend", "hour", "weekday", "weekend", "yday", "week", "month"), 11 | abbr = TRUE, 12 | ..., 13 | .date = "date" 14 | ) 15 | } 16 | \arguments{ 17 | \item{data}{An input \code{data.frame} with at least one date(time) column.} 18 | 19 | \item{vars}{A character vector of variables of interest. Possible options 20 | include: 21 | - \code{"trend"}: a numeric expression of the overall time series 22 | - \code{"hour"}: the hour of the day (0-23) 23 | - \code{"weekday"}: the day of the week (Sunday through Saturday) 24 | - \code{"weekend"}: whether it is a weekend (Saturday, Sunday) or weekday 25 | - \code{"yday"}: the day of the year 26 | - \code{"week"}: the week of the year 27 | - \code{"month"}: the month of the year} 28 | 29 | \item{abbr}{Abbreviate weekday and month strings? Defaults to \code{TRUE}, which 30 | tends to look better in plots.} 31 | 32 | \item{...}{Not used} 33 | 34 | \item{.date}{The name of the 'date' column to use for manipulation.} 35 | } 36 | \description{ 37 | This function conveniently manipulates a datetime ('POSIXct') column (by 38 | default named 'date') into a series of columns which are useful features in 39 | deweather models. Used internally by \code{\link[=build_dw_model]{build_dw_model()}} and 40 | \code{\link[=tune_dw_model]{tune_dw_model()}}, but can be used directly by users if desired. 41 | } 42 | \seealso{ 43 | \code{\link[openair:cutData]{openair::cutData()}} for more flexible time series data conditioning. 44 | } 45 | -------------------------------------------------------------------------------- /_pkgdown.yml: -------------------------------------------------------------------------------- 1 | url: https://openair-project.github.io/deweather/ 2 | template: 3 | package: openairpkgdown 4 | 5 | navbar: 6 | components: 7 | casestudy: 8 | text: Case Studies (External) 9 | menu: 10 | - text: (2019) Using meteorological normalisation to detect interventions in air quality time series 11 | href: https://www.sciencedirect.com/science/article/pii/S004896971834244X 12 | - text: (2012) A short-term intervention study — Impact of airport closure due to the eruption of Eyjafjallajökull on near-field air quality 13 | href: https://www.sciencedirect.com/science/article/abs/pii/S1352231012001355 14 | - text: (2009) Analysis of air pollution data at a mixed source location using boosted regression trees 15 | href: https://www.sciencedirect.com/science/article/abs/pii/S1352231009003069 16 | 17 | reference: 18 | - title: Data 19 | desc: > 20 | Example datasets included with the package, used to demonstrate and test 21 | deweathering functions. 22 | contents: aqroadside 23 | 24 | - title: Build 25 | desc: > 26 | Core functions for tuning and fitting deweathering models, including 27 | parameter tuning, model construction and adding derived variables. 28 | contents: 29 | - tune_dw_model 30 | - build_dw_model 31 | - append_dw_vars 32 | 33 | - title: Examine 34 | desc: > 35 | Methods to examine a deweathering model; currently 'getters' to extract 36 | specific features of a built model. 37 | contents: 38 | - get_dw_pollutant 39 | 40 | - title: Visualise 41 | desc: > 42 | Functions for visualizing model components and relationships, including 43 | variable importance and partial dependence plots. 44 | contents: 45 | - plot_dw_importance 46 | - plot_dw_partial_1d 47 | - plot_dw_partial_2d 48 | 49 | - title: Predict 50 | desc: > 51 | Functions to apply a deweathering model for prediction. 52 | contents: 53 | - predict_dw 54 | - simulate_dw_met 55 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/bug_report.yml: -------------------------------------------------------------------------------- 1 | name: Bug report 2 | description: Report an error or unexpected behaviour 3 | labels: ["bug"] 4 | 5 | body: 6 | - type: markdown 7 | attributes: 8 | value: | 9 | Thank you for helping with the development of the openair project by reporting a bug. 10 | 11 | If you haven't already, you might find it useful to read our [getting help guide](https://openair-project.github.io/book/sections/appendices/appendix-gethelp.html), which details how you can best help us help you! 12 | 13 | - type: textarea 14 | attributes: 15 | label: Bug description 16 | description: | 17 | Description of the bug. 18 | Please clearly explain the difference between what you would *expect* to happen, and what is *actually* happening. 19 | placeholder: Please describe the bug here. 20 | 21 | - type: textarea 22 | attributes: 23 | label: Steps to reproduce 24 | description: | 25 | Tell us how to reproduce this bug. 26 | Please include a [minimal, fully reproducible example](https://openair-project.github.io/book/sections/appendices/appendix-gethelp.html#sec-reprex), ideally including code. 27 | If you need to attach data, you can do so in a reply to your issue after it has been submitted. 28 | placeholder: | 29 | When providing code, please contain it within code chunks using the below formatting: 30 | 31 | ```r 32 | # code goes here! 33 | openair::polarPlot(openair::mydata) 34 | ``` 35 | 36 | - type: textarea 37 | attributes: 38 | label: Package version 39 | description: | 40 | Please provide the output of running `packageVersion("deweather")` in your console. 41 | placeholder: | 42 | Provide the output of: 43 | ```r 44 | packageVersion("deweather") 45 | ``` 46 | 47 | - type: markdown 48 | attributes: 49 | value: "_Thank you for submitting this bug report!_" 50 | 51 | -------------------------------------------------------------------------------- /R/predict_dw.R: -------------------------------------------------------------------------------- 1 | #' Use a deweather model to predict with a new dataset 2 | #' 3 | #' This function is a convenient wrapper around [parsnip::predict.model_fit()] 4 | #' to use a deweather model for prediction. This automatically extracts relevant 5 | #' parts of the deweather object and creates variables within `newdata` using 6 | #' [append_dw_vars()] if required. 7 | #' 8 | #' @param dw A deweather model created with [build_dw_model()]. 9 | #' 10 | #' @param newdata Data set to which to apply the model. If missing the data used 11 | #' to build the model in the first place will be used. 12 | #' 13 | #' @param name The name of the new column. 14 | #' 15 | #' @param column_bind If `TRUE`, this function will return `newdata` with an 16 | #' additional prediction column bound to it. If `FALSE`, return a 17 | #' single-column data frame. 18 | #' 19 | #' @export 20 | #' 21 | #' @return a [tibble][tibble::tibble-package] 22 | #' 23 | #' @author Jack Davison 24 | predict_dw <- function( 25 | dw, 26 | newdata = deweather::get_dw_input_data(dw), 27 | name = deweather::get_dw_pollutant(dw), 28 | column_bind = FALSE 29 | ) { 30 | check_deweather(dw) 31 | 32 | # get model components 33 | mod <- get_dw_model(dw) 34 | vars <- get_dw_vars(dw) 35 | 36 | # if any of the vars given aren't in data, they can be appended by the 37 | # append_dw_vars function 38 | if (any(!vars %in% names(newdata))) { 39 | vars_to_add <- vars[!vars %in% names(newdata)] 40 | newdata <- append_dw_vars(newdata, vars = vars_to_add, abbr = TRUE) 41 | } 42 | 43 | # don't allow overwriting columns 44 | if (name %in% names(newdata) && column_bind) { 45 | cli::cli_abort( 46 | "'{name}' already present in {.arg newdata}; change {.arg name} or set {.arg column_bind} to {FALSE}." 47 | ) 48 | } 49 | 50 | # predict 51 | prediction <- parsnip::predict.model_fit( 52 | mod, 53 | new_data = newdata, 54 | type = "numeric" 55 | ) |> 56 | stats::setNames(name) 57 | 58 | if (column_bind) { 59 | prediction <- dplyr::bind_cols(newdata, prediction) 60 | } 61 | 62 | return(prediction) 63 | } 64 | -------------------------------------------------------------------------------- /src/metSimCpp.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | using namespace Rcpp; 3 | 4 | // [[Rcpp::export]] 5 | IntegerVector get_constrained_indices_cpp(IntegerVector doy, IntegerVector hod, int day_win, int hour_win) { 6 | int n = doy.size(); 7 | int max_doy = 366; 8 | int max_hour = 24; 9 | 10 | // 1. Create the Grid (3D Vector) 11 | // grid[day][hour] = {list of indices} 12 | std::vector>> grid(max_doy + 1, std::vector>(max_hour)); 13 | 14 | for(int i = 0; i < n; ++i) { 15 | int d = doy[i]; 16 | int h = hod[i]; 17 | grid[d][h].push_back(i + 1); // Store 1-based index 18 | } 19 | 20 | // 2. Iterate through each observation 21 | IntegerVector result(n); 22 | 23 | for(int i = 0; i < n; ++i) { 24 | int current_d = doy[i]; 25 | int current_h = hod[i]; 26 | 27 | std::vector candidates; 28 | 29 | // Loop dynamically based on the provided window arguments 30 | for(int d_offset = -day_win; d_offset <= day_win; ++d_offset) { 31 | for(int h_offset = -hour_win; h_offset <= hour_win; ++h_offset) { 32 | 33 | // --- Handle Circular Wrapping --- 34 | 35 | // Day Wrap (1 to 366) 36 | int search_d = current_d + d_offset; 37 | // Handle multiple wraps if window > 366 (rare but safer logic) 38 | while (search_d > 366) search_d -= 366; 39 | while (search_d < 1) search_d += 366; 40 | 41 | // Hour Wrap (0 to 23) 42 | int search_h = current_h + h_offset; 43 | while (search_h >= 24) search_h -= 24; 44 | while (search_h < 0) search_h += 24; 45 | 46 | // --- Collect Indices --- 47 | const std::vector& bucket = grid[search_d][search_h]; 48 | if (!bucket.empty()) { 49 | candidates.insert(candidates.end(), bucket.begin(), bucket.end()); 50 | } 51 | } 52 | } 53 | 54 | // 3. Sample one randomly 55 | if (candidates.size() > 0) { 56 | int rand_pos = floor(R::runif(0, candidates.size())); 57 | result[i] = candidates[rand_pos]; 58 | } else { 59 | result[i] = NA_INTEGER; 60 | } 61 | } 62 | 63 | return result; 64 | } 65 | -------------------------------------------------------------------------------- /NEWS.md: -------------------------------------------------------------------------------- 1 | # deweather 1.0.0 2 | 3 | ## Major Version Changes 4 | 5 | Version 1.0.0 of deweather is a complete re-write of the `deweather` package. This new version: 6 | 7 | - Uses the `tidymodels` framework, allowing for more flexibility in plotting engines. `deweather` 1.0.0 launches with both `xgboost` and `lightgbm` engines available. 8 | 9 | - Provides much more flexible partial dependency calculations, including grouped PDs. 10 | 11 | - Uses the flexible `{mirai}` package to support parallelisation. 12 | 13 | - Uses a consistent function and object naming scheme for easier applications. 14 | 15 | The main reason prompting this change was the retirement of the `gbm` R package, and the slow development of `gbm3`. `xgboost` and `lightgbm` are modern, fast, popular, and highly capable implementations of gradient boosted machine learning. 16 | 17 | The original version of deweather (including its `NEWS.md`) is archived at for users still interested in the old API. 18 | 19 | ## New Features 20 | 21 | - Model building functions: 22 | 23 | - `build_dw_model()` fits a deweather model, used in the rest of the package. 24 | 25 | - `tune_dw_model()` allows for different modelling parameters to be tweaked and experimented with. 26 | 27 | - `append_dw_vars()` attaches a variety of modelling variables, and is used automatically within the above two functions. 28 | 29 | - The `get_dw_pollutant()` family allows for specific features of deweather models to be extracted consistently. 30 | 31 | - Visualisation functions: 32 | 33 | - `plot_dw_importance()` provides a quick plot of variable importance of a deweather model. 34 | 35 | - `plot_dw_partial_1d()` calculates and visualises partial dependencies of any subset of model variables. 36 | 37 | - `plot_dw_partial_2d()` calculates and visualises two-dimensional partial dependencies. 38 | 39 | - Modelling functions: 40 | 41 | - `predict_dw()` allows for the use of a deweather model for predictions. 42 | 43 | - `simualte_dw_met()` will simulate a timeseries in which selected meteorological variables are averaged, effectively helping 'remove' the influence of met variables. 44 | -------------------------------------------------------------------------------- /tests/testthat/test-build_dw_model.R: -------------------------------------------------------------------------------- 1 | test_that("boosted tree models work", { 2 | small_data <- head(aqroadside, n = 1000) 3 | 4 | model <- build_dw_model(small_data, "no2") 5 | 6 | expect_no_error(print(model)) 7 | expect_no_error(head(model)) 8 | expect_no_error(tail(model)) 9 | expect_no_error(plot(model)) 10 | expect_no_error(summary(model)) 11 | 12 | expect_equal(get_dw_engine(model), "xgboost") 13 | expect_equal(get_dw_pollutant(model), "no2") 14 | expect_equal( 15 | get_dw_vars(model), 16 | c("trend", "ws", "wd", "hour", "weekday", "air_temp") 17 | ) 18 | expect_equal( 19 | names(get_dw_input_data(model)), 20 | c("no2", "trend", "ws", "wd", "hour", "weekday", "air_temp") 21 | ) 22 | expect_equal( 23 | get_dw_params(model), 24 | list( 25 | tree_depth = 5, 26 | trees = 200L, 27 | learn_rate = 0.1, 28 | mtry = NULL, 29 | min_n = 10L, 30 | loss_reduction = 0, 31 | sample_size = 1L, 32 | stop_iter = 190L 33 | ) 34 | ) 35 | expect_equal(get_dw_params(model, "tree_depth"), 5) 36 | 37 | imp <- get_dw_importance(model) 38 | expect_type(imp$importance, "double") 39 | expect_s3_class(imp$var, "factor") 40 | expect_equal(nrow(imp), 12) 41 | 42 | imp2 <- get_dw_importance(model, aggregate_factors = TRUE) 43 | expect_equal(nrow(imp2), length(get_dw_vars(model))) 44 | 45 | imp3 <- get_dw_importance(model, sort = FALSE) 46 | expect_type(imp3$var, "character") 47 | 48 | expect_no_error(plot_dw_importance(model)) 49 | expect_no_error(plot_dw_importance(model, aggregate_factors = TRUE)) 50 | 51 | expect_s3_class(plot_dw_partial_1d(model, n = 10), "gg") 52 | expect_s3_class(plot_dw_partial_1d(model, "hour", n = 10), "gg") 53 | expect_s3_class(plot_dw_partial_1d(model, c("hour", "ws"), n = 10), "gg") 54 | 55 | expect_s3_class(plot_dw_partial_2d(model, "hour", "ws", n = 10), "gg") 56 | expect_s3_class( 57 | plot_dw_partial_2d(model, "hour", "ws", n = 10, contour = "lines"), 58 | "gg" 59 | ) 60 | expect_s3_class( 61 | plot_dw_partial_2d(model, "hour", "ws", n = 10, contour = "fill"), 62 | "gg" 63 | ) 64 | expect_s3_class( 65 | plot_dw_partial_2d(model, "hour", "ws", n = 10, show_conf_int = TRUE), 66 | "gg" 67 | ) 68 | }) 69 | -------------------------------------------------------------------------------- /man/getters-dw.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/get_dw.R 3 | \name{get_dw_pollutant} 4 | \alias{get_dw_pollutant} 5 | \alias{get_dw_vars} 6 | \alias{get_dw_params} 7 | \alias{get_dw_input_data} 8 | \alias{get_dw_model} 9 | \alias{get_dw_engine} 10 | \alias{get_dw_importance} 11 | \title{Getters for various deweather model features} 12 | \usage{ 13 | get_dw_pollutant(dw) 14 | 15 | get_dw_vars(dw) 16 | 17 | get_dw_params(dw, param = NULL) 18 | 19 | get_dw_input_data(dw) 20 | 21 | get_dw_model(dw) 22 | 23 | get_dw_engine(dw) 24 | 25 | get_dw_importance(dw, aggregate_factors = FALSE, sort = TRUE) 26 | } 27 | \arguments{ 28 | \item{dw}{A deweather model created with \code{\link[=build_dw_model]{build_dw_model()}}.} 29 | 30 | \item{param}{For \code{\link[=get_dw_params]{get_dw_params()}}. The default (\code{NULL}) returns a list of 31 | model parameters. \code{param} will return one specific parameter as a character 32 | vector.} 33 | 34 | \item{aggregate_factors}{Defaults to \code{FALSE}. If \code{TRUE}, the importance of 35 | factor inputs (e.g., Weekday) will be summed into a single variable. This 36 | only applies to certain engines which report factor importance as 37 | disaggregate features.} 38 | 39 | \item{sort}{If \code{TRUE}, the default, features will be sorted by their 40 | importance. If \code{FALSE}, they will be sorted alphabetically. In 41 | \code{\link[=plot_dw_importance]{plot_dw_importance()}} this will change the ordering of the y-axis, whereas 42 | in \code{\link[=get_dw_importance]{get_dw_importance()}} it will change whether \code{var} is returned as a 43 | factor or character data type.} 44 | } 45 | \value{ 46 | Typically a character vector, except: 47 | \itemize{ 48 | \item \code{\link[=get_dw_params]{get_dw_params()}}: a list, unless \code{param} is set. 49 | \item \code{\link[=get_dw_importance]{get_dw_importance()}}: a \code{data.frame} 50 | \item \code{\link[=get_dw_model]{get_dw_model()}}: A \link[parsnip:model_fit]{parsnip::model_fit} object 51 | } 52 | } 53 | \description{ 54 | \code{deweather} provides multiple 'getter' functions for extracting relevant 55 | model features from a deweather model. These are a useful convenience, 56 | particularly in conjunction with R's \link[=pipeOp]{pipe} operator (\verb{|>}). 57 | } 58 | \concept{Object 'Getter' Functions} 59 | -------------------------------------------------------------------------------- /R/deweather-generics.R: -------------------------------------------------------------------------------- 1 | # Deweather --------------------------------------------------------------- 2 | 3 | #' @method print Deweather 4 | #' @export 5 | #' @author Jack Davison 6 | print.Deweather <- function(x, ...) { 7 | labs <- 8 | get_dw_importance(x, aggregate_factors = TRUE, sort = TRUE) |> 9 | dplyr::arrange(dplyr::desc(.data$importance)) |> 10 | dplyr::mutate( 11 | importance = paste0(round(.data$importance * 100, 1), "%"), 12 | lab = paste0(.data$var, " (", .data$importance, ")") 13 | ) |> 14 | dplyr::pull("lab") 15 | 16 | str <- c( 17 | "*" = "A model for predicting {.strong {get_dw_pollutant(x)}} using {.field {labs}}." 18 | ) 19 | 20 | cli::cli_h1("Deweather Model") 21 | cli::cli_inform(str) 22 | 23 | cli::cli_h2("Model Parameters") 24 | 25 | params <- x$params 26 | 27 | cli::cli_ul() 28 | for (i in names(params)) { 29 | cli::cli_li("{.field {i}}: {params[i]}") 30 | } 31 | cli::cli_end() 32 | } 33 | 34 | #' @method plot Deweather 35 | #' @export 36 | plot.Deweather <- function(x, ...) { 37 | plot_dw_importance(x, ...) 38 | } 39 | 40 | #' @method summary Deweather 41 | #' @export 42 | summary.Deweather <- function(object, ...) { 43 | dw_map(object$data, summary, ...) 44 | } 45 | 46 | #' @method head Deweather 47 | #' @export 48 | head.Deweather <- function(x, ...) { 49 | dw_map(x$data, utils::head, ...) 50 | } 51 | 52 | #' @method tail Deweather 53 | #' @export 54 | tail.Deweather <- function(x, ...) { 55 | dw_map(x$data, utils::tail, ...) 56 | } 57 | 58 | # Utilities --------------------------------------------------------------- 59 | 60 | #' mapping helper to perform functions on each dataframe element of a DW model 61 | #' @noRd 62 | #' @author Jack Davison 63 | dw_map <- function(x, FUN, ...) { 64 | dat <- names(x) 65 | 66 | out <- list() 67 | for (i in dat) { 68 | args <- list(x[[i]], ...) 69 | proc <- do.call(FUN, args = args) 70 | cli::cli_par(id = i) 71 | cli::cli_inform(paste0("{.field $", i, "}")) 72 | print(proc) 73 | cli::cli_end(id = i) 74 | out <- append(out, list(proc)) 75 | } 76 | 77 | names(out) <- dat 78 | return(invisible(out)) 79 | } 80 | 81 | #' Check an input is a deweather model 82 | #' @noRd 83 | check_deweather <- function(dw) { 84 | if (!inherits(dw, "Deweather")) { 85 | cli::cli_abort( 86 | "{.arg dw} must be a 'Deweather' object created using {.fun deweather::build_dw_model}." 87 | ) 88 | } 89 | } 90 | -------------------------------------------------------------------------------- /.github/CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to deweather 2 | 3 | This outlines how to propose a change to `{deweather}`. 4 | 5 | ## Fixing typos 6 | 7 | You can fix typos, spelling mistakes, or grammatical errors in the documentation directly using the GitHub web interface, as long as the changes are made in the _source_ file. 8 | This generally means you'll need to edit [roxygen2 comments](https://roxygen2.r-lib.org/articles/roxygen2.html) in an `.R`, not a `.Rd` file. 9 | You can find the `.R` file that generates the `.Rd` by reading the comment in the first line. 10 | 11 | ## Bigger changes 12 | 13 | If you want to make a bigger change, it's a good idea to first file an issue and make sure someone from the team agrees that it’s needed. 14 | If you’ve found a bug, please file an issue that illustrates the bug with a minimal 15 | [reprex](https://www.tidyverse.org/help/#reprex). 16 | 17 | ### Pull request process 18 | 19 | * Fork the package and clone onto your computer. If you haven't done this before, we recommend using `usethis::create_from_github("openair-project/deweather", fork = TRUE)`. 20 | 21 | * Install all development dependencies with `devtools::install_dev_deps()`, and then make sure the package passes R CMD check by running `devtools::check()`. 22 | If R CMD check doesn't pass cleanly, it's a good idea to ask for help before continuing. 23 | * Create a Git branch for your pull request (PR). We recommend using `usethis::pr_init("brief-description-of-change")`. 24 | 25 | * Make your changes, commit to git, and then create a PR by running `usethis::pr_push()`, and following the prompts in your browser. 26 | The title of your PR should briefly describe the change. 27 | The body of your PR should contain `Fixes #issue-number`. 28 | 29 | * For user-facing changes, add a bullet to the top of `NEWS.md` (i.e. just below the first header). 30 | 31 | ### Code style 32 | 33 | * Care should be taken such that new code follows a style similar to the rest of the `{openair}` family. The most user-facing example of this is that exported functions should be written in "lowerCamelCase" (i.e., `polarPlot()` rather than `polar_plot()`). 34 | 35 | * We use [roxygen2](https://cran.r-project.org/package=roxygen2), with [Markdown syntax](https://cran.r-project.org/web/packages/roxygen2/vignettes/rd-formatting.html), for documentation. 36 | 37 | ## Code of Conduct 38 | 39 | Please note that the `{openair}` project is released with a 40 | [Contributor Code of Conduct](CODE_OF_CONDUCT.md). By contributing to this 41 | project you agree to abide by its terms. 42 | -------------------------------------------------------------------------------- /man/simulate_dw_met.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/simulate_dw_met.R 3 | \name{simulate_dw_met} 4 | \alias{simulate_dw_met} 5 | \title{Function to run random meteorological simulations on a deweather model} 6 | \usage{ 7 | simulate_dw_met( 8 | dw, 9 | newdata = deweather::get_dw_input_data(dw), 10 | vars = c("ws", "wd", "air_temp"), 11 | resampling = c("constrained", "all"), 12 | window_day = 30, 13 | window_hour = 2, 14 | n = 200, 15 | aggregate = TRUE, 16 | progress = rlang::is_interactive() 17 | ) 18 | } 19 | \arguments{ 20 | \item{dw}{A deweather model created with \code{\link[=build_dw_model]{build_dw_model()}}.} 21 | 22 | \item{newdata}{Data set to which to apply the model. If missing the data used 23 | to build the model in the first place will be used.} 24 | 25 | \item{vars}{The variables that should be randomly varied. Note that these 26 | should typically be meteorological variables (e.g., \code{"ws"}, \code{"wd"}, 27 | \code{"air_temp"}) and not temporal emission proxies (e.g., \code{"hour"}, 28 | \code{"weekday"}, \code{"week"}).} 29 | 30 | \item{resampling}{The resampling strategy. One of: 31 | \itemize{ 32 | \item \code{"constrained"} (default), meaning that only days of the year close to 33 | the target date are sampled. This option is used in conjunction with 34 | \code{window_day} and \code{window_hour}. For example, a \code{window_day} of \code{30} will 35 | sample +/-30 days of the date. 36 | \item \code{"all"}, meaning all dates are shuffled. 37 | } 38 | 39 | The argument for using constrained resampling is that it resamples 40 | conditions for a similar time of year and / or hour of the day to minimise 41 | the resampling of implausible conditions e.g. very warm temperatures during 42 | winter.} 43 | 44 | \item{window_day, window_hour}{The day of year (\code{window_day}) and hour of day 45 | (\code{window_hour}) windows to sample within when \code{resampling = "constrained"}. 46 | For example, \code{window_day = 30} samples within +/-30 days of any given date.} 47 | 48 | \item{n}{The number of simulations to use.} 49 | 50 | \item{aggregate}{By default, all of the simulations will be aggregated into a 51 | single time series. When \code{aggregate = FALSE}, all simulations will be 52 | returned in a single data frame with an \code{.id} column distinguishing between 53 | them.} 54 | 55 | \item{progress}{Show a progress bar? Defaults to \code{TRUE} in interactive 56 | sessions.} 57 | } 58 | \value{ 59 | a \link[tibble:tibble-package]{tibble} 60 | } 61 | \description{ 62 | This function performs random simulations to help isolate the effect of 63 | emissions changes from meteorological variability in air quality data. It 64 | works by repeatedly shuffling meteorological variables (like wind and air 65 | temperature) while keeping temporal patterns intact, then predicting 66 | pollutant concentrations using a trained deweather model. 67 | } 68 | \author{ 69 | David Carslaw 70 | } 71 | -------------------------------------------------------------------------------- /R/plot_dw_importance.R: -------------------------------------------------------------------------------- 1 | #' Visualise deweather model feature importance 2 | #' 3 | #' Visualise the feature importance (% Gain for boosted tree models) for each 4 | #' variable of a deweather model, with some customisation. 5 | #' 6 | #' @inheritParams get_dw_importance 7 | #' 8 | #' @param cols Colours to use for plotting. See [openair::openColours()]. 9 | #' 10 | #' @return a [ggplot2][ggplot2::ggplot2-package] figure 11 | #' 12 | #' @export 13 | plot_dw_importance <- 14 | function(dw, aggregate_factors = FALSE, sort = TRUE, cols = "tol") { 15 | check_deweather(dw) 16 | importance <- 17 | get_dw_importance(dw, aggregate_factors = aggregate_factors, sort = sort) 18 | 19 | ggplot2::ggplot( 20 | importance, 21 | ggplot2::aes(x = .data[["importance"]], y = .data[["var"]]) 22 | ) + 23 | ggplot2::geom_col(fill = openair::openColours(cols, n = 1L)) + 24 | ggplot2::scale_x_continuous( 25 | expand = ggplot2::expansion(c(0, .1)), 26 | labels = function(x) { 27 | paste0(x * 100, "%") 28 | } 29 | ) + 30 | ggplot2::labs(y = NULL, x = "Importance") + 31 | ggplot2::theme_bw() 32 | } 33 | 34 | #' Take an importance dataframe and combine factor variables into a single 35 | #' feature 36 | #' @param importance,vars,data Consistent with previous data 37 | #' @noRd 38 | aggregate_importance_factors <- function(dw) { 39 | importance <- get_dw_importance(dw, aggregate_factors = FALSE) 40 | vars <- get_dw_vars(dw) 41 | data <- get_dw_input_data(dw) 42 | 43 | # if nrow(importance) is the same as length of vars, there's nothing to 44 | # aggregate 45 | if (nrow(importance) == length(vars)) { 46 | return(importance) 47 | } 48 | 49 | # get the types of each variable 50 | vartypes <- 51 | purrr::map_vec(vars, function(x) { 52 | class(data[[x]]) 53 | }) 54 | 55 | # get the factor variables 56 | factor_vars <- vars[vartypes == "factor"] 57 | 58 | # create a dictionary of non-factors (newFeature is the same) 59 | dict <- data.frame( 60 | newFeature = vars[vartypes != "factor"], 61 | var = vars[vartypes != "factor"] 62 | ) 63 | 64 | # if there are any factor vars, append these to the dictionary 65 | if (length(factor_vars) > 0L) { 66 | dict <- 67 | dplyr::bind_rows( 68 | dict, 69 | purrr::map( 70 | factor_vars, 71 | function(x) { 72 | data.frame( 73 | newFeature = x, 74 | var = paste0(x, levels(data[[x]])) 75 | ) 76 | } 77 | ) |> 78 | dplyr::bind_rows() 79 | ) 80 | } 81 | 82 | # summarise per new Feature 83 | importance <- 84 | dplyr::left_join(importance, dict, by = dplyr::join_by("var")) |> 85 | dplyr::summarise(importance = sum(.data$importance), .by = "newFeature") |> 86 | dplyr::rename(var = "newFeature") |> 87 | dplyr::arrange(dplyr::desc(.data$importance)) 88 | 89 | # restore correct factor order 90 | importance$var <- 91 | factor(importance$var, rev(importance$var)) 92 | 93 | return(importance) 94 | } 95 | -------------------------------------------------------------------------------- /R/get_dw.R: -------------------------------------------------------------------------------- 1 | #' Getters for various deweather model features 2 | #' 3 | #' @description 4 | #' 5 | #' `deweather` provides multiple 'getter' functions for extracting relevant 6 | #' model features from a deweather model. These are a useful convenience, 7 | #' particularly in conjunction with R's [pipe][pipeOp] operator (`|>`). 8 | #' 9 | #' @param dw A deweather model created with [build_dw_model()]. 10 | #' 11 | #' @param param For [get_dw_params()]. The default (`NULL`) returns a list of 12 | #' model parameters. `param` will return one specific parameter as a character 13 | #' vector. 14 | #' 15 | #' @param aggregate_factors Defaults to `FALSE`. If `TRUE`, the importance of 16 | #' factor inputs (e.g., Weekday) will be summed into a single variable. This 17 | #' only applies to certain engines which report factor importance as 18 | #' disaggregate features. 19 | #' 20 | #' @param sort If `TRUE`, the default, features will be sorted by their 21 | #' importance. If `FALSE`, they will be sorted alphabetically. In 22 | #' [plot_dw_importance()] this will change the ordering of the y-axis, whereas 23 | #' in [get_dw_importance()] it will change whether `var` is returned as a 24 | #' factor or character data type. 25 | #' 26 | #' @return Typically a character vector, except: 27 | #' - [get_dw_params()]: a list, unless `param` is set. 28 | #' - [get_dw_importance()]: a `data.frame` 29 | #' - [get_dw_model()]: A [parsnip::model_fit] object 30 | #' 31 | #' @family Object 'Getter' Functions 32 | #' 33 | #' @rdname getters-dw 34 | #' @order 1 35 | #' @export 36 | get_dw_pollutant <- function(dw) { 37 | check_deweather(dw) 38 | dw$pollutant 39 | } 40 | 41 | #' @rdname getters-dw 42 | #' @order 2 43 | #' @export 44 | get_dw_vars <- function(dw) { 45 | check_deweather(dw) 46 | dw$vars$names 47 | } 48 | 49 | #' @rdname getters-dw 50 | #' @order 3 51 | #' @export 52 | get_dw_params <- function(dw, param = NULL) { 53 | check_deweather(dw) 54 | params <- dw$params 55 | if (is.null(param)) { 56 | return(params) 57 | } else { 58 | param <- rlang::arg_match(param, names(params)) 59 | params[[param]] 60 | } 61 | } 62 | 63 | #' @rdname getters-dw 64 | #' @order 4 65 | #' @export 66 | get_dw_input_data <- function(dw) { 67 | check_deweather(dw) 68 | dw$data$input 69 | } 70 | 71 | #' @rdname getters-dw 72 | #' @order 5 73 | #' @export 74 | get_dw_model <- function(dw) { 75 | check_deweather(dw) 76 | dw$model 77 | } 78 | 79 | #' @rdname getters-dw 80 | #' @order 6 81 | #' @export 82 | get_dw_engine <- function(dw) { 83 | check_deweather(dw) 84 | dw$engine 85 | } 86 | 87 | 88 | #' @rdname getters-dw 89 | #' @export 90 | #' @order 7 91 | get_dw_importance <- 92 | function(dw, aggregate_factors = FALSE, sort = TRUE) { 93 | check_deweather(dw) 94 | if (aggregate_factors) { 95 | importance <- aggregate_importance_factors(dw) 96 | } else { 97 | importance <- dw$data$importance 98 | } 99 | 100 | if (!sort) { 101 | importance$var <- as.character(importance$var) 102 | } 103 | 104 | return(importance) 105 | } 106 | -------------------------------------------------------------------------------- /man/plot_dw_partial_1d.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/plot_dw_partial_1d.R 3 | \name{plot_dw_partial_1d} 4 | \alias{plot_dw_partial_1d} 5 | \title{Create partial dependence plots for deweather models} 6 | \usage{ 7 | plot_dw_partial_1d( 8 | dw, 9 | vars = NULL, 10 | intervals = 40L, 11 | group = NULL, 12 | group_intervals = 3L, 13 | show_conf_int = TRUE, 14 | n = NULL, 15 | prop = 0.1, 16 | cols = "tol", 17 | radial_wd = TRUE, 18 | ncol = NULL, 19 | nrow = NULL, 20 | plot = TRUE, 21 | progress = rlang::is_interactive() 22 | ) 23 | } 24 | \arguments{ 25 | \item{dw}{A deweather model created with \code{\link[=build_dw_model]{build_dw_model()}}.} 26 | 27 | \item{vars}{Character. The name of the variable(s) to plot. Must be one of 28 | the variables used in the model. If \code{NULL}, all variables will be plotted 29 | in order of importance.} 30 | 31 | \item{intervals}{The number of points for the partial dependence profile.} 32 | 33 | \item{group}{Optional grouping variable to show separate profiles for 34 | different levels of another predictor. Must be one of the variables used in 35 | the model. Default is \code{NULL} (no grouping).} 36 | 37 | \item{group_intervals}{The number of bins when the \code{group} variable is 38 | numeric.} 39 | 40 | \item{show_conf_int}{Should the bootstrapped 95\% confidence interval be 41 | shown? In \code{\link[=plot_dw_partial_1d]{plot_dw_partial_1d()}} these are shown using transparent ribbons 42 | (for numeric variables) and rectangles (for categorical variables).} 43 | 44 | \item{n}{The number of observations to use for calculating the partial 45 | dependence profile. If \code{NULL} (default), uses \code{prop} to determine the 46 | sample size.} 47 | 48 | \item{prop}{The proportion of input data to use for calculating the partial 49 | dependence profile, between 0 and 1. Default is \code{0.1} (10\% of data). 50 | Ignored if \code{n} is specified.} 51 | 52 | \item{cols}{Colours to use for plotting. See \code{\link[openair:openColours]{openair::openColours()}}.} 53 | 54 | \item{radial_wd}{Should the \code{"wd"} (wind direction) variable be plotted on a 55 | radial axis? This can enhance interpretability, but makes it inconsistent 56 | with other variables which are plotted on cartesian coordinates. Defaults 57 | to \code{TRUE}.} 58 | 59 | \item{ncol, nrow}{When more than one \code{vars} is defined, \code{ncol} and \code{nrow} 60 | define the dimensions of the grid to create. Setting both to be \code{NULL} 61 | creates a roughly square grid.} 62 | 63 | \item{plot}{When \code{FALSE}, return a list of plot data instead of a plot.} 64 | 65 | \item{progress}{Show a progress bar? Defaults to \code{TRUE} in interactive 66 | sessions.} 67 | } 68 | \value{ 69 | A \code{ggplot2} object showing the partial dependence plot. If multiple 70 | \code{vars} are specified, a \code{patchwork} assembly of plots will be returned. If 71 | \code{plot = FALSE}, a named list of plot data will be returned instead. 72 | } 73 | \description{ 74 | Generates partial dependence plots to visualize the relationship between 75 | predictor variables and model predictions. These plots show how the predicted 76 | pollutant concentration changes as a function of one variable while averaging 77 | over the effects of all other variables. 78 | } 79 | -------------------------------------------------------------------------------- /man/build_dw_model.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/build_dw_model.R 3 | \name{build_dw_model} 4 | \alias{build_dw_model} 5 | \title{Build a Deweather Model} 6 | \usage{ 7 | build_dw_model( 8 | data, 9 | pollutant, 10 | vars = c("trend", "ws", "wd", "hour", "weekday", "air_temp"), 11 | tree_depth = 5, 12 | trees = 200L, 13 | learn_rate = 0.1, 14 | mtry = NULL, 15 | min_n = 10L, 16 | loss_reduction = 0, 17 | sample_size = 1L, 18 | stop_iter = 190L, 19 | engine = c("xgboost", "lightgbm"), 20 | ..., 21 | .date = "date" 22 | ) 23 | } 24 | \arguments{ 25 | \item{data}{An input \code{data.frame} containing one pollutant column (defined 26 | using \code{pollutant}) and a collection of feature columns (defined using 27 | \code{vars}).} 28 | 29 | \item{pollutant}{The name of the column (likely a pollutant) in \code{data} to 30 | predict.} 31 | 32 | \item{vars}{The name of the columns in \code{data} to use as model features - 33 | i.e., to predict the values in the \code{pollutant} column. Any character 34 | columns will be coerced to factors. \code{"hour"}, \code{"weekday"}, \code{"trend"}, 35 | \code{"yday"}, \code{"week"}, and \code{"month"} are special terms and will be passed to 36 | \code{\link[=append_dw_vars]{append_dw_vars()}} if not present in \code{names(data)}.} 37 | 38 | \item{tree_depth}{An integer for the maximum depth of the tree (i.e. number 39 | of splits) (specific engines only).} 40 | 41 | \item{trees}{An integer for the number of trees contained in 42 | the ensemble.} 43 | 44 | \item{learn_rate}{A number for the rate at which the boosting algorithm adapts 45 | from iteration-to-iteration (specific engines only). This is sometimes referred to 46 | as the shrinkage parameter.} 47 | 48 | \item{mtry}{A number for the number (or proportion) of predictors that will 49 | be randomly sampled at each split when creating the tree models 50 | (specific engines only).} 51 | 52 | \item{min_n}{An integer for the minimum number of data points 53 | in a node that is required for the node to be split further.} 54 | 55 | \item{loss_reduction}{A number for the reduction in the loss function required 56 | to split further (specific engines only).} 57 | 58 | \item{sample_size}{A number for the number (or proportion) of data that is 59 | exposed to the fitting routine. For \code{xgboost}, the sampling is done at 60 | each iteration while \code{C5.0} samples once during training.} 61 | 62 | \item{stop_iter}{The number of iterations without improvement before 63 | stopping (specific engines only).} 64 | 65 | \item{engine}{A single character string specifying what computational engine 66 | to use for fitting.} 67 | 68 | \item{...}{Not current used.} 69 | 70 | \item{.date}{The name of the 'date' column which defines the air quality 71 | timeseries. Passed to \code{\link[=append_dw_vars]{append_dw_vars()}} if needed. Also used to extract 72 | the time zone of the data for later restoration if \code{trend} is used as a 73 | variable.} 74 | } 75 | \value{ 76 | a 'Deweather' object for further analysis 77 | } 78 | \description{ 79 | This function builds a boosted decision tree machine learning model with 80 | useful methods for interrogating it in an air quality and meteorological 81 | context. Currently, only the \link[xgboost:xgboost]{xgboost} engine is 82 | supported. 83 | } 84 | -------------------------------------------------------------------------------- /man/plot_dw_partial_2d.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/plot_dw_partial_2d.R 3 | \name{plot_dw_partial_2d} 4 | \alias{plot_dw_partial_2d} 5 | \title{Create a 2-way partial dependence plot for deweather models} 6 | \usage{ 7 | plot_dw_partial_2d( 8 | dw, 9 | var_x = NULL, 10 | var_y = NULL, 11 | intervals = 40L, 12 | contour = c("none", "lines", "fill"), 13 | contour_bins = 8, 14 | exclude_distance = 0.05, 15 | show_conf_int = FALSE, 16 | n = NULL, 17 | prop = 0.05, 18 | cols = "viridis", 19 | radial_wd = FALSE, 20 | plot = TRUE, 21 | progress = rlang::is_interactive() 22 | ) 23 | } 24 | \arguments{ 25 | \item{dw}{A deweather model created with \code{\link[=build_dw_model]{build_dw_model()}}.} 26 | 27 | \item{var_x, var_y}{The name of the two variables to plot. Must be one of the 28 | variables used in the model. If both are missing, the top two most 29 | individually important numeric variables will be selected automatically.} 30 | 31 | \item{intervals}{The number of points for the partial dependence profile.} 32 | 33 | \item{contour}{Show contour lines on the plot? Can be one of \code{"none"} (the 34 | default, no contour lines), \code{"lines"} (draws lines) or \code{"fill"} (draws 35 | filled contours using a binned colour scale).} 36 | 37 | \item{contour_bins}{How many bins should be drawn if \code{contour != "none"}?} 38 | 39 | \item{exclude_distance}{A 2-way partial dependence plot uses 40 | \code{\link[mgcv:exclude.too.far]{mgcv::exclude.too.far()}} to ensure the plotted surface is within range of 41 | the original input data. \code{exclude_distance} defines how far away from the 42 | original data is too far to plot. This should be in the range \code{0} to \code{1}, 43 | where higher values are more permissive; \code{1} will retain all data.} 44 | 45 | \item{show_conf_int}{Should the bootstrapped 95\% confidence interval be 46 | shown? In \code{\link[=plot_dw_partial_2d]{plot_dw_partial_2d()}} this creates separate facets for the lower 47 | and higher confidence intervals. It may be easiest to see the difference by 48 | using \code{contour = "fill"}.} 49 | 50 | \item{n}{The number of observations to use for calculating the partial 51 | dependence profile. If \code{NULL} (default), uses \code{prop} to determine the 52 | sample size.} 53 | 54 | \item{prop}{The proportion of input data to use for calculating the partial 55 | dependence profile, between 0 and 1. Default is \code{0.1} (10\% of data). 56 | Ignored if \code{n} is specified.} 57 | 58 | \item{cols}{Colours to use for plotting. See \code{\link[openair:openColours]{openair::openColours()}}.} 59 | 60 | \item{radial_wd}{Should the \code{"wd"} (wind direction) variable be plotted on a 61 | radial axis? This can enhance interpretability, but makes it inconsistent 62 | with other variables which are plotted on cartesian coordinates. Defaults 63 | to \code{FALSE}.} 64 | 65 | \item{plot}{When \code{FALSE}, return a list of plot data instead of a plot.} 66 | 67 | \item{progress}{Show a progress bar? Defaults to \code{TRUE} in interactive 68 | sessions.} 69 | } 70 | \value{ 71 | A \code{ggplot2} object showing the partial dependence plot. If \code{plot = FALSE}, a named list of plot data will be returned instead. 72 | } 73 | \description{ 74 | Generates 2-way partial dependence plot to visualize the relationship between 75 | two predictor variables and model predictions. These plots show how the 76 | predicted pollutant concentration changes as a function of two variables 77 | while averaging over the effects of all other variables. 78 | } 79 | -------------------------------------------------------------------------------- /R/append_dw_vars.R: -------------------------------------------------------------------------------- 1 | #' Conveniently append common 'deweathering' variables to an air quality time 2 | #' series 3 | #' 4 | #' This function conveniently manipulates a datetime ('POSIXct') column (by 5 | #' default named 'date') into a series of columns which are useful features in 6 | #' deweather models. Used internally by [build_dw_model()] and 7 | #' [tune_dw_model()], but can be used directly by users if desired. 8 | #' 9 | #' @param data An input `data.frame` with at least one date(time) column. 10 | #' 11 | #' @param vars A character vector of variables of interest. Possible options 12 | #' include: 13 | #' - `"trend"`: a numeric expression of the overall time series 14 | #' - `"hour"`: the hour of the day (0-23) 15 | #' - `"weekday"`: the day of the week (Sunday through Saturday) 16 | #' - `"weekend"`: whether it is a weekend (Saturday, Sunday) or weekday 17 | #' - `"yday"`: the day of the year 18 | #' - `"week"`: the week of the year 19 | #' - `"month"`: the month of the year 20 | #' 21 | #' @param abbr Abbreviate weekday and month strings? Defaults to `TRUE`, which 22 | #' tends to look better in plots. 23 | #' 24 | #' @param ... Not used 25 | #' 26 | #' @param .date The name of the 'date' column to use for manipulation. 27 | #' 28 | #' @seealso [openair::cutData()] for more flexible time series data conditioning. 29 | #' 30 | #' @export 31 | append_dw_vars <- function( 32 | data, 33 | vars = c( 34 | "trend", 35 | "hour", 36 | "weekday", 37 | "weekend", 38 | "yday", 39 | "week", 40 | "month" 41 | ), 42 | abbr = TRUE, 43 | ..., 44 | .date = "date" 45 | ) { 46 | rlang::check_dots_empty() 47 | vars <- rlang::arg_match(vars, dwVars, multiple = TRUE) 48 | 49 | if (!.date %in% names(data)) { 50 | cli::cli_abort( 51 | c( 52 | "x" = "There is no column called '{(.date)}' in {.field data}.", 53 | "i" = "Names in {.field data}: {names(data)}" 54 | ) 55 | ) 56 | } 57 | 58 | if (!lubridate::is.POSIXct(data[[.date]])) { 59 | cli::cli_abort( 60 | c( 61 | "x" = "The column '{(.date)}' in {.field data} is not {.code POSIXct}.", 62 | "i" = "Class of {.field data${(.date)}}: {.code {class(data[[.date]])}}" 63 | ) 64 | ) 65 | } 66 | 67 | if ("trend" %in% vars) { 68 | data$trend <- as.numeric(data[[.date]]) 69 | } 70 | 71 | if ("hour" %in% vars) { 72 | data$hour <- as.integer(lubridate::hour(data[[.date]])) 73 | } 74 | 75 | if ("weekday" %in% vars) { 76 | data$weekday <- 77 | lubridate::wday(data[[.date]], label = TRUE, abbr = abbr) |> 78 | factor(ordered = FALSE) 79 | } 80 | 81 | if ("weekend" %in% vars) { 82 | data$weekend <- lubridate::wday( 83 | data[[.date]], 84 | label = FALSE, 85 | week_start = 1L 86 | ) 87 | data$weekend <- ifelse(data$weekend %in% 6:7, "weekend", "weekday") 88 | data$weekend <- factor(data$weekend, c("weekday", "weekend")) 89 | } 90 | 91 | if ("yday" %in% vars) { 92 | data$yday <- as.integer(lubridate::yday(data[[.date]])) 93 | } 94 | 95 | if ("week" %in% vars) { 96 | data$week <- as.integer(lubridate::week(data[[.date]])) 97 | } 98 | 99 | if ("month" %in% vars) { 100 | data$month <- 101 | lubridate::month(data[[.date]], label = TRUE, abbr = abbr) |> 102 | factor(ordered = FALSE) 103 | } 104 | 105 | return(data) 106 | } 107 | 108 | # variables which are reserved by deweather 109 | dwVars <- c( 110 | "hour", 111 | "weekday", 112 | "weekend", 113 | "trend", 114 | "yday", 115 | "week", 116 | "month" 117 | ) 118 | -------------------------------------------------------------------------------- /man/tune_dw_model.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/tune_dw_model.R 3 | \name{tune_dw_model} 4 | \alias{tune_dw_model} 5 | \title{Tune a deweather model} 6 | \usage{ 7 | tune_dw_model( 8 | data, 9 | pollutant, 10 | vars = c("trend", "ws", "wd", "hour", "weekday", "air_temp"), 11 | tree_depth = 5, 12 | trees = 200L, 13 | learn_rate = 0.1, 14 | mtry = NULL, 15 | min_n = 10L, 16 | loss_reduction = 0, 17 | sample_size = 1L, 18 | stop_iter = 190L, 19 | engine = c("xgboost", "lightgbm"), 20 | split_prop = 3/4, 21 | grid_levels = 5, 22 | v_partitions = 10 23 | ) 24 | } 25 | \arguments{ 26 | \item{data}{An input \code{data.frame} containing one pollutant column (defined 27 | using \code{pollutant}) and a collection of feature columns (defined using 28 | \code{vars}).} 29 | 30 | \item{pollutant}{The name of the column (likely a pollutant) in \code{data} to 31 | predict.} 32 | 33 | \item{vars}{The name of the columns in \code{data} to use as model features - 34 | i.e., to predict the values in the \code{pollutant} column. Any character 35 | columns will be coerced to factors. \code{"hour"}, \code{"weekday"}, \code{"trend"}, 36 | \code{"yday"}, \code{"week"}, and \code{"month"} are special terms and will be passed to 37 | \code{\link[=append_dw_vars]{append_dw_vars()}} if not present in \code{names(data)}.} 38 | 39 | \item{tree_depth, trees, learn_rate, mtry, min_n, loss_reduction, sample_size, stop_iter}{If length 1, these parameters will be fixed. If length \code{2}, the parameter 40 | will be tuned within the range defined between the first and last value. For 41 | example, if \code{tree_depth = c(1, 5)} and \code{grid_levels = 3}, tree depths of \code{1}, 42 | \code{3}, and \code{5} will be tested.} 43 | 44 | \item{engine}{A single character string specifying what computational engine 45 | to use for fitting.} 46 | 47 | \item{split_prop}{The proportion of data to be retained for 48 | modeling/analysis. Passed to the \code{prop} argument of 49 | \code{\link[rsample:initial_split]{rsample::initial_split()}}.} 50 | 51 | \item{grid_levels}{An integer for the number of values of each parameter to 52 | use to make the regular grid. Passed to the \code{levels} argument of 53 | \code{\link[dials:grid_regular]{dials::grid_regular()}}.} 54 | 55 | \item{v_partitions}{The number of partitions of the data set to use for 56 | v-fold cross-validation. Passed to the \code{v} argument of 57 | \code{\link[rsample:vfold_cv]{rsample::vfold_cv()}}.} 58 | } 59 | \description{ 60 | This function performs hyperparameter tuning for a gradient boosting model 61 | used in deweathering air pollution data. It uses cross-validation to find 62 | optimal hyperparameters and returns the best performing model along with 63 | performance metrics and visualizations. Parallel processing (e.g., through 64 | the \code{mirai} package) is recommended to speed up tuning - see 65 | \url{https://tune.tidymodels.org/articles/extras/optimizations.html#parallel-processing}. 66 | } 67 | \details{ 68 | The function performs the following steps: 69 | \itemize{ 70 | \item Removes rows with missing values in the pollutant or predictor variables 71 | \item Splits data into training and testing sets 72 | \item Creates a tuning grid for any parameters specified as ranges 73 | \item Performs grid search with cross-validation to find optimal hyperparameters 74 | \item Fits a final model using the best hyperparameters 75 | \item Generates predictions and performance metrics 76 | 77 | At least one hyperparameter must be specified as a range (vector of length 78 | 2) for tuning to occur. Single values are treated as fixed parameters. 79 | } 80 | } 81 | \author{ 82 | Jack Davison 83 | } 84 | -------------------------------------------------------------------------------- /R/build_dw_model.R: -------------------------------------------------------------------------------- 1 | #' Build a Deweather Model 2 | #' 3 | #' This function builds a boosted decision tree machine learning model with 4 | #' useful methods for interrogating it in an air quality and meteorological 5 | #' context. Currently, only the [xgboost][xgboost::xgboost()] engine is 6 | #' supported. 7 | #' 8 | #' @param data An input `data.frame` containing one pollutant column (defined 9 | #' using `pollutant`) and a collection of feature columns (defined using 10 | #' `vars`). 11 | #' 12 | #' @param pollutant The name of the column (likely a pollutant) in `data` to 13 | #' predict. 14 | #' 15 | #' @param vars The name of the columns in `data` to use as model features - 16 | #' i.e., to predict the values in the `pollutant` column. Any character 17 | #' columns will be coerced to factors. `"hour"`, `"weekday"`, `"trend"`, 18 | #' `"yday"`, `"week"`, and `"month"` are special terms and will be passed to 19 | #' [append_dw_vars()] if not present in `names(data)`. 20 | #' 21 | #' @param ... Not current used. 22 | #' 23 | #' @param .date The name of the 'date' column which defines the air quality 24 | #' timeseries. Passed to [append_dw_vars()] if needed. Also used to extract 25 | #' the time zone of the data for later restoration if `trend` is used as a 26 | #' variable. 27 | #' 28 | #' @inheritParams parsnip::boost_tree 29 | #' 30 | #' @return a 'Deweather' object for further analysis 31 | #' 32 | #' @export 33 | build_dw_model <- function( 34 | data, 35 | pollutant, 36 | vars = c("trend", "ws", "wd", "hour", "weekday", "air_temp"), 37 | tree_depth = 5, 38 | trees = 200L, 39 | learn_rate = 0.1, 40 | mtry = NULL, 41 | min_n = 10L, 42 | loss_reduction = 0, 43 | sample_size = 1L, 44 | stop_iter = 190L, 45 | engine = c("xgboost", "lightgbm"), 46 | ..., 47 | .date = "date" 48 | ) { 49 | # check inputs 50 | rlang::check_dots_empty() 51 | engine <- rlang::arg_match(engine, multiple = FALSE) 52 | vars <- rlang::arg_match( 53 | vars, 54 | unique(c(dwVars, names(data))), 55 | multiple = TRUE 56 | ) 57 | 58 | # get timezone 59 | tz <- lubridate::tz(data[[.date]]) 60 | 61 | # if any of the vars given aren't in data, they can be appended by the 62 | # append_dw_vars function 63 | if (any(!vars %in% names(data))) { 64 | vars_to_add <- vars[!vars %in% names(data)] 65 | data <- append_dw_vars(data, vars = vars_to_add, abbr = TRUE, .date = .date) 66 | } 67 | 68 | # if lightgbm, also need bonsai 69 | if (engine == "lightgbm") { 70 | rlang::check_installed(c("lightgbm", "bonsai")) 71 | } 72 | 73 | # drop all missing values 74 | data <- data |> 75 | dplyr::select(dplyr::all_of(c(pollutant, vars))) |> 76 | dplyr::filter(dplyr::if_all(dplyr::everything(), ~ !is.na(.))) 77 | 78 | # change characters into factors, and ensure factors are unordered 79 | # (xgboost naming seems to get confused with ordered factors) 80 | data <- dplyr::mutate( 81 | data, 82 | dplyr::across(dplyr::where(is.character), factor), 83 | dplyr::across(dplyr::where(is.ordered), function(x) { 84 | factor(x, ordered = FALSE) 85 | }) 86 | ) 87 | 88 | # define model spec 89 | model_spec <- 90 | parsnip::boost_tree( 91 | mode = "regression", 92 | engine = engine, 93 | tree_depth = !!tree_depth, 94 | trees = !!trees, 95 | learn_rate = !!learn_rate, 96 | mtry = !!mtry, 97 | min_n = !!min_n, 98 | loss_reduction = !!loss_reduction, 99 | sample_size = !!sample_size, 100 | stop_iter = !!stop_iter 101 | ) 102 | 103 | # build a formula object from poll & vars 104 | formula <- stats::reformulate(vars, pollutant) 105 | 106 | # fit the model 107 | model <- parsnip::fit(model_spec, formula, data = data) 108 | 109 | # get importance 110 | importance <- vip::vi(model$fit) |> 111 | stats::setNames(c("var", "importance")) 112 | 113 | # reverse the factor levels (for plotting mainly) 114 | importance$var <- factor(importance$var, rev(importance$var)) 115 | 116 | # deweather object 117 | out <- list( 118 | pollutant = pollutant, 119 | vars = list( 120 | names = vars, 121 | types = as.character(purrr::map(data, class)[vars]) 122 | ), 123 | params = list( 124 | tree_depth = tree_depth, 125 | trees = trees, 126 | learn_rate = learn_rate, 127 | mtry = mtry, 128 | min_n = min_n, 129 | loss_reduction = loss_reduction, 130 | sample_size = sample_size, 131 | stop_iter = stop_iter 132 | ), 133 | data = list( 134 | input = data, 135 | importance = dplyr::tibble(importance) 136 | ), 137 | model = model, 138 | engine = "xgboost", 139 | tz = tz 140 | ) 141 | 142 | class(out) <- "Deweather" 143 | 144 | return(out) 145 | } 146 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 |
3 | 4 | 5 | 6 | ## **deweather** 7 | ### open source tools to remove meteorological variation from air quality data 8 | 9 | 10 | [![R-CMD-check](https://github.com/openair-project/deweather/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/openair-project/deweather/actions/workflows/R-CMD-check.yaml) 11 | [![CRAN status](https://www.r-pkg.org/badges/version/deweather)](https://CRAN.R-project.org/package=deweather) 12 |
13 | [![github](https://img.shields.io/badge/CODE-github-black?logo=github)](https://github.com/openair-project/deweather) 14 | [![website](https://img.shields.io/badge/DOCS-website-black)](https://openair-project.github.io/deweather/) 15 | [![book](https://img.shields.io/badge/DOCS-book-black)](https://openair-project.github.io/book/) 16 | 17 | 18 |
19 | 20 | **deweather** is an R package developed for the purpose of "removing" the influence of meteorology from air quality time series data. The package uses a *boosted regression tree* approach for modelling air quality data. These and similar techniques provide powerful tools for building statistical models of air quality data. They are able to take account of the many complex interactions between variables as well as non-linear relationships between the variables. 21 | 22 |
23 | 24 | *Part of the openair toolkit* 25 | 26 | [![openair](https://img.shields.io/badge/openair_core-06D6A0?style=flat-square)](https://openair-project.github.io/openair/) | 27 | [![worldmet](https://img.shields.io/badge/worldmet-26547C?style=flat-square)](https://openair-project.github.io/worldmet/) | 28 | [![openairmaps](https://img.shields.io/badge/openairmaps-FFD166?style=flat-square)](https://openair-project.github.io/openairmaps/) | 29 | [![deweather](https://img.shields.io/badge/deweather-EF476F?style=flat-square)](https://openair-project.github.io/deweather/) 30 | 31 |
32 | 33 |
34 | 35 | ## 💡 Core Features 36 | 37 | **deweather** makes it straightforward to test, build, and evaluate models in R. 38 | 39 | - **Test and build meteorological normalisation models** flexibly using `tune_dw_model()` and `build_dw_model()`. 40 | 41 | - **Plot and examine models** in a myriad of ways, including visualising partial dependencies, using functions like `plot_importance()`, `plot_dw_partial_1d()` and `plot_dw_partial_2d()`. 42 | 43 | - **Apply meteorological averaging** using `simulate_dw_met()` to obtain a meteorologically normalised air quality timeseries. 44 | 45 | Modelling can be computationally intensive and therefore **deweather** makes use of the parallel processing, which should work on Windows, Linux and Mac OSX. 46 | 47 |
48 | 49 |
50 | 51 |
52 | 53 | ## ⌛ Pre-1.0.0 deweather 54 | 55 | **deweather** was overhauled in its 1.0.0 update. We believe this update makes `deweather` more modern and flexible, but we appreciate users may require access to or prefer the older version. 56 | 57 | For this reason, the older, `gbm`-powered version of `deweather` can be accessed at . 58 | 59 | Note that the above repository is provided for archival purposes only, and is unlikely to recieve any future feature updates. 60 | 61 |
62 | 63 | ## 📖 Documentation 64 | 65 | All **deweather** functions are fully documented; access documentation using R in your IDE of choice. 66 | 67 | ```r 68 | ?deweather::build_dw_model 69 | ``` 70 | 71 | Documentation is also hosted online on the **package website**. 72 | 73 | [![website](https://img.shields.io/badge/website-documentation-blue)](https://openair-project.github.io/deweather/) 74 | 75 | A guide to the openair toolkit can be found in the **online book**, which contains lots of code snippets, demonstrations of functionality, and ideas for the application of **openair**'s various functions. 76 | 77 | [![book](https://img.shields.io/badge/book-code_demos_and_ideas-blue)](https://openair-project.github.io/book/) 78 | 79 |
80 | 81 | ## 🗃️ Installation 82 | 83 | **deweather** is not yet on **CRAN**. 84 | 85 | The development version of **deweather** can be installed from GitHub using `{pak}`: 86 | 87 | ``` r 88 | # install.packages("pak") 89 | pak::pak("openair-project/deweather") 90 | ``` 91 | 92 |
93 | 94 | 🏛️ **deweather** is primarily maintained by [David Carslaw](https://github.com/davidcarslaw). 95 | 96 | 📃 **deweather** is licensed under the [MIT License](https://openair-project.github.io/deweather/LICENSE.html). 97 | 98 | 🧑‍💻 Contributions are welcome from the wider community. See the [contributing guide](https://openair-project.github.io/deweather/CONTRIBUTING.html) and [code of conduct](https://openair-project.github.io/deweather/CODE_OF_CONDUCT.html) for more information. 99 | -------------------------------------------------------------------------------- /.github/CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Covenant Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | We as members, contributors, and leaders pledge to make participation in our 6 | community a harassment-free experience for everyone, regardless of age, body 7 | size, visible or invisible disability, ethnicity, sex characteristics, gender 8 | identity and expression, level of experience, education, socio-economic status, 9 | nationality, personal appearance, race, caste, colour, religion, or sexual 10 | identity and orientation. 11 | 12 | We pledge to act and interact in ways that contribute to an open, welcoming, 13 | diverse, inclusive, and healthy community. 14 | 15 | ## Our Standards 16 | 17 | Examples of behaviour that contributes to a positive environment for our 18 | community include: 19 | 20 | * Demonstrating empathy and kindness toward other people 21 | * Being respectful of differing opinions, viewpoints, and experiences 22 | * Giving and gracefully accepting constructive feedback 23 | * Accepting responsibility and apologizing to those affected by our mistakes, 24 | and learning from the experience 25 | * Focusing on what is best not just for us as individuals, but for the overall 26 | community 27 | 28 | Examples of unacceptable behaviour include: 29 | 30 | * The use of sexualized language or imagery, and sexual attention or advances of 31 | any kind 32 | * Trolling, insulting or derogatory comments, and personal or political attacks 33 | * Public or private harassment 34 | * Publishing others' private information, such as a physical or email address, 35 | without their explicit permission 36 | * Other conduct which could reasonably be considered inappropriate in a 37 | professional setting 38 | 39 | ## Enforcement Responsibilities 40 | 41 | Community leaders are responsible for clarifying and enforcing our standards of 42 | acceptable behavior and will take appropriate and fair corrective action in 43 | response to any behavior that they deem inappropriate, threatening, offensive, 44 | or harmful. 45 | 46 | Community leaders have the right and responsibility to remove, edit, or reject 47 | comments, commits, code, wiki edits, issues, and other contributions that are 48 | not aligned to this Code of Conduct, and will communicate reasons for moderation 49 | decisions when appropriate. 50 | 51 | ## Scope 52 | 53 | This Code of Conduct applies within all community spaces, and also applies when 54 | an individual is officially representing the community in public spaces. 55 | Examples of representing our community include using an official e-mail address, 56 | posting via an official social media account, or acting as an appointed 57 | representative at an online or offline event. 58 | 59 | ## Enforcement 60 | 61 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 62 | reported to the community leaders responsible via the GitHub platform. 63 | All complaints will be reviewed and investigated promptly and fairly. 64 | 65 | All community leaders are obligated to respect the privacy and security of the 66 | reporter of any incident. 67 | 68 | ## Enforcement Guidelines 69 | 70 | Community leaders will follow these Community Impact Guidelines in determining 71 | the consequences for any action they deem in violation of this Code of Conduct: 72 | 73 | ### 1. Correction 74 | 75 | **Community Impact**: Use of inappropriate language or other behaviour deemed 76 | unprofessional or unwelcome in the community. 77 | 78 | **Consequence**: A private, written warning from community leaders, providing 79 | clarity around the nature of the violation and an explanation of why the 80 | behaviour was inappropriate. A public apology may be requested. 81 | 82 | ### 2. Warning 83 | 84 | **Community Impact**: A violation through a single incident or series of 85 | actions. 86 | 87 | **Consequence**: A warning with consequences for continued behaviour. No 88 | interaction with the people involved, including unsolicited interaction with 89 | those enforcing the Code of Conduct, for a specified period of time. This 90 | includes avoiding interactions in community spaces as well as external channels 91 | like social media. Violating these terms may lead to a temporary or permanent 92 | ban. 93 | 94 | ### 3. Temporary Ban 95 | 96 | **Community Impact**: A serious violation of community standards, including 97 | sustained inappropriate behaviour. 98 | 99 | **Consequence**: A temporary ban from any sort of interaction or public 100 | communication with the community for a specified period of time. No public or 101 | private interaction with the people involved, including unsolicited interaction 102 | with those enforcing the Code of Conduct, is allowed during this period. 103 | Violating these terms may lead to a permanent ban. 104 | 105 | ### 4. Permanent Ban 106 | 107 | **Community Impact**: Demonstrating a pattern of violation of community 108 | standards, including sustained inappropriate behavior, harassment of an 109 | individual, or aggression toward or disparagement of classes of individuals. 110 | 111 | **Consequence**: A permanent ban from any sort of public interaction within the 112 | community. 113 | 114 | ## Attribution 115 | 116 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], 117 | version 2.1, available at 118 | . 119 | 120 | Community Impact Guidelines were inspired by 121 | [Mozilla's code of conduct enforcement ladder](https://github.com/mozilla/inclusion). 122 | 123 | For answers to common questions about this code of conduct, see the FAQ at 124 | . Translations are available at . 125 | 126 | [homepage]: https://www.contributor-covenant.org 127 | -------------------------------------------------------------------------------- /R/simulate_dw_met.R: -------------------------------------------------------------------------------- 1 | #' Function to run random meteorological simulations on a deweather model 2 | #' 3 | #' This function performs random simulations to help isolate the effect of 4 | #' emissions changes from meteorological variability in air quality data. It 5 | #' works by repeatedly shuffling meteorological variables (like wind and air 6 | #' temperature) while keeping temporal patterns intact, then predicting 7 | #' pollutant concentrations using a trained deweather model. 8 | #' 9 | #' @param dw A deweather model created with [build_dw_model()]. 10 | #' 11 | #' @param newdata Data set to which to apply the model. If missing the data used 12 | #' to build the model in the first place will be used. 13 | #' 14 | #' @param vars The variables that should be randomly varied. Note that these 15 | #' should typically be meteorological variables (e.g., `"ws"`, `"wd"`, 16 | #' `"air_temp"`) and not temporal emission proxies (e.g., `"hour"`, 17 | #' `"weekday"`, `"week"`). 18 | #' 19 | #' @param resampling The resampling strategy. One of: 20 | #' 21 | #' - `"constrained"` (default), meaning that only days of the year close to 22 | #' the target date are sampled. This option is used in conjunction with 23 | #' `window_day` and `window_hour`. For example, a `window_day` of `30` will 24 | #' sample +/-30 days of the date. 25 | #' 26 | #' - `"all"`, meaning all dates are shuffled. 27 | #' 28 | #' The argument for using constrained resampling is that it resamples 29 | #' conditions for a similar time of year and / or hour of the day to minimise 30 | #' the resampling of implausible conditions e.g. very warm temperatures during 31 | #' winter. 32 | #' 33 | #' @param window_day,window_hour The day of year (`window_day`) and hour of day 34 | #' (`window_hour`) windows to sample within when `resampling = "constrained"`. 35 | #' For example, `window_day = 30` samples within +/-30 days of any given date. 36 | #' 37 | #' @param n The number of simulations to use. 38 | #' 39 | #' @param aggregate By default, all of the simulations will be aggregated into a 40 | #' single time series. When `aggregate = FALSE`, all simulations will be 41 | #' returned in a single data frame with an `.id` column distinguishing between 42 | #' them. 43 | #' 44 | #' @param progress Show a progress bar? Defaults to `TRUE` in interactive 45 | #' sessions. 46 | #' 47 | #' @export 48 | #' 49 | #' @return a [tibble][tibble::tibble-package] 50 | #' 51 | #' @author David Carslaw 52 | simulate_dw_met <- 53 | function( 54 | dw, 55 | newdata = deweather::get_dw_input_data(dw), 56 | vars = c("ws", "wd", "air_temp"), 57 | resampling = c("constrained", "all"), 58 | window_day = 30, 59 | window_hour = 2, 60 | n = 200, 61 | aggregate = TRUE, 62 | progress = rlang::is_interactive() 63 | ) { 64 | check_deweather(dw) 65 | resampling <- rlang::arg_match(resampling, c("constrained", "all")) 66 | 67 | # extract model components 68 | model <- get_dw_model(dw) 69 | model_vars <- get_dw_vars(dw) 70 | pollutant <- get_dw_pollutant(dw) 71 | tz <- dw$tz 72 | 73 | if (!"trend" %in% model_vars) { 74 | cli::cli_abort( 75 | "{.arg dw} must have a trend component as one of the explanatory variables." 76 | ) 77 | } 78 | 79 | # if daemons are set, need to load packages in the workers, otherwise don't 80 | if (mirai::daemons_set()) { 81 | prediction <- 82 | purrr::map( 83 | .x = 1:n, 84 | .f = purrr::in_parallel( 85 | \(x) { 86 | library(deweather) 87 | contr_one_hot <- parsnip::contr_one_hot 88 | doPred( 89 | mydata = newdata, 90 | mod = model, 91 | vars = vars, 92 | resampling = resampling, 93 | window_day = window_day, 94 | window_hour = window_hour, 95 | sample_fun = get_constrained_indices_cpp, 96 | tz = tz 97 | ) 98 | }, 99 | doPred = doPred, 100 | newdata = newdata, 101 | model = model, 102 | vars = vars, 103 | resampling = resampling, 104 | window_day = window_day, 105 | window_hour = window_hour, 106 | tz = tz, 107 | get_constrained_indices_cpp = get_constrained_indices_cpp 108 | ), 109 | .progress = progress 110 | ) |> 111 | purrr::list_rbind() 112 | } else { 113 | prediction <- 114 | purrr::map( 115 | .x = 1:n, 116 | .f = purrr::in_parallel( 117 | \(x) { 118 | doPred( 119 | mydata = newdata, 120 | mod = model, 121 | vars = vars, 122 | resampling = resampling, 123 | window_day = window_day, 124 | window_hour = window_hour, 125 | sample_fun = get_constrained_indices_cpp, 126 | tz = tz 127 | ) 128 | }, 129 | doPred = doPred, 130 | newdata = newdata, 131 | model = model, 132 | vars = vars, 133 | resampling = resampling, 134 | window_day = window_day, 135 | window_hour = window_hour, 136 | tz = tz, 137 | get_constrained_indices_cpp = get_constrained_indices_cpp 138 | ), 139 | .progress = progress 140 | ) |> 141 | purrr::list_rbind() 142 | } 143 | 144 | # use pollutant name 145 | names(prediction)[2] <- pollutant 146 | 147 | # Aggregate results 148 | if (aggregate) { 149 | prediction <- 150 | dplyr::summarise( 151 | prediction, 152 | {{ pollutant }} := mean(.data[[pollutant]]), 153 | .by = "date" 154 | ) |> 155 | dplyr::tibble() 156 | } else { 157 | prediction <- 158 | dplyr::mutate( 159 | prediction, 160 | .id = dplyr::row_number(), 161 | .by = "date", 162 | .before = 0 163 | ) |> 164 | dplyr::tibble() 165 | } 166 | 167 | return(prediction) 168 | } 169 | 170 | # get random samples and predict 171 | doPred <- function( 172 | mydata, 173 | mod, 174 | vars, 175 | resampling, 176 | window_day, 177 | window_hour, 178 | sample_fun, 179 | tz 180 | ) { 181 | n <- nrow(mydata) 182 | 183 | if (resampling == "all") { 184 | id <- sample(1:n, n, replace = FALSE) 185 | } 186 | 187 | if (resampling == "constrained") { 188 | # Extract features 189 | dates <- as.POSIXct(mydata$trend, tz = tz) 190 | doy <- lubridate::yday(dates) 191 | hod <- lubridate::hour(dates) 192 | 193 | # Call C++ with the window arguments 194 | id <- sample_fun( 195 | doy = doy, 196 | hod = hod, 197 | day_win = window_day, 198 | hour_win = window_hour 199 | ) 200 | } 201 | 202 | # new data with random samples 203 | mydata[vars] <- lapply(mydata[vars], \(x) x[id]) 204 | 205 | # predict 206 | prediction <- parsnip::predict.model_fit(mod, new_data = mydata) 207 | 208 | # return data 209 | prediction <- dplyr::tibble( 210 | date = as.POSIXct(mydata$trend, tz = tz), 211 | pred = prediction$.pred 212 | ) 213 | 214 | return(prediction) 215 | } 216 | -------------------------------------------------------------------------------- /R/tune_dw_model.R: -------------------------------------------------------------------------------- 1 | #' Tune a deweather model 2 | #' 3 | #' This function performs hyperparameter tuning for a gradient boosting model 4 | #' used in deweathering air pollution data. It uses cross-validation to find 5 | #' optimal hyperparameters and returns the best performing model along with 6 | #' performance metrics and visualizations. Parallel processing (e.g., through 7 | #' the `mirai` package) is recommended to speed up tuning - see 8 | #' . 9 | #' 10 | #' @inheritParams build_dw_model 11 | #' 12 | #' @param 13 | #' tree_depth,trees,learn_rate,mtry,min_n,loss_reduction,sample_size,stop_iter 14 | #' If length 1, these parameters will be fixed. If length `2`, the parameter 15 | #' will be tuned within the range defined between the first and last value. For 16 | #' example, if `tree_depth = c(1, 5)` and `grid_levels = 3`, tree depths of `1`, 17 | #' `3`, and `5` will be tested. 18 | #' 19 | #' @param split_prop The proportion of data to be retained for 20 | #' modeling/analysis. Passed to the `prop` argument of 21 | #' [rsample::initial_split()]. 22 | #' 23 | #' @param grid_levels An integer for the number of values of each parameter to 24 | #' use to make the regular grid. Passed to the `levels` argument of 25 | #' [dials::grid_regular()]. 26 | #' 27 | #' @param v_partitions The number of partitions of the data set to use for 28 | #' v-fold cross-validation. Passed to the `v` argument of 29 | #' [rsample::vfold_cv()]. 30 | #' 31 | #' @details The function performs the following steps: 32 | #' 33 | #' - Removes rows with missing values in the pollutant or predictor variables 34 | #' 35 | #' - Splits data into training and testing sets 36 | #' 37 | #' - Creates a tuning grid for any parameters specified as ranges 38 | #' 39 | #' - Performs grid search with cross-validation to find optimal hyperparameters 40 | #' 41 | #' - Fits a final model using the best hyperparameters 42 | #' 43 | #' - Generates predictions and performance metrics 44 | #' 45 | #' At least one hyperparameter must be specified as a range (vector of length 46 | #' 2) for tuning to occur. Single values are treated as fixed parameters. 47 | #' 48 | #' @author Jack Davison 49 | #' @export 50 | tune_dw_model <- function( 51 | data, 52 | pollutant, 53 | vars = c("trend", "ws", "wd", "hour", "weekday", "air_temp"), 54 | tree_depth = 5, 55 | trees = 200L, 56 | learn_rate = 0.1, 57 | mtry = NULL, 58 | min_n = 10L, 59 | loss_reduction = 0, 60 | sample_size = 1L, 61 | stop_iter = 190L, 62 | engine = c("xgboost", "lightgbm"), 63 | split_prop = 3 / 4, 64 | grid_levels = 5, 65 | v_partitions = 10 66 | ) { 67 | # check inputs 68 | engine <- rlang::arg_match(engine, multiple = FALSE) 69 | vars <- rlang::arg_match( 70 | vars, 71 | unique(c(dwVars, names(data))), 72 | multiple = TRUE 73 | ) 74 | 75 | # if any of the vars given aren't in data, they can be appended by the 76 | # append_dw_vars function 77 | if (any(!vars %in% names(data))) { 78 | vars_to_add <- vars[!vars %in% names(data)] 79 | data <- append_dw_vars(data, vars = vars_to_add, abbr = TRUE) 80 | } 81 | 82 | if (engine == "lightgbm") { 83 | rlang::check_installed(c("lightgbm", "bonsai")) 84 | } 85 | 86 | # drop all missing values 87 | data <- data |> 88 | dplyr::select(dplyr::all_of(c(pollutant, vars))) |> 89 | dplyr::filter(dplyr::if_all(dplyr::everything(), ~ !is.na(.))) 90 | 91 | # change characters into factors, and ensure factors are unordered 92 | # (xgboost naming seems to get confused with ordered factors) 93 | data <- dplyr::mutate( 94 | data, 95 | dplyr::across(dplyr::where(is.character), factor), 96 | dplyr::across(dplyr::where(is.ordered), function(x) { 97 | factor(x, ordered = FALSE) 98 | }) 99 | ) 100 | 101 | # get testing-training splits 102 | data_split <- rsample::initial_split(data, prop = split_prop) 103 | data_train <- rsample::training(data_split) 104 | 105 | # build tuning grid 106 | grid <- list() 107 | 108 | tree_depth_spec <- tree_depth 109 | trees_spec <- trees 110 | learn_rate_spec <- learn_rate 111 | mtry_spec <- mtry 112 | min_n_spec <- min_n 113 | loss_reduction_spec <- loss_reduction 114 | sample_size_spec <- sample_size 115 | stop_iter_spec <- stop_iter 116 | 117 | if (length(tree_depth) > 1) { 118 | grid <- append(grid, list(dials::tree_depth(range = tree_depth))) 119 | tree_depth_spec <- parsnip::tune() 120 | } 121 | 122 | if (length(trees) > 1) { 123 | grid <- append(grid, list(dials::trees(range = trees))) 124 | trees_spec <- parsnip::tune() 125 | } 126 | 127 | if (length(learn_rate) > 1) { 128 | grid <- append(grid, list(dials::learn_rate(range = learn_rate))) 129 | learn_rate_spec <- parsnip::tune() 130 | } 131 | 132 | if (length(mtry) > 1) { 133 | grid <- append(grid, list(dials::mtry(range = mtry))) 134 | mtry_spec <- parsnip::tune() 135 | } 136 | 137 | if (length(min_n) > 1) { 138 | grid <- append(grid, list(dials::min_n(range = min_n))) 139 | min_n_spec <- parsnip::tune() 140 | } 141 | 142 | if (length(loss_reduction) > 1) { 143 | grid <- append(grid, list(dials::loss_reduction(range = loss_reduction))) 144 | loss_reduction_spec <- parsnip::tune() 145 | } 146 | 147 | if (length(sample_size) > 1) { 148 | grid <- append(grid, list(dials::sample_size(range = sample_size))) 149 | sample_size_spec <- parsnip::tune() 150 | } 151 | 152 | if (length(stop_iter) > 1) { 153 | grid <- append(grid, list(dials::stop_iter(range = stop_iter))) 154 | stop_iter_spec <- parsnip::tune() 155 | } 156 | 157 | if (length(grid) == 0) { 158 | cli::cli_abort( 159 | "At least one parameter (e.g., {.arg tree_depth}) must be given as a range of two values." 160 | ) 161 | } 162 | 163 | grid <- dials::grid_regular(x = grid, levels = grid_levels) 164 | 165 | # get tuning spec 166 | tune_spec <- 167 | parsnip::boost_tree( 168 | mode = "regression", 169 | engine = engine, 170 | tree_depth = !!tree_depth_spec, 171 | trees = !!trees_spec, 172 | learn_rate = !!learn_rate_spec, 173 | mtry = !!mtry_spec, 174 | min_n = !!min_n_spec, 175 | loss_reduction = !!loss_reduction_spec, 176 | sample_size = !!sample_size_spec, 177 | stop_iter = !!stop_iter_spec 178 | ) 179 | 180 | # get training folds 181 | folds <- rsample::vfold_cv(data_train, v = v_partitions) 182 | 183 | # build a formula object from poll & vars 184 | formula <- stats::reformulate(vars, pollutant) 185 | 186 | # create tuning workflow 187 | wf <- 188 | workflows::workflow() |> 189 | workflows::add_model(tune_spec) |> 190 | workflows::add_formula(formula) 191 | 192 | # get results from grid 193 | results <- tune::tune_grid( 194 | wf, 195 | resamples = folds, 196 | grid = grid, 197 | control = tune::control_grid( 198 | verbose = TRUE, 199 | allow_par = TRUE, 200 | parallel_over = "everything" 201 | ) 202 | ) 203 | 204 | # get best models 205 | five_best_models <- tune::show_best(results, metric = "rmse") 206 | 207 | # get the best overall model 208 | best_params <- tune::select_best(results, metric = "rmse") |> 209 | dplyr::select(-".config") 210 | 211 | # finalise workflow 212 | wf <- tune::finalize_workflow(wf, best_params) 213 | 214 | # one last fit using splits 215 | final_fit <- tune::last_fit(wf, data_split) 216 | 217 | # final predictions 218 | final_predictions <- 219 | tune::collect_predictions(final_fit) |> 220 | dplyr::select( 221 | "obs" = !!pollutant, 222 | "mod" = ".pred" 223 | ) |> 224 | dplyr::mutate( 225 | pollutant = pollutant, 226 | .before = 0 227 | ) 228 | 229 | # bind to testing dataset for better comparisons 230 | final_predictions <- 231 | rsample::testing(final_fit$splits[[1]]) |> 232 | dplyr::select(-dplyr::any_of(pollutant)) |> 233 | dplyr::bind_cols( 234 | final_predictions 235 | ) 236 | 237 | # get model stats 238 | final_metrics <- 239 | final_predictions |> 240 | openair::modStats(type = "pollutant") |> 241 | dplyr::rename_with(tolower) 242 | 243 | # plot a scatter plot 244 | axisrange <- range(c(0, final_predictions$obs, final_predictions$mod)) 245 | plot <- 246 | final_predictions |> 247 | ggplot2::ggplot( 248 | ggplot2::aes(x = .data$obs, y = .data$mod) 249 | ) + 250 | ggplot2::geom_abline( 251 | color = "#9E0142FF", 252 | alpha = 0.5, 253 | lty = 5, 254 | slope = 0.5 255 | ) + 256 | ggplot2::geom_abline(color = "#9E0142FF", alpha = 0.5, lty = 5, slope = 2) + 257 | ggplot2::geom_abline(color = "#9E0142FF", lwd = 1.5) + 258 | ggplot2::geom_point() + 259 | ggplot2::theme_bw() + 260 | ggplot2::scale_x_continuous( 261 | limits = axisrange, 262 | expand = ggplot2::expansion(c(0, .1)) 263 | ) + 264 | ggplot2::scale_y_continuous( 265 | limits = axisrange, 266 | expand = ggplot2::expansion(c(0, .1)) 267 | ) + 268 | ggplot2::coord_cartesian(ratio = 1L) + 269 | ggplot2::labs( 270 | x = openair::quickText(paste("Observed", pollutant)), 271 | y = openair::quickText(paste("Modelled", pollutant)) 272 | ) 273 | 274 | # return params 275 | list( 276 | best_params = as.list(best_params), 277 | final_fit = list( 278 | predictions = final_predictions, 279 | metrics = final_metrics, 280 | plot = plot 281 | ) 282 | ) 283 | } 284 | -------------------------------------------------------------------------------- /R/plot_dw_partial_2d.R: -------------------------------------------------------------------------------- 1 | #' Create a 2-way partial dependence plot for deweather models 2 | #' 3 | #' Generates 2-way partial dependence plot to visualize the relationship between 4 | #' two predictor variables and model predictions. These plots show how the 5 | #' predicted pollutant concentration changes as a function of two variables 6 | #' while averaging over the effects of all other variables. 7 | #' 8 | #' @inheritParams plot_dw_partial_1d 9 | #' 10 | #' @param var_x,var_y The name of the two variables to plot. Must be one of the 11 | #' variables used in the model. If both are missing, the top two most 12 | #' individually important numeric variables will be selected automatically. 13 | #' 14 | #' @param contour Show contour lines on the plot? Can be one of `"none"` (the 15 | #' default, no contour lines), `"lines"` (draws lines) or `"fill"` (draws 16 | #' filled contours using a binned colour scale). 17 | #' 18 | #' @param contour_bins How many bins should be drawn if `contour != "none"`? 19 | #' 20 | #' @param show_conf_int Should the bootstrapped 95% confidence interval be 21 | #' shown? In [plot_dw_partial_2d()] this creates separate facets for the lower 22 | #' and higher confidence intervals. It may be easiest to see the difference by 23 | #' using `contour = "fill"`. 24 | #' 25 | #' @param exclude_distance A 2-way partial dependence plot uses 26 | #' [mgcv::exclude.too.far()] to ensure the plotted surface is within range of 27 | #' the original input data. `exclude_distance` defines how far away from the 28 | #' original data is too far to plot. This should be in the range `0` to `1`, 29 | #' where higher values are more permissive; `1` will retain all data. 30 | #' 31 | #' @param radial_wd Should the `"wd"` (wind direction) variable be plotted on a 32 | #' radial axis? This can enhance interpretability, but makes it inconsistent 33 | #' with other variables which are plotted on cartesian coordinates. Defaults 34 | #' to `FALSE`. 35 | #' 36 | #' @return A `ggplot2` object showing the partial dependence plot. If `plot = 37 | #' FALSE`, a named list of plot data will be returned instead. 38 | #' 39 | #' @export 40 | plot_dw_partial_2d <- function( 41 | dw, 42 | var_x = NULL, 43 | var_y = NULL, 44 | intervals = 40L, 45 | contour = c("none", "lines", "fill"), 46 | contour_bins = 8, 47 | exclude_distance = 0.05, 48 | show_conf_int = FALSE, 49 | n = NULL, 50 | prop = 0.05, 51 | cols = "viridis", 52 | radial_wd = FALSE, 53 | plot = TRUE, 54 | progress = rlang::is_interactive() 55 | ) { 56 | check_deweather(dw) 57 | 58 | if (exclude_distance < 0 || exclude_distance > 1) { 59 | cli::cli_abort("{.arg exclude_distance} must be between {0} and {1}.") 60 | } 61 | 62 | # get model features 63 | model <- get_dw_model(dw) 64 | vars <- get_dw_vars(dw) 65 | input_data <- get_dw_input_data(dw) 66 | pollutant <- get_dw_pollutant(dw) 67 | importance <- get_dw_importance(dw, aggregate_factors = TRUE) 68 | 69 | # if vars are missing, pick the two most important numeric variables 70 | if (is.null(var_x) && is.null(var_y)) { 71 | num_vars <- dw$vars$names[dw$vars$types %in% c("numeric", "integer")] 72 | most_important <- 73 | importance |> 74 | dplyr::filter(.data$var %in% num_vars) |> 75 | dplyr::slice_head(n = 2L) |> 76 | dplyr::pull(.data$var) |> 77 | as.character() 78 | var_x <- most_important[1] 79 | var_y <- most_important[2] 80 | } 81 | 82 | # make sure vars are model vars 83 | var_x <- rlang::arg_match(var_x, vars) 84 | var_y <- rlang::arg_match(var_y, vars) 85 | contour <- rlang::arg_match( 86 | contour, 87 | c("none", "lines", "fill"), 88 | multiple = FALSE 89 | ) 90 | 91 | # need to switch around the variables if radial_wd is desired 92 | if (radial_wd && (var_x == "wd" || var_y == "wd")) { 93 | if (var_y == "wd") { 94 | var_y <- var_x 95 | var_x <- "wd" 96 | } 97 | } 98 | 99 | # create DALEX explainer 100 | explainer <- 101 | DALEXtra::explain_tidymodels( 102 | model = model, 103 | data = dplyr::select(input_data, -dplyr::all_of(pollutant)), 104 | y = input_data[[pollutant]], 105 | verbose = FALSE 106 | ) 107 | 108 | # rows to use in data 109 | rows <- sample( 110 | x = nrow(input_data), 111 | size = n %||% (prop * nrow(input_data)), 112 | replace = FALSE 113 | ) 114 | 115 | # get 2d CP 116 | cp2d <- purrr::map( 117 | .x = rows, 118 | .f = \(row) { 119 | ingredients::ceteris_paribus_2d( 120 | explainer = explainer, 121 | observation = input_data[row, ], 122 | grid_points = intervals, 123 | variables = c(var_x, var_y) 124 | ) |> 125 | dplyr::tibble() 126 | }, 127 | .progress = progress 128 | ) |> 129 | dplyr::bind_rows() 130 | 131 | # calculate mean 132 | plotdata <- 133 | dplyr::reframe( 134 | cp2d, 135 | openair::bootMeanDF(.data$y_hat, B = 100), 136 | .by = dplyr::all_of(c(var_x, var_y, "vname1", "vname2", "label")) 137 | ) 138 | 139 | # exclude too far 140 | id <- mgcv::exclude.too.far( 141 | d1 = input_data[[var_x]], 142 | d2 = input_data[[var_y]], 143 | g1 = plotdata[[var_x]], 144 | g2 = plotdata[[var_y]], 145 | dist = exclude_distance 146 | ) 147 | plotdata <- plotdata[!id, ] 148 | 149 | # if not plotting, just return the data 150 | if (!plot) { 151 | return(plotdata) 152 | } 153 | 154 | # if plotting confidence interval, need to reshape data and define a faceting 155 | # strategy 156 | if (show_conf_int) { 157 | plotdata <- 158 | plotdata |> 159 | tidyr::pivot_longer( 160 | cols = c("mean", "min", "max"), 161 | values_to = "mean", 162 | names_to = "stat" 163 | ) |> 164 | dplyr::mutate( 165 | stat = factor( 166 | .data$stat, 167 | levels = c("min", "mean", "max"), 168 | labels = c("Lower 95% CI", "Mean", "Upper 95% CI") 169 | ) 170 | ) 171 | facet <- ggplot2::facet_wrap( 172 | ggplot2::vars(.data$stat), 173 | nrow = 1L, 174 | axes = "all" 175 | ) 176 | } else { 177 | facet <- NULL 178 | } 179 | 180 | scale_x <- NULL 181 | scale_y <- NULL 182 | if (var_x == "wd") { 183 | scale_x <- wd_scale("x") 184 | } 185 | if (var_y == "wd") { 186 | scale_y <- wd_scale("y") 187 | } 188 | if (var_x == "hour") { 189 | scale_x <- hour_scale("x") 190 | } 191 | if (var_y == "hour") { 192 | scale_y <- hour_scale("y") 193 | } 194 | 195 | # make plot 196 | plot <- ggplot2::ggplot( 197 | plotdata, 198 | ggplot2::aes(x = .data[[var_x]], y = .data[[var_y]]) 199 | ) + 200 | ggplot2::labs( 201 | x = openair::quickText(var_x), 202 | y = openair::quickText(var_y), 203 | fill = openair::quickText(pollutant) 204 | ) + 205 | ggplot2::theme_bw() + 206 | ggplot2::theme( 207 | strip.background = ggplot2::element_blank(), 208 | strip.text.x.top = ggplot2::element_text(hjust = 0) 209 | ) + 210 | facet + 211 | scale_x + 212 | scale_y 213 | 214 | if (contour %in% c("none", "lines")) { 215 | plot <- 216 | plot + 217 | ggplot2::geom_tile(ggplot2::aes(fill = .data$mean)) + 218 | ggplot2::scale_fill_gradientn( 219 | colours = openair::openColours(cols) 220 | ) 221 | 222 | if (contour == "lines") { 223 | plot <- 224 | plot + 225 | ggplot2::geom_contour( 226 | mapping = ggplot2::aes(z = .data$mean), 227 | colour = "black", 228 | bins = contour_bins 229 | ) 230 | } 231 | } 232 | 233 | if (contour == "fill") { 234 | plot <- plot + 235 | ggplot2::geom_contour_filled( 236 | mapping = ggplot2::aes(z = .data$mean), 237 | colour = "black", 238 | bins = contour_bins 239 | ) + 240 | ggplot2::scale_fill_manual( 241 | values = openair::openColours(cols, n = contour_bins), 242 | aesthetics = "fill" 243 | ) 244 | } 245 | 246 | if (radial_wd && (var_x == "wd" || var_y == "wd")) { 247 | plot <- 248 | plot + 249 | ggplot2::coord_radial(inner.radius = 0.1) + 250 | ggplot2::theme( 251 | panel.border = ggplot2::element_blank(), 252 | axis.line.theta = ggplot2::element_line(linewidth = 0.25) 253 | ) + 254 | ggplot2::scale_x_continuous( 255 | limits = c(0, 360), 256 | oob = scales::oob_keep, 257 | breaks = seq(0, 270, 90), 258 | expand = ggplot2::expansion(), 259 | labels = c("N", "E", "S", "W") 260 | ) + 261 | ggplot2::scale_y_continuous(expand = ggplot2::expansion()) 262 | } else { 263 | plot <- 264 | plot + 265 | ggplot2::coord_cartesian(default = FALSE, expand = FALSE) 266 | } 267 | 268 | return(plot) 269 | } 270 | 271 | hour_scale <- function(which = c("x", "y")) { 272 | fun <- if (which == "x") { 273 | ggplot2::scale_x_continuous 274 | } else { 275 | ggplot2::scale_y_continuous 276 | } 277 | 278 | fun( 279 | breaks = seq(0, 24, 4), 280 | limits = c(0, 23), 281 | oob = scales::oob_keep 282 | ) 283 | } 284 | 285 | 286 | wd_scale <- function(which = c("x", "y")) { 287 | if (which == "x") { 288 | fun <- ggplot2::scale_x_continuous 289 | sep <- "\n" 290 | } else { 291 | fun <- ggplot2::scale_y_continuous 292 | sep <- " " 293 | } 294 | 295 | fun( 296 | breaks = seq(0, 360, 90), 297 | labels = c( 298 | paste("0", "(N)", sep = sep), 299 | paste("90", "(E)", sep = sep), 300 | paste("180", "(S)", sep = sep), 301 | paste("270", "(W)", sep = sep), 302 | paste("360", "(N)", sep = sep) 303 | ), 304 | limits = c(0, 360), 305 | oob = scales::oob_keep 306 | ) 307 | } 308 | -------------------------------------------------------------------------------- /R/plot_dw_partial_1d.R: -------------------------------------------------------------------------------- 1 | #' Create partial dependence plots for deweather models 2 | #' 3 | #' Generates partial dependence plots to visualize the relationship between 4 | #' predictor variables and model predictions. These plots show how the predicted 5 | #' pollutant concentration changes as a function of one variable while averaging 6 | #' over the effects of all other variables. 7 | #' 8 | #' @param dw A deweather model created with [build_dw_model()]. 9 | #' 10 | #' @param vars Character. The name of the variable(s) to plot. Must be one of 11 | #' the variables used in the model. If `NULL`, all variables will be plotted 12 | #' in order of importance. 13 | #' 14 | #' @param intervals The number of points for the partial dependence profile. 15 | #' 16 | #' @param group Optional grouping variable to show separate profiles for 17 | #' different levels of another predictor. Must be one of the variables used in 18 | #' the model. Default is `NULL` (no grouping). 19 | #' 20 | #' @param group_intervals The number of bins when the `group` variable is 21 | #' numeric. 22 | #' 23 | #' @param show_conf_int Should the bootstrapped 95% confidence interval be 24 | #' shown? In [plot_dw_partial_1d()] these are shown using transparent ribbons 25 | #' (for numeric variables) and rectangles (for categorical variables). 26 | #' 27 | #' @param n The number of observations to use for calculating the partial 28 | #' dependence profile. If `NULL` (default), uses `prop` to determine the 29 | #' sample size. 30 | #' 31 | #' @param prop The proportion of input data to use for calculating the partial 32 | #' dependence profile, between 0 and 1. Default is `0.1` (10% of data). 33 | #' Ignored if `n` is specified. 34 | #' 35 | #' @param cols Colours to use for plotting. See [openair::openColours()]. 36 | #' 37 | #' @param radial_wd Should the `"wd"` (wind direction) variable be plotted on a 38 | #' radial axis? This can enhance interpretability, but makes it inconsistent 39 | #' with other variables which are plotted on cartesian coordinates. Defaults 40 | #' to `TRUE`. 41 | #' 42 | #' @param ncol,nrow When more than one `vars` is defined, `ncol` and `nrow` 43 | #' define the dimensions of the grid to create. Setting both to be `NULL` 44 | #' creates a roughly square grid. 45 | #' 46 | #' @param plot When `FALSE`, return a list of plot data instead of a plot. 47 | #' 48 | #' @param progress Show a progress bar? Defaults to `TRUE` in interactive 49 | #' sessions. 50 | #' 51 | #' @return A `ggplot2` object showing the partial dependence plot. If multiple 52 | #' `vars` are specified, a `patchwork` assembly of plots will be returned. If 53 | #' `plot = FALSE`, a named list of plot data will be returned instead. 54 | #' 55 | #' @export 56 | plot_dw_partial_1d <- function( 57 | dw, 58 | vars = NULL, 59 | intervals = 40L, 60 | group = NULL, 61 | group_intervals = 3L, 62 | show_conf_int = TRUE, 63 | n = NULL, 64 | prop = 0.1, 65 | cols = "tol", 66 | radial_wd = TRUE, 67 | ncol = NULL, 68 | nrow = NULL, 69 | plot = TRUE, 70 | progress = rlang::is_interactive() 71 | ) { 72 | check_deweather(dw) 73 | 74 | model <- get_dw_model(dw) 75 | input_data <- get_dw_input_data(dw) 76 | pollutant <- get_dw_pollutant(dw) 77 | importance <- get_dw_importance(dw, aggregate_factors = TRUE) 78 | 79 | # check inputs against model variables 80 | model_vars <- get_dw_vars(dw) 81 | vars <- vars %||% rev(levels(importance$var)) 82 | vars <- rlang::arg_match(vars, model_vars, multiple = TRUE) 83 | if (!is.null(group)) { 84 | rlang::arg_match(group, model_vars, multiple = FALSE) 85 | } 86 | 87 | # ensure `group` is not trend 88 | if (!is.null(group)) { 89 | if (group == "trend") { 90 | cli::cli_abort("{.arg group} cannot be 'trend'.") 91 | } 92 | } 93 | # check other inputs 94 | if (!is.null(prop) && (prop > 1 || prop < 0)) { 95 | cli::cli_abort("{.arg prop} must be between `0` and `1`.") 96 | } 97 | 98 | # create DALEX explainer 99 | explainer <- 100 | DALEXtra::explain_tidymodels( 101 | model = model, 102 | data = dplyr::select(input_data, -dplyr::all_of(pollutant)), 103 | y = input_data[[pollutant]], 104 | verbose = FALSE 105 | ) 106 | 107 | # get breaks for a numeric group 108 | if (!is.null(group) && is.numeric(input_data[[group]])) { 109 | group_breaks <- stats::quantile( 110 | input_data[[group]], 111 | probs = seq(0, 1, length.out = group_intervals + 1L), 112 | na.rm = TRUE 113 | ) 114 | } 115 | 116 | var_types <- purrr::map_vec(vars, \(x) { 117 | dw$vars$types[which(dw$vars$names == x)] 118 | }) 119 | var_types <- ifelse( 120 | var_types %in% c("numeric", "integer"), 121 | "numerical", 122 | "categorical" 123 | ) 124 | 125 | pd_data <- list() 126 | 127 | if (any(var_types == "categorical")) { 128 | profile <- DALEX::model_profile( 129 | explainer = explainer, 130 | variables = vars[var_types == "categorical"], 131 | groups = group, 132 | N = n %||% round(nrow(input_data) * prop), 133 | grid_points = intervals, 134 | variable_type = "categorical" 135 | ) 136 | cp <- dplyr::tibble(profile$cp_profiles) 137 | pd_data <- append(pd_data, list(cp)) 138 | } 139 | 140 | if (any(var_types == "numerical")) { 141 | profile <- DALEX::model_profile( 142 | explainer = explainer, 143 | variables = vars[var_types == "numerical"], 144 | groups = group, 145 | N = n %||% round(nrow(input_data) * prop), 146 | grid_points = intervals, 147 | variable_type = "numerical" 148 | ) 149 | cp <- dplyr::tibble(profile$cp_profiles) 150 | pd_data <- append(pd_data, list(cp)) 151 | } 152 | 153 | pd_data <- dplyr::bind_rows(pd_data) 154 | 155 | if (is.null(group)) { 156 | group <- "(all)" 157 | pd_data$group_var <- "(all)" 158 | } else { 159 | if (is.numeric(pd_data[[group]])) { 160 | pd_data$group_var <- ggplot2::cut_number( 161 | pd_data[[group]], 162 | n = group_intervals 163 | ) 164 | } else { 165 | pd_data$group_var <- pd_data[[group]] 166 | } 167 | } 168 | 169 | # summarise for plot 170 | plotdata <- 171 | split(pd_data, pd_data$`_vname_`) |> 172 | purrr::imap( 173 | \(df, i) { 174 | dplyr::summarise( 175 | df, 176 | openair::bootMeanDF(.data$`_yhat_`, B = 100), 177 | .by = c(i, "_vname_", "group_var") 178 | ) |> 179 | dplyr::select(-"n") 180 | } 181 | ) 182 | 183 | # function that plots one variable 184 | plot_single_pd <- function(df) { 185 | var <- df$`_vname_`[1] 186 | 187 | # find colours 188 | colours <- openair::openColours( 189 | scheme = cols, 190 | n = dplyr::n_distinct(df$group_var) 191 | ) 192 | 193 | # if the variable is "trend", convert it to a datetime and drop the far 194 | # reaches of it 195 | if (var == "trend") { 196 | df <- 197 | df |> 198 | dplyr::tibble() |> 199 | dplyr::filter(!.data$trend %in% range(.data$trend)) |> 200 | dplyr::mutate( 201 | trend = as.POSIXct(.data$trend, tz = dw$tz) 202 | ) 203 | } 204 | 205 | # create plot 206 | plot <- 207 | df |> 208 | ggplot2::ggplot( 209 | ggplot2::aes( 210 | x = .data[[var]], 211 | y = .data$mean, 212 | ymax = .data$max, 213 | ymin = .data$min 214 | ) 215 | ) 216 | 217 | # geometries - different if a variable is numeric vs categorical 218 | if (!is.numeric(df[[var]]) && !lubridate::is.POSIXct(df[[var]])) { 219 | if (show_conf_int) { 220 | plot <- 221 | plot + 222 | ggplot2::geom_crossbar( 223 | ggplot2::aes(fill = .data$group_var), 224 | alpha = 0.3, 225 | color = NA, 226 | key_glyph = ggplot2::draw_key_polygon 227 | ) 228 | } 229 | plot <- 230 | plot + 231 | ggplot2::geom_crossbar( 232 | ggplot2::aes( 233 | ymin = .data$mean, 234 | ymax = .data$mean, 235 | color = .data$group_var 236 | ), 237 | key_glyph = ggplot2::draw_key_path 238 | ) 239 | } else { 240 | if (show_conf_int) { 241 | plot <- 242 | plot + 243 | ggplot2::geom_ribbon( 244 | ggplot2::aes( 245 | fill = factor(.data$group_var) 246 | ), 247 | alpha = 0.3 248 | ) 249 | } 250 | plot <- 251 | plot + 252 | ggplot2::geom_line( 253 | ggplot2::aes( 254 | color = factor(.data$group_var) 255 | ) 256 | ) 257 | } 258 | 259 | # add themes 260 | plot <- 261 | plot + 262 | ggplot2::theme_bw() + 263 | ggplot2::theme( 264 | plot.title = ggplot2::element_text(face = "bold") 265 | ) + 266 | ggplot2::scale_color_manual( 267 | values = colours, 268 | aesthetics = c("fill", "color"), 269 | name = openair::quickText(group) 270 | ) + 271 | ggplot2::labs( 272 | y = openair::quickText(pollutant), 273 | x = openair::quickText(var) 274 | ) 275 | 276 | # make wind direction radial 277 | if (var == "wd") { 278 | if (radial_wd) { 279 | plot <- 280 | plot + 281 | ggplot2::scale_x_continuous( 282 | breaks = seq(0, 270, 90), 283 | labels = c( 284 | "N", 285 | "E", 286 | "S", 287 | "W" 288 | ), 289 | limits = c(0, 360), 290 | expand = ggplot2::expansion() 291 | ) + 292 | ggplot2::coord_radial(r.axis.inside = 315) + 293 | ggplot2::theme( 294 | panel.border = ggplot2::element_blank(), 295 | axis.line.theta = ggplot2::element_line(linewidth = 0.25) 296 | ) 297 | 298 | plot <- patchwork::free(plot) 299 | } else { 300 | plot <- 301 | plot + 302 | ggplot2::scale_x_continuous( 303 | breaks = seq(0, 360, 90), 304 | labels = c( 305 | "0\n(N)", 306 | "90\n(E)", 307 | "180\n(S)", 308 | "270\n(W)", 309 | "360\n(N)" 310 | ), 311 | limits = c(0, 360) 312 | ) 313 | } 314 | } 315 | 316 | if (var == "hour") { 317 | plot <- 318 | plot + 319 | ggplot2::scale_x_continuous( 320 | breaks = seq(0, 24, 4), 321 | limits = c(0, 23) 322 | ) 323 | } 324 | 325 | if (length(colours) == 1L) { 326 | plot <- plot + 327 | ggplot2::guides( 328 | color = ggplot2::guide_none(), 329 | fill = ggplot2::guide_none() 330 | ) 331 | } 332 | 333 | # add title 334 | gain <- scales::label_percent(0.1)(importance$importance[ 335 | importance$var == var 336 | ]) 337 | plot <- plot + 338 | ggplot2::labs( 339 | title = paste0(var, " (", gain, ")") 340 | ) 341 | 342 | return(plot) 343 | } 344 | 345 | # ensure plot data is in order of variables 346 | plotdata <- plotdata[vars] 347 | 348 | if (!plot) { 349 | return(plotdata) 350 | } 351 | 352 | plots <- 353 | purrr::map( 354 | plotdata, 355 | plot_single_pd, 356 | .progress = progress 357 | ) |> 358 | stats::setNames(vars) 359 | 360 | if (length(plots) > 1) { 361 | for (i in 1:length(plots)) { 362 | if (i != 1) { 363 | plots[[i]] <- plots[[i]] + ggplot2::theme(legend.position = "none") 364 | } 365 | } 366 | 367 | plots <- 368 | patchwork::wrap_plots(plots) + 369 | patchwork::plot_layout( 370 | widths = 1, 371 | heights = 1, 372 | guides = "collect", 373 | ncol = ncol, 374 | nrow = nrow 375 | ) 376 | } else { 377 | plots <- plots[[1]] 378 | } 379 | 380 | return(plots) 381 | } 382 | -------------------------------------------------------------------------------- /vignettes/articles/deweather.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Meteorological Normalisation with {deweather}" 3 | description: > 4 | Get started with deweather 5 | format: html 6 | knitr: 7 | opts_chunk: 8 | collapse: true 9 | comment: "#>" 10 | --- 11 | 12 | ```{r} 13 | #| include: false 14 | knitr::opts_chunk$set( 15 | collapse = TRUE, 16 | comment = "#>" 17 | ) 18 | ``` 19 | 20 | ## Introduction 21 | 22 | Meteorology plays a central role in affecting the concentrations of pollutants in the atmosphere. When considering trends in air pollutants it can be very difficult to know whether a change in concentration is due to emissions or meteorology. 23 | 24 | The **deweather** package uses a powerful statistical technique based on *boosted regression trees* using a variety of packages using the `{tidymodels}` framework. This allows for a variety of engines to be employed, with the default being `{xgboost}`. Statistical models are developed to explain concentrations using meteorological and other variables. These models can be tested on randomly withheld data with the aim of developing the most appropriate model. 25 | 26 | Much of **deweather** supports parallel processing with the `{mirai}` package, so along with loading the library we'll set some daemons as well as a seed. 27 | 28 | ```{r} 29 | #| label: setup 30 | #| include: false 31 | library(deweather) 32 | mirai::daemons(5) 33 | set.seed(321) 34 | ``` 35 | 36 | ## Example data set 37 | 38 | The **deweather** package comes with a comprehensive data set of air quality and meteorological data. The air quality data is from Marylebone Road in central London (obtained from the `{openair}` package) and the meteorological data from Heathrow Airport (obtained from the `{worldmet}` package). The `aqroadside` data frame contains various pollutants such a NO~x~, NO~2~, ethane and isoprene as well as meteorological data including wind speed, wind direction, relative humidity, ambient temperature and cloud cover. 39 | 40 | ```{r} 41 | #| label: showData 42 | #| eval: TRUE 43 | head(aqroadside) 44 | ``` 45 | 46 | ## Prepare for Model Building 47 | 48 | A straightforward way to get started is to use the `append_dw_vars()` function to attach a load of useful model features to your air quality `data.frame`. Variables such as the hour of the day (`"hour"`) and day of the week (`"weekday"`) are used as features to explain some of the variation. `"hour"` for example very usefully acts as a proxy for the diurnal variation in emissions. These temporal emission proxies are also important to include to help the model differentiate between emission versus weather-related changes. For example, emissions tend to change throughout a day and so do variables such as wind speed and ambient temperature (here already present in the data as `"ws"` and `"air_temp"` respectively). 49 | 50 | Note that, strictly, `append_dw_vars()` does not need to be run directly; if `build_dw_model()` or `tune_dw_model()` are passed any of `"hour"`, `"weekday"`, `"trend"`, `"yday"`, `"week"`, or `"month"` to `vars`, the parent function will use `append_dw_vars()` itself in the background if those columns do not exist in the input dataframe. 51 | 52 | ```{r} 53 | #| label: model_data 54 | append_dw_vars(aqroadside) |> 55 | dplyr::glimpse() 56 | ``` 57 | 58 | ## Build a Model 59 | 60 | ### Tuning a Model 61 | 62 | There are many input parameters to `build_dw_model()` and, while sensible defaults have been set, you may be interested in seeing how tweaking them can influence how well the model behaves. This is called "tuning", and the `tune_dw_model()` function provides an interface for this. Any of the boosted decision tree parameters (e.g., `tree_depth` and `trees`) can be set as a range which is then combined with `grid_levels` to create a regular grid of all parameter combinations. 63 | 64 | Note that this process can take a while, especially if many parameters are being tuned. One way to speed this up is to invoke parallel processing through the `mirai` package - see for more information. In this example, we'll also significantly trim down the data to speed things up. 65 | 66 | ```{r} 67 | #| label: tune 68 | tuned_results <- 69 | aqroadside |> 70 | # want to sample by weekday to get good spread of categorical data 71 | append_dw_vars("weekday") |> 72 | dplyr::slice_sample(n = 250, by = weekday) |> 73 | # tune model 74 | tune_dw_model( 75 | pollutant = "no2", 76 | tree_depth = c(1, 5), 77 | trees = c(150, 250), 78 | grid_levels = 3L 79 | ) 80 | ``` 81 | 82 | This output has a few useful features. First, we're informed that the best value for `trees` is `r tuned_results$best_params$trees` and for `tree_depth` is `r tuned_results$best_params$tree_depth`. 83 | 84 | ```{r} 85 | #| label: bestParams 86 | tuned_results$best_params 87 | ``` 88 | 89 | We can also see how the tuned model behaved on some reserved testing data. A table of predictions is returned, as well as a table of statistics and a scatter plot of modelled vs measured values. 90 | 91 | ```{r} 92 | #| label: metrics 93 | dplyr::glimpse(tuned_results$final_fit$metrics) 94 | ``` 95 | 96 | ```{r} 97 | #| label: finalplot 98 | tuned_results$final_fit$plot 99 | ``` 100 | 101 | ### Finalising a Model 102 | 103 | Assuming that a good model can be developed, it can now be explored in more detail. `deweather` has useful defaults for many of the model parameters, but these can be adjusted by the user if required. 104 | 105 | ```{r} 106 | #| label: buildMod 107 | no2_model <- 108 | build_dw_model( 109 | data = aqroadside, 110 | pollutant = "no2", 111 | vars = c("trend", "ws", "wd", "hour", "weekday", "air_temp"), 112 | engine = "xgboost" 113 | ) 114 | ``` 115 | 116 | This function returns a "deweathering model" object. We can see a quick summary of it by simply printing it. 117 | 118 | ```{r} 119 | #| label: printnox 120 | no2_model 121 | ``` 122 | 123 | We can also pull out specific features of the model using `get_dw_*()` functions. `{deweather}` has many of such 'getter' functions, and they form a consistent, useful API for accessing relevant features of different objects created in the package. 124 | 125 | ```{r} 126 | #| label: get 127 | get_dw_pollutant(no2_model) 128 | get_dw_vars(no2_model) 129 | get_dw_input_data(no2_model) 130 | ``` 131 | 132 | One feature we immediately have access to is the "feature importance" score. This can be obtained as a `data.frame` using a similar approach to the above. This varies depending on the chosen `engine`, but for boosted trees this represents 'Gain' - the fractional contribution of each feature to the model based on the total gain of this feature's splits. A higher percentage means a more important predictive feature. 133 | 134 | ```{r} 135 | #| label: getimport 136 | get_dw_importance(no2_model) 137 | ``` 138 | 139 | The Gain can be automatically plotted as a bar chart using the `plot_dw_importance()` function. The wind direction is the most predictive feature, and the day of the week being any week day is the least predictive feature. 140 | 141 | ```{r} 142 | #| label: fig-plotimport 143 | #| fig-cap: The feature importance of our model. 144 | plot_dw_importance(no2_model) 145 | ``` 146 | 147 | You'll notice that our 'character' variable, the day of the week, has been split out into multiple levels. This is because `{xgboost}` requires numeric variables, so behind the scenes our 'day of the week' variable is split out into a matrix of seven variables. This is informative - for example, we can see here that the day of the week being a weekend is a useful feature. Regardless, if you would like to see factor features as single features, the `aggregate_factors` argument may be of use. 148 | 149 | ```{r} 150 | #| label: fig-plotimportagg 151 | #| fig-cap: The feature importance of our model, with factors aggregated. 152 | plot_dw_importance(no2_model, aggregate_factors = TRUE) 153 | ``` 154 | 155 | ## Examine the partial dependencies 156 | 157 | ### Basic PD Plots 158 | 159 | One of the benefits of the boosted regression tree approach is that the *partial dependencies* can be explored. In simple terms, the partial dependencies show the relationship between the pollutant of interest and the covariates used in the model while holding the value of other covariates at their mean level. 160 | 161 | Lets plot a partial dependency for the `hour` variable. We'll set `n` to `100` for speed; this will sample 100 random samples of our original data to construct the plot. We can see that `no2` is highest during the day and lowest overnight, everything else kept equal. 162 | 163 | ```{r} 164 | #| label: pd1d 165 | plot_dw_partial_1d(no2_model, "hour", n = 100) 166 | ``` 167 | 168 | A categorical variable like `weekday` looks slightly different, but achieves a similar result. `no2` is lowest on weekends, all else being equal. 169 | 170 | ```{r} 171 | #| label: pd1d2 172 | plot_dw_partial_1d(no2_model, "weekday", n = 100) 173 | ``` 174 | 175 | If a variable isn't given, all variables will be plotted in a `patchwork` assembly in order of importance. 176 | 177 | ```{r} 178 | #| label: pd1d3 179 | plot_dw_partial_1d(no2_model, n = 100) 180 | ``` 181 | 182 | ### Grouped PD 183 | 184 | Sometimes, when examining partial dependences, it is useful to consider *grouped* partial dependences. This means, before PDs are calculated, we split the data into groups and calculate PDs for each subset of the data. Let's give this a go now. 185 | 186 | ```{r} 187 | #| label: calcPDWkday 188 | plot_dw_partial_1d(no2_model, "hour", group = "weekday", n = 1000) 189 | ``` 190 | 191 | Note that `group` need not be categorical; you may provide a continuous variable which will be binned into `group_intervals`. Here we group by `"air_temp"` and split it into 3 equally sized bins. 192 | 193 | ```{r} 194 | #| label: calcPDWkday2 195 | plot_dw_partial_1d( 196 | no2_model, 197 | "hour", 198 | group = "air_temp", 199 | group_intervals = 3L, 200 | n = 1000 201 | ) 202 | ``` 203 | 204 | ### Two-Way Interactions 205 | 206 | Above we needed to treat one of our continuous features as a factor feature. 207 | 208 | It can be very useful to plot important two-way interactions. In this example the interaction between `"ws"` and `"air_temp"` is considered. The plot shows that `no2` tends to be high when the wind speed is low and the temperature is low, i.e., stable atmospheric conditions. Also `no2` tends to be high when the temperature is high, which is most likely due to more O~3~ available to convert NO to NO~2~. In fact, background O~3~ would probably be a useful covariate to add to the model. 209 | 210 | ```{r} 211 | #| label: pd2d1 212 | plot_dw_partial_2d(no2_model, "ws", "air_temp", n = 200) 213 | ``` 214 | 215 | It can be easier to see some of these relationships using a binned scale. `contour = "lines"` will overlay the continuous colour surface with contour lines. `contour = "fill"` will bin the entire colour scale. The number of bins is controlled using the `contour_bins` argument. 216 | 217 | ```{r} 218 | #| label: pd2d2 219 | plot_dw_partial_2d( 220 | no2_model, 221 | "ws", 222 | "air_temp", 223 | n = 200, 224 | contour = "fill", 225 | contour_bins = 10, 226 | cols = "turbo" 227 | ) 228 | ``` 229 | 230 | ## Customisation 231 | 232 | These plots can be customised like any other `{ggplot2}` object. 233 | 234 | ```{r} 235 | #| label: fig-ggplot 236 | #| fig-cap: A thoroughly customised partial dependence plot. 237 | library(ggplot2) 238 | 239 | plot_dw_partial_1d(no2_model, "hour",group = "weekday") + 240 | theme_light(14) + 241 | theme( 242 | title = element_text(face = "bold"), 243 | plot.subtitle = element_text(face = "plain", size = 11), 244 | legend.position = "none" 245 | ) + 246 | scale_fill_manual( 247 | values = c( 248 | "orange", 249 | "cadetblue", 250 | "lightblue4", 251 | "lightblue3", 252 | "lightblue2", 253 | "lightblue1", 254 | "orange3" 255 | ), 256 | aesthetics = c("fill", "color") 257 | ) + 258 | labs( 259 | x = "Hour of the Day", 260 | y = "NOx (ug/m3)", 261 | fill = "Weekday", 262 | color = "Weekday", 263 | subtitle = "Weekends (orange) have lower NOx emissions than weekdays (blue)." 264 | ) + 265 | scale_y_continuous(limits = c(0, NA)) + 266 | scale_x_continuous(breaks = seq(0, 24, 4), expand = expansion()) 267 | ``` 268 | 269 | ## Apply meteorological averaging 270 | 271 | An *indication* of the meteorologically-averaged trend is given by the `plot_dw_partial_1d()` function above. 272 | 273 | ```{r} 274 | #| label: "pd-trend" 275 | #| fig.width: 7 276 | #| fig.height: 3.5 277 | plot_dw_partial_1d(no2_model, "trend", n = 100, intervals = 100) 278 | ``` 279 | 280 | A much better indication is given by using the model to predict many times with random sampling of **meteorological** conditions. This sampling is carried out by the `simulate_met()` function. 281 | 282 | Note that you'd typically want `n` to be a higher value than what's used in this example; it has been set to `50` for speed. Recall also that some `mirai::daemons()` are set to allow for parallelism. 283 | 284 | ```{r} 285 | #| label: "metSim" 286 | #| eval: TRUE 287 | demet <- simulate_dw_met( 288 | no2_model, 289 | vars = c("ws", "wd", "air_temp"), 290 | n = 50 291 | ) 292 | ``` 293 | 294 | Now it is possible to plot the resulting trend. 295 | 296 | ```{r} 297 | #| label: "plotTrend" 298 | #| eval: TRUE 299 | #| fig.width: 7 300 | #| fig.height: 3.5 301 | #| fig.cap: "A deweathered nitrogen dioxide trend." 302 | #| fig.alt: "A line chart with date on the x-axis and deweathered NO2 on 303 | #| the y-axis. The trend is very noisy, but shows an increase in 304 | #| concentrations in 2003." 305 | library(ggplot2) 306 | 307 | ggplot(demet, aes(x = date, y = no2)) + 308 | geom_line(linewidth = 0.01) + 309 | theme_bw() + 310 | labs( 311 | y = openair::quickText("NO2"), 312 | x = NULL 313 | ) 314 | ``` 315 | 316 | The plot shows the trend in NO~2~ controlling for the main weather variables. The plot now reveals the strong diurnal and weekly cycle in NO~2~ that is driven by variations in the sources of NO~2~ (NO~x~) rather than meteorology, i.e., road traffic which has strong hourly and daily variations throughout the year. It can be useful to simply average the results to provide a better indication of the overall trend. For example: 317 | 318 | ```{r} 319 | #| label: "plotTrendAve" 320 | #| eval: TRUE 321 | #| fig.width: 7 322 | #| fig.height: 3.5 323 | #| fig.cap: "A time-averaged deweathered nitrogen dioxide trend." 324 | #| fig.alt: "A line chart with date on the x-axis and deweathered NO2 on 325 | #| the y-axis. The trend has been time averaged to show monthly mean 326 | #| concentrations, clearly illustrating a sharp increase in 2003." 327 | demet |> 328 | openair::timeAverage("month") |> 329 | ggplot(aes(x = date, y = no2)) + 330 | geom_line() + 331 | theme_bw() + 332 | labs( 333 | y = openair::quickText("NO2"), 334 | x = NULL 335 | ) 336 | ``` 337 | 338 | With some simple data manipulation, we can compare the meteorological simulation, the partial dependence profile, and the input data. It is clear that the input data has much more noise. The PD profile are simulated data are similar, but the simulated data are higher overall. This highlights that purely relying on the PD profiles (which isolates the trend from even temporal variables like day of the week) may underestimate pollutant concentrations. 339 | 340 | ```{r} 341 | # PD - set intervals to something high for a good profile 342 | pd_trend <- plot_dw_partial_1d( 343 | no2_model, 344 | "trend", 345 | plot = FALSE, 346 | n = 50, 347 | intervals = 200 348 | ) |> 349 | purrr::pluck("trend") |> 350 | dplyr::transmute( 351 | date = as.POSIXct(trend), 352 | no2 = mean 353 | ) 354 | 355 | # input data 356 | input_data <- get_dw_input_data(no2_model) |> 357 | dplyr::transmute( 358 | date = as.POSIXct(trend), 359 | no2 = no2 360 | ) 361 | 362 | # combine with demet and plot 363 | dplyr::bind_rows( 364 | "Simulated" = demet, 365 | "Input" = input_data, 366 | "Partial Dep" = pd_trend, 367 | .id = "source" 368 | ) |> 369 | openair::timeAverage("month", type = "source") |> 370 | ggplot(aes(x = date, y = no2, color = source)) + 371 | geom_line() + 372 | theme_bw() + 373 | labs( 374 | y = openair::quickText("NO2"), 375 | x = NULL 376 | ) + 377 | scale_color_manual( 378 | values = c( 379 | "Input" = "grey70", 380 | "Partial Dep" = "orange", 381 | "Simulated" = "royalblue" 382 | ) 383 | ) 384 | ``` 385 | 386 | Note that the function `predict_dw()` is also provided for generalised use of a deweather model for prediction. 387 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | GNU General Public License 2 | ========================== 3 | 4 | _Version 2, June 1991_ 5 | _Copyright © 1989, 1991 Free Software Foundation, Inc.,_ 6 | _51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA_ 7 | 8 | Everyone is permitted to copy and distribute verbatim copies 9 | of this license document, but changing it is not allowed. 10 | 11 | ### Preamble 12 | 13 | The licenses for most software are designed to take away your 14 | freedom to share and change it. By contrast, the GNU General Public 15 | License is intended to guarantee your freedom to share and change free 16 | software--to make sure the software is free for all its users. This 17 | General Public License applies to most of the Free Software 18 | Foundation's software and to any other program whose authors commit to 19 | using it. (Some other Free Software Foundation software is covered by 20 | the GNU Lesser General Public License instead.) You can apply it to 21 | your programs, too. 22 | 23 | When we speak of free software, we are referring to freedom, not 24 | price. Our General Public Licenses are designed to make sure that you 25 | have the freedom to distribute copies of free software (and charge for 26 | this service if you wish), that you receive source code or can get it 27 | if you want it, that you can change the software or use pieces of it 28 | in new free programs; and that you know you can do these things. 29 | 30 | To protect your rights, we need to make restrictions that forbid 31 | anyone to deny you these rights or to ask you to surrender the rights. 32 | These restrictions translate to certain responsibilities for you if you 33 | distribute copies of the software, or if you modify it. 34 | 35 | For example, if you distribute copies of such a program, whether 36 | gratis or for a fee, you must give the recipients all the rights that 37 | you have. You must make sure that they, too, receive or can get the 38 | source code. And you must show them these terms so they know their 39 | rights. 40 | 41 | We protect your rights with two steps: **(1)** copyright the software, and 42 | **(2)** offer you this license which gives you legal permission to copy, 43 | distribute and/or modify the software. 44 | 45 | Also, for each author's protection and ours, we want to make certain 46 | that everyone understands that there is no warranty for this free 47 | software. If the software is modified by someone else and passed on, we 48 | want its recipients to know that what they have is not the original, so 49 | that any problems introduced by others will not reflect on the original 50 | authors' reputations. 51 | 52 | Finally, any free program is threatened constantly by software 53 | patents. We wish to avoid the danger that redistributors of a free 54 | program will individually obtain patent licenses, in effect making the 55 | program proprietary. To prevent this, we have made it clear that any 56 | patent must be licensed for everyone's free use or not licensed at all. 57 | 58 | The precise terms and conditions for copying, distribution and 59 | modification follow. 60 | 61 | ### TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 62 | 63 | **0.** This License applies to any program or other work which contains 64 | a notice placed by the copyright holder saying it may be distributed 65 | under the terms of this General Public License. The “Program”, below, 66 | refers to any such program or work, and a “work based on the Program” 67 | means either the Program or any derivative work under copyright law: 68 | that is to say, a work containing the Program or a portion of it, 69 | either verbatim or with modifications and/or translated into another 70 | language. (Hereinafter, translation is included without limitation in 71 | the term “modification”.) Each licensee is addressed as “you”. 72 | 73 | Activities other than copying, distribution and modification are not 74 | covered by this License; they are outside its scope. The act of 75 | running the Program is not restricted, and the output from the Program 76 | is covered only if its contents constitute a work based on the 77 | Program (independent of having been made by running the Program). 78 | Whether that is true depends on what the Program does. 79 | 80 | **1.** You may copy and distribute verbatim copies of the Program's 81 | source code as you receive it, in any medium, provided that you 82 | conspicuously and appropriately publish on each copy an appropriate 83 | copyright notice and disclaimer of warranty; keep intact all the 84 | notices that refer to this License and to the absence of any warranty; 85 | and give any other recipients of the Program a copy of this License 86 | along with the Program. 87 | 88 | You may charge a fee for the physical act of transferring a copy, and 89 | you may at your option offer warranty protection in exchange for a fee. 90 | 91 | **2.** You may modify your copy or copies of the Program or any portion 92 | of it, thus forming a work based on the Program, and copy and 93 | distribute such modifications or work under the terms of Section 1 94 | above, provided that you also meet all of these conditions: 95 | 96 | * **a)** You must cause the modified files to carry prominent notices 97 | stating that you changed the files and the date of any change. 98 | * **b)** You must cause any work that you distribute or publish, that in 99 | whole or in part contains or is derived from the Program or any 100 | part thereof, to be licensed as a whole at no charge to all third 101 | parties under the terms of this License. 102 | * **c)** If the modified program normally reads commands interactively 103 | when run, you must cause it, when started running for such 104 | interactive use in the most ordinary way, to print or display an 105 | announcement including an appropriate copyright notice and a 106 | notice that there is no warranty (or else, saying that you provide 107 | a warranty) and that users may redistribute the program under 108 | these conditions, and telling the user how to view a copy of this 109 | License. (Exception: if the Program itself is interactive but 110 | does not normally print such an announcement, your work based on 111 | the Program is not required to print an announcement.) 112 | 113 | These requirements apply to the modified work as a whole. If 114 | identifiable sections of that work are not derived from the Program, 115 | and can be reasonably considered independent and separate works in 116 | themselves, then this License, and its terms, do not apply to those 117 | sections when you distribute them as separate works. But when you 118 | distribute the same sections as part of a whole which is a work based 119 | on the Program, the distribution of the whole must be on the terms of 120 | this License, whose permissions for other licensees extend to the 121 | entire whole, and thus to each and every part regardless of who wrote it. 122 | 123 | Thus, it is not the intent of this section to claim rights or contest 124 | your rights to work written entirely by you; rather, the intent is to 125 | exercise the right to control the distribution of derivative or 126 | collective works based on the Program. 127 | 128 | In addition, mere aggregation of another work not based on the Program 129 | with the Program (or with a work based on the Program) on a volume of 130 | a storage or distribution medium does not bring the other work under 131 | the scope of this License. 132 | 133 | **3.** You may copy and distribute the Program (or a work based on it, 134 | under Section 2) in object code or executable form under the terms of 135 | Sections 1 and 2 above provided that you also do one of the following: 136 | 137 | * **a)** Accompany it with the complete corresponding machine-readable 138 | source code, which must be distributed under the terms of Sections 139 | 1 and 2 above on a medium customarily used for software interchange; or, 140 | * **b)** Accompany it with a written offer, valid for at least three 141 | years, to give any third party, for a charge no more than your 142 | cost of physically performing source distribution, a complete 143 | machine-readable copy of the corresponding source code, to be 144 | distributed under the terms of Sections 1 and 2 above on a medium 145 | customarily used for software interchange; or, 146 | * **c)** Accompany it with the information you received as to the offer 147 | to distribute corresponding source code. (This alternative is 148 | allowed only for noncommercial distribution and only if you 149 | received the program in object code or executable form with such 150 | an offer, in accord with Subsection b above.) 151 | 152 | The source code for a work means the preferred form of the work for 153 | making modifications to it. For an executable work, complete source 154 | code means all the source code for all modules it contains, plus any 155 | associated interface definition files, plus the scripts used to 156 | control compilation and installation of the executable. However, as a 157 | special exception, the source code distributed need not include 158 | anything that is normally distributed (in either source or binary 159 | form) with the major components (compiler, kernel, and so on) of the 160 | operating system on which the executable runs, unless that component 161 | itself accompanies the executable. 162 | 163 | If distribution of executable or object code is made by offering 164 | access to copy from a designated place, then offering equivalent 165 | access to copy the source code from the same place counts as 166 | distribution of the source code, even though third parties are not 167 | compelled to copy the source along with the object code. 168 | 169 | **4.** You may not copy, modify, sublicense, or distribute the Program 170 | except as expressly provided under this License. Any attempt 171 | otherwise to copy, modify, sublicense or distribute the Program is 172 | void, and will automatically terminate your rights under this License. 173 | However, parties who have received copies, or rights, from you under 174 | this License will not have their licenses terminated so long as such 175 | parties remain in full compliance. 176 | 177 | **5.** You are not required to accept this License, since you have not 178 | signed it. However, nothing else grants you permission to modify or 179 | distribute the Program or its derivative works. These actions are 180 | prohibited by law if you do not accept this License. Therefore, by 181 | modifying or distributing the Program (or any work based on the 182 | Program), you indicate your acceptance of this License to do so, and 183 | all its terms and conditions for copying, distributing or modifying 184 | the Program or works based on it. 185 | 186 | **6.** Each time you redistribute the Program (or any work based on the 187 | Program), the recipient automatically receives a license from the 188 | original licensor to copy, distribute or modify the Program subject to 189 | these terms and conditions. You may not impose any further 190 | restrictions on the recipients' exercise of the rights granted herein. 191 | You are not responsible for enforcing compliance by third parties to 192 | this License. 193 | 194 | **7.** If, as a consequence of a court judgment or allegation of patent 195 | infringement or for any other reason (not limited to patent issues), 196 | conditions are imposed on you (whether by court order, agreement or 197 | otherwise) that contradict the conditions of this License, they do not 198 | excuse you from the conditions of this License. If you cannot 199 | distribute so as to satisfy simultaneously your obligations under this 200 | License and any other pertinent obligations, then as a consequence you 201 | may not distribute the Program at all. For example, if a patent 202 | license would not permit royalty-free redistribution of the Program by 203 | all those who receive copies directly or indirectly through you, then 204 | the only way you could satisfy both it and this License would be to 205 | refrain entirely from distribution of the Program. 206 | 207 | If any portion of this section is held invalid or unenforceable under 208 | any particular circumstance, the balance of the section is intended to 209 | apply and the section as a whole is intended to apply in other 210 | circumstances. 211 | 212 | It is not the purpose of this section to induce you to infringe any 213 | patents or other property right claims or to contest validity of any 214 | such claims; this section has the sole purpose of protecting the 215 | integrity of the free software distribution system, which is 216 | implemented by public license practices. Many people have made 217 | generous contributions to the wide range of software distributed 218 | through that system in reliance on consistent application of that 219 | system; it is up to the author/donor to decide if he or she is willing 220 | to distribute software through any other system and a licensee cannot 221 | impose that choice. 222 | 223 | This section is intended to make thoroughly clear what is believed to 224 | be a consequence of the rest of this License. 225 | 226 | **8.** If the distribution and/or use of the Program is restricted in 227 | certain countries either by patents or by copyrighted interfaces, the 228 | original copyright holder who places the Program under this License 229 | may add an explicit geographical distribution limitation excluding 230 | those countries, so that distribution is permitted only in or among 231 | countries not thus excluded. In such case, this License incorporates 232 | the limitation as if written in the body of this License. 233 | 234 | **9.** The Free Software Foundation may publish revised and/or new versions 235 | of the General Public License from time to time. Such new versions will 236 | be similar in spirit to the present version, but may differ in detail to 237 | address new problems or concerns. 238 | 239 | Each version is given a distinguishing version number. If the Program 240 | specifies a version number of this License which applies to it and “any 241 | later version”, you have the option of following the terms and conditions 242 | either of that version or of any later version published by the Free 243 | Software Foundation. If the Program does not specify a version number of 244 | this License, you may choose any version ever published by the Free Software 245 | Foundation. 246 | 247 | **10.** If you wish to incorporate parts of the Program into other free 248 | programs whose distribution conditions are different, write to the author 249 | to ask for permission. For software which is copyrighted by the Free 250 | Software Foundation, write to the Free Software Foundation; we sometimes 251 | make exceptions for this. Our decision will be guided by the two goals 252 | of preserving the free status of all derivatives of our free software and 253 | of promoting the sharing and reuse of software generally. 254 | 255 | ### NO WARRANTY 256 | 257 | **11.** BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY 258 | FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN 259 | OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES 260 | PROVIDE THE PROGRAM “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED 261 | OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 262 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS 263 | TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE 264 | PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, 265 | REPAIR OR CORRECTION. 266 | 267 | **12.** IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 268 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR 269 | REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, 270 | INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING 271 | OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED 272 | TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY 273 | YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER 274 | PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE 275 | POSSIBILITY OF SUCH DAMAGES. 276 | 277 | END OF TERMS AND CONDITIONS 278 | 279 | ### How to Apply These Terms to Your New Programs 280 | 281 | If you develop a new program, and you want it to be of the greatest 282 | possible use to the public, the best way to achieve this is to make it 283 | free software which everyone can redistribute and change under these terms. 284 | 285 | To do so, attach the following notices to the program. It is safest 286 | to attach them to the start of each source file to most effectively 287 | convey the exclusion of warranty; and each file should have at least 288 | the “copyright” line and a pointer to where the full notice is found. 289 | 290 | 291 | Copyright (C) 292 | 293 | This program is free software; you can redistribute it and/or modify 294 | it under the terms of the GNU General Public License as published by 295 | the Free Software Foundation; either version 2 of the License, or 296 | (at your option) any later version. 297 | 298 | This program is distributed in the hope that it will be useful, 299 | but WITHOUT ANY WARRANTY; without even the implied warranty of 300 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 301 | GNU General Public License for more details. 302 | 303 | You should have received a copy of the GNU General Public License along 304 | with this program; if not, write to the Free Software Foundation, Inc., 305 | 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. 306 | 307 | Also add information on how to contact you by electronic and paper mail. 308 | 309 | If the program is interactive, make it output a short notice like this 310 | when it starts in an interactive mode: 311 | 312 | Gnomovision version 69, Copyright (C) year name of author 313 | Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 314 | This is free software, and you are welcome to redistribute it 315 | under certain conditions; type `show c' for details. 316 | 317 | The hypothetical commands `show w` and `show c` should show the appropriate 318 | parts of the General Public License. Of course, the commands you use may 319 | be called something other than `show w` and `show c`; they could even be 320 | mouse-clicks or menu items--whatever suits your program. 321 | 322 | You should also get your employer (if you work as a programmer) or your 323 | school, if any, to sign a “copyright disclaimer” for the program, if 324 | necessary. Here is a sample; alter the names: 325 | 326 | Yoyodyne, Inc., hereby disclaims all copyright interest in the program 327 | `Gnomovision' (which makes passes at compilers) written by James Hacker. 328 | 329 | , 1 April 1989 330 | Ty Coon, President of Vice 331 | 332 | This General Public License does not permit incorporating your program into 333 | proprietary programs. If your program is a subroutine library, you may 334 | consider it more useful to permit linking proprietary applications with the 335 | library. If this is what you want to do, use the GNU Lesser General 336 | Public License instead of this License. -------------------------------------------------------------------------------- /pkgdown/favicon/favicon.svg: -------------------------------------------------------------------------------- 1 | --------------------------------------------------------------------------------