├── .Rbuildignore ├── .gitignore ├── CRAN-RELEASE ├── DESCRIPTION ├── NAMESPACE ├── NEWS.Rmd ├── NEWS.md ├── R ├── _Rhistory ├── bank_td.R ├── dataprepmodelplots.R ├── modelplotr.R ├── parametersmodelplots.R └── plottingmodelplots.R ├── README.Rmd ├── README.md ├── cran-comments.Rmd ├── cran-comments.md ├── data └── bank_td.rda ├── man ├── aggregate_over_ntiles.Rd ├── bank_td.Rd ├── build_input_yourself.Rd ├── customize_plot_text.Rd ├── modelplotr.Rd ├── plot_costsrevs.Rd ├── plot_cumgains.Rd ├── plot_cumlift.Rd ├── plot_cumresponse.Rd ├── plot_multiplot.Rd ├── plot_profit.Rd ├── plot_response.Rd ├── plot_roi.Rd ├── plotting_scope.Rd ├── prepare_scores_and_ntiles.Rd └── prepare_scores_and_ntiles_keras.Rd ├── modelplotr.Rcheck └── 00check.log ├── modelplotr.Rproj ├── tests ├── testthat.R └── testthat │ └── test-prepare_scores_and_deciles.R └── vignettes ├── .gitignore ├── cumgains.png ├── modelplotr.Rmd └── plot123.png /.Rbuildignore: -------------------------------------------------------------------------------- 1 | ^Meta$ 2 | ^doc$ 3 | ^README\.Rmd$ 4 | ^NEWS\.Rmd$ 5 | ^cran-comments.Rmd$ 6 | ^cran-comments.md$ 7 | ^.*\.Rproj$ 8 | ^\.Rproj\.user$ 9 | ^CRAN-RELEASE$ 10 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | Meta 2 | doc 3 | inst/doc 4 | .Rproj.user 5 | .Rhistory 6 | .RData 7 | .Ruserdata 8 | ^NEWS.Rmd 9 | ^README.Rmd 10 | ^cran-comments.Rmd 11 | .keras 12 | -------------------------------------------------------------------------------- /CRAN-RELEASE: -------------------------------------------------------------------------------- 1 | This package was submitted to CRAN on 2019-04-16. 2 | Once it is accepted, delete this file and tag the release (commit e1e6a48217). 3 | -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: modelplotr 2 | Type: Package 3 | Title: Plots to Evaluate the Business Performance of Predictive Models 4 | Version: 1.1.0 5 | Authors@R: c( 6 | person("Jurriaan","Nagelkerke",email="jurriaan.nagelkerke@gmail.com",role=c("aut","cre")), 7 | person("Pieter","Marcus",email="pieter.marcus@persgroep.net",role="aut")) 8 | URL: https://github.com/jurrr/modelplotr 9 | BugReports: https://github.com/jurrr/modelplotr/issues 10 | Description: Plots to assess the quality of predictive models from a business perspective. 11 | Using these plots, it can be shown how implementation of the model will impact business 12 | targets like response on a campaign or return on investment. Different scopes can be selected: 13 | compare models, compare datasets or compare target class values and various plot customization 14 | and highlighting options are available. 15 | targets like response on a campaign. Different scopes can be selected: compare models, compare 16 | datasets or compare target class values and various plot customization and highlighting options 17 | are available. 18 | Depends: R (>= 3.1.0) 19 | License: GPL-3 20 | Encoding: UTF-8 21 | LazyData: true 22 | Imports: 23 | ggplot2 (>= 2.2.1), 24 | gridExtra (>= 2.3.0), 25 | magrittr (>= 1.5.0), 26 | dplyr (>= 0.7.7), 27 | RColorBrewer (>= 1.1.2), 28 | ggfittext (>= 0.6.0), 29 | scales (>= 1.0.0), 30 | rlang (>= 0.3.1) 31 | RoxygenNote: 7.1.1 32 | Suggests: 33 | mlr (>= 2.12.1), 34 | caret (>= 6.0), 35 | randomForest (>= 4.6.14), 36 | nnet(>= 7.3-12), 37 | e1071, 38 | h2o, 39 | keras, 40 | knitr, 41 | rmarkdown, 42 | testthat, 43 | xgboost, 44 | stringr, 45 | kableExtra, 46 | lattice, 47 | ranger, 48 | glmnet 49 | VignetteBuilder: knitr 50 | -------------------------------------------------------------------------------- /NAMESPACE: -------------------------------------------------------------------------------- 1 | # Generated by roxygen2: do not edit by hand 2 | 3 | export(aggregate_over_ntiles) 4 | export(customize_plot_text) 5 | export(plot_costsrevs) 6 | export(plot_cumgains) 7 | export(plot_cumlift) 8 | export(plot_cumresponse) 9 | export(plot_multiplot) 10 | export(plot_profit) 11 | export(plot_response) 12 | export(plot_roi) 13 | export(plotting_scope) 14 | export(prepare_scores_and_ntiles) 15 | export(prepare_scores_and_ntiles_keras) 16 | importFrom(magrittr,"%>%") 17 | importFrom(rlang,.data) 18 | -------------------------------------------------------------------------------- /NEWS.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "NEWS-modelplotr" 3 | output: github_document 4 | --- 5 | 6 | 7 | 8 | 9 | ```{r setup, include = FALSE} 10 | knitr::opts_chunk$set( 11 | collapse = TRUE, 12 | comment = "#>", 13 | fig.path = "NEWS-" 14 | ) 15 | ``` 16 | 17 | ```{r, echo = FALSE} 18 | cat(paste0("Last Update: ",Sys.time())) 19 | ``` 20 | 21 | 22 | ## v1.0.0 23 | 2019-04-13 24 | * Initial CRAN release. 25 | 26 | ## v1.1.0 27 | 2020-10-11 28 | * prepare_scores_and_ntiles_keras() function added to better support keras models 29 | 30 | -------------------------------------------------------------------------------- /NEWS.md: -------------------------------------------------------------------------------- 1 | NEWS-modelplotr 2 | ================ 3 | 4 | 5 | 6 | #> Last Update: 2019-04-13 13:17:21 7 | 8 | ## v1.0.0 9 | 10 | 2019-04-13 \* Initial CRAN release. 11 | -------------------------------------------------------------------------------- /R/_Rhistory: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jurrr/modelplotr/9ca4c5dc319eae91c854038f51fb76396fe82371/R/_Rhistory -------------------------------------------------------------------------------- /R/bank_td.R: -------------------------------------------------------------------------------- 1 | #' Bank clients that have/have not subscribed a term deposit. 2 | #' 3 | #' A dataset containing some customer characteristics for clients of a bank that have/have not subscribed a term deposit. 4 | #' 5 | #' @format A data frame with 2000 rows and 6 variables: 6 | #' \describe{ 7 | #' \item{has_td}{has the client subscribed a term deposit? Values: "term deposit", "no". 8 | #' This variable is used as the binary target variable in examples for the modelplotr package.} 9 | #' \item{td_type}{what type of term deposit did the client subscribe? Values: "no.td", "td.type.A", "td.type.B", "td.type.C". 10 | #' This variable is used as the multinomial target variable in examples for the modelplotr package.} 11 | #' \item{duration}{last contact duration, in seconds (numeric)} 12 | #' \item{campaign}{number of contacts performed during this campaign and for this client} 13 | #' \item{pdays}{number of days that passed by after the client was last contacted from a previous campaign} 14 | #' \item{previous}{number of contacts performed before this campaign and for this client (numeric)} 15 | #' \item{euribor3m}{euribor 3 month rate} 16 | #' } 17 | #' @source This dataset is a subset of the dataset made available by the University of California, Irvine. 18 | #' The complete dataset is available here: \url{https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank-additional.zip} 19 | "bank_td" 20 | 21 | # # SCRIPT TO CREATE bank_td 22 | # library(dplyr) 23 | # #zipname = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank-additional.zip' 24 | # # we encountered that the source at uci.edu is not always available, therefore we made a copy to our repos. 25 | # zipname = 'https://modelplot.github.io/img/bank-additional.zip' 26 | # csvname = 'bank-additional/bank-additional-full.csv' 27 | # temp <- tempfile() 28 | # download.file(zipname,temp, mode="wb") 29 | # bank <- read.table(unzip(temp,csvname),sep=";", stringsAsFactors=FALSE,header = TRUE) 30 | # unlink(temp) 31 | # 32 | # summary(bank) 33 | # bank <- bank %>% select('y','duration','campaign','pdays','previous','euribor3m') %>% sample_n(2000) 34 | # 35 | # # definition of y_cat, created to show modelplotr for multinomial targets 36 | # bank <- bank %>% 37 | # mutate(has_td = case_when(y =='yes' ~ 'term.deposit', 38 | # y =='no' ~ 'no.term.deposit'), 39 | # has_td = factor(has_td,levels=c('term.deposit','no.term.deposit')), 40 | # td_value = rexp(nrow(bank),1/5000)+ bank$duration * 20 + 41 | # 500*(bank$campaign*rnorm(nrow(bank),1,1))+bank$euribor3m*(100+rnorm(nrow(bank),200,100)), 42 | # td_type = case_when(y == 'no' ~ 'no.td', 43 | # td_value < 10000 ~ 'td.type.A', 44 | # td_value <= 25000 ~ 'td.type.B', 45 | # TRUE ~ 'td.type.C'), 46 | # td_type = factor(td_type,levels = c('no.td','td.type.A','td.type.B','td.type.C'))) %>% 47 | # select(has_td,td_type,duration,campaign,pdays,previous,euribor3m) 48 | # 49 | # bank_td <- bank 50 | # usethis::use_data(bank_td,overwrite = TRUE) 51 | 52 | -------------------------------------------------------------------------------- /R/modelplotr.R: -------------------------------------------------------------------------------- 1 | .onAttach <- function(libname, pkgname) { 2 | packageStartupMessage("Package modelplotr loaded! Happy model plotting!") 3 | } 4 | 5 | 6 | #' modelplotr: Plots to Evaluate the Business Performance of Predictive Models. 7 | #' 8 | #' Plots to evaluate the business performance of predictive models in R. 9 | #' A number of widely used plots to assess the quality of a predictive model from a business perspective 10 | #' can easily be created. Using these plots, it can be shown how implementation of the model will impact 11 | #' business targets like response on a campaign or return on investment. It's very easy to apply modelplotr 12 | #' to predictive models that are developed in caret, mlr, h2o or keras. For other models, even those built 13 | #' outside of R, an instruction is included. 14 | #' The modelplotr package provides three categories of important functions: 15 | #' datapreparation, plot parameterization and plotting. 16 | #' 17 | #' @author Jurriaan Nagelkerke [aut, cre] 18 | #' @author Pieter Marcus [aut] 19 | #' 20 | #' @section Datapreparation functions: 21 | #' The datapreparation functions are: 22 | #' \describe{ 23 | #' \item{\code{\link{prepare_scores_and_ntiles}}}{Function that builds a dataframe 24 | #' that contains actuals and predictions on the target variable for each dataset in \code{datasets} and each model in \code{models}. 25 | #' As inputs, it takes dataframes to score and model objects created with \strong{caret}, \strong{mlr}, \strong{h2o} or \strong{keras}. 26 | #' Specifically for keras models, built with keras_model_sequential() or with the keras functional API, there is the 27 | #' \code{\link{prepare_scores_and_ntiles_keras}} function. 28 | #' To use modelplotr on top of models created otherwise, even models built outside r, see \code{\link{aggregate_over_ntiles}}} 29 | #' \item{\code{\link{plotting_scope}}}{Function that creates a dataframe in the required format for all 30 | #' modelplotr plots, relevant to the selected scope of evaluation. Each record in this dataframe represents 31 | #' a unique combination of datasets, models, target classes and ntiles. As an input, plotting_scope can handle 32 | #' both a dataframe created with \code{aggregate_over_ntiles} as well as a dataframe created with 33 | #' \code{prepare_scores_and_ntiles} (or with \code{prepare_scores_and_ntiles_keras} or created otherwise, with similar layout). } 34 | #' \item{\code{\link{aggregate_over_ntiles}}}{Function that aggregates the output of \code{prepare_scores_and_ntiles} 35 | #' to create a dataframe with aggregated actuals and predictions. Each record in this dataframe represents 36 | #' a unique combination of datasets, models, target classes and ntiles. In most cases, you do not need to use function 37 | #' since the \code{plotting_scope} function will call this function automatically. }} 38 | #' @section Parameterization functions: 39 | #' Most parameterization functions are internal functions. However, one is available for customization: 40 | #' \describe{ 41 | #' \item{\code{\link{customize_plot_text}}}{Function that returns a list that contains all textual elements for 42 | #' all plots that modelplotr can create. By changing the elements in this list - simply by overwriting values - 43 | #' and then including this list with the \code{custom_plot_text} parameter in plot functions, plot texts can easily be customized 44 | #' to meet your (language) preferences}} 45 | #' @section Plotting functions: 46 | #' The plotting functions are: 47 | #' \describe{ 48 | #' \item{\code{\link{plot_cumgains}}}{Generates the cumulative gains plot. This plot, often referred to as the gains chart, 49 | #' helps answering the question: \strong{\emph{When we apply the model and select the best X ntiles, 50 | #' what percentage of the actual target class observations can we expect to target?}} } 51 | #' \item{\code{\link{plot_cumlift}}}{Generates the cumulative lift plot, often referred to as lift plot or index plot, 52 | #' helps you answer the question: \strong{\emph{When we apply the model and select the best X ntiles, 53 | #' how many times better is that than using no model at all?}}} 54 | #' \item{\code{\link{plot_response}}}{Generates the response plot. It plots the percentage of target class observations 55 | #' per ntile. It can be used to answer the following business question: \strong{\emph{When we apply 56 | #' the model and select ntile X, what is the expected percentage of target class observations 57 | #' in that ntile?}}} 58 | #' \item{\code{\link{plot_cumresponse}}}{Generates the cumulative response plot. It plots the cumulative percentage of 59 | #' target class observations up until that ntile. It helps answering the question: 60 | #' \strong{\emph{When we apply the model and select up until ntile X, what is the expected percentage of 61 | #' target class observations in the selection? }}} 62 | #' \item{\code{\link{plot_multiplot}}}{Generates a canvas with all four evaluation plots - cumulative gains, cumulative lift, 63 | #' response and cumulative response - combined on one canvas} 64 | #' \item{\code{\link{plot_costsrevs}}}{It plots the cumulative costs and revenues up until that ntile when the model 65 | #' is used for campaign selection. It can be used to answer the following business question: 66 | #' \strong{\emph{When we apply the model and select up until ntile X, what are the expected costs and 67 | #' revenues of the campaign?}}} 68 | #' \item{\code{\link{plot_profit}}}{Generates the Profit plot. It plots the cumulative profit up until that ntile when the 69 | #' model is used for campaign selection. It can be used to answer the following business question: 70 | #' \strong{\emph{When we apply the model and select up until ntile X, what is the expected profit of the campaign?}}} 71 | #' \item{\code{\link{plot_roi}}}{Generates the Return on Investment plot. It plots the cumulative revenues as a percentage 72 | #' of investments up until that ntile when the model is used for campaign selection. It can be used to answer the following 73 | #' business question: \strong{\emph{When we apply the model and select up until ntile X, what is the expected % return on 74 | #' investment of the campaign?}}} 75 | #' } 76 | #' 77 | #' @seealso \code{vignette('modelplotr')} 78 | #' @seealso \url{https://github.com/modelplot/modelplotr} for details on the package 79 | #' @seealso \url{https://modelplot.github.io/} for our blog posts on using modelplotr 80 | #' @examples 81 | #' \dontrun{ 82 | #' # load example data (Bank clients with/without a term deposit - see ?bank_td for details) 83 | #' data("bank_td") 84 | #' 85 | #' # prepare data for training model for binomial target has_td and train models 86 | #' train_index = sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE) 87 | #' train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 88 | #' test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 89 | #' 90 | #' #train models using caret... (or use mlr or H2o or keras ... see ?prepare_scores_and_ntiles) 91 | #' # setting caret cross validation, here tuned for speed (not accuracy!) 92 | #' fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE) 93 | #' # random forest using ranger package, here tuned for speed (not accuracy!) 94 | #' rf = caret::train(has_td ~.,data = train, method = "ranger",trControl = fitControl, 95 | #' tuneGrid = expand.grid(.mtry = 2,.splitrule = "gini",.min.node.size=10)) 96 | #' # mnl model using glmnet package 97 | #' mnl = caret::train(has_td ~.,data = train, method = "glmnet",trControl = fitControl) 98 | #' 99 | #' # load modelplotr 100 | #' library(modelplotr) 101 | #' 102 | #' # transform datasets and model objects to input for modelplotr 103 | #' scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"), 104 | #' dataset_labels = list("train data","test data"), 105 | #' models = list("rf","mnl"), 106 | #' model_labels = list("random forest","multinomial logit"), 107 | #' target_column="has_td", 108 | #' ntiles=100) 109 | #' 110 | #' # set scope for analysis (default: no comparison) 111 | #' plot_input <- plotting_scope(prepared_input = scores_and_ntiles) 112 | #' head(plot_input) 113 | #' 114 | #' # ALL PLOTS, with defaults 115 | #' plot_cumgains(data=plot_input) 116 | #' plot_cumlift(data=plot_input) 117 | #' plot_response(data=plot_input) 118 | #' plot_cumresponse(data=plot_input) 119 | #' plot_multiplot(data=plot_input) 120 | #' # financial plots - these need some financial parameters 121 | #' plot_costsrevs(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50) 122 | #' plot_profit(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50) 123 | #' plot_roi(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50) 124 | #' 125 | #' # CHANGING THE SCOPE OF ANALYSIS 126 | #' # changing the scope - compare models: 127 | #' plot_input <- plotting_scope(prepared_input = scores_and_ntiles,scope="compare_models") 128 | #' plot_cumgains(data=plot_input) 129 | #' # changing the scope - compare datasets: 130 | #' plot_input <- plotting_scope(prepared_input = scores_and_ntiles,scope="compare_datasets") 131 | #' plot_roi(data = plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50) 132 | #' # changing the scope - compare target classes: 133 | #' plot_input <- plotting_scope(prepared_input = scores_and_ntiles,scope="compare_targetclasses") 134 | #' plot_response(data=plot_input) 135 | # ' 136 | #' # HIGHLIGHTING OPTIONS 137 | #' plot_input <- plotting_scope(prepared_input = scores_and_ntiles, 138 | #' scope = 'compare_datasets',select_model_label = 'random forest') 139 | #' plot_cumgains(data=plot_input,highlight_ntile=20) 140 | #' plot_cumlift(data=plot_input,highlight_ntile=20,highlight_how = 'plot') 141 | #' plot_response(data=plot_input,highlight_ntile=20,highlight_how = 'text') 142 | #' plot_cumresponse(data=plot_input,highlight_ntile=20,highlight_how = 'plot_text') 143 | #' plot_costsrevs(data=plot_input,fixed_costs = 1000,variable_costs_per_unit = 10, 144 | #' profit_per_unit = 50,highlight_ntile='max_roi') 145 | #' plot_profit(data=plot_input,fixed_costs = 1500,variable_costs_per_unit = 10,profit_per_unit = 50) 146 | #' plot_roi(data=plot_input,fixed_costs = 1500,variable_costs_per_unit = 10,profit_per_unit = 50) 147 | #' 148 | #' # OTHER PLOT CUSTOMIZATIONS 149 | #' # customize line colors 150 | #' plot_input <- plotting_scope(prepared_input = scores_and_ntiles,scope = 'compare_models') 151 | #' plot_cumgains(data=plot_input,custom_line_colors = c('pink','navyblue')) 152 | #' # customize all textual elements of plots 153 | #' plot_input <- plotting_scope(prepared_input = scores_and_ntiles) 154 | #' mytexts <- customize_plot_text(plot_input = plot_input) 155 | #' mytexts$cumresponse$plottitle <- 'Expected conversion rate for Campaign XYZ' 156 | #' mytexts$cumresponse$plotsubtitle <- 'proposed selection: best 15 percentiles according to our model' 157 | #' mytexts$cumresponse$y_axis_label <- '% Conversion' 158 | #' mytexts$cumresponse$x_axis_label <- 'percentiles (percentile = 1% of customers)' 159 | #' mytexts$cumresponse$annotationtext <- 160 | #' "Selecting up until the &NTL percentile with model &MDL has an expected conversion rate of &VALUE" 161 | #' plot_cumresponse(data=plot_input,custom_plot_text = mytexts,highlight_ntile = 15) 162 | #' } 163 | #' @docType package 164 | #' @name modelplotr 165 | NULL 166 | -------------------------------------------------------------------------------- /R/parametersmodelplots.R: -------------------------------------------------------------------------------- 1 | ##@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@## 2 | #### customize_plot_text() #### 3 | ##@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@## 4 | 5 | #' Customize textual elements of the plots 6 | #' 7 | #' Function to overrule the default textual elements in the plots, like title, subtitle, 8 | #' axis labels and annotation texts when the highlighting parameter \code{highlight_ntile} 9 | #' is specified. 10 | #' 11 | #' @section How to customize textual elements of plots: 12 | #' All textual parts of the plots can be customized, for instance to translate 13 | #' textual elements to another language or to change the annotation text that is added with the 14 | #' \code{highlight_ntile} parameter. Once you have created the \code{plot_input} dataframe 15 | #' using \code{plotting_Scope}, you can run this \code{customize_plot_text()} function. 16 | #' It returns a list, containing all textual elements of the plots, including annotation texts. 17 | #' For instance, run \cr\cr 18 | #' \code{my_plot_text <- customize_plot_text(plot_input = plot_input)} \cr\cr 19 | #' The list contains plot-specific elements (e.g. \code{...$cumgains$...})). \cr 20 | #' Now, you can change the textual elements by overriding the element(s) you want to customize. 21 | #' For instance, if you want to change the textual elements of the gains plot to Dutch:\cr\cr 22 | #' \code{my_plot_text$gains$plottitle <- 'Cumulatieve Gains grafiek'}\cr 23 | #' \code{my_plot_text$gains$x_axis_label <- 'Deciel'}\cr 24 | #' \code{my_plot_text$gains$y_axis_label <- 'cumulatieve gains'}\cr 25 | #' \code{my_plot_text$cumgains$optimal_gains_label <- 'maximale gains'}\cr 26 | #' \code{my_plot_text$cumgains$minimal_gains_label <- 'minimale gains'}\cr 27 | #' \code{plot_cumgains(custom_plot_text = my_plot_text)}\cr\cr 28 | #' To change the annotation text, use the placeholders starting with '&' to dynamically include: 29 | #' \tabular{ll}{ 30 | #' \bold{palaceholder} \tab \bold{placeholder value}\cr 31 | #' \code{&NTL} \tab ntile specified with parameter \code{highlight_ntile}.\cr 32 | #' \code{&PCTNTL} \tab Total percentage of dataset selected up until specified ntile.\cr 33 | #' \code{&MDL} \tab Selected model label(s).\cr 34 | #' \code{&DS} \tab Selected dataset label(s).\cr 35 | #' \code{&YVAL} \tab Selected target class (Y-value).\cr 36 | #' \code{&VALUE} \tab The plot specific value at specified ntile. 37 | #' Eg. Cumulative gains, Rumulative lift, Response, Cumulative response, Profit, ROI or Revenue.\cr 38 | #' } 39 | #' For instance, to translate the gains plot annotation text to Dutch:\cr 40 | #' \code{my_plot_text$cumlift$annotationtext <- "Door &PCTNTL met de hoogste modelkans volgens model &MDL 41 | #' in &DS te selecteren is deze selectie van &YVAL observaties &CUMLIFT keer beter dan een random selectie."}\cr 42 | #' \code{plot_cumlift(highlight_ntile=3,custom_plot_text=my_plot_text)} 43 | #' 44 | #' @param plot_input Dataframe. Dataframe needs to be created with 45 | #' \code{\link{plotting_scope}} or else meet required input format. 46 | #' @return List with default values for all textual elements of the plots. 47 | #' @examples 48 | #' # load example data (Bank clients with/without a term deposit - see ?bank_td for details) 49 | #' data("bank_td") 50 | #' 51 | #' # prepare data for training model for binomial target has_td and train models 52 | #' train_index = sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE) 53 | #' train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 54 | #' test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 55 | #' 56 | #' #train models using caret... (or use mlr or H2o or keras ... see ?prepare_scores_and_ntiles) 57 | #' # setting caret cross validation, here tuned for speed (not accuracy!) 58 | #' fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE) 59 | #' # random forest using ranger package, here tuned for speed (not accuracy!) 60 | #' rf = caret::train(has_td ~.,data = train, method = "ranger",trControl = fitControl, 61 | #' tuneGrid = expand.grid(.mtry = 2,.splitrule = "gini",.min.node.size=10)) 62 | #' # mnl model using glmnet package 63 | #' mnl = caret::train(has_td ~.,data = train, method = "glmnet",trControl = fitControl) 64 | #' 65 | #' # load modelplotr 66 | #' library(modelplotr) 67 | #' 68 | #' # transform datasets and model objects to input for modelplotr 69 | #' scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"), 70 | #' dataset_labels = list("train data","test data"), 71 | #' models = list("rf","mnl"), 72 | #' model_labels = list("random forest","multinomial logit"), 73 | #' target_column="has_td", 74 | #' ntiles=100) 75 | #' 76 | #' # set scope for analysis (default: no comparison) 77 | #' plot_input <- plotting_scope(prepared_input = scores_and_ntiles) 78 | #' 79 | #' # customize all textual elements of plots 80 | #' mytexts <- customize_plot_text(plot_input = plot_input) 81 | #' mytexts$cumresponse$plottitle <- 'Expected conversion rate for Campaign XYZ' 82 | #' mytexts$cumresponse$plotsubtitle <- 'proposed selection: best 15 percentiles according to our model' 83 | #' mytexts$cumresponse$y_axis_label <- '% Conversion' 84 | #' mytexts$cumresponse$x_axis_label <- 'percentiles (percentile = 1% of customers)' 85 | #' mytexts$cumresponse$annotationtext <- 86 | #' "Selecting up until the &NTL percentile with model &MDL has an expected conversion rate of &VALUE" 87 | #' plot_cumresponse(data=plot_input,custom_plot_text = mytexts,highlight_ntile = 15) 88 | #' @export 89 | #' @importFrom magrittr %>% 90 | #' @seealso \code{\link{modelplotr}} for generic info on the package \code{moddelplotr} 91 | #' @seealso \code{vignette('modelplotr')} 92 | #' @seealso \url{https://github.com/modelplot/modelplotr} for details on the package 93 | #' @seealso \url{https://modelplot.github.io/} for our blog on the value of the model plots 94 | customize_plot_text <- function(plot_input=plot_input){ 95 | 96 | 97 | # create empty list for plot_text 98 | plot_text <- list() 99 | 100 | # add generic characteristics (derived from plot_input) 101 | plot_text$scope$scope <- max(as.character(plot_input$scope)) 102 | plot_text$scope$sel_model <- max(as.character(plot_input$model_label)) 103 | plot_text$scope$sel_dataset <- max(as.character(plot_input$dataset_label)) 104 | plot_text$scope$sel_target_class <- max(as.character(plot_input$target_class)) 105 | plot_text$scope$ntiles = max(plot_input$ntile) 106 | plot_text$scope$plotsubtitle <- 107 | ifelse(plot_text$scope$scope=="compare_datasets", 108 | paste0('scope: comparing datasets & model: ',plot_text$scope$sel_model, 109 | ' & target class: ' ,plot_text$scope$sel_target_class), 110 | ifelse(plot_text$scope$scope=="compare_models", 111 | paste0('scope: comparing models & dataset: ',plot_text$scope$sel_dataset, 112 | ' & target class: ',plot_text$scope$sel_target_class), 113 | ifelse(plot_text$scope$scope=="compare_targetclasses", 114 | paste0('scope: comparing target classes & dataset: ',plot_text$scope$sel_dataset, 115 | ' & model: ',plot_text$scope$sel_model), 116 | paste0('model: ',plot_text$scope$sel_model, 117 | ' & dataset: ',plot_text$scope$sel_dataset, 118 | ' & target class: ',plot_text$scope$sel_target_class)))) 119 | plot_text$scope$x_axis_label <- 120 | ifelse(plot_text$scope$ntiles==10,'decile', 121 | ifelse(plot_text$scope$ntiles==4,'quartile', 122 | ifelse(plot_text$scope$ntiles==5,'quintile', 123 | ifelse(plot_text$scope$ntiles==20,'ventile', 124 | ifelse(plot_text$scope$ntiles==100,'percentile', 125 | 'ntile'))))) 126 | 127 | # default values for textual plot elements 128 | 129 | # CUMGAINS 130 | plot_text$cumgains$plottitle <- "Cumulative gains" 131 | plot_text$cumgains$plotsubtitle <- plot_text$scope$plotsubtitle 132 | plot_text$cumgains$x_axis_label <- plot_text$scope$x_axis_label 133 | plot_text$cumgains$y_axis_label <- "cumulative gains" 134 | plot_text$cumgains$optimal_gains_label <- 'optimal gains' 135 | plot_text$cumgains$minimal_gains_label <- 'minimal gains' 136 | plot_text$cumgains$annotationtext <- "When we select &PCTNTL with the highest probability according to model &MDL, this selection holds &VALUE of all &YVAL cases in &DS." 137 | 138 | 139 | # CUMLIFT 140 | plot_text$cumlift$plottitle <- "Cumulative lift" 141 | plot_text$cumlift$plotsubtitle <- plot_text$scope$plotsubtitle 142 | plot_text$cumlift$x_axis_label <- plot_text$scope$x_axis_label 143 | plot_text$cumlift$y_axis_label <- "cumulative lift" 144 | plot_text$cumlift$lift_refline_label <- 'no lift' 145 | plot_text$cumlift$annotationtext <- "When we select &PCTNTL with the highest probability according to model &MDL in &DS, this selection for &YVAL cases is &VALUE times better than selecting without a model." 146 | 147 | 148 | # RESPONSE 149 | plot_text$response$plottitle <- "Response" 150 | plot_text$response$plotsubtitle <- plot_text$scope$plotsubtitle 151 | plot_text$response$x_axis_label <- plot_text$scope$x_axis_label 152 | plot_text$response$y_axis_label <- "response" 153 | plot_text$response$response_refline_label <- 'overall response' 154 | plot_text$response$annotationtext <- "When we select ntile &NTL according to model &MDL in dataset &DS the %% of &YVAL cases in the selection is &VALUE." 155 | 156 | 157 | # CUMRESPONSE 158 | plot_text$cumresponse$plottitle <- "Cumulative response" 159 | plot_text$cumresponse$plotsubtitle <- plot_text$scope$plotsubtitle 160 | plot_text$cumresponse$x_axis_label <- plot_text$scope$x_axis_label 161 | plot_text$cumresponse$y_axis_label <- "cumulative response" 162 | plot_text$cumresponse$response_refline_label <- 'overall response' 163 | plot_text$cumresponse$annotationtext <- "When we select ntiles 1 until &NTL according to model &MDL in dataset &DS the %% of &YVAL cases in the selection is &VALUE." 164 | 165 | 166 | # MULTIPLOT 167 | plot_text$multiplot$plottitle <- 168 | ifelse(plot_text$scope$scope=="compare_datasets", 169 | paste0('scope: comparing datasets & model: ',plot_text$scope$sel_model, 170 | ' & target class: ' ,plot_text$scope$sel_target_class), 171 | ifelse(plot_text$scope$scope=="compare_models", 172 | paste0('scope: comparing models & dataset: ',plot_text$scope$sel_dataset, 173 | ' & target class: ',plot_text$scope$sel_target_class), 174 | ifelse(plot_text$scope$scope=="compare_targetclasses", 175 | paste0('scope: comparing target classes & dataset: ',plot_text$scope$sel_dataset, 176 | ' & model: ',plot_text$scope$sel_model), 177 | paste0('model: ',plot_text$scope$sel_model, 178 | ' & dataset: ',plot_text$scope$sel_dataset, 179 | ' & target class: ',plot_text$scope$sel_target_class)))) 180 | plot_text$multiplot$plotsubtitle <- plot_text$scope$plotsubtitle 181 | plot_text$multiplot$x_axis_label <- plot_text$scope$x_axis_label 182 | plot_text$multiplot$annotationtext <- NA 183 | 184 | # PROFIT 185 | plot_text$profit$plottitle <- "Profit" 186 | plot_text$profit$plotsubtitle <- plot_text$scope$plotsubtitle 187 | plot_text$profit$x_axis_label <- plot_text$scope$x_axis_label 188 | plot_text$profit$y_axis_label <- "Profit" 189 | plot_text$profit$profit_breakeven_refline_label <- 'break-even' 190 | plot_text$profit$profit_overall_refline_label <- 'overall profit' 191 | plot_text$profit$annotationtext <- "When we select ntiles 1 until &NTL in dataset &DS using model &MDL to target &YVAL cases the expected profit is &VALUE" 192 | 193 | # ROI 194 | plot_text$roi$plottitle <- "Return on Investment (ROI)" 195 | plot_text$roi$plotsubtitle <- plot_text$scope$plotsubtitle 196 | plot_text$roi$x_axis_label <- plot_text$scope$x_axis_label 197 | plot_text$roi$y_axis_label <- "% ROI" 198 | plot_text$roi$roi_breakeven_refline_label <- 'break-even' 199 | plot_text$roi$roi_overall_refline_label <- 'overall roi' 200 | plot_text$roi$annotationtext <- "When we select ntiles 1 until &NTL in dataset &DS using model &MDL to target &YVAL cases the expected return on investment is &VALUE." 201 | 202 | 203 | # COSTSREVS 204 | plot_text$costsrevs$plottitle <- "Costs and Revenues" 205 | plot_text$costsrevs$plotsubtitle <- plot_text$scope$plotsubtitle 206 | plot_text$costsrevs$x_axis_label <- plot_text$scope$x_axis_label 207 | plot_text$costsrevs$y_axis_label <- "costs / revenues" 208 | plot_text$costsrevs$costs_label <- "total costs" 209 | plot_text$costsrevs$revenues_label <- "revenues" 210 | plot_text$costsrevs$annotationtext <- "When we select ntiles 1 until &NTL in dataset &DS using model &MDL to target &YVAL cases the revenues are &VALUE" 211 | 212 | 213 | 214 | 215 | message('List with default values for all textual plot elements is created. 216 | To customize titles, axis labels and annotation text, modify specific list elements. 217 | E.g, when List is named \'mylist\', to change the lift plot title to \'Cumulatieve Lift grafiek\', use: 218 | mylist$cumlift$title <- \'Cumulatieve Lift grafiek\' 219 | plot_cumlift(custom_plot_text = mylist)' ) 220 | 221 | return(plot_text) 222 | 223 | } 224 | 225 | 226 | ##@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@## 227 | #### setplotparams() #### 228 | ##@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@## 229 | # internal function to combine textual elements (default or customized), specified scope and line customization 230 | # to generate list with all plot parameters (pp) 231 | 232 | setplotparams <- function(plot_input,plottype,custom_line_colors,plot_text) { 233 | 234 | # plottype <- "costsrevs" 235 | # custom_line_colors <- NA 236 | 237 | # get textual elements and put them in pp (plot_params) list 238 | pp <- plot_text 239 | 240 | # ALL PLOTS 241 | pp$scope$plottype <- plottype 242 | pp$scope$levels <- unique(as.character(plot_input$legend)) 243 | pp$scope$nlevels <- length(pp$scope$levels) 244 | pp$scope$randcols <- RColorBrewer::brewer.pal(n = 8, name = "Set1") 245 | pp$scope$levelcols <- pp$scope$randcols[1:pp$scope$nlevels] 246 | pp$scope$xlabper <- ifelse(pp$scope$ntiles<=20,1,ifelse(pp$scope$ntiles<=40,2,5)) 247 | pp$scope$ntile0 <- ifelse(pp$scope$plottype=="cumgains",1,0) 248 | if (length(custom_line_colors)==1 & is.na(custom_line_colors[1])){ 249 | pp$scope$levelcols <- pp$scope$randcols[1:pp$scope$nlevels] 250 | } else if(length(custom_line_colors)==pp$scope$nlevels) { 251 | pp$scope$levelcols <- custom_line_colors 252 | } else if (length(custom_line_colors)pp$scope$nlevels) { 258 | message('specified custom_line_colors vector greater than required length! 259 | It is cropped to match required length\n') 260 | pp$scope$levelcols <- custom_line_colors[1:pp$scope$nlevels] 261 | } else { 262 | pp$scope$levelcols <- pp$scope$randcols[1:pp$scope$nlevels] 263 | } 264 | 265 | pp$scope$plottitle <- get('plottitle',get(pp$scope$plottype,plot_text)) 266 | pp$scope$plotsubtitle <- get('plotsubtitle',get(pp$scope$plottype,plot_text)) 267 | pp$scope$annotationtext <- get('annotationtext',get(pp$scope$plottype,plot_text)) 268 | 269 | # GAINS 270 | if (pp$scope$scope=='compare_models') { 271 | pp$cumgains$reflevels <- paste0(pp$cumgains$optimal_gains_label,' (',unique(plot_input$dataset_label),')') 272 | } else { 273 | pp$cumgains$reflevels <- paste0(pp$cumgains$optimal_gains_label,' (',pp$scope$levels,')') 274 | } 275 | pp$cumgains$nreflevels <- ifelse(pp$scope$scope=='compare_models',1,pp$scope$nlevels) 276 | if (pp$scope$scope=='compare_models') { 277 | pp$cumgains$reflevelcols <- 'gray' 278 | } else { pp$cumgains$reflevelcols <- pp$scope$levelcols} 279 | pp$cumgains$levels <- c(pp$scope$levels,pp$cumgains$minimal_gains_label,pp$cumgains$reflevels) 280 | pp$cumgains$nlevels <- length(pp$cumgains$levels) 281 | pp$cumgains$legendcolumns <- ifelse(pp$cumgains$nlevels>6,2,1) 282 | pp$cumgains$linetypes <- c(rep('solid',pp$scope$nlevels),'dashed',rep('dotted',pp$cumgains$nreflevels)) 283 | pp$cumgains$alphas <- c(rep(1,pp$scope$nlevels),1,rep(1,pp$cumgains$nreflevels)) 284 | pp$cumgains$linecols <- c(pp$scope$levelcols,'gray',pp$cumgains$reflevelcols) 285 | pp$cumgains$linesizes <- c(rep(1,pp$scope$nlevels),0.5,rep(1.2,pp$cumgains$nreflevels)) 286 | pp$cumgains$annolabelfmt <- 'scales::percent_format(accuracy=1)' 287 | 288 | # LIFT 289 | pp$cumlift$levels <- c(pp$scope$levels,pp$cumlift$lift_refline_label) 290 | pp$cumlift$nlevels <- length(pp$cumlift$levels) 291 | pp$cumlift$legendcolumns <- ifelse(pp$cumlift$nlevels>6,2,1) 292 | pp$cumlift$linetypes <- c(rep('solid',pp$scope$nlevels),'dashed') 293 | pp$cumlift$alphas <- c(rep(1,pp$scope$nlevels),1) 294 | pp$cumlift$linecols <- c(pp$scope$levelcols,'gray') 295 | pp$cumlift$linesizes <- c(rep(1,pp$scope$nlevels),0.5) 296 | pp$cumlift$annolabelfmt <- 'scales::comma_format(accuracy=0.1)' 297 | 298 | # RESPONSE 299 | if (pp$scope$scope=='compare_models') { 300 | pp$response$reflevels <- paste0(pp$response$response_refline_label,' (',unique(plot_input$dataset_label),')') 301 | } else { 302 | pp$response$reflevels <- paste0(pp$response$response_refline_label,' (',pp$scope$levels,')') 303 | } 304 | pp$response$nreflevels <- ifelse(pp$scope$scope=='compare_models',1,pp$scope$nlevels) 305 | if (pp$scope$scope=='compare_models') pp$response$reflevelcols <- 'gray' else pp$response$reflevelcols <- pp$scope$levelcols 306 | pp$response$levels <- c(pp$scope$levels,pp$response$reflevels) 307 | pp$response$nlevels <- length(pp$response$levels) 308 | pp$response$legendcolumns <- ifelse(pp$response$nlevels>6,2,1) 309 | pp$response$linetypes <- c(rep('solid',pp$scope$nlevels),rep('dashed',pp$response$nreflevels)) 310 | pp$response$alphas <- c(rep(1,pp$scope$nlevels),rep(1,pp$response$nreflevels)) 311 | pp$response$linecols <- c(pp$scope$levelcols,pp$response$reflevelcols) 312 | pp$response$linesizes <- c(rep(1,pp$scope$nlevels),rep(0.8,pp$response$nreflevels)) 313 | pp$response$annolabelfmt <- 'scales::percent_format(accuracy=0.1)' 314 | 315 | # CUMRESPONSE 316 | if (pp$scope$scope=='compare_models') { 317 | pp$cumresponse$reflevels <- paste0(pp$cumresponse$response_refline_label,' (',unique(plot_input$dataset_label),')') 318 | } else { 319 | pp$cumresponse$reflevels <- paste0(pp$cumresponse$response_refline_label,' (',pp$scope$levels,')') 320 | } 321 | pp$cumresponse$nreflevels <- ifelse(pp$scope$scope=='compare_models',1,pp$scope$nlevels) 322 | if (pp$scope$scope=='compare_models') pp$cumresponse$reflevelcols <- 'gray' else pp$cumresponse$reflevelcols <- pp$scope$levelcols 323 | pp$cumresponse$levels <- c(pp$scope$levels,pp$cumresponse$reflevels) 324 | pp$cumresponse$nlevels <- length(pp$cumresponse$levels) 325 | pp$cumresponse$legendcolumns <- ifelse(pp$cumresponse$nlevels>6,2,1) 326 | pp$cumresponse$linetypes <- c(rep('solid',pp$scope$nlevels),rep('dashed',pp$cumresponse$nreflevels)) 327 | pp$cumresponse$alphas <- c(rep(1,pp$scope$nlevels),rep(1,pp$cumresponse$nreflevels)) 328 | pp$cumresponse$linecols <- c(pp$scope$levelcols,pp$cumresponse$reflevelcols) 329 | pp$cumresponse$linesizes <- c(rep(1,pp$scope$nlevels),rep(0.8,pp$cumresponse$nreflevels)) 330 | pp$cumresponse$annolabelfmt <- 'scales::percent_format(accuracy=0.1)' 331 | 332 | # MULTIPLOT 333 | pp$multiplot$annolabelfmt <- '' 334 | 335 | # PROFIT 336 | if (pp$scope$scope=='compare_models') { 337 | pp$profit$reflevels <- paste0(pp$profit$profit_overall_refline_label,' (',unique(plot_input$dataset_label),')') 338 | } else { 339 | pp$profit$reflevels <- paste0(pp$profit$profit_overall_refline_label,' (',pp$scope$levels,')') 340 | } 341 | pp$profit$nreflevels <- ifelse(pp$scope$scope=='compare_models',1,pp$scope$nlevels) 342 | if (pp$scope$scope=='compare_models') { 343 | pp$profit$reflevelcols <- 'gray' 344 | } else { pp$profit$reflevelcols <- pp$scope$levelcols} 345 | pp$profit$levels <- c(pp$scope$levels,pp$profit$profit_breakeven_refline_label,pp$profit$reflevels) 346 | pp$profit$nlevels <- length(pp$profit$levels) 347 | pp$profit$legendcolumns <- ifelse(pp$profit$nlevels>6,2,1) 348 | pp$profit$linetypes <- c(rep('solid',pp$scope$nlevels),'dashed',rep('dotted',pp$profit$nreflevels)) 349 | pp$profit$alphas <- c(rep(1,pp$scope$nlevels),1,rep(1,pp$profit$nreflevels)) 350 | pp$profit$linecols <- c(pp$scope$levelcols,'gray',pp$profit$reflevelcols) 351 | pp$profit$linesizes <- c(rep(1,pp$scope$nlevels),0.8,rep(1.2,pp$profit$nreflevels)) 352 | pp$profit$annolabelfmt <- 'scales::dollar_format(prefix = "\u20ac", suffix = "")' #euro symbol 353 | 354 | # ROI 355 | if (pp$scope$scope=='compare_models') { 356 | pp$roi$reflevels <- paste0(pp$roi$roi_overall_refline_label,' (',unique(plot_input$dataset_label),')') 357 | } else { 358 | pp$roi$reflevels <- paste0(pp$roi$roi_overall_refline_label,' (',pp$scope$levels,')') 359 | } 360 | pp$roi$nreflevels <- ifelse(pp$scope$scope=='compare_models',1,pp$scope$nlevels) 361 | if (pp$scope$scope=='compare_models') { 362 | pp$roi$reflevelcols <- 'gray' 363 | } else { pp$roi$reflevelcols <- pp$scope$levelcols} 364 | pp$roi$levels <- c(pp$scope$levels,pp$roi$roi_breakeven_refline_label,pp$roi$reflevels) 365 | pp$roi$nlevels <- length(pp$roi$levels) 366 | pp$roi$legendcolumns <- ifelse(pp$roi$nlevels>6,2,1) 367 | pp$roi$linetypes <- c(rep('solid',pp$scope$nlevels),'dashed',rep('dotted',pp$roi$nreflevels)) 368 | pp$roi$alphas <- c(rep(1,pp$scope$nlevels),1,rep(1,pp$roi$nreflevels)) 369 | pp$roi$linecols <- c(pp$scope$levelcols,'gray',pp$roi$reflevelcols) 370 | pp$roi$linesizes <- c(rep(1,pp$scope$nlevels),1.2,rep(0.7,pp$roi$nreflevels)) 371 | pp$roi$annolabelfmt <- 'scales::percent_format(accuracy=1)' 372 | 373 | # COSTSREVS 374 | if (pp$scope$scope=='compare_models') { 375 | pp$costsrevs$costlevels <- paste0(pp$costsrevs$costs_label,' (',unique(plot_input$dataset_label),')') 376 | } else { 377 | pp$costsrevs$costlevels <- paste0(pp$costsrevs$costs_label,' (',pp$scope$levels,')') 378 | } 379 | pp$costsrevs$nreflevels <- ifelse(pp$scope$scope=='compare_models',1,pp$scope$nlevels) 380 | if (pp$scope$scope=='compare_models') { 381 | pp$costsrevs$reflevelcols <- 'gray' 382 | } else { pp$costsrevs$reflevelcols <- pp$scope$levelcols} 383 | pp$costsrevs$levels <- paste0(pp$costsrevs$revenues_label,' (',pp$scope$levels,')') 384 | pp$costsrevs$levels <- c(pp$costsrevs$levels,pp$costsrevs$costlevels) 385 | pp$costsrevs$nlevels <- length(pp$costsrevs$levels) 386 | pp$costsrevs$legendcolumns <- ifelse(pp$costsrevs$nlevels>6,2,1) 387 | pp$costsrevs$linetypes <- c(rep('solid',pp$scope$nlevels),rep('dashed',pp$costsrevs$nreflevels)) 388 | pp$costsrevs$alphas <- c(rep(1,pp$scope$nlevels),rep(1,pp$roi$nreflevels)) 389 | pp$costsrevs$linecols <- c(pp$scope$levelcols,pp$costsrevs$reflevelcols) 390 | pp$costsrevs$linesizes <- c(rep(1,pp$scope$nlevels),rep(1,pp$costsrevs$nreflevels)) 391 | pp$costsrevs$annolabelfmt <- 'scales::dollar_format(prefix = "\u20ac", suffix = "")' #euro symbol 392 | 393 | pp$scope$annolabelfmt = get('annolabelfmt',get(pp$scope$plottype,pp)) 394 | 395 | return(pp) 396 | } 397 | 398 | ##@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@## 399 | #### annotate_plot() #### 400 | ##@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@## 401 | 402 | utils::globalVariables(c("plot_input_prepared")) 403 | 404 | #highlight_ntile='max' 405 | annotate_plot <- function(plot=plot,highlight_input=plot_input_prepared, 406 | highlight_ntile=highlight_ntile,highlight_how=highlight_how,pp=pp){ 407 | 408 | if(!is.na(highlight_ntile)) { 409 | 410 | # check if scores_and_ntiles exists, otherwise create 411 | if (highlight_ntile<1 | (highlight_ntile>pp$scope$ntiles & highlight_ntile!='max_roi'& highlight_ntile!='max_profit')| 412 | ifelse(is.numeric(highlight_ntile),highlight_ntile %% 1 > 0,FALSE)) { 413 | stop(paste0("Value for highlight_ntile not valid! Choose ntile (integer) value to highlight in range [1:",pp$scope$ntiles,"] 414 | or use highlight_ntile='max_profit' or highlight_ntile='max_roi' for maximum value highlighting")) 415 | } 416 | if(!highlight_how %in% c('plot','text','plot_text')){ 417 | message("no valid value for highlight_how specified; default value (plot_text) is chosen 418 | -> choose 'plot_text' to highlight both the plot and add explanatory text below the plot 419 | -> choose 'plot' to only highlight both the plot - no explanatory text is added below the plot 420 | -> choose 'text' to only add explanatory text below the plot - the chosen ntile is not highlighted in the plot \n") 421 | highlight_how <- 'plot_text' 422 | } 423 | 424 | # prepare input for highlighting 425 | # when maximum value for financial plots is requested: get ntiles with maximum 426 | if(highlight_ntile == 'max_profit'){ 427 | highlight_input = highlight_input %>% dplyr::group_by(legend) %>% 428 | dplyr::filter(refline == 0 & max_profit == 1) %>% 429 | dplyr::top_n(1,wt = plotvalue) %>% 430 | dplyr::ungroup() 431 | highlight_ntile_num = highlight_input %>% dplyr::distinct(ntile) %>% dplyr::pull() # lowest ntile with max profit when multiple lines are plotted 432 | } else if(highlight_ntile == 'max_roi'){ 433 | highlight_input = highlight_input %>% dplyr::group_by(legend) %>% 434 | dplyr::filter(refline == 0 & max_roi == 1) %>% 435 | dplyr::top_n(1,wt = plotvalue) %>% 436 | dplyr::ungroup() 437 | highlight_ntile_num = highlight_input %>% dplyr::distinct(ntile) %>% dplyr::pull() # lowest ntile with max profit when multiple lines are plotted 438 | } else { 439 | highlight_input = highlight_input %>% dplyr::filter(ntile==highlight_ntile & refline==0) 440 | highlight_ntile_num = highlight_ntile 441 | } 442 | 443 | if(highlight_how %in% c('plot','plot_text')){ 444 | # check ggplot version (clip=off is available in version 3.0 and later) 445 | if(utils::packageVersion("ggplot2") < 3.0) { 446 | warning(paste0('You are using ggplot2 version ',utils::packageVersion("ggplot2"),'. ggplot2 >= 3.0.0 is required for nicer annotated plots!'), 447 | call. = FALSE) 448 | } 449 | 450 | # add highlighting 451 | plot <- plot + 452 | # add highlighting cicle(s) to plot at ntile value 453 | ggplot2::geom_point(data = highlight_input, 454 | ggplot2::aes(x=ntile,y=plotvalue,color=legend),shape=1,size=5,show.legend = FALSE)+ 455 | # add line(s) from annotated point(s) to Y axis 456 | ggplot2::geom_segment(data = highlight_input, 457 | ggplot2::aes(x=-Inf,y=plotvalue,xend=ntile+0.5,yend=plotvalue,colour=legend), 458 | linetype="dotted",size=0.5,show.legend = FALSE)+ 459 | # add line(s) from annotated point(s) to X axis 460 | ggplot2::geom_segment(data = highlight_input, 461 | ggplot2::aes(x=ntile,y=-Inf,xend=ntile,yend=plotvalue+0.05,colour=legend), 462 | linetype="dotted",size=1,show.legend = FALSE) + 463 | # add value labels for annotated points to Y axis 464 | ggplot2::geom_label(data=highlight_input, 465 | ggplot2::aes(x=-Inf,y=plotvalue,label = eval(parse(text=paste0(pp$scope$annolabelfmt,"(plotvalue)"))),color=legend),fill="white",alpha=0.6, 466 | hjust = 0, fontface = "bold",show.legend = FALSE) 467 | 468 | # emphasize ntile for which annotation is added on X axis 469 | if(min(highlight_ntile_num) == max(highlight_ntile_num) & highlight_ntile_num[1] %% pp$scope$xlabper == 0){ 470 | xbreaks <- seq((1-pp$scope$ntile0)*pp$scope$xlabper,pp$scope$ntiles+pp$scope$ntile0,pp$scope$xlabper) 471 | xfaces <- c(rep("plain",(pp$scope$ntile0+highlight_ntile_num-1)/pp$scope$xlabper), 472 | "bold", 473 | rep("plain",(pp$scope$ntiles+pp$scope$ntile0-highlight_ntile_num)/pp$scope$xlabper)) 474 | xsizes <- c(rep(10,(pp$scope$ntile0+highlight_ntile_num-1)/pp$scope$xlabper), 475 | 12, 476 | rep(10,(pp$scope$ntiles+pp$scope$ntile0-highlight_ntile_num)/pp$scope$xlabper)) 477 | plot <- plot + 478 | ggplot2::theme( 479 | axis.line = ggplot2::element_line(color="black"), 480 | axis.text.x = ggplot2::element_text(face=xfaces,size=xsizes))+ 481 | ggplot2::scale_x_continuous(name=get('x_axis_label',get(pp$scope$plottype,pp)), breaks=xbreaks,labels=xbreaks,expand = c(0, 0.02)) 482 | }else{ 483 | xbreaks <- seq((1-pp$scope$ntile0)*pp$scope$xlabper,pp$scope$ntiles+pp$scope$ntile0,pp$scope$xlabper) 484 | xfaces <- rep("plain",(pp$scope$ntiles/pp$scope$xlabper)+pp$scope$ntile0) 485 | xsizes <- rep(10,(pp$scope$ntiles/pp$scope$xlabper)+pp$scope$ntile0) 486 | plot <- plot + 487 | ggplot2::theme( 488 | axis.line = ggplot2::element_line(color="black"), 489 | axis.text.x = ggplot2::element_text(face=xfaces,size=xsizes))+ 490 | ggplot2::scale_x_continuous(name=get('x_axis_label',get(pp$scope$plottype,pp)), breaks=xbreaks,labels=xbreaks,expand = c(0, 0.02))+ 491 | # add value labels for annotated points to X axis 492 | ggplot2::geom_label(data=highlight_input %>% dplyr::filter(ntile %in% highlight_ntile_num & refline==0), 493 | ggplot2::aes(x=highlight_ntile_num,y=-Inf,label = highlight_ntile_num,color=legend),fill="white", 494 | vjust=0.2,fontface = "bold",alpha=0.8,show.legend = FALSE) 495 | } 496 | # make sure value labels for annotated points to X axis aren't clipped 497 | if(utils::packageVersion("ggplot2") >= 3.0) plot <- plot + ggplot2::coord_cartesian(clip = 'off' ) 498 | } 499 | 500 | # annotation text 501 | 502 | annovalues <- highlight_input %>% 503 | dplyr::filter(ntile %in% highlight_ntile_num & refline==0) %>% 504 | dplyr::mutate(xmin=rep(0,pp$scope$nlevels), 505 | xmax=rep(100,pp$scope$nlevels), 506 | ymin=seq(1,pp$scope$nlevels,1), 507 | ymax=seq(2,pp$scope$nlevels+1,1), 508 | # create variables with the values needed for the annotation texts 509 | NTL=highlight_ntile_num, 510 | PCTNTL=sprintf("%1.0f%%",100*highlight_ntile_num/pp$scope$ntiles), 511 | MDL=model_label, 512 | DS=dataset_label, 513 | YVAL=.data$target_class, 514 | VALUE=eval(parse(text=paste0(pp$scope$annolabelfmt,"(plotvalue)"))), 515 | # replace the placeholders for values in the annotation text per plot type 516 | annotationtext = 517 | eval(parse(text=paste0("sprintf('",stringr::str_replace_all(pp$scope$annotationtext,'&[A-Z]+','%s'), " ', ", 518 | paste(substr(unlist(stringr:: str_extract_all(pp$scope$annotationtext,'&[A-Z]+')),2,100), 519 | collapse = ', '),')')))) 520 | 521 | message(paste(' ',paste0('Plot annotation for plot: ',pp$scope$plottitle), 522 | paste(paste0('- ',annovalues$annotationtext), collapse = '\n'),' ',' ', sep = '\n')) 523 | 524 | if(highlight_how %in% c('text','plot_text')){ 525 | # create annotation text element to add to grob 526 | annotextplot <- ggplot2::ggplot(annovalues, 527 | ggplot2::aes(label = .data$annotationtext, xmin = .data$xmin, xmax = .data$xmax, ymin = .data$ymin,ymax = .data$ymax,color=.data$legend)) + 528 | ggplot2::geom_rect(fill=NA,color=NA) + 529 | ggplot2::scale_color_manual(values=pp$scope$levelcols)+ 530 | ggfittext::geom_fit_text(place = "center",grow = TRUE,reflow = FALSE) + 531 | ggplot2::theme_minimal() + 532 | ggplot2::theme(legend.position="none", 533 | line =ggplot2::element_blank(), 534 | title=ggplot2::element_blank(), 535 | axis.text=ggplot2::element_blank())+ 536 | ggplot2::scale_y_reverse() 537 | 538 | #remove title from plot 539 | plot <- plot + ggplot2::theme( 540 | plot.title = ggplot2::element_blank(), 541 | plot.subtitle = ggplot2::element_blank()) 542 | 543 | # create title and subtitle elements for grob 544 | 545 | title <- grid::textGrob(pp$scope$plottitle, gp=grid::gpar(fontsize=18)) 546 | subtitle <- grid::textGrob(pp$scope$plotsubtitle, gp=grid::gpar(fontsize=10,fontface="italic",col="black")) 547 | 548 | #add x axis labels when no annotation is applied to plot 549 | if(highlight_how =='text') { 550 | plot <- plot + ggplot2::scale_x_continuous(name=get('x_axis_label',get(pp$scope$plottype,pp)), 551 | breaks=seq(0,pp$scope$ntiles,pp$scope$xlabper), 552 | labels=seq(0,pp$scope$ntiles,pp$scope$xlabper),expand = c(0, 0.02))+ 553 | ggplot2::theme(axis.line.x=ggplot2::element_line(),axis.line.y=ggplot2::element_line()) 554 | } 555 | 556 | # create grob layout and add elements to it 557 | lay <- as.matrix(c(1,2,rep(3,20),rep(4,1+pp$scope$nlevels))) 558 | plot <- gridExtra::arrangeGrob(title,subtitle,plot,annotextplot, layout_matrix = lay, 559 | widths = grid::unit(18, "cm"),heights = grid::unit(rep(12/(23+pp$scope$nlevels),23+pp$scope$nlevels), "cm")) 560 | 561 | } 562 | } 563 | return(plot) 564 | } 565 | 566 | 567 | quiet <- function(x) { 568 | sink(tempfile()) 569 | on.exit(sink()) 570 | invisible(force(x)) 571 | } 572 | 573 | 574 | -------------------------------------------------------------------------------- /README.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | output: github_document 3 | --- 4 | 5 | ```{r setup, include = FALSE} 6 | knitr::opts_chunk$set( 7 | collapse = TRUE, 8 | comment = "#>", 9 | fig.path = "man/figures/README-", 10 | out.width = "100%" 11 | ) 12 | ``` 13 | 14 | # modelplotr: Plots to evaluate the business value of predictive models 15 | 16 | The modelplotr package makes it easy to create a number of valuable evaluation plots to assess the business value of a predictive model. Using these plots, it can be shown how implementation of the model will impact business targets like response or return on investment of a campaign. 17 | 18 | Plots available with modelplotr: 19 | 20 | * cumulative gains 21 | * cumulative lift 22 | * response 23 | * cumulative response 24 | * costs & revenues 25 | * profit 26 | * return on investment 27 | 28 | Some benefits of using modelplotr: 29 | 30 | * *easy to explain* plots to discuss your model with business 31 | * *easy to use* on top of predictive models built with caret, mlr, h2o, keras or otherwise (with or without r) 32 | * supports both *binary and multinomial targets* 33 | * provides *four plotting scopes*: 34 | + comparing models 35 | + comparing datasets 36 | + comparing multiclass target classes 37 | + no comparison (single line) 38 | * *plot annotation*: highlighting specific values and adding explanatory text to guide interpretation 39 | * *plot customization*: all textual elements, line colors 40 | * *save plot* to file on disk 41 | 42 | ## Installation 43 | 44 | You can install `modelplotr` from [GitHub](https://github.com/modelplot/modelplotr) with: 45 | 46 | ```{r, eval = FALSE} 47 | devtools::install_github("modelplot/modelplotr") 48 | ``` 49 | 50 | See this blog for further details and examples of using the package. 51 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # modelplotr: Plots to evaluate the business value of predictive models 3 | 4 | The modelplotr package makes it easy to create a number of valuable 5 | evaluation plots to assess the business value of a predictive model. 6 | Using these plots, it can be shown how implementation of the model will 7 | impact business targets like response or return on investment of a 8 | campaign. 9 | 10 | Plots available with modelplotr: 11 | 12 | - cumulative gains 13 | - cumulative lift 14 | - response 15 | - cumulative response 16 | - costs & revenues 17 | - profit 18 | - return on investment 19 | 20 | Some benefits of using modelplotr: 21 | 22 | - *easy to explain* plots to discuss your model with business 23 | - *easy to use* on top of predictive models built with caret, mlr, 24 | h2o, keras or otherwise (with or without r) 25 | - supports both *binary and multinomial targets* 26 | - provides *four plotting scopes*: 27 | - comparing models 28 | - comparing datasets 29 | - comparing multiclass target classes 30 | - no comparison (single line) 31 | - *plot annotation*: highlighting specific values and adding 32 | explanatory text to guide interpretation 33 | - *plot customization*: all textual elements, line colors 34 | - *save plot* to file on disk 35 | 36 | ## Installation 37 | 38 | You can install `modelplotr` from 39 | [GitHub](https://github.com/modelplot/modelplotr) with: 40 | 41 | ``` r 42 | devtools::install_github("modelplot/modelplotr") 43 | ``` 44 | 45 | See this blog for further details and examples of using the package. 46 | -------------------------------------------------------------------------------- /cran-comments.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "cran-comments" 3 | author: "Jurriaan Nagelkerke" 4 | date: "`r Sys.Date()`" 5 | output: github_document 6 | --- 7 | 8 | 9 | ## Adjustments after first submission: 10 | 11 | - removed authors from DESCRIPTION (Authors@R remains) 12 | - removed Tensorflow dependency in Vignette example 13 | - downsized examples / dontrun examples for examples that need tensorflow installation/long runtime 14 | - shortened description in DESCRIPTION 15 | - changed print/cat lines into message/warning lines 16 | - removed writing to user filespace in writing to tempdir() 17 | 18 | 19 | ## R CMD check results 20 | There were no ERRORs or WARNINGs or NOTES. 21 | 22 | -------------------------------------------------------------------------------- /cran-comments.md: -------------------------------------------------------------------------------- 1 | cran-comments 2 | ================ 3 | Jurriaan Nagelkerke 4 | 2019-04-16 5 | 6 | ## Adjustments after first submission: 7 | 8 | - removed authors from DESCRIPTION ( remains) 9 | - removed Tensorflow dependency in Vignette example 10 | - downsized examples / dontrun examples for examples that need 11 | tensorflow installation/long runtime 12 | - shortened description in DESCRIPTION 13 | - changed print/cat lines into message/warning lines 14 | - removed writing to user filespace in writing to tempdir() 15 | 16 | ## R CMD check results 17 | 18 | There were no ERRORs or WARNINGs or NOTES. 19 | -------------------------------------------------------------------------------- /data/bank_td.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jurrr/modelplotr/9ca4c5dc319eae91c854038f51fb76396fe82371/data/bank_td.rda -------------------------------------------------------------------------------- /man/aggregate_over_ntiles.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/dataprepmodelplots.R 3 | \name{aggregate_over_ntiles} 4 | \alias{aggregate_over_ntiles} 5 | \title{Build a dataframe with aggregated evaluation measures} 6 | \usage{ 7 | aggregate_over_ntiles(prepared_input) 8 | } 9 | \arguments{ 10 | \item{prepared_input}{Dataframe resulting from function \code{\link{prepare_scores_and_ntiles}} or a data frame that meets 11 | requirements as specified in the section below: \bold{When you build input for aggregate_over_ntiles() yourself} .} 12 | } 13 | \value{ 14 | Dataframe object is returned, containing: 15 | \tabular{lll}{ 16 | \bold{column} \tab \bold{type} \tab \bold{definition} \cr 17 | model_label \tab String \tab Name of the model object \cr 18 | dataset_label \tab Factor \tab Datasets to include in the plot as factor levels\cr 19 | target_class\tab String or Integer\tab Target classes to include in the plot\cr 20 | ntile\tab Integer\tab Ntile groups based on model probability for target class\cr 21 | neg\tab Integer\tab Number of cases not belonging to target class in dataset in ntile\cr 22 | pos\tab Integer\tab Number of cases belonging to target class in dataset in ntile\cr 23 | tot\tab Integer\tab Total number of cases in dataset in ntile\cr 24 | pct\tab Decimal \tab Percentage of cases in dataset in ntile that belongs to 25 | target class (pos/tot)\cr 26 | negtot\tab Integer\tab Total number of cases not belonging to target class in dataset\cr 27 | postot\tab Integer\tab Total number of cases belonging to target class in dataset\cr 28 | tottot\tab Integer\tab Total number of cases in dataset\cr 29 | pcttot\tab Decimal\tab Percentage of cases in dataset that belongs to 30 | target class (postot / tottot)\cr 31 | cumneg\tab Integer\tab Cumulative number of cases not belonging to target class in 32 | dataset from ntile 1 up until ntile\cr 33 | cumpos\tab Integer\tab Cumulative number of cases belonging to target class in 34 | dataset from ntile 1 up until ntile\cr 35 | cumtot\tab Integer\tab Cumulative number of cases in dataset from ntile 1 36 | up until ntile\cr 37 | cumpct\tab Integer\tab Cumulative percentage of cases belonging to target class in 38 | dataset from ntile 1 up until ntile (cumpos/cumtot)\cr 39 | gain\tab Decimal\tab Gains value for dataset for ntile (pos/postot)\cr 40 | cumgain\tab Decimal\tab Cumulative gains value for dataset for ntile 41 | (cumpos/postot)\cr 42 | gain_ref\tab Decimal\tab Lower reference for gains value for dataset for ntile 43 | (ntile/#ntiles)\cr 44 | gain_opt\tab Decimal\tab Upper reference for gains value for dataset for ntile\cr 45 | lift\tab Decimal\tab Lift value for dataset for ntile (pct/pcttot)\cr 46 | cumlift\tab Decimal\tab Cumulative lift value for dataset for ntile 47 | ((cumpos/cumtot)/pcttot)\cr 48 | cumlift_ref\tab Decimal\tab Reference value for Cumulative lift value (constant: 1) 49 | } 50 | } 51 | \description{ 52 | Build a dataframe with aggregated actuals and predictions. 53 | Records in this dataframe represent the unique combinations of models [m], datasets [d], targetvalues [t] and ntiles [n]. 54 | The size of this dataframe therefore is (m*d*t*n) rows and 23 columns. \cr\cr \bold{\emph{In most cases, you do not need to use function 55 | since the \code{\link{plotting_scope}} function will call this function automatically.}} 56 | } 57 | \section{When you build input for aggregate_over_ntiles() yourself}{ 58 | 59 | To make plots with modelplotr, is not required to use the function prepare_scores_and_ntiles to generate the required input data. 60 | You can create your own dataframe containing actuals and probabilities and ntiles (1st ntile = (1/#ntiles) percent 61 | with highest model probability, last ntile = (1/#ntiles) percent with lowest probability according to model) , 62 | In that case, make sure the input dataframe contains the folowing columns & formats: 63 | \tabular{lll}{ 64 | \bold{column} \tab \bold{type} \tab \bold{definition} \cr 65 | model_label \tab Factor \tab Name of the model object \cr 66 | dataset_label \tab Factor \tab Datasets to include in the plot as factor levels\cr 67 | y_true \tab Factor \tab Target with actual values \cr 68 | prob_[tv1] \tab Decimal \tab Probability according to model for target value 1 \cr 69 | prob_[tv2] \tab Decimal \tab Probability according to model for target value 2 \cr 70 | ... \tab ... \tab ... \cr 71 | prob_[tvn] \tab Decimal \tab Probability according to model for target value n \cr 72 | ntl_[tv1] \tab Integer \tab Ntile based on probability according to model for target value 1 \cr 73 | ntl_[tv2] \tab Integerl \tab Ntile based on probability according to model for target value 2 \cr 74 | ... \tab ... \tab ... \cr 75 | ntl_[tvn] \tab Integer \tab Ntile based on probability according to model for target value n 76 | } 77 | See \code{\link{build_input_yourself}} for an example to build the required input yourself. 78 | } 79 | 80 | \examples{ 81 | \dontrun{ 82 | # load example data (Bank clients with/without a term deposit - see ?bank_td for details) 83 | data("bank_td") 84 | 85 | # prepare data for training model for binomial target has_td and train models 86 | train_index = sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE) 87 | train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 88 | test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 89 | 90 | #train models using mlr... 91 | trainTask <- mlr::makeClassifTask(data = train, target = "has_td") 92 | testTask <- mlr::makeClassifTask(data = test, target = "has_td") 93 | mlr::configureMlr() # this line is needed when using mlr without loading it (mlr::) 94 | task = mlr::makeClassifTask(data = train, target = "has_td") 95 | lrn = mlr::makeLearner("classif.randomForest", predict.type = "prob") 96 | rf = mlr::train(lrn, task) 97 | lrn = mlr::makeLearner("classif.multinom", predict.type = "prob") 98 | mnl = mlr::train(lrn, task) 99 | #... or train models using caret... 100 | # setting caret cross validation, here tuned for speed (not accuracy!) 101 | fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE) 102 | # random forest using ranger package, here tuned for speed (not accuracy!) 103 | rf = caret::train(has_td ~.,data = train, method = "ranger",trControl = fitControl, 104 | tuneGrid = expand.grid(.mtry = 2,.splitrule = "gini",.min.node.size=10)) 105 | # mnl model using glmnet package 106 | mnl = caret::train(has_td ~.,data = train, method = "glmnet",trControl = fitControl) 107 | #... or train models using h2o... 108 | h2o::h2o.init() 109 | h2o::h2o.no_progress() 110 | h2o_train = h2o::as.h2o(train) 111 | h2o_test = h2o::as.h2o(test) 112 | gbm <- h2o::h2o.gbm(y = "has_td", 113 | x = setdiff(colnames(train), "has_td"), 114 | training_frame = h2o_train, 115 | nfolds = 5) 116 | #... or train models using keras. 117 | x_train <- as.matrix(train[,-1]); y=train[,1]; y_train <- keras::to_categorical(as.numeric(y)-1); 118 | `\%>\%` <- magrittr::`\%>\%` 119 | nn <- keras::keras_model_sequential() \%>\% 120 | keras::layer_dense(units = 16,kernel_initializer = "uniform",activation = 'relu', 121 | input_shape = NCOL(x_train))\%>\% 122 | keras::layer_dense(units = 16,kernel_initializer = "uniform", activation='relu') \%>\% 123 | keras::layer_dense(units = length(levels(train[,1])),activation='softmax') 124 | nn \%>\% keras::compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=c('accuracy')) 125 | nn \%>\% keras::fit(x_train,y_train,epochs = 20,batch_size = 1028,verbose=0) 126 | 127 | # preparation steps 128 | scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"), 129 | dataset_labels = list("train data","test data"), 130 | models = list("rf","mnl", "gbm","nn"), 131 | model_labels = list("random forest","multinomial logit", 132 | "gradient boosting machine","artificial neural network"), 133 | target_column="has_td") 134 | aggregated <- aggregate_over_ntiles(prepared_input=scores_and_ntiles) 135 | head(aggregated) 136 | plot_input <- plotting_scope(prepared_input = aggregated) 137 | head(plot_input) 138 | } 139 | } 140 | \seealso{ 141 | \code{\link{modelplotr}} for generic info on the package \code{moddelplotr} 142 | 143 | \code{vignette('modelplotr')} 144 | 145 | \code{\link{prepare_scores_and_ntiles}} for details on the function \code{prepare_scores_and_ntiles} 146 | that generates the required input. 147 | 148 | \code{\link{plotting_scope}} for details on the function \code{plotting_scope} that 149 | filters the output of \code{aggregate_over_ntiles} to prepare it for the required evaluation. 150 | 151 | \code{\link{build_input_yourself}} for an example to build the required input yourself. 152 | 153 | \url{https://github.com/modelplot/modelplotr} for details on the package 154 | 155 | \url{https://modelplot.github.io/} for our blog on the value of the model plots 156 | } 157 | -------------------------------------------------------------------------------- /man/bank_td.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/bank_td.R 3 | \docType{data} 4 | \name{bank_td} 5 | \alias{bank_td} 6 | \title{Bank clients that have/have not subscribed a term deposit.} 7 | \format{ 8 | A data frame with 2000 rows and 6 variables: 9 | \describe{ 10 | \item{has_td}{has the client subscribed a term deposit? Values: "term deposit", "no". 11 | This variable is used as the binary target variable in examples for the modelplotr package.} 12 | \item{td_type}{what type of term deposit did the client subscribe? Values: "no.td", "td.type.A", "td.type.B", "td.type.C". 13 | This variable is used as the multinomial target variable in examples for the modelplotr package.} 14 | \item{duration}{last contact duration, in seconds (numeric)} 15 | \item{campaign}{number of contacts performed during this campaign and for this client} 16 | \item{pdays}{number of days that passed by after the client was last contacted from a previous campaign} 17 | \item{previous}{number of contacts performed before this campaign and for this client (numeric)} 18 | \item{euribor3m}{euribor 3 month rate} 19 | } 20 | } 21 | \source{ 22 | This dataset is a subset of the dataset made available by the University of California, Irvine. 23 | The complete dataset is available here: \url{https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank-additional.zip} 24 | } 25 | \usage{ 26 | bank_td 27 | } 28 | \description{ 29 | A dataset containing some customer characteristics for clients of a bank that have/have not subscribed a term deposit. 30 | } 31 | \keyword{datasets} 32 | -------------------------------------------------------------------------------- /man/build_input_yourself.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/dataprepmodelplots.R 3 | \name{build_input_yourself} 4 | \alias{build_input_yourself} 5 | \title{Example: build required input from a custom model} 6 | \description{ 7 | It's very easy to apply modelplotr 8 | to predictive models that are developed in caret, mlr, h2o or keras. However, also for models that are developed differently, 9 | even those built outside of R, it only takes a bit more work to use modelplotr on top of these models. 10 | In this section we introduce the required format and an example. 11 | } 12 | \section{When you build input for plotting_scope() yourself}{ 13 | 14 | To make plots with modelplotr, is not required to use the function prepare_scores_and_ntiles to generate the required input data. 15 | You can create your own dataframe containing actuals and probabilities and ntiles (1st ntile = (1/#ntiles) percent 16 | with highest model probability, last ntile = (1/#ntiles) percent with lowest probability according to model) , 17 | In that case, make sure the input dataframe contains the folowing columns & formats: 18 | \tabular{lll}{ 19 | \bold{column} \tab \bold{type} \tab \bold{definition} \cr 20 | model_label \tab Factor \tab Name of the model object \cr 21 | dataset_label \tab Factor \tab Datasets to include in the plot as factor levels\cr 22 | y_true \tab Factor \tab Target with actual values \cr 23 | prob_[tv1] \tab Decimal \tab Probability according to model for target value 1 \cr 24 | prob_[tv2] \tab Decimal \tab Probability according to model for target value 2 \cr 25 | ... \tab ... \tab ... \cr 26 | prob_[tvn] \tab Decimal \tab Probability according to model for target value n \cr 27 | ntl_[tv1] \tab Integer \tab Ntile based on probability according to model for target value 1 \cr 28 | ntl_[tv2] \tab Integerl \tab Ntile based on probability according to model for target value 2 \cr 29 | ... \tab ... \tab ... \cr 30 | ntl_[tvn] \tab Integer \tab Ntile based on probability according to model for target value n 31 | } 32 | } 33 | 34 | \examples{ 35 | # load example data (Bank clients with/without a term deposit - see ?bank_td for details) 36 | data("bank_td") 37 | library(dplyr) 38 | # prepare data for training model for binomial target has_td and train models 39 | train_index = sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE) 40 | train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 41 | test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 42 | 43 | #train logistic regression model with stats package 44 | glm.model <- glm(has_td ~.,family=binomial(link='logit'),data=train) 45 | #score model 46 | prob_no.term.deposit <- stats::predict(glm.model,newdata=train,type='response') 47 | prob_term.deposit <- 1-prob_no.term.deposit 48 | #set number of ntiles 49 | ntiles = 10 50 | # determine cutoffs 51 | cutoffs = c(stats::quantile(prob_term.deposit,probs = seq(0,1,1/ntiles),na.rm = TRUE)) 52 | #calculate ntile values 53 | ntl_term.deposit <- (ntiles+1)-as.numeric(cut(prob_term.deposit,breaks=cutoffs,include.lowest=TRUE)) 54 | ntl_no.term.deposit <- (ntiles+1)-ntl_term.deposit 55 | # create scored data frame 56 | scores_and_ntiles <- train \%>\% 57 | select(has_td) \%>\% 58 | mutate(model_label=factor('logistic regression'), 59 | dataset_label=factor('train data'), 60 | y_true=factor(has_td), 61 | prob_term.deposit = prob_term.deposit, 62 | prob_no.term.deposit = prob_no.term.deposit, 63 | ntl_term.deposit = ntl_term.deposit, 64 | ntl_no.term.deposit = ntl_no.term.deposit) \%>\% 65 | select(-has_td) 66 | 67 | # add test data 68 | #score model on test data 69 | prob_no.term.deposit <- stats::predict(glm.model,newdata=test,type='response') 70 | prob_term.deposit <- 1-prob_no.term.deposit 71 | #set number of ntiles 72 | ntiles = 10 73 | # determine cutoffs 74 | cutoffs = c(stats::quantile(prob_term.deposit,probs = seq(0,1,1/ntiles),na.rm = TRUE)) 75 | #calculate ntile values 76 | ntl_term.deposit <- (ntiles+1)-as.numeric(cut(prob_term.deposit,breaks=cutoffs,include.lowest=TRUE)) 77 | ntl_no.term.deposit <- (ntiles+1)-ntl_term.deposit 78 | scores_and_ntiles <- scores_and_ntiles \%>\% 79 | rbind( 80 | test \%>\% 81 | select(has_td) \%>\% 82 | mutate(model_label=factor('logistic regression'), 83 | dataset_label=factor('test data'), 84 | y_true=factor(has_td), 85 | prob_term.deposit = prob_term.deposit, 86 | prob_no.term.deposit = prob_no.term.deposit, 87 | ntl_term.deposit = ntl_term.deposit, 88 | ntl_no.term.deposit = ntl_no.term.deposit) \%>\% 89 | select(-has_td) 90 | ) 91 | 92 | plot_input <- plotting_scope(prepared_input = scores_and_ntiles,scope='compare_datasets') 93 | plot_cumgains() 94 | 95 | } 96 | -------------------------------------------------------------------------------- /man/customize_plot_text.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/parametersmodelplots.R 3 | \name{customize_plot_text} 4 | \alias{customize_plot_text} 5 | \title{Customize textual elements of the plots} 6 | \usage{ 7 | customize_plot_text(plot_input = plot_input) 8 | } 9 | \arguments{ 10 | \item{plot_input}{Dataframe. Dataframe needs to be created with 11 | \code{\link{plotting_scope}} or else meet required input format.} 12 | } 13 | \value{ 14 | List with default values for all textual elements of the plots. 15 | } 16 | \description{ 17 | Function to overrule the default textual elements in the plots, like title, subtitle, 18 | axis labels and annotation texts when the highlighting parameter \code{highlight_ntile} 19 | is specified. 20 | } 21 | \section{How to customize textual elements of plots}{ 22 | 23 | All textual parts of the plots can be customized, for instance to translate 24 | textual elements to another language or to change the annotation text that is added with the 25 | \code{highlight_ntile} parameter. Once you have created the \code{plot_input} dataframe 26 | using \code{plotting_Scope}, you can run this \code{customize_plot_text()} function. 27 | It returns a list, containing all textual elements of the plots, including annotation texts. 28 | For instance, run \cr\cr 29 | \code{my_plot_text <- customize_plot_text(plot_input = plot_input)} \cr\cr 30 | The list contains plot-specific elements (e.g. \code{...$cumgains$...})). \cr 31 | Now, you can change the textual elements by overriding the element(s) you want to customize. 32 | For instance, if you want to change the textual elements of the gains plot to Dutch:\cr\cr 33 | \code{my_plot_text$gains$plottitle <- 'Cumulatieve Gains grafiek'}\cr 34 | \code{my_plot_text$gains$x_axis_label <- 'Deciel'}\cr 35 | \code{my_plot_text$gains$y_axis_label <- 'cumulatieve gains'}\cr 36 | \code{my_plot_text$cumgains$optimal_gains_label <- 'maximale gains'}\cr 37 | \code{my_plot_text$cumgains$minimal_gains_label <- 'minimale gains'}\cr 38 | \code{plot_cumgains(custom_plot_text = my_plot_text)}\cr\cr 39 | To change the annotation text, use the placeholders starting with '&' to dynamically include: 40 | \tabular{ll}{ 41 | \bold{palaceholder} \tab \bold{placeholder value}\cr 42 | \code{&NTL} \tab ntile specified with parameter \code{highlight_ntile}.\cr 43 | \code{&PCTNTL} \tab Total percentage of dataset selected up until specified ntile.\cr 44 | \code{&MDL} \tab Selected model label(s).\cr 45 | \code{&DS} \tab Selected dataset label(s).\cr 46 | \code{&YVAL} \tab Selected target class (Y-value).\cr 47 | \code{&VALUE} \tab The plot specific value at specified ntile. 48 | Eg. Cumulative gains, Rumulative lift, Response, Cumulative response, Profit, ROI or Revenue.\cr 49 | } 50 | For instance, to translate the gains plot annotation text to Dutch:\cr 51 | \code{my_plot_text$cumlift$annotationtext <- "Door &PCTNTL met de hoogste modelkans volgens model &MDL 52 | in &DS te selecteren is deze selectie van &YVAL observaties &CUMLIFT keer beter dan een random selectie."}\cr 53 | \code{plot_cumlift(highlight_ntile=3,custom_plot_text=my_plot_text)} 54 | } 55 | 56 | \examples{ 57 | # load example data (Bank clients with/without a term deposit - see ?bank_td for details) 58 | data("bank_td") 59 | 60 | # prepare data for training model for binomial target has_td and train models 61 | train_index = sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE) 62 | train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 63 | test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 64 | 65 | #train models using caret... (or use mlr or H2o or keras ... see ?prepare_scores_and_ntiles) 66 | # setting caret cross validation, here tuned for speed (not accuracy!) 67 | fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE) 68 | # random forest using ranger package, here tuned for speed (not accuracy!) 69 | rf = caret::train(has_td ~.,data = train, method = "ranger",trControl = fitControl, 70 | tuneGrid = expand.grid(.mtry = 2,.splitrule = "gini",.min.node.size=10)) 71 | # mnl model using glmnet package 72 | mnl = caret::train(has_td ~.,data = train, method = "glmnet",trControl = fitControl) 73 | 74 | # load modelplotr 75 | library(modelplotr) 76 | 77 | # transform datasets and model objects to input for modelplotr 78 | scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"), 79 | dataset_labels = list("train data","test data"), 80 | models = list("rf","mnl"), 81 | model_labels = list("random forest","multinomial logit"), 82 | target_column="has_td", 83 | ntiles=100) 84 | 85 | # set scope for analysis (default: no comparison) 86 | plot_input <- plotting_scope(prepared_input = scores_and_ntiles) 87 | 88 | # customize all textual elements of plots 89 | mytexts <- customize_plot_text(plot_input = plot_input) 90 | mytexts$cumresponse$plottitle <- 'Expected conversion rate for Campaign XYZ' 91 | mytexts$cumresponse$plotsubtitle <- 'proposed selection: best 15 percentiles according to our model' 92 | mytexts$cumresponse$y_axis_label <- '\% Conversion' 93 | mytexts$cumresponse$x_axis_label <- 'percentiles (percentile = 1\% of customers)' 94 | mytexts$cumresponse$annotationtext <- 95 | "Selecting up until the &NTL percentile with model &MDL has an expected conversion rate of &VALUE" 96 | plot_cumresponse(data=plot_input,custom_plot_text = mytexts,highlight_ntile = 15) 97 | } 98 | \seealso{ 99 | \code{\link{modelplotr}} for generic info on the package \code{moddelplotr} 100 | 101 | \code{vignette('modelplotr')} 102 | 103 | \url{https://github.com/modelplot/modelplotr} for details on the package 104 | 105 | \url{https://modelplot.github.io/} for our blog on the value of the model plots 106 | } 107 | -------------------------------------------------------------------------------- /man/modelplotr.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/modelplotr.R 3 | \docType{package} 4 | \name{modelplotr} 5 | \alias{modelplotr} 6 | \title{modelplotr: Plots to Evaluate the Business Performance of Predictive Models.} 7 | \description{ 8 | Plots to evaluate the business performance of predictive models in R. 9 | A number of widely used plots to assess the quality of a predictive model from a business perspective 10 | can easily be created. Using these plots, it can be shown how implementation of the model will impact 11 | business targets like response on a campaign or return on investment. It's very easy to apply modelplotr 12 | to predictive models that are developed in caret, mlr, h2o or keras. For other models, even those built 13 | outside of R, an instruction is included. 14 | The modelplotr package provides three categories of important functions: 15 | datapreparation, plot parameterization and plotting. 16 | } 17 | \section{Datapreparation functions}{ 18 | 19 | The datapreparation functions are: 20 | \describe{ 21 | \item{\code{\link{prepare_scores_and_ntiles}}}{Function that builds a dataframe 22 | that contains actuals and predictions on the target variable for each dataset in \code{datasets} and each model in \code{models}. 23 | As inputs, it takes dataframes to score and model objects created with \strong{caret}, \strong{mlr}, \strong{h2o} or \strong{keras}. 24 | Specifically for keras models, built with keras_model_sequential() or with the keras functional API, there is the 25 | \code{\link{prepare_scores_and_ntiles_keras}} function. 26 | To use modelplotr on top of models created otherwise, even models built outside r, see \code{\link{aggregate_over_ntiles}}} 27 | \item{\code{\link{plotting_scope}}}{Function that creates a dataframe in the required format for all 28 | modelplotr plots, relevant to the selected scope of evaluation. Each record in this dataframe represents 29 | a unique combination of datasets, models, target classes and ntiles. As an input, plotting_scope can handle 30 | both a dataframe created with \code{aggregate_over_ntiles} as well as a dataframe created with 31 | \code{prepare_scores_and_ntiles} (or with \code{prepare_scores_and_ntiles_keras} or created otherwise, with similar layout). } 32 | \item{\code{\link{aggregate_over_ntiles}}}{Function that aggregates the output of \code{prepare_scores_and_ntiles} 33 | to create a dataframe with aggregated actuals and predictions. Each record in this dataframe represents 34 | a unique combination of datasets, models, target classes and ntiles. In most cases, you do not need to use function 35 | since the \code{plotting_scope} function will call this function automatically. }} 36 | } 37 | 38 | \section{Parameterization functions}{ 39 | 40 | Most parameterization functions are internal functions. However, one is available for customization: 41 | \describe{ 42 | \item{\code{\link{customize_plot_text}}}{Function that returns a list that contains all textual elements for 43 | all plots that modelplotr can create. By changing the elements in this list - simply by overwriting values - 44 | and then including this list with the \code{custom_plot_text} parameter in plot functions, plot texts can easily be customized 45 | to meet your (language) preferences}} 46 | } 47 | 48 | \section{Plotting functions}{ 49 | 50 | The plotting functions are: 51 | \describe{ 52 | \item{\code{\link{plot_cumgains}}}{Generates the cumulative gains plot. This plot, often referred to as the gains chart, 53 | helps answering the question: \strong{\emph{When we apply the model and select the best X ntiles, 54 | what percentage of the actual target class observations can we expect to target?}} } 55 | \item{\code{\link{plot_cumlift}}}{Generates the cumulative lift plot, often referred to as lift plot or index plot, 56 | helps you answer the question: \strong{\emph{When we apply the model and select the best X ntiles, 57 | how many times better is that than using no model at all?}}} 58 | \item{\code{\link{plot_response}}}{Generates the response plot. It plots the percentage of target class observations 59 | per ntile. It can be used to answer the following business question: \strong{\emph{When we apply 60 | the model and select ntile X, what is the expected percentage of target class observations 61 | in that ntile?}}} 62 | \item{\code{\link{plot_cumresponse}}}{Generates the cumulative response plot. It plots the cumulative percentage of 63 | target class observations up until that ntile. It helps answering the question: 64 | \strong{\emph{When we apply the model and select up until ntile X, what is the expected percentage of 65 | target class observations in the selection? }}} 66 | \item{\code{\link{plot_multiplot}}}{Generates a canvas with all four evaluation plots - cumulative gains, cumulative lift, 67 | response and cumulative response - combined on one canvas} 68 | \item{\code{\link{plot_costsrevs}}}{It plots the cumulative costs and revenues up until that ntile when the model 69 | is used for campaign selection. It can be used to answer the following business question: 70 | \strong{\emph{When we apply the model and select up until ntile X, what are the expected costs and 71 | revenues of the campaign?}}} 72 | \item{\code{\link{plot_profit}}}{Generates the Profit plot. It plots the cumulative profit up until that ntile when the 73 | model is used for campaign selection. It can be used to answer the following business question: 74 | \strong{\emph{When we apply the model and select up until ntile X, what is the expected profit of the campaign?}}} 75 | \item{\code{\link{plot_roi}}}{Generates the Return on Investment plot. It plots the cumulative revenues as a percentage 76 | of investments up until that ntile when the model is used for campaign selection. It can be used to answer the following 77 | business question: \strong{\emph{When we apply the model and select up until ntile X, what is the expected % return on 78 | investment of the campaign?}}} 79 | } 80 | } 81 | 82 | \examples{ 83 | \dontrun{ 84 | # load example data (Bank clients with/without a term deposit - see ?bank_td for details) 85 | data("bank_td") 86 | 87 | # prepare data for training model for binomial target has_td and train models 88 | train_index = sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE) 89 | train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 90 | test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 91 | 92 | #train models using caret... (or use mlr or H2o or keras ... see ?prepare_scores_and_ntiles) 93 | # setting caret cross validation, here tuned for speed (not accuracy!) 94 | fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE) 95 | # random forest using ranger package, here tuned for speed (not accuracy!) 96 | rf = caret::train(has_td ~.,data = train, method = "ranger",trControl = fitControl, 97 | tuneGrid = expand.grid(.mtry = 2,.splitrule = "gini",.min.node.size=10)) 98 | # mnl model using glmnet package 99 | mnl = caret::train(has_td ~.,data = train, method = "glmnet",trControl = fitControl) 100 | 101 | # load modelplotr 102 | library(modelplotr) 103 | 104 | # transform datasets and model objects to input for modelplotr 105 | scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"), 106 | dataset_labels = list("train data","test data"), 107 | models = list("rf","mnl"), 108 | model_labels = list("random forest","multinomial logit"), 109 | target_column="has_td", 110 | ntiles=100) 111 | 112 | # set scope for analysis (default: no comparison) 113 | plot_input <- plotting_scope(prepared_input = scores_and_ntiles) 114 | head(plot_input) 115 | 116 | # ALL PLOTS, with defaults 117 | plot_cumgains(data=plot_input) 118 | plot_cumlift(data=plot_input) 119 | plot_response(data=plot_input) 120 | plot_cumresponse(data=plot_input) 121 | plot_multiplot(data=plot_input) 122 | # financial plots - these need some financial parameters 123 | plot_costsrevs(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50) 124 | plot_profit(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50) 125 | plot_roi(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50) 126 | 127 | # CHANGING THE SCOPE OF ANALYSIS 128 | # changing the scope - compare models: 129 | plot_input <- plotting_scope(prepared_input = scores_and_ntiles,scope="compare_models") 130 | plot_cumgains(data=plot_input) 131 | # changing the scope - compare datasets: 132 | plot_input <- plotting_scope(prepared_input = scores_and_ntiles,scope="compare_datasets") 133 | plot_roi(data = plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50) 134 | # changing the scope - compare target classes: 135 | plot_input <- plotting_scope(prepared_input = scores_and_ntiles,scope="compare_targetclasses") 136 | plot_response(data=plot_input) 137 | # HIGHLIGHTING OPTIONS 138 | plot_input <- plotting_scope(prepared_input = scores_and_ntiles, 139 | scope = 'compare_datasets',select_model_label = 'random forest') 140 | plot_cumgains(data=plot_input,highlight_ntile=20) 141 | plot_cumlift(data=plot_input,highlight_ntile=20,highlight_how = 'plot') 142 | plot_response(data=plot_input,highlight_ntile=20,highlight_how = 'text') 143 | plot_cumresponse(data=plot_input,highlight_ntile=20,highlight_how = 'plot_text') 144 | plot_costsrevs(data=plot_input,fixed_costs = 1000,variable_costs_per_unit = 10, 145 | profit_per_unit = 50,highlight_ntile='max_roi') 146 | plot_profit(data=plot_input,fixed_costs = 1500,variable_costs_per_unit = 10,profit_per_unit = 50) 147 | plot_roi(data=plot_input,fixed_costs = 1500,variable_costs_per_unit = 10,profit_per_unit = 50) 148 | 149 | # OTHER PLOT CUSTOMIZATIONS 150 | # customize line colors 151 | plot_input <- plotting_scope(prepared_input = scores_and_ntiles,scope = 'compare_models') 152 | plot_cumgains(data=plot_input,custom_line_colors = c('pink','navyblue')) 153 | # customize all textual elements of plots 154 | plot_input <- plotting_scope(prepared_input = scores_and_ntiles) 155 | mytexts <- customize_plot_text(plot_input = plot_input) 156 | mytexts$cumresponse$plottitle <- 'Expected conversion rate for Campaign XYZ' 157 | mytexts$cumresponse$plotsubtitle <- 'proposed selection: best 15 percentiles according to our model' 158 | mytexts$cumresponse$y_axis_label <- '\% Conversion' 159 | mytexts$cumresponse$x_axis_label <- 'percentiles (percentile = 1\% of customers)' 160 | mytexts$cumresponse$annotationtext <- 161 | "Selecting up until the &NTL percentile with model &MDL has an expected conversion rate of &VALUE" 162 | plot_cumresponse(data=plot_input,custom_plot_text = mytexts,highlight_ntile = 15) 163 | } 164 | } 165 | \seealso{ 166 | \code{vignette('modelplotr')} 167 | 168 | \url{https://github.com/modelplot/modelplotr} for details on the package 169 | 170 | \url{https://modelplot.github.io/} for our blog posts on using modelplotr 171 | } 172 | \author{ 173 | Jurriaan Nagelkerke [aut, cre] 174 | 175 | Pieter Marcus [aut] 176 | } 177 | -------------------------------------------------------------------------------- /man/plot_costsrevs.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/plottingmodelplots.R 3 | \name{plot_costsrevs} 4 | \alias{plot_costsrevs} 5 | \title{Costs & Revenues plot} 6 | \usage{ 7 | plot_costsrevs( 8 | data = plot_input, 9 | highlight_ntile = "max_profit", 10 | highlight_how = "plot_text", 11 | save_fig = FALSE, 12 | save_fig_filename = NA, 13 | custom_line_colors = NA, 14 | custom_plot_text = NULL, 15 | fixed_costs, 16 | variable_costs_per_unit, 17 | profit_per_unit 18 | ) 19 | } 20 | \arguments{ 21 | \item{data}{Dataframe. Dataframe needs to be created with \code{\link{plotting_scope}} 22 | or else meet required input format.} 23 | 24 | \item{highlight_ntile}{Integer or string ("max_roi" or "max_profit"). Specifying the ntile at which the plot is annotated 25 | and/or performances are highlighted. Default value is \code{max_profit}, highlighting the ntile where difference between 26 | returns and costs (hence: profits) is greatest.} 27 | 28 | \item{highlight_how}{String. How to annotate the plot. Possible values: "plot_text","plot", "text". 29 | Default is "plot_text", both highlighting the ntile and value on the plot as well as in text below the plot. 30 | "plot" only highligths the plot, but does not add text below the plot explaining the plot at chosen ntile. 31 | "text" adds text below the plot explaining the plot at chosen ntile but does not highlight the plot.} 32 | 33 | \item{save_fig}{Logical. Save plot to file? Default = FALSE. When set to TRUE, saved plot is optimized for 36x24cm.} 34 | 35 | \item{save_fig_filename}{String. Filename of saved plot. Default the plot is saved as {tempdir()}/{plotname}.png.} 36 | 37 | \item{custom_line_colors}{Vector of Strings. Specifying colors for the lines in the plot. 38 | When not specified, colors from the RColorBrewer palet "Set1" are used.} 39 | 40 | \item{custom_plot_text}{List. List with customized textual elements for plot. Create a list with defaults 41 | by using \code{\link{customize_plot_text}} and override default values to customize.} 42 | 43 | \item{fixed_costs}{Numeric. Specifying the fixed costs related to a selection based on the model. 44 | These costs are constant and do not vary with selection size (ntiles).} 45 | 46 | \item{variable_costs_per_unit}{Numeric. Specifying the variable costs per selected unit for a selection based on the model. 47 | These costs vary with selection size (ntiles).} 48 | 49 | \item{profit_per_unit}{Numeric. Specifying the profit per unit in case the selected unit converts / responds positively.} 50 | } 51 | \value{ 52 | gtable, containing 6 grobs. 53 | } 54 | \description{ 55 | Generates the Costs & Revenues plot. It plots the cumulative costs and revenues up until that ntile when the model is used 56 | for campaign selection. It can be used to answer the following business question: \bold{\emph{When we apply the model and 57 | select up until ntile X, what are the expected costs and revenues of the campaign?}} 58 | Extra parameters needed for this plot are: fixed_costs, variable_costs_per_unit and profit_per_unit. 59 | } 60 | \examples{ 61 | # load example data (Bank clients with/without a term deposit - see ?bank_td for details) 62 | data("bank_td") 63 | # prepare data for training model for binomial target has_td and train models 64 | train_index = sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE) 65 | train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 66 | test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 67 | #train models using caret... (or use mlr or H2o or keras ... see ?prepare_scores_and_ntiles) 68 | # setting caret cross validation, here tuned for speed (not accuracy!) 69 | fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE) 70 | # random forest using ranger package, here tuned for speed (not accuracy!) 71 | rf = caret::train(has_td ~.,data = train, method = "ranger",trControl = fitControl, 72 | tuneGrid = expand.grid(.mtry = 2,.splitrule = "gini",.min.node.size=10)) 73 | # mnl model using glmnet package 74 | mnl = caret::train(has_td ~.,data = train, method = "glmnet",trControl = fitControl) 75 | # load modelplotr 76 | library(modelplotr) 77 | # transform datasets and model objects to input for modelplotr 78 | scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"), 79 | dataset_labels = list("train data","test data"), 80 | models = list("rf","mnl"), 81 | model_labels = list("random forest","multinomial logit"), 82 | target_column="has_td", 83 | ntiles=100) 84 | # set scope for analysis (default: no comparison) 85 | plot_input <- plotting_scope(prepared_input = scores_and_ntiles,scope='compare_models') 86 | plot_costsrevs(data=plot_input,fixed_costs=1000,variable_costs_per_unit= 10,profit_per_unit=50) 87 | plot_costsrevs(data=plot_input,fixed_costs=1000,variable_costs_per_unit= 10,profit_per_unit=50, 88 | highlight_ntile=20) 89 | plot_costsrevs(data=plot_input,fixed_costs=1000,variable_costs_per_unit= 10,profit_per_unit=50, 90 | highlight_ntile='max_roi') 91 | plot_costsrevs(data=plot_input,fixed_costs=1000,variable_costs_per_unit= 10,profit_per_unit=50, 92 | highlight_ntile='max_profit') 93 | } 94 | \seealso{ 95 | \code{\link{modelplotr}} for generic info on the package \code{moddelplotr} 96 | 97 | \code{vignette('modelplotr')} 98 | 99 | \code{\link{plotting_scope}} for details on the function \code{plotting_scope} that 100 | transforms a dataframe created with \code{prepare_scores_and_ntiles} or \code{aggregate_over_ntiles} to 101 | a dataframe in the required format for all modelplotr plots. 102 | 103 | \code{\link{aggregate_over_ntiles}} for details on the function \code{aggregate_over_ntiles} that 104 | aggregates the output of \code{prepare_scores_and_ntiles} to create a dataframe with aggregated actuals and predictions. 105 | In most cases, you do not need to use it since the \code{plotting_scope} function will call this function automatically. 106 | 107 | \url{https://github.com/modelplot/modelplotr} for details on the package 108 | 109 | \url{https://modelplot.github.io/} for our blog on the value of the model plots 110 | } 111 | -------------------------------------------------------------------------------- /man/plot_cumgains.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/plottingmodelplots.R 3 | \name{plot_cumgains} 4 | \alias{plot_cumgains} 5 | \title{Cumulative gains plot} 6 | \usage{ 7 | plot_cumgains( 8 | data = plot_input, 9 | highlight_ntile = NA, 10 | highlight_how = "plot_text", 11 | save_fig = FALSE, 12 | save_fig_filename = NA, 13 | custom_line_colors = NA, 14 | custom_plot_text = NULL 15 | ) 16 | } 17 | \arguments{ 18 | \item{data}{Dataframe. Dataframe needs to be created with \code{\link{plotting_scope}} 19 | or else meet required input format.} 20 | 21 | \item{highlight_ntile}{Integer. Specifying the ntile at which the plot is annotated 22 | and/or performances are highlighted.} 23 | 24 | \item{highlight_how}{String. How to annotate the plot. Possible values: "plot_text","plot", "text". 25 | Default is "plot_text", both highlighting the ntile and value on the plot as well as in text below the plot. 26 | "plot" only highligths the plot, but does not add text below the plot explaining the plot at chosen ntile. 27 | "text" adds text below the plot explaining the plot at chosen ntile but does not highlight the plot.} 28 | 29 | \item{save_fig}{Logical. Save plot to file? Default = FALSE. When set to TRUE, saved plots are optimized for 18x12cm.} 30 | 31 | \item{save_fig_filename}{String. Filename of saved plot. Default the plot is saved as {tempdir()}/{plotname}.png.} 32 | 33 | \item{custom_line_colors}{Vector of Strings. Specifying colors for the lines in the plot. 34 | When not specified, colors from the RColorBrewer palet "Set1" are used.} 35 | 36 | \item{custom_plot_text}{List. List with customized textual elements for plot. Create a list with defaults 37 | by using \code{\link{customize_plot_text}} and override default values to customize.} 38 | } 39 | \value{ 40 | ggplot object. Cumulative gains plot. 41 | } 42 | \description{ 43 | Generates the cumulative gains plot. This plot, often referred to as the gains chart, 44 | helps answering the question: \bold{\emph{When we apply the model and select the best X ntiles, 45 | what percentage of the actual target class observations can we expect to target?}} 46 | } 47 | \examples{ 48 | # load example data (Bank clients with/without a term deposit - see ?bank_td for details) 49 | data("bank_td") 50 | # prepare data for training model for binomial target has_td and train models 51 | train_index = sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE) 52 | train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 53 | test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 54 | #train models using caret... (or use mlr or H2o or keras ... see ?prepare_scores_and_ntiles) 55 | # setting caret cross validation, here tuned for speed (not accuracy!) 56 | fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE) 57 | # random forest using ranger package, here tuned for speed (not accuracy!) 58 | rf = caret::train(has_td ~.,data = train, method = "ranger",trControl = fitControl, 59 | tuneGrid = expand.grid(.mtry = 2,.splitrule = "gini",.min.node.size=10)) 60 | # mnl model using glmnet package 61 | mnl = caret::train(has_td ~.,data = train, method = "glmnet",trControl = fitControl) 62 | # load modelplotr 63 | library(modelplotr) 64 | # transform datasets and model objects to input for modelplotr 65 | scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"), 66 | dataset_labels = list("train data","test data"), 67 | models = list("rf","mnl"), 68 | model_labels = list("random forest","multinomial logit"), 69 | target_column="has_td", 70 | ntiles=100) 71 | plot_input <- plotting_scope(prepared_input = scores_and_ntiles,scope="compare_models") 72 | plot_cumgains(data=plot_input) 73 | plot_cumgains(data=plot_input,custom_line_colors=c("orange","purple")) 74 | plot_cumgains(data=plot_input,highlight_ntile=20) 75 | } 76 | \seealso{ 77 | \code{\link{modelplotr}} for generic info on the package \code{moddelplotr} 78 | 79 | \code{vignette('modelplotr')} 80 | 81 | \code{\link{plotting_scope}} for details on the function \code{plotting_scope} that 82 | transforms a dataframe created with \code{prepare_scores_and_ntiles} or \code{aggregate_over_ntiles} to 83 | a dataframe in the required format for all modelplotr plots. 84 | 85 | \code{\link{aggregate_over_ntiles}} for details on the function \code{aggregate_over_ntiles} that 86 | aggregates the output of \code{prepare_scores_and_ntiles} to create a dataframe with aggregated actuals and predictions. 87 | In most cases, you do not need to use it since the \code{plotting_scope} function will call this function automatically. 88 | 89 | \url{https://github.com/modelplot/modelplotr} for details on the package 90 | 91 | \url{https://modelplot.github.io/} for our blog on the value of the model plots 92 | } 93 | -------------------------------------------------------------------------------- /man/plot_cumlift.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/plottingmodelplots.R 3 | \name{plot_cumlift} 4 | \alias{plot_cumlift} 5 | \title{Cumulative Lift plot} 6 | \usage{ 7 | plot_cumlift( 8 | data = plot_input, 9 | highlight_ntile = NA, 10 | highlight_how = "plot_text", 11 | save_fig = FALSE, 12 | save_fig_filename = NA, 13 | custom_line_colors = NA, 14 | custom_plot_text = NULL 15 | ) 16 | } 17 | \arguments{ 18 | \item{data}{Dataframe. Dataframe needs to be created with \code{\link{plotting_scope}} 19 | or else meet required input format.} 20 | 21 | \item{highlight_ntile}{Integer. Specifying the ntile at which the plot is annotated 22 | and/or performances are highlighted.} 23 | 24 | \item{highlight_how}{String. How to annotate the plot. Possible values: "plot_text","plot", "text". 25 | Default is "plot_text", both highlighting the ntile and value on the plot as well as in text below the plot. 26 | "plot" only highligths the plot, but does not add text below the plot explaining the plot at chosen ntile. 27 | "text" adds text below the plot explaining the plot at chosen ntile but does not highlight the plot.} 28 | 29 | \item{save_fig}{Logical. Save plot to file? Default = FALSE. When set to TRUE, saved plots are optimized for 18x12cm.} 30 | 31 | \item{save_fig_filename}{String. Filename of saved plot. Default the plot is saved as {tempdir()}/{plotname}.png.} 32 | 33 | \item{custom_line_colors}{Vector of Strings. Specifying colors for the lines in the plot. 34 | When not specified, colors from the RColorBrewer palet "Set1" are used.} 35 | 36 | \item{custom_plot_text}{List. List with customized textual elements for plot. Create a list with defaults 37 | by using \code{\link{customize_plot_text}} and override default values to customize.} 38 | } 39 | \value{ 40 | ggplot object. Lift plot. 41 | } 42 | \description{ 43 | Generates the cumulative lift plot, often referred to as lift plot or index plot, 44 | helps you answer the question: When we apply the model and select the best X ntiles, 45 | how many times better is that than using no model at all? 46 | } 47 | \examples{ 48 | # load example data (Bank clients with/without a term deposit - see ?bank_td for details) 49 | data("bank_td") 50 | # prepare data for training model for binomial target has_td and train models 51 | train_index = sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE) 52 | train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 53 | test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 54 | #train models using caret... (or use mlr or H2o or keras ... see ?prepare_scores_and_ntiles) 55 | # setting caret cross validation, here tuned for speed (not accuracy!) 56 | fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE) 57 | # random forest using ranger package, here tuned for speed (not accuracy!) 58 | rf = caret::train(has_td ~.,data = train, method = "ranger",trControl = fitControl, 59 | tuneGrid = expand.grid(.mtry = 2,.splitrule = "gini",.min.node.size=10)) 60 | # mnl model using glmnet package 61 | mnl = caret::train(has_td ~.,data = train, method = "glmnet",trControl = fitControl) 62 | # load modelplotr 63 | library(modelplotr) 64 | # transform datasets and model objects to input for modelplotr 65 | scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"), 66 | dataset_labels = list("train data","test data"), 67 | models = list("rf","mnl"), 68 | model_labels = list("random forest","multinomial logit"), 69 | target_column="has_td", 70 | ntiles=100) 71 | plot_input <- plotting_scope(prepared_input = scores_and_ntiles,scope="compare_datasets") 72 | plot_cumlift(data=plot_input) 73 | plot_cumlift(data=plot_input,custom_line_colors=c("orange","purple")) 74 | plot_cumlift(data=plot_input,highlight_ntile=2) 75 | } 76 | \seealso{ 77 | \code{\link{modelplotr}} for generic info on the package \code{moddelplotr} 78 | 79 | \code{vignette('modelplotr')} 80 | 81 | \code{\link{plotting_scope}} for details on the function \code{plotting_scope} that 82 | transforms a dataframe created with \code{prepare_scores_and_ntiles} or \code{aggregate_over_ntiles} to 83 | a dataframe in the required format for all modelplotr plots. 84 | 85 | \code{\link{aggregate_over_ntiles}} for details on the function \code{aggregate_over_ntiles} that 86 | aggregates the output of \code{prepare_scores_and_ntiles} to create a dataframe with aggregated actuals and predictions. 87 | In most cases, you do not need to use it since the \code{plotting_scope} function will call this function automatically. 88 | 89 | \url{https://github.com/modelplot/modelplotr} for details on the package 90 | 91 | \url{https://modelplot.github.io/} for our blog on the value of the model plots 92 | } 93 | -------------------------------------------------------------------------------- /man/plot_cumresponse.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/plottingmodelplots.R 3 | \name{plot_cumresponse} 4 | \alias{plot_cumresponse} 5 | \title{Cumulative Respose plot} 6 | \usage{ 7 | plot_cumresponse( 8 | data = plot_input, 9 | highlight_ntile = NA, 10 | highlight_how = "plot_text", 11 | save_fig = FALSE, 12 | save_fig_filename = NA, 13 | custom_line_colors = NA, 14 | custom_plot_text = NULL 15 | ) 16 | } 17 | \arguments{ 18 | \item{data}{Dataframe. Dataframe needs to be created with \code{\link{plotting_scope}} 19 | or else meet required input format.} 20 | 21 | \item{highlight_ntile}{Integer. Specifying the ntile at which the plot is annotated 22 | and/or performances are highlighted.} 23 | 24 | \item{highlight_how}{String. How to annotate the plot. Possible values: "plot_text","plot", "text". 25 | Default is "plot_text", both highlighting the ntile and value on the plot as well as in text below the plot. 26 | "plot" only highligths the plot, but does not add text below the plot explaining the plot at chosen ntile. 27 | "text" adds text below the plot explaining the plot at chosen ntile but does not highlight the plot.} 28 | 29 | \item{save_fig}{Logical. Save plot to file? Default = FALSE. When set to TRUE, saved plots are optimized for 18x12cm.} 30 | 31 | \item{save_fig_filename}{String. Filename of saved plot. Default the plot is saved as {tempdir()}/{plotname}.png.} 32 | 33 | \item{custom_line_colors}{Vector of Strings. Specifying colors for the lines in the plot. 34 | When not specified, colors from the RColorBrewer palet "Set1" are used.} 35 | 36 | \item{custom_plot_text}{List. List with customized textual elements for plot. Create a list with defaults 37 | by using \code{\link{customize_plot_text}} and override default values to customize.} 38 | } 39 | \value{ 40 | ggplot object. Cumulative Response plot. 41 | } 42 | \description{ 43 | Generates the cumulative response plot. It plots the cumulative percentage of 44 | target class observations up until that ntile. It helps answering the question: 45 | When we apply the model and select up until ntile X, what is the expected percentage of 46 | target class observations in the selection? 47 | } 48 | \examples{ 49 | # load example data (Bank clients with/without a term deposit - see ?bank_td for details) 50 | data("bank_td") 51 | # prepare data for training model for binomial target has_td and train models 52 | train_index = sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE) 53 | train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 54 | test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 55 | #train models using caret... (or use mlr or H2o or keras ... see ?prepare_scores_and_ntiles) 56 | # setting caret cross validation, here tuned for speed (not accuracy!) 57 | fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE) 58 | # random forest using ranger package, here tuned for speed (not accuracy!) 59 | rf = caret::train(has_td ~.,data = train, method = "ranger",trControl = fitControl, 60 | tuneGrid = expand.grid(.mtry = 2,.splitrule = "gini",.min.node.size=10)) 61 | # mnl model using glmnet package 62 | mnl = caret::train(has_td ~.,data = train, method = "glmnet",trControl = fitControl) 63 | # load modelplotr 64 | library(modelplotr) 65 | # transform datasets and model objects to input for modelplotr 66 | scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"), 67 | dataset_labels = list("train data","test data"), 68 | models = list("rf","mnl"), 69 | model_labels = list("random forest","multinomial logit"), 70 | target_column="has_td", 71 | ntiles=20) 72 | plot_input <- plotting_scope(prepared_input = scores_and_ntiles) 73 | plot_cumresponse(data=plot_input) 74 | plot_cumresponse(data=plot_input,custom_line_colors="pink") 75 | plot_cumresponse(data=plot_input,highlight_ntile=5) 76 | } 77 | \seealso{ 78 | \code{\link{modelplotr}} for generic info on the package \code{moddelplotr} 79 | 80 | \code{vignette('modelplotr')} 81 | 82 | \code{\link{plotting_scope}} for details on the function \code{plotting_scope} that 83 | transforms a dataframe created with \code{prepare_scores_and_ntiles} or \code{aggregate_over_ntiles} to 84 | a dataframe in the required format for all modelplotr plots. 85 | 86 | \code{\link{aggregate_over_ntiles}} for details on the function \code{aggregate_over_ntiles} that 87 | aggregates the output of \code{prepare_scores_and_ntiles} to create a dataframe with aggregated actuals and predictions. 88 | In most cases, you do not need to use it since the \code{plotting_scope} function will call this function automatically. 89 | 90 | \url{https://github.com/modelplot/modelplotr} for details on the package 91 | 92 | \url{https://modelplot.github.io/} for our blog on the value of the model plots 93 | } 94 | -------------------------------------------------------------------------------- /man/plot_multiplot.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/plottingmodelplots.R 3 | \name{plot_multiplot} 4 | \alias{plot_multiplot} 5 | \title{Create plot with all four evaluation plots} 6 | \usage{ 7 | plot_multiplot( 8 | data = plot_input, 9 | save_fig = FALSE, 10 | save_fig_filename = NA, 11 | custom_line_colors = NA, 12 | highlight_ntile = NA, 13 | custom_plot_text = NULL 14 | ) 15 | } 16 | \arguments{ 17 | \item{data}{Dataframe. Dataframe needs to be created with \code{\link{plotting_scope}} 18 | or else meet required input format.} 19 | 20 | \item{save_fig}{Logical. Save plot to file? Default = FALSE. When set to TRUE, saved plot is optimized for 36x24cm.} 21 | 22 | \item{save_fig_filename}{String. Filename of saved plot. Default the plot is saved as {tempdir()}/{plotname}.png.} 23 | 24 | \item{custom_line_colors}{Vector of Strings. Specifying colors for the lines in the plot. 25 | When not specified, colors from the RColorBrewer palet "Set1" are used.} 26 | 27 | \item{highlight_ntile}{Integer. Specifying the ntile at which the plot is annotated 28 | and/or performances are highlighted.} 29 | 30 | \item{custom_plot_text}{List. List with customized textual elements for plot. Create a list with defaults 31 | by using \code{\link{customize_plot_text}} and override default values to customize.} 32 | } 33 | \value{ 34 | gtable, containing 6 grobs. 35 | } 36 | \description{ 37 | Generates a layout containing a number graphical elements, including title, subtitle and the four 38 | model evaluation plots: cumulative gains plot, lift plot, response plot and cumulative response plot. 39 | } 40 | \examples{ 41 | # load example data (Bank clients with/without a term deposit - see ?bank_td for details) 42 | data("bank_td") 43 | # prepare data for training model for binomial target has_td and train models 44 | train_index = sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE) 45 | train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 46 | test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 47 | #train models using caret... (or use mlr or H2o or keras ... see ?prepare_scores_and_ntiles) 48 | # setting caret cross validation, here tuned for speed (not accuracy!) 49 | fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE) 50 | # random forest using ranger package, here tuned for speed (not accuracy!) 51 | rf = caret::train(has_td ~.,data = train, method = "ranger",trControl = fitControl, 52 | tuneGrid = expand.grid(.mtry = 2,.splitrule = "gini",.min.node.size=10)) 53 | # mnl model using glmnet package 54 | mnl = caret::train(has_td ~.,data = train, method = "glmnet",trControl = fitControl) 55 | # load modelplotr 56 | library(modelplotr) 57 | # transform datasets and model objects to input for modelplotr 58 | scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"), 59 | dataset_labels = list("train data","test data"), 60 | models = list("rf","mnl"), 61 | model_labels = list("random forest","multinomial logit"), 62 | target_column="has_td", 63 | ntiles=10) 64 | plot_input <- plotting_scope(prepared_input = scores_and_ntiles) 65 | plot_multiplot(data=plot_input) 66 | plot_multiplot(data=plot_input,highlight_ntile = 2) 67 | } 68 | \seealso{ 69 | \code{\link{modelplotr}} for generic info on the package \code{moddelplotr} 70 | 71 | \code{vignette('modelplotr')} 72 | 73 | \code{\link{plotting_scope}} for details on the function \code{plotting_scope} that 74 | transforms a dataframe created with \code{prepare_scores_and_ntiles} or \code{aggregate_over_ntiles} to 75 | a dataframe in the required format for all modelplotr plots. 76 | 77 | \code{\link{aggregate_over_ntiles}} for details on the function \code{aggregate_over_ntiles} that 78 | aggregates the output of \code{prepare_scores_and_ntiles} to create a dataframe with aggregated actuals and predictions. 79 | In most cases, you do not need to use it since the \code{plotting_scope} function will call this function automatically. 80 | 81 | \url{https://github.com/modelplot/modelplotr} for details on the package 82 | 83 | \url{https://modelplot.github.io/} for our blog on the value of the model plots 84 | } 85 | -------------------------------------------------------------------------------- /man/plot_profit.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/plottingmodelplots.R 3 | \name{plot_profit} 4 | \alias{plot_profit} 5 | \title{Profit plot} 6 | \usage{ 7 | plot_profit( 8 | data = plot_input, 9 | highlight_ntile = "max_profit", 10 | highlight_how = "plot_text", 11 | save_fig = FALSE, 12 | save_fig_filename = NA, 13 | custom_line_colors = NA, 14 | custom_plot_text = NULL, 15 | fixed_costs, 16 | variable_costs_per_unit, 17 | profit_per_unit 18 | ) 19 | } 20 | \arguments{ 21 | \item{data}{Dataframe. Dataframe needs to be created with \code{\link{plotting_scope}} 22 | or else meet required input format.} 23 | 24 | \item{highlight_ntile}{Integer or string ("max_roi" or "max_profit"). Specifying the ntile at which the plot is annotated 25 | and/or performances are highlighted. Default value is \code{max_profit}, highlighting the ntile where profit is highest.} 26 | 27 | \item{highlight_how}{String. How to annotate the plot. Possible values: "plot_text","plot", "text". 28 | Default is "plot_text", both highlighting the ntile and value on the plot as well as in text below the plot. 29 | "plot" only highligths the plot, but does not add text below the plot explaining the plot at chosen ntile. 30 | "text" adds text below the plot explaining the plot at chosen ntile but does not highlight the plot.} 31 | 32 | \item{save_fig}{Logical. Save plot to file? Default = FALSE. When set to TRUE, saved plot is optimized for 36x24cm.} 33 | 34 | \item{save_fig_filename}{String. Filename of saved plot. Default the plot is saved as {tempdir()}/{plotname}.png.} 35 | 36 | \item{custom_line_colors}{Vector of Strings. Specifying colors for the lines in the plot. 37 | When not specified, colors from the RColorBrewer palet "Set1" are used.} 38 | 39 | \item{custom_plot_text}{List. List with customized textual elements for plot. Create a list with defaults 40 | by using \code{\link{customize_plot_text}} and override default values to customize.} 41 | 42 | \item{fixed_costs}{Numeric. Specifying the fixed costs related to a selection based on the model. 43 | These costs are constant and do not vary with selection size (ntiles).} 44 | 45 | \item{variable_costs_per_unit}{Numeric. Specifying the variable costs per selected unit for a selection based on the model. 46 | These costs vary with selection size (ntiles).} 47 | 48 | \item{profit_per_unit}{Numeric. Specifying the profit per unit in case the selected unit converts / responds positively.} 49 | } 50 | \value{ 51 | gtable, containing 6 grobs. 52 | } 53 | \description{ 54 | Generates the Profit plot. It plots the cumulative profit up until that ntile when the model is used for campaign selection. 55 | It can be used to answer the following business question: \bold{\emph{When we apply the model and select up until ntile X, 56 | what is the expected profit of the campaign?}} 57 | Extra parameters needed for this plot are: fixed_costs, variable_costs_per_unit and profit_per_unit. 58 | } 59 | \examples{ 60 | # load example data (Bank clients with/without a term deposit - see ?bank_td for details) 61 | data("bank_td") 62 | # prepare data for training model for binomial target has_td and train models 63 | train_index = sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE) 64 | train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 65 | test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 66 | #train models using caret... (or use mlr or H2o or keras ... see ?prepare_scores_and_ntiles) 67 | # setting caret cross validation, here tuned for speed (not accuracy!) 68 | fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE) 69 | # random forest using ranger package, here tuned for speed (not accuracy!) 70 | rf = caret::train(has_td ~.,data = train, method = "ranger",trControl = fitControl, 71 | tuneGrid = expand.grid(.mtry = 2,.splitrule = "gini",.min.node.size=10)) 72 | # mnl model using glmnet package 73 | mnl = caret::train(has_td ~.,data = train, method = "glmnet",trControl = fitControl) 74 | # load modelplotr 75 | library(modelplotr) 76 | # transform datasets and model objects to input for modelplotr 77 | scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"), 78 | dataset_labels = list("train data","test data"), 79 | models = list("rf","mnl"), 80 | model_labels = list("random forest","multinomial logit"), 81 | target_column="has_td", 82 | ntiles=100) 83 | # set scope for analysis (default: no comparison) 84 | plot_input <- plotting_scope(prepared_input = scores_and_ntiles,scope='compare_models') 85 | plot_profit(data=plot_input,fixed_costs=1000,variable_costs_per_unit= 10,profit_per_unit=50) 86 | plot_profit(data=plot_input,fixed_costs=1000,variable_costs_per_unit= 10,profit_per_unit=50, 87 | highlight_ntile=20) 88 | plot_profit(data=plot_input,fixed_costs=1000,variable_costs_per_unit= 10,profit_per_unit=50, 89 | highlight_ntile='max_roi') 90 | } 91 | \seealso{ 92 | \code{\link{modelplotr}} for generic info on the package \code{moddelplotr} 93 | 94 | \code{vignette('modelplotr')} 95 | 96 | \code{\link{plotting_scope}} for details on the function \code{plotting_scope} that 97 | transforms a dataframe created with \code{prepare_scores_and_ntiles} or \code{aggregate_over_ntiles} to 98 | a dataframe in the required format for all modelplotr plots. 99 | 100 | \code{\link{aggregate_over_ntiles}} for details on the function \code{aggregate_over_ntiles} that 101 | aggregates the output of \code{prepare_scores_and_ntiles} to create a dataframe with aggregated actuals and predictions. 102 | In most cases, you do not need to use it since the \code{plotting_scope} function will call this function automatically. 103 | 104 | \url{https://github.com/modelplot/modelplotr} for details on the package 105 | 106 | \url{https://modelplot.github.io/} for our blog on the value of the model plots 107 | } 108 | -------------------------------------------------------------------------------- /man/plot_response.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/plottingmodelplots.R 3 | \name{plot_response} 4 | \alias{plot_response} 5 | \title{Response plot} 6 | \usage{ 7 | plot_response( 8 | data = plot_input, 9 | highlight_ntile = NA, 10 | highlight_how = "plot_text", 11 | save_fig = FALSE, 12 | save_fig_filename = NA, 13 | custom_line_colors = NA, 14 | custom_plot_text = NULL 15 | ) 16 | } 17 | \arguments{ 18 | \item{data}{Dataframe. Dataframe needs to be created with \code{\link{plotting_scope}} 19 | or else meet required input format.} 20 | 21 | \item{highlight_ntile}{Integer. Specifying the ntile at which the plot is annotated 22 | and/or performances are highlighted.} 23 | 24 | \item{highlight_how}{String. How to annotate the plot. Possible values: "plot_text","plot", "text". 25 | Default is "plot_text", both highlighting the ntile and value on the plot as well as in text below the plot. 26 | "plot" only highligths the plot, but does not add text below the plot explaining the plot at chosen ntile. 27 | "text" adds text below the plot explaining the plot at chosen ntile but does not highlight the plot.} 28 | 29 | \item{save_fig}{Logical. Save plot to file? Default = FALSE. When set to TRUE, saved plots are optimized for 18x12cm.} 30 | 31 | \item{save_fig_filename}{String. Filename of saved plot. Default the plot is saved as {tempdir()}/{plotname}.png.} 32 | 33 | \item{custom_line_colors}{Vector of Strings. Specifying colors for the lines in the plot. 34 | When not specified, colors from the RColorBrewer palet "Set1" are used.} 35 | 36 | \item{custom_plot_text}{List. List with customized textual elements for plot. Create a list with defaults 37 | by using \code{\link{customize_plot_text}} and override default values to customize.} 38 | } 39 | \value{ 40 | ggplot object. Response plot. 41 | } 42 | \description{ 43 | Generates the response plot. It plots the percentage of target class observations 44 | per ntile. It can be used to answer the following business question: When we apply 45 | the model and select ntile X, what is the expected percentage of target class observations 46 | in that ntile? 47 | } 48 | \examples{ 49 | # load example data (Bank clients with/without a term deposit - see ?bank_td for details) 50 | data("bank_td") 51 | # prepare data for training model for binomial target has_td and train models 52 | train_index = sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE) 53 | train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 54 | test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 55 | #train models using caret... (or use mlr or H2o or keras ... see ?prepare_scores_and_ntiles) 56 | # setting caret cross validation, here tuned for speed (not accuracy!) 57 | fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE) 58 | # random forest using ranger package, here tuned for speed (not accuracy!) 59 | rf = caret::train(has_td ~.,data = train, method = "ranger",trControl = fitControl, 60 | tuneGrid = expand.grid(.mtry = 2,.splitrule = "gini",.min.node.size=10)) 61 | # mnl model using glmnet package 62 | mnl = caret::train(has_td ~.,data = train, method = "glmnet",trControl = fitControl) 63 | # load modelplotr 64 | library(modelplotr) 65 | # transform datasets and model objects to input for modelplotr 66 | scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"), 67 | dataset_labels = list("train data","test data"), 68 | models = list("rf","mnl"), 69 | model_labels = list("random forest","multinomial logit"), 70 | target_column="has_td", 71 | ntiles=100) 72 | plot_input <- plotting_scope(prepared_input = scores_and_ntiles) 73 | plot_response(data=plot_input) 74 | plot_response(data=plot_input,custom_line_colors=RColorBrewer::brewer.pal(3,"Dark2")) 75 | plot_response(data=plot_input,highlight_ntile=2) 76 | } 77 | \seealso{ 78 | \code{\link{modelplotr}} for generic info on the package \code{moddelplotr} 79 | 80 | \code{vignette('modelplotr')} 81 | 82 | \code{\link{plotting_scope}} for details on the function \code{plotting_scope} that 83 | transforms a dataframe created with \code{prepare_scores_and_ntiles} or \code{aggregate_over_ntiles} to 84 | a dataframe in the required format for all modelplotr plots. 85 | 86 | \code{\link{aggregate_over_ntiles}} for details on the function \code{aggregate_over_ntiles} that 87 | aggregates the output of \code{prepare_scores_and_ntiles} to create a dataframe with aggregated actuals and predictions. 88 | In most cases, you do not need to use it since the \code{plotting_scope} function will call this function automatically. 89 | 90 | \url{https://github.com/modelplot/modelplotr} for details on the package 91 | 92 | \url{https://modelplot.github.io/} for our blog on the value of the model plots 93 | } 94 | -------------------------------------------------------------------------------- /man/plot_roi.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/plottingmodelplots.R 3 | \name{plot_roi} 4 | \alias{plot_roi} 5 | \title{ROI plot} 6 | \usage{ 7 | plot_roi( 8 | data = plot_input, 9 | highlight_ntile = "max_roi", 10 | highlight_how = "plot_text", 11 | save_fig = FALSE, 12 | save_fig_filename = NA, 13 | custom_line_colors = NA, 14 | custom_plot_text = NULL, 15 | fixed_costs, 16 | variable_costs_per_unit, 17 | profit_per_unit 18 | ) 19 | } 20 | \arguments{ 21 | \item{data}{Dataframe. Dataframe needs to be created with \code{\link{plotting_scope}} 22 | or else meet required input format.} 23 | 24 | \item{highlight_ntile}{Integer or string ("max_roi" or "max_profit"). Specifying the ntile at which the plot is annotated 25 | and/or performances are highlighted. Default value is \code{max_roi}, highlighting the ntile where roi is highest.} 26 | 27 | \item{highlight_how}{String. How to annotate the plot. Possible values: "plot_text","plot", "text". 28 | Default is "plot_text", both highlighting the ntile and value on the plot as well as in text below the plot. 29 | "plot" only highligths the plot, but does not add text below the plot explaining the plot at chosen ntile. 30 | "text" adds text below the plot explaining the plot at chosen ntile but does not highlight the plot.} 31 | 32 | \item{save_fig}{Logical. Save plot to file? Default = FALSE. When set to TRUE, saved plot is optimized for 36x24cm.} 33 | 34 | \item{save_fig_filename}{String. Filename of saved plot. Default the plot is saved as {tempdir()}/{plotname}.png.} 35 | 36 | \item{custom_line_colors}{Vector of Strings. Specifying colors for the lines in the plot. 37 | When not specified, colors from the RColorBrewer palet "Set1" are used.} 38 | 39 | \item{custom_plot_text}{List. List with customized textual elements for plot. Create a list with defaults 40 | by using \code{\link{customize_plot_text}} and override default values to customize.} 41 | 42 | \item{fixed_costs}{Numeric. Specifying the fixed costs related to a selection based on the model. 43 | These costs are constant and do not vary with selection size (ntiles).} 44 | 45 | \item{variable_costs_per_unit}{Numeric. Specifying the variable costs per selected unit for a selection based on the model. 46 | These costs vary with selection size (ntiles).} 47 | 48 | \item{profit_per_unit}{Numeric. Specifying the profit per unit in case the selected unit converts / responds positively.} 49 | } 50 | \value{ 51 | gtable, containing 6 grobs. 52 | # load example data (Bank clients with/without a term deposit - see ?bank_td for details) 53 | } 54 | \description{ 55 | Generates the Return on Investment plot. It plots the cumulative revenues as a percentage of investments 56 | up until that ntile when the model is used for campaign selection. It can be used to answer the following 57 | business question: \bold{\emph{When we apply the model and select up until ntile X, what is the expected % 58 | return on investment of the campaign?}} Extra parameters needed for this plot are: 59 | fixed_costs, variable_costs_per_unit and profit_per_unit. 60 | } 61 | \examples{ 62 | # load example data (Bank clients with/without a term deposit - see ?bank_td for details) 63 | data("bank_td") 64 | # prepare data for training model for binomial target has_td and train models 65 | train_index = sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE) 66 | train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 67 | test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 68 | #train models using caret... (or use mlr or H2o or keras ... see ?prepare_scores_and_ntiles) 69 | # setting caret cross validation, here tuned for speed (not accuracy!) 70 | fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE) 71 | # random forest using ranger package, here tuned for speed (not accuracy!) 72 | rf = caret::train(has_td ~.,data = train, method = "ranger",trControl = fitControl, 73 | tuneGrid = expand.grid(.mtry = 2,.splitrule = "gini",.min.node.size=10)) 74 | # mnl model using glmnet package 75 | mnl = caret::train(has_td ~.,data = train, method = "glmnet",trControl = fitControl) 76 | # load modelplotr 77 | library(modelplotr) 78 | # transform datasets and model objects to input for modelplotr 79 | scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"), 80 | dataset_labels = list("train data","test data"), 81 | models = list("rf","mnl"), 82 | model_labels = list("random forest","multinomial logit"), 83 | target_column="has_td", 84 | ntiles=100) 85 | # set scope for analysis (default: no comparison) 86 | plot_input <- plotting_scope(prepared_input = scores_and_ntiles) 87 | plot_roi(data=plot_input,fixed_costs=1000,variable_costs_per_unit= 10,profit_per_unit=50) 88 | plot_roi(data=plot_input,fixed_costs=1000,variable_costs_per_unit= 10,profit_per_unit=50, 89 | highlight_ntile=20) 90 | plot_roi(data=plot_input,fixed_costs=1000,variable_costs_per_unit= 10,profit_per_unit=50, 91 | highlight_ntile="max_profit") 92 | } 93 | \seealso{ 94 | \code{\link{modelplotr}} for generic info on the package \code{moddelplotr} 95 | 96 | \code{vignette('modelplotr')} 97 | 98 | \code{\link{plotting_scope}} for details on the function \code{plotting_scope} that 99 | transforms a dataframe created with \code{prepare_scores_and_ntiles} or \code{aggregate_over_ntiles} to 100 | a dataframe in the required format for all modelplotr plots. 101 | 102 | \code{\link{aggregate_over_ntiles}} for details on the function \code{aggregate_over_ntiles} that 103 | aggregates the output of \code{prepare_scores_and_ntiles} to create a dataframe with aggregated actuals and predictions. 104 | In most cases, you do not need to use it since the \code{plotting_scope} function will call this function automatically. 105 | 106 | \url{https://github.com/modelplot/modelplotr} for details on the package 107 | 108 | \url{https://modelplot.github.io/} for our blog on the value of the model plots 109 | } 110 | -------------------------------------------------------------------------------- /man/plotting_scope.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/dataprepmodelplots.R 3 | \name{plotting_scope} 4 | \alias{plotting_scope} 5 | \title{Build dataframe with formatted input for all plots.} 6 | \usage{ 7 | plotting_scope( 8 | prepared_input, 9 | scope = "no_comparison", 10 | select_model_label = NA, 11 | select_dataset_label = NA, 12 | select_targetclass = NA, 13 | select_smallest_targetclass = TRUE 14 | ) 15 | } 16 | \arguments{ 17 | \item{prepared_input}{Dataframe. Dataframe created with \code{\link{prepare_scores_and_ntiles}} or dataframe created with 18 | \code{\link{aggregate_over_ntiles}} or a dataframe that is created otherwise with similar layout as the output of these functions 19 | (see ?prepare_scores_and_ntiles and ?aggregate_over_ntiles for layout details).} 20 | 21 | \item{scope}{String. Evaluation type of interest. Possible values: 22 | "compare_models","compare_datasets", "compare_targetclasses","no_comparison". 23 | Default is NA, equivalent to "no_comparison".} 24 | 25 | \item{select_model_label}{String. Selected model when scope is "compare_datasets" or "compare_targetclasses" or "no_comparison". 26 | Needs to be identical to model descriptions as specified in model_labels (or models when model_labels is not specified). 27 | When scope is "compare_models", select_model_label can be used to take a subset of available models.} 28 | 29 | \item{select_dataset_label}{String. Selected dataset when scope is compare_models or compare_targetclasses or no_comparison. 30 | Needs to be identical to dataset descriptions as specified in dataset_labels (or datasets when dataset_labels is not 31 | specified). When scope is "compare_datasets", select_dataset_label can be used to take a subset of available datasets.} 32 | 33 | \item{select_targetclass}{String. Selected target value when scope is compare_models or compare_datasets or no_comparison. 34 | Default is smallest value when select_smallest_targetclass=TRUE, otherwise first alphabetical value. 35 | When scope is "compare_targetclasses", select_targetclass can be used to take a subset of available target classes.} 36 | 37 | \item{select_smallest_targetclass}{Boolean. Select the target value with the smallest number of cases in dataset as group of 38 | interest. Default is True, hence the target value with the least observations is selected.} 39 | } 40 | \value{ 41 | Dataframe \code{plot_input} is a subset of \code{ntiles_aggregate}. 42 | } 43 | \description{ 44 | Build a dataframe in the required format for all modelplotr plots, relevant to the selected scope of evaluation. 45 | Each record in this dataframe represents a unique combination of datasets, models, target classes and ntiles. 46 | As an input, plotting_scope can handle both a dataframe created with \code{aggregate_over_ntiles} as well as a dataframe 47 | created with \code{prepare_scores_and_ntiles} (or created otherwise with similar layout). 48 | There are four perspectives: 49 | \describe{ 50 | \item{"no_comparison" (default)}{In this perspective, you're interested in the performance of one model on one dataset 51 | for one target class. Therefore, only one line is plotted in the plots. 52 | The parameters \code{select_model_label}, \code{select_dataset_label} and \code{select_targetclass} determine which group is 53 | plotted. When not specified, the first alphabetic model, the first alphabetic dataset and 54 | the smallest (when \code{select_smallest_targetclass=TRUE}) or first alphabetic target value are selected } 55 | \item{"compare_models"}{In this perspective, you're interested in how well different models perform in comparison to 56 | each other on the same dataset and for the same target value. This results in a comparison between models available 57 | in ntiles_aggregate$model_label for a selected dataset (default: first alphabetic dataset) and for a selected target value 58 | (default: smallest (when \code{select_smallest_targetclass=TRUE}) or first alphabetic target value).} 59 | \item{"compare_datasets"}{In this perspective, you're interested in how well a model performs in different datasets 60 | for a specific model on the same target value. This results in a comparison between datasets available in 61 | ntiles_aggregate$dataset_label for a selected model (default: first alphabetic model) and for a selected target value (default: 62 | smallest (when \code{select_smallest_targetclass=TRUE}) or first alphabetic target value).} 63 | \item{"compare_targetclasses"}{In this perspective, you're interested in how well a model performs for different target 64 | values on a specific dataset.This resuls in a comparison between target classes available in ntiles_aggregate$target_class for 65 | a selected model (default: first alphabetic model) and for a selected dataset (default: first alphabetic dataset).}} 66 | } 67 | \section{When you build input for plotting_scope() yourself}{ 68 | 69 | To make plots with modelplotr, is not required to use the function prepare_scores_and_ntiles to generate the required input data. 70 | You can create your own dataframe containing actuals and probabilities and ntiles (1st ntile = (1/#ntiles) percent 71 | with highest model probability, last ntile = (1/#ntiles) percent with lowest probability according to model) , 72 | In that case, make sure the input dataframe contains the folowing columns & formats: 73 | \tabular{lll}{ 74 | \bold{column} \tab \bold{type} \tab \bold{definition} \cr 75 | model_label \tab Factor \tab Name of the model object \cr 76 | dataset_label \tab Factor \tab Datasets to include in the plot as factor levels\cr 77 | y_true \tab Factor \tab Target with actual values \cr 78 | prob_[tv1] \tab Decimal \tab Probability according to model for target value 1 \cr 79 | prob_[tv2] \tab Decimal \tab Probability according to model for target value 2 \cr 80 | ... \tab ... \tab ... \cr 81 | prob_[tvn] \tab Decimal \tab Probability according to model for target value n \cr 82 | ntl_[tv1] \tab Integer \tab Ntile based on probability according to model for target value 1 \cr 83 | ntl_[tv2] \tab Integerl \tab Ntile based on probability according to model for target value 2 \cr 84 | ... \tab ... \tab ... \cr 85 | ntl_[tvn] \tab Integer \tab Ntile based on probability according to model for target value n 86 | } 87 | See \link{build_input_yourself} for an example to build the required input yourself. 88 | } 89 | 90 | \examples{ 91 | \dontrun{ 92 | # load example data (Bank clients with/without a term deposit - see ?bank_td for details) 93 | data("bank_td") 94 | 95 | # prepare data for training model for binomial target has_td and train models 96 | train_index = sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE) 97 | train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 98 | test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 99 | 100 | #train models using mlr... 101 | trainTask <- mlr::makeClassifTask(data = train, target = "has_td") 102 | testTask <- mlr::makeClassifTask(data = test, target = "has_td") 103 | mlr::configureMlr() # this line is needed when using mlr without loading it (mlr::) 104 | task = mlr::makeClassifTask(data = train, target = "has_td") 105 | lrn = mlr::makeLearner("classif.randomForest", predict.type = "prob") 106 | rf = mlr::train(lrn, task) 107 | lrn = mlr::makeLearner("classif.multinom", predict.type = "prob") 108 | mnl = mlr::train(lrn, task) 109 | #... or train models using caret... 110 | # setting caret cross validation, here tuned for speed (not accuracy!) 111 | fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE) 112 | # random forest using ranger package, here tuned for speed (not accuracy!) 113 | rf = caret::train(has_td ~.,data = train, method = "ranger",trControl = fitControl, 114 | tuneGrid = expand.grid(.mtry = 2,.splitrule = "gini",.min.node.size=10)) 115 | # mnl model using glmnet package 116 | mnl = caret::train(has_td ~.,data = train, method = "glmnet",trControl = fitControl) 117 | #... or train models using h2o... 118 | h2o::h2o.init() 119 | h2o::h2o.no_progress() 120 | h2o_train = h2o::as.h2o(train) 121 | h2o_test = h2o::as.h2o(test) 122 | gbm <- h2o::h2o.gbm(y = "has_td", 123 | x = setdiff(colnames(train), "has_td"), 124 | training_frame = h2o_train, 125 | nfolds = 5) 126 | #... or train models using keras. 127 | x_train <- as.matrix(train[,-1]); y=train[,1]; y_train <- keras::to_categorical(as.numeric(y)-1) 128 | `\%>\%` <- magrittr::`\%>\%` 129 | nn <- keras::keras_model_sequential() \%>\% 130 | keras::layer_dense(units = 16,kernel_initializer = "uniform",activation = 'relu', 131 | input_shape = NCOL(x_train))\%>\% 132 | keras::layer_dense(units=16,kernel_initializer="uniform",activation='relu') \%>\% 133 | keras::layer_dense(units=length(levels(train[,1])),activation='softmax') 134 | nn \%>\% keras::compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=c('accuracy')) 135 | nn \%>\% keras::fit(x_train,y_train,epochs = 20,batch_size = 1028,verbose=0) 136 | 137 | # preparation steps 138 | scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"), 139 | dataset_labels = list("train data","test data"), 140 | models = list("rf","mnl", "gbm","nn"), 141 | model_labels = list("random forest","multinomial logit", 142 | "gradient boosting machine","artificial neural network"), 143 | target_column="has_td") 144 | plot_input <- plotting_scope(prepared_input = scores_and_ntiles) 145 | plot_cumgains(data = plot_input) 146 | plot_cumlift(data = plot_input) 147 | plot_response(data = plot_input) 148 | plot_cumresponse(data = plot_input) 149 | plot_multiplot(data = plot_input) 150 | plot_costsrevs(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50) 151 | plot_profit(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50) 152 | plot_roi(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50) 153 | } 154 | } 155 | \seealso{ 156 | \code{\link{modelplotr}} for generic info on the package \code{moddelplotr} 157 | 158 | \code{vignette('modelplotr')} 159 | 160 | \code{\link{aggregate_over_ntiles}} for details on the function \code{aggregate_over_ntiles} that 161 | generates the required input. 162 | 163 | \code{\link{prepare_scores_and_ntiles}} for details on the function \code{prepare_scores_and_ntiles} 164 | that generates the required input. 165 | 166 | \code{\link{build_input_yourself}} for an example to build the required input yourself. 167 | filters the output of \code{aggregate_over_ntiles} to prepare it for the required evaluation. 168 | 169 | \url{https://github.com/modelplot/modelplotr} for details on the package 170 | 171 | \url{https://modelplot.github.io/} for our blog on the value of the model plots 172 | } 173 | -------------------------------------------------------------------------------- /man/prepare_scores_and_ntiles.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/dataprepmodelplots.R 3 | \name{prepare_scores_and_ntiles} 4 | \alias{prepare_scores_and_ntiles} 5 | \title{Build a dataframe containing Actuals, Probabilities and Ntiles} 6 | \usage{ 7 | prepare_scores_and_ntiles( 8 | datasets, 9 | dataset_labels, 10 | models, 11 | model_labels, 12 | target_column, 13 | ntiles = 10 14 | ) 15 | } 16 | \arguments{ 17 | \item{datasets}{List of Strings. A list of the names of the dataframe 18 | objects to include in model evaluation. All dataframes need to contain a 19 | target variable and feature variables.} 20 | 21 | \item{dataset_labels}{List of Strings. A list of labels for the datasets, shown in plots. 22 | When dataset_labels is not specified, the names from \code{datasets} are used.} 23 | 24 | \item{models}{List of Strings. List of the names of the model objects, containing parameters to 25 | apply models to datasets. To use this function, model objects need to be generated 26 | by the mlr package or the caret package or the h20 package or the keras package. 27 | Modelplotr automatically detects whether the model is built using mlr or caret or h2o or keras.} 28 | 29 | \item{model_labels}{List of Strings. Labels for the models, shown in plots. 30 | When model_labels is not specified, the names from \code{moddels} are used.} 31 | 32 | \item{target_column}{String. Name of the target variable in datasets. Target 33 | can be either binary or multinomial. Continuous targets are not supported.} 34 | 35 | \item{ntiles}{Integer. Number of ntiles. The ntile parameter represents the specified number 36 | of equally sized buckets the observations in each dataset are grouped into. 37 | By default, observations are grouped in 10 equally sized buckets, often referred to as deciles.} 38 | } 39 | \value{ 40 | Dataframe. A dataframe is built, based on the \code{datasets} 41 | and \code{models} specified. It contains the dataset name, actuals on the \code{target_column} , 42 | the predicted probabilities for each target class (eg. unique target value) and attribution to 43 | ntiles in the dataset for each target class. 44 | } 45 | \description{ 46 | Build dataframe object that contains actuals and predictions on 47 | the target variable for each dataset in \code{datasets} and each model in \code{models} 48 | } 49 | \section{When you build scores_and_ntiles yourself}{ 50 | 51 | To make plots with modelplotr, is not required to use this function to generate input for function \code{plotting_scope} 52 | You can create your own dataframe containing actuals and predictions and ntiles, 53 | See \code{\link{build_input_yourself}} for an example to build the required input for \code{\link{plotting_scope}} 54 | or \code{\link{aggregate_over_ntiles}} yourself, within r or even outside of r. 55 | } 56 | 57 | \examples{ 58 | \dontrun{ 59 | # load example data (Bank clients with/without a term deposit - see ?bank_td for details) 60 | data("bank_td") 61 | 62 | # prepare data for training model for binomial target has_td and train models 63 | train_index = sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE) 64 | train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 65 | test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 66 | 67 | #train models using mlr... 68 | trainTask <- mlr::makeClassifTask(data = train, target = "has_td") 69 | testTask <- mlr::makeClassifTask(data = test, target = "has_td") 70 | mlr::configureMlr() # this line is needed when using mlr without loading it (mlr::) 71 | task = mlr::makeClassifTask(data = train, target = "has_td") 72 | lrn = mlr::makeLearner("classif.randomForest", predict.type = "prob") 73 | rf = mlr::train(lrn, task) 74 | lrn = mlr::makeLearner("classif.multinom", predict.type = "prob") 75 | mnl = mlr::train(lrn, task) 76 | #... or train models using caret... 77 | # setting caret cross validation, here tuned for speed (not accuracy!) 78 | fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE) 79 | # random forest using ranger package, here tuned for speed (not accuracy!) 80 | rf = caret::train(has_td ~.,data = train, method = "ranger",trControl = fitControl, 81 | tuneGrid = expand.grid(.mtry = 2,.splitrule = "gini",.min.node.size=10)) 82 | # mnl model using glmnet package 83 | mnl = caret::train(has_td ~.,data = train, method = "glmnet",trControl = fitControl) 84 | #... or train models using h2o... 85 | h2o::h2o.init() 86 | h2o::h2o.no_progress() 87 | h2o_train = h2o::as.h2o(train) 88 | h2o_test = h2o::as.h2o(test) 89 | gbm <- h2o::h2o.gbm(y = "has_td", 90 | x = setdiff(colnames(train), "has_td"), 91 | training_frame = h2o_train, 92 | nfolds = 5) 93 | #... or train models using keras. 94 | x_train <- as.matrix(train[,-1]); y=train[,1]; y_train <- keras::to_categorical(as.numeric(y)-1); 95 | `\%>\%` <- magrittr::`\%>\%` 96 | nn <- keras::keras_model_sequential() \%>\% 97 | keras::layer_dense(units = 16,kernel_initializer = "uniform",activation = 'relu', 98 | input_shape = NCOL(x_train))\%>\% 99 | keras::layer_dense(units = 16,kernel_initializer = "uniform", activation='relu') \%>\% 100 | keras::layer_dense(units = length(levels(train[,1])),activation='softmax') 101 | nn \%>\% keras::compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=c('accuracy')) 102 | nn \%>\% keras::fit(x_train,y_train,epochs = 20,batch_size = 1028,verbose=0) 103 | 104 | # preparation steps 105 | scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"), 106 | dataset_labels = list("train data","test data"), 107 | models = list("rf","mnl", "gbm","nn"), 108 | model_labels = list("random forest","multinomial logit", 109 | "gradient boosting machine","artificial neural network"), 110 | target_column="has_td") 111 | plot_input <- plotting_scope(prepared_input = scores_and_ntiles) 112 | plot_cumgains(data = plot_input) 113 | plot_cumlift(data = plot_input) 114 | plot_response(data = plot_input) 115 | plot_cumresponse(data = plot_input) 116 | plot_multiplot(data = plot_input) 117 | plot_costsrevs(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50) 118 | plot_profit(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50) 119 | plot_roi(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50) 120 | } 121 | } 122 | \seealso{ 123 | \code{\link{modelplotr}} for generic info on the package \code{moddelplotr} 124 | 125 | \code{vignette('modelplotr')} 126 | 127 | \code{\link{plotting_scope}} for details on the function \code{plotting_scope} that 128 | transforms a dataframe created with \code{prepare_scores_and_ntiles} or \code{aggregate_over_ntiles} to 129 | a dataframe in the required format for all modelplotr plots. 130 | 131 | \code{\link{aggregate_over_ntiles}} for details on the function \code{aggregate_over_ntiles} that 132 | aggregates the output of \code{prepare_scores_and_ntiles} to create a dataframe with aggregated actuals and predictions. 133 | In most cases, you do not need to use it since the \code{plotting_scope} function will call this function automatically. 134 | 135 | \url{https://github.com/modelplot/modelplotr} for details on the package 136 | 137 | \url{https://modelplot.github.io/} for our blog on the value of the model plots 138 | } 139 | -------------------------------------------------------------------------------- /man/prepare_scores_and_ntiles_keras.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/dataprepmodelplots.R 3 | \name{prepare_scores_and_ntiles_keras} 4 | \alias{prepare_scores_and_ntiles_keras} 5 | \title{Build a dataframe containing Actuals, Probabilities and Ntiles for keras models} 6 | \usage{ 7 | prepare_scores_and_ntiles_keras( 8 | inputlists, 9 | inputlist_labels, 10 | outputlists, 11 | select_output_index = 1, 12 | models, 13 | model_labels, 14 | targetclass_labels, 15 | ntiles = 10 16 | ) 17 | } 18 | \arguments{ 19 | \item{inputlists}{List of Strings. A list of list names, referring to the input list 20 | objects to include in model evaluation.} 21 | 22 | \item{inputlist_labels}{List of Strings. A list of labels for the inputlists, shown in plots. 23 | When inputlist_labels is not specified, the names from \code{inputlists} are used.} 24 | 25 | \item{outputlists}{List of Strings. A list of list names, referring to the output list 26 | objects to include in model evaluation.} 27 | 28 | \item{select_output_index}{Integer. The index of the output of \code{outputlists} to evaluate and show 29 | in plots. Only relevant for multi-output models, default index value for multi-output models: 1.} 30 | 31 | \item{models}{List of Strings. List of the names of the keras model objects, containing parameters to 32 | apply models to datasets. To use this function, model objects need to be generated 33 | by the keras package. Both models created with \code{keras_model_sequential()} as well as models 34 | created with the keras functional API are supported by modelplotr.} 35 | 36 | \item{model_labels}{List of Strings. Labels for the models, shown in plots. 37 | When model_labels is not specified, the names from \code{moddels} are used.} 38 | 39 | \item{targetclass_labels}{List of Strings. A list of names to use in plots for the target class values 40 | for the selected output. If not specified, the model output column indices are used. 41 | Specify the labels in the same order as the model output columns.} 42 | 43 | \item{ntiles}{Integer. Number of ntiles. The ntile parameter represents the specified number 44 | of equally sized buckets the observations in each dataset are grouped into. 45 | By default, observations are grouped in 10 equally sized buckets, often referred to as deciles.} 46 | } 47 | \value{ 48 | Dataframe. A dataframe is built, based on the \code{datasets} 49 | and \code{models} specified. It contains the dataset name, actuals on the \code{target_column} , 50 | the predicted probabilities for each target class (eg. unique target value) and attribution to 51 | ntiles in the dataset for each target class. 52 | } 53 | \description{ 54 | Build dataframe object that contains actuals and predictions on the target variable 55 | for each input list in \code{inputlists} and each (sequential/functional API) keras model in \code{models} 56 | } 57 | \section{When you build scores_and_ntiles yourself}{ 58 | 59 | To make plots with modelplotr, is not required to use this function to generate input for function \code{plotting_scope} 60 | You can create your own dataframe containing actuals and predictions and ntiles, 61 | See \code{\link{build_input_yourself}} for an example to build the required input for \code{\link{plotting_scope}} 62 | or \code{\link{aggregate_over_ntiles}} yourself, within r or even outside of r. 63 | } 64 | 65 | \examples{ 66 | \dontrun{ 67 | # load example data (Bank clients with/without a term deposit - see ?bank_td for details) 68 | data("bank_td") 69 | 70 | # prepare data for training model for binomial target has_td and train models 71 | train_index = sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE) 72 | train = bank_td[train_index,] 73 | test = bank_td[-train_index,] 74 | 75 | train_seq = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 76 | test_seq = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 77 | 78 | 79 | #train keras models using keras_model_sequential() . 80 | x_train <- as.matrix(train[,-c(1:2)]); y_train <- 2-as.numeric(train[,1]); 81 | input_train = list(x_train); output_train = list(y_train) 82 | x_test <- as.matrix(test[,-c(1:2)]); y_test <- 2-as.numeric(test[,1]); 83 | input_test = list(x_test); output_test = list(y_test) 84 | 85 | `\%>\%` <- magrittr::`\%>\%` 86 | nn_seq <- keras::keras_model_sequential() \%>\% 87 | keras::layer_dense(units = 16,kernel_initializer = "uniform",activation = 'relu', 88 | input_shape = NCOL(x_train))\%>\% 89 | keras::layer_dense(units = 16,kernel_initializer = "uniform", activation='relu') \%>\% 90 | keras::layer_dense(units = 1,activation='sigmoid') 91 | nn_seq \%>\% keras::compile(optimizer='rmsprop',loss='binary_crossentropy',metrics=c('accuracy')) 92 | nn_seq \%>\% keras::fit(input_train,output_train,epochs = 20,batch_size = 1028,verbose=0) 93 | 94 | scores_and_ntiles <- prepare_scores_and_ntiles_keras(inputlists = list("input_train","input_test"), 95 | inputlist_labels = list("train data","test data"), 96 | models = list("nn_seq"), 97 | model_labels = list("keras sequential model"), 98 | outputlists = list("output_train","output_test"), 99 | select_output_index = 1, 100 | targetclass_labels = list("no.term.deposit","term.deposit"), 101 | ntiles = 10) 102 | 103 | plot_input <- plotting_scope(prepared_input = scores_and_ntiles,scope = "compare_datasets") 104 | plot_cumgains(data = plot_input) 105 | plot_cumlift(data = plot_input) 106 | plot_response(data = plot_input) 107 | plot_cumresponse(data = plot_input) 108 | plot_multiplot(data = plot_input) 109 | 110 | 111 | #... or train keras models using keras functional api (multi-input / multi-output is supported). 112 | x1_train <- as.matrix(train[,c(3:4)]); y1_train <- as.numeric(train[,1])-1; 113 | x2_train <- as.matrix(train[,c(5:7)]); y2_train <- keras::to_categorical(as.numeric(train[,2])-1, 114 | num_classes = 4); 115 | input_train = list(x1_train,x2_train); output_train = list(y1_train,y2_train) 116 | x1_test <- as.matrix(test[,c(3:4)]); y1_test <- as.numeric(test[,1])-1; 117 | x2_test <- as.matrix(test[,c(5:7)]); y2_test <- keras::to_categorical(as.numeric(test[,2])-1, 118 | num_classes = 4); 119 | input_test = list(x1_test,x2_test); output_test = list(y1_test,y2_test) 120 | 121 | x1_input <- keras::layer_input(shape = NCOL(x1_train)) 122 | x2_input <- keras::layer_input(shape = NCOL(x2_train)) 123 | concatenated <- keras::layer_concatenate(list(x1_input, x2_input)) \%>\% 124 | keras::layer_dense(units = 16,kernel_initializer = "uniform", activation='relu') \%>\% 125 | keras::layer_dense(units = 16,kernel_initializer = "uniform", activation='relu') 126 | y1_output <- concatenated \%>\% keras::layer_dense(1, activation = "sigmoid", name = "has_td") 127 | y2_output <- concatenated \%>\% keras::layer_dense(4, activation = "softmax", name = "td_type") 128 | nn_api <- keras::keras_model(list(x1_input,x2_input), list(y1_output,y2_output)) 129 | nn_api \%>\% keras::compile(optimizer = "rmsprop", 130 | loss = c("binary_crossentropy","categorical_crossentropy")) 131 | nn_api \%>\% keras::fit(list(x1_train, x2_train),list(y1_train, y2_train),20,batch_size = 1028) 132 | 133 | scores_and_ntiles <- prepare_scores_and_ntiles_keras(inputlists = list("input_train","input_test"), 134 | inputlist_labels = list("train data","test data"), 135 | models = list("nn_api"), 136 | model_labels = list("keras api model"), 137 | outputlists = list("output_train","output_test"), 138 | select_output_index = 2, 139 | targetclass_labels = list('no.td','td.type.A','td.type.B','td.type.C'), 140 | ntiles = 100) 141 | plot_input <- plotting_scope(prepared_input=scores_and_ntiles,scope="compare_targetclasses") 142 | plot_cumgains(data = plot_input) 143 | plot_cumlift(data = plot_input) 144 | plot_response(data = plot_input) 145 | plot_cumresponse(data = plot_input) 146 | plot_multiplot(data = plot_input) 147 | } 148 | } 149 | \seealso{ 150 | \code{\link{modelplotr}} for generic info on the package \code{moddelplotr} 151 | 152 | \code{vignette('modelplotr')} 153 | 154 | \code{\link{plotting_scope}} for details on the function \code{plotting_scope} that 155 | transforms a dataframe created with \code{prepare_scores_and_ntiles} or \code{aggregate_over_ntiles} to 156 | a dataframe in the required format for all modelplotr plots. 157 | 158 | \code{\link{aggregate_over_ntiles}} for details on the function \code{aggregate_over_ntiles} that 159 | aggregates the output of \code{prepare_scores_and_ntiles} to create a dataframe with aggregated actuals and predictions. 160 | In most cases, you do not need to use it since the \code{plotting_scope} function will call this function automatically. 161 | 162 | \url{https://github.com/modelplot/modelplotr} for details on the package 163 | 164 | \url{https://modelplot.github.io/} for our blog on the value of the model plots 165 | } 166 | -------------------------------------------------------------------------------- /modelplotr.Rcheck/00check.log: -------------------------------------------------------------------------------- 1 | * using log directory 'C:/TEMP/modelplotr/modelplotr.Rcheck' 2 | * using R version 3.5.2 (2018-12-20) 3 | * using platform: x86_64-w64-mingw32 (64-bit) 4 | * using session charset: ISO8859-1 5 | * checking for file 'modelplotr/DESCRIPTION' ... ERROR 6 | Required fields missing or empty: 7 | 'Author' 'Maintainer' 8 | * DONE 9 | Status: 1 ERROR 10 | -------------------------------------------------------------------------------- /modelplotr.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | 15 | AutoAppendNewline: Yes 16 | StripTrailingWhitespace: Yes 17 | 18 | BuildType: Package 19 | PackageUseDevtools: Yes 20 | PackageInstallArgs: --no-multiarch --with-keep.source 21 | PackageRoxygenize: rd,collate,namespace,vignette 22 | -------------------------------------------------------------------------------- /tests/testthat.R: -------------------------------------------------------------------------------- 1 | library(testthat) 2 | library(modelplotr) 3 | 4 | #test_check("modelplotr") 5 | -------------------------------------------------------------------------------- /tests/testthat/test-prepare_scores_and_deciles.R: -------------------------------------------------------------------------------- 1 | context("prepare_scores_and_deciles") 2 | 3 | library(magrittr) 4 | library(testthat) 5 | # Generate some sample data ----------------------------------------------- 6 | 7 | data(iris) 8 | 9 | # add some noise to iris to prevent perfect models 10 | addNoise <- function(x) round(rnorm(n=100,mean=mean(x),sd=sd(x)),1) 11 | 12 | iris_addnoise <- as.data.frame(lapply(iris[1:4], addNoise)) 13 | 14 | iris_addnoise$Species <- sample(unique(iris$Species),100,replace=TRUE) 15 | 16 | iris <- rbind(iris,iris_addnoise) 17 | 18 | train_index = sample(seq(1, nrow(iris)),size = 0.7*nrow(iris), replace = F ) 19 | train = iris[train_index,] 20 | 21 | test = iris[-train_index,] 22 | 23 | 24 | # Expected answers -------------------------------------------------------- 25 | 26 | expected_cols <- c("model_label", "dataset_label", "y_true", 27 | "prob_setosa", "prob_versicolor", "prob_virginica", 28 | "dcl_setosa", "dcl_versicolor", "dcl_virginica") 29 | 30 | expected_col_types <- c("factor", "factor", "factor", "numeric", 31 | "numeric", "numeric", "numeric", "numeric", 32 | "numeric") 33 | 34 | 35 | 36 | 37 | # H2o tests --------------------------------------------------------------- 38 | 39 | # Load H2o and initialise ----------------------------------------------- 40 | library(h2o) 41 | 42 | h2o.init() 43 | h2o.no_progress() 44 | 45 | h2o_train <- train %>% 46 | as.h2o() 47 | 48 | # Train a GBM ------------------------------------------------------------- 49 | 50 | h2o_model <- h2o.gbm(y = "Species", 51 | x = setdiff(colnames(train), "Species"), 52 | training_frame = h2o_train, 53 | nfolds = 5) 54 | 55 | 56 | df <- prepare_scores_and_deciles(datasets=list("train","test"), 57 | dataset_labels = list("train data","test data"), 58 | models = list("h2o_model"), 59 | model_labels = list("h2o gbm"), 60 | target_column="Species") 61 | 62 | test_that("h2o models are properly formatted for aggregate_over_deciles", { 63 | df <- get("scores_and_deciles") 64 | col_names <- colnames(df) 65 | col_types <- map_chr(df, ~ class(.)) 66 | 67 | expect_equal(colnames(df), expected_cols) 68 | expect_equal(unname(col_types), expected_col_types) 69 | }) 70 | 71 | 72 | # Test MLR support 73 | trainTask <- mlr::makeClassifTask(data = train, target = "Species") 74 | testTask <- mlr::makeClassifTask(data = test, target = "Species") 75 | 76 | # Test mlr support -------------------------------------------------------- 77 | 78 | mlr::configureMlr() # this line is needed when using mlr without loading it (mlr::) 79 | 80 | task <- mlr::makeClassifTask(data = train, target = "Species") 81 | lrn <- mlr::makeLearner("classif.randomForest", predict.type = "prob") 82 | rf <- mlr::train(lrn, task) 83 | 84 | 85 | prepare_scores_and_deciles(datasets=list("train","test"), 86 | dataset_labels = list("train data","test data"), 87 | models = list("rf"), 88 | model_labels = list("random forest"), 89 | target_column="Species") 90 | 91 | test_that("mlr models are properly formatted for aggregate_over_deciles", { 92 | 93 | df <- get("scores_and_deciles") 94 | col_names <- colnames(df) 95 | col_types <- map_chr(df, ~ class(.)) 96 | 97 | expect_equal(colnames(df), expected_cols) 98 | expect_equal(unname(col_types), expected_col_types) 99 | }) 100 | 101 | 102 | 103 | # Test caret support ------------------------------------------------------ 104 | 105 | rf <- caret::train(Species ~.,data = train, method = "rf") 106 | 107 | prepare_scores_and_deciles(datasets=list("train","test"), 108 | dataset_labels = list("train data","test data"), 109 | models = list("rf"), 110 | model_labels = list("random forest"), 111 | target_column="Species") 112 | 113 | test_that("caret models are properly formatted for aggregate_over_deciles", { 114 | 115 | df <- get("scores_and_deciles") 116 | col_names <- colnames(df) 117 | col_types <- map_chr(df, ~ class(.)) 118 | 119 | expect_equal(colnames(df), expected_cols) 120 | expect_equal(unname(col_types), expected_col_types) 121 | }) 122 | 123 | -------------------------------------------------------------------------------- /vignettes/.gitignore: -------------------------------------------------------------------------------- 1 | *.html 2 | *.R 3 | -------------------------------------------------------------------------------- /vignettes/cumgains.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jurrr/modelplotr/9ca4c5dc319eae91c854038f51fb76396fe82371/vignettes/cumgains.png -------------------------------------------------------------------------------- /vignettes/modelplotr.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "modelplotr: Plots to evaluate the business value of predictive models" 3 | author: "Jurriaan Nagelkerke" 4 | date: "`r Sys.Date()`" 5 | output: rmarkdown::html_vignette 6 | vignette: > 7 | %\VignetteIndexEntry{modelplotr} 8 | %\VignetteEngine{knitr::rmarkdown} 9 | %\VignetteEncoding{UTF-8} 10 | --- 11 | 12 | ```{r setup, include = FALSE} 13 | knitr::opts_chunk$set( 14 | collapse = TRUE, 15 | comment = "#>" 16 | ) 17 | library(modelplotr) 18 | library(kableExtra) 19 | ``` 20 | 21 | 22 | > Why ROC curves are a bad idea to explain your model to business people 23 | 24 | The modelplotr package makes it easy to create a number of valuable evaluation plots to assess the business value of a predictive model. Using these plots, it can be shown how implementation of the model will impact business targets like response or return on investment of a campaign. 25 | 26 | 27 | ## Why use modelplotr 28 | 29 | ```{r pressure, echo=FALSE, fig.cap="Cartoon ROC plot", out.width = '100%'} 30 | knitr::include_graphics("https://modelplot.github.io/img/cartoonrocplot.jpg") 31 | ``` 32 | 33 | > ‘...And as we can see clearly on this ROC plot, the sensitivity of the model at the value of 0.2 on one minus the specificity is quite high! Right?…’. 34 | 35 | If your fellow business colleagues didn’t already wander away during your presentation about your fantastic predictive model, it will definitely push them over the edge when you start talking like this. Why? Because the ROC curve is not easy to quickly explain and also difficult to translate into answers on the business questions your spectators have. And these business questions were the reason you’ve built a model in the first place! 36 | 37 | What business questions? We build models for all kinds of supervised classification problems. Such as predictive models to select the best records in a dataset, which can be customers, leads, items, events... For instance: You want to know which of your active customers have the highest probability to churn; you need to select those prospects that are most likely to respond to an offer; you have to identify transactions that have a high risk to be fraudulent. During your presentation, your audience is therefore mainly focused on answering questions like: Does your model enable us to select our target audience? How much better will we be doing, using your model? What will the expected response on our campaign be? What is the optimal selection size? Modelplotr helps you answer these busines questions. 38 | 39 | During our model building efforts, we should already be focused on verifying how well the model performs. Often, we do so by training the model parameters on a selection or subset of records and test the performance on a holdout set or external validation set. We look at a set of performance measures like the ROC curve and the AUC value. These plots and statistics are very helpful to check during model building and model optimization whether your model is under- or overfitting and what set of parameters performs best on test data. However, these statistics are not that valuable in assessing the *business value* the model you developed. 40 | 41 | One reason that the ROC curve is not that useful in explaining the business value of your model, is because it’s quite hard to explain the interpretation of ‘area under the curve’, ‘specificity’ or ‘sensitivity’ to business people. Another important reason that these statistics and plots are useless in your business meetings is that they don’t help in determining how to apply your predictive model: What percentage of records should we select based on the model? Should we select only the best 10% of cases? Or should we stop at 30%? Or go on until we have selected 70%?... This is something you want to decide together with your business colleague to best match the business plans and campaign targets they have to meet. The four plots - the cumulative gains, cumulative lift, response and cumulative response - we are about to introduce are in our view the best ones for that cause. On top of that, we also introduce three plots that enable plotting the financial impact of using your model in a campaign: The costs & revenues plot, the profit plot and the return on investment plot. In the end, talking about financial consequences of implementing a is often the most effective when discussing the value of your model with business colleagues. 42 | 43 | 44 | ## The plots in modelplotr 45 | 46 | Before getting into the details how to use modelplotr, let's first introduce the plots you can create with it in more detail. We begin by explaining what in is on the x axis all these plots. Next, we introduce the different measures that are plotted on the y axis of these plots. 47 | 48 | ### All plots: explaining what's on the canvas 49 | 50 | Although each plot sheds light on the business value of your model from a different angle, they all use the same data: 51 | 52 | * Predicted probability for the target class 53 | * Equally sized groups based on this predicted probability, named ntiles 54 | * Actual number of observed target class observations in these groups (ntiles) 55 | 56 | Regarding the ntiles: It’s common practice to split the data to score into 10 equally large groups and call these groups deciles. Observations that belong to the top-10% with highest model probability in a set, are in decile 1 of that set; the next group of 10% with high model probability are decile 2 and finally the 10% observations with the lowest model probability on the target class belong to decile 10. 57 | 58 | *Notice that modelplotr does support that you specify the number of equally sized groups with the parameter **ntiles**. Hence, **ntiles=100** results in 100 equally sized groups with in the first group the 1% with the highest model probability and in group 100 the 1% with the lowest model probability. These groups are often referred to as percentiles; modelplotr will also label them as such. Any value between 4 and 100 can be specified for **ntiles**. For illustration purposes, we will use deciles, hence the default of **ntiles=10** * 59 | 60 | Each of the plots in modelplotr places the deciles on the x axis and another measure on the y axis. The deciles are plotted from left to right so the observations with the highest model probability are on the left side of the plot. This results in plots like this: 61 | 62 | ```{r decileplot, echo=FALSE, out.width = '100%'} 63 | knitr::include_graphics("https://modelplot.github.io/img/decileplot.png") 64 | ``` 65 | 66 | Now that it’s clear what is on the horizontal axis of each of the plots, we can go into more detail on the metrics for each plot on the vertical axis. For each plot, there's a brief explanation what insight you gain with the plot from a business perspective. 67 | 68 | ### Cumulative gains plot 69 | 70 | The cumulative gains plot - often named ‘gains plot’ - helps you answer the question: 71 | 72 | > When we apply the model and select the best X deciles, what % of the actual target class observations can we expect to target? 73 | 74 | Hence, the cumulative gains plot visualises the percentage of the target class members you have selected if you would decide to select up until decile X. This is a very important business question, because in most cases, you want to use a predictive model to target a subset of observations - customers, prospects, cases,... - instead of targeting all cases. And since we won't build perfect models all the time, we will miss some potential. 75 | 76 | So, we'll have to accept we will lose some. What percentage of the actual target class members you do select with your model at a given decile, that’s what the cumulative gains plot tells you. The plot comes with two reference lines to tell you how good/bad your model is doing: The random model line and the wizard model line. The random model line tells you what proportion of the actual target class you would expect to select when no model is used at all. This vertical line runs from the origin (with 0% of cases, you can only have 0% of the actual target class members) to the upper right corner (with 100% of the cases, you have 100% of the target class members). It’s the rock bottom of how your model can perform; are you close to this, then your model is not much better than a coin flip. The wizard model is the upper bound of what your model can do. It starts in the origin and rises as steep as possible towards 100%. If less than 10% of all cases belong to the target category, this means that it goes steep up from the origin to the value of decile 1 and cumulative gains of 100% and remains there for all other deciles as it is a cumulative measure. Your model will always move between these two reference lines - closer to a wizard is always better - and looks like this: 77 | 78 | ```{r cumgainsplot, echo=FALSE, out.width = '100%'} 79 | knitr::include_graphics("https://modelplot.github.io/img/cumgainsplot.png") 80 | ``` 81 | 82 | 83 | 84 | ### Cumulative lift plot 85 | 86 | The cumulative lift plot, often referred to as lift plot or index plot, helps you answer the question: 87 | 88 | > When we apply the model and select the best X deciles, how many times better is that than using no model at all? 89 | 90 | The lift plot helps you in explaining how much better selecting based on your model is compared to taking random selections instead. Especially when models are not yet used within a certain organisation or domain, this really helps business understand what selecting based on models can do for them. 91 | 92 | The lift plot only has one reference line: the ‘random model’. With a random model we mean that each observation gets a random number and all cases are devided into deciles based on these random numbers. The % of actual target category observations in each decile would be equal to the overall % of actual target category observations in the total set. Since the lift is calculated as the ratio of these two numbers, we get a horizontal line at the value of 1. Your model should however be able to do better, resulting in a high ratio for decile 1. How high the lift can get, depends on the quality of your model, but also on the % of target class observations in the data: If 50% of your data belongs to the target class of interest, a perfect model would 'only' do twice as good (lift: 2) as a random selection. With a smaller target class value, say 10%, the model can potentially be 10 times better (lift: 10) than a random selection. Therefore, no general guideline of a 'good' lift can be specified. Towards decile 10, since the plot is cumulative, with 100% of cases, we have the whole set again and therefore the cumulative lift will always end up at a value of 1. It looks like this: 93 | 94 | ```{r cumliftplot, echo=FALSE, out.width = '100%'} 95 | knitr::include_graphics("https://modelplot.github.io/img/cumliftplot.png") 96 | ``` 97 | 98 | 99 | 100 | ### Response plot 101 | 102 | One of the easiest to explain evaluation plots is the response plot. It simply plots the percentage of target class observations per decile. It can be used to answer the following business question: 103 | 104 | > When we apply the model and select decile X, what is the expected % of target class observations in that decile? 105 | 106 | The plot has one reference line: The % of target class cases in the total set. It looks like this: 107 | 108 | ```{r responseplot, echo=FALSE, out.width = '100%'} 109 | knitr::include_graphics("https://modelplot.github.io/img/responseplot.png") 110 | ``` 111 | 112 | 113 | A good model starts with a high response value in the first decile(s) and suddenly drops quickly towards 0 for later deciles. This indicates good differentiation between target class members - getting high model scores - and all other cases. An interesting point in the plot is the location where your model’s line intersects the random model line. From that decile onwards, the % of target class cases is lower than a random selection of cases would hold. 114 | 115 | 116 | ### Cumulative response plot 117 | 118 | Finally, one of the most used plots: The cumulative response plot. It answers the question burning on each business rep's lips: 119 | 120 | > When we apply the model and select up until decile X, what is the expected % of target class observations in the selection? 121 | 122 | The reference line in this plot is the same as in the response plot: the % of target class cases in the total set. 123 | 124 | ```{r cumresponseplot, echo=FALSE, out.width = '100%'} 125 | knitr::include_graphics("https://modelplot.github.io/img/cumresponseplot.png") 126 | ``` 127 | 128 | 129 | Whereas the response plot crosses the reference line, in the cumulative response plot it never crosses it but ends up at the same point for decile 10: Selecting all cases up until decile 10 is the same as selecting all cases, hence the % of target class cases will be exactly the same. This plot is most often used to decide - together with business colleagues - up until what decile to select for a campaign. 130 | 131 | 132 | 133 | **To plot the financial implications of implementing a predictive model, modelplotr provides three additional plots: the Costs & revenues plot, the Profit plot and the ROI plot. ** 134 | 135 | ### Costs & Revenues plot 136 | 137 | The costs & revenues plot plots both the cumulative revenues and and the cumulative costs (investments) up until that decile when the model is used for campaign selection. It can be used to answer the following business question: 138 | 139 | > When we apply the model and select up until decile X, what are the expected revenues and investments of the campaign? 140 | 141 | The plot includes both costs and revenues lines. The costs are the cumulative costs of selecting up until a given decile and consist of both fixed costs and variable costs. The fixed costs for campaigns often include costs to create the campaign and other costs that do not vary with the size of the campaign selection. The variable costs do depend on the selection size, resulting in a linear increasing line. The revenues take into account the expected response % - as plotted in the cumulative response plot - as well as the expected revenue per response. 142 | 143 | ```{r costsrevsplot, echo=FALSE, out.width = '100%'} 144 | knitr::include_graphics("https://modelplot.github.io/img/costsrevsplot.png") 145 | ``` 146 | 147 | The campaign is profitable in the plot area where revenues exceed costs. The optimal profit might be difficult to see quickly, since the reference line is a diagonal. Therefore, to evaluate profitability, the next plot - the profit plot - is more suitable. 148 | 149 | 150 | ### Profit plot 151 | 152 | The profit plot visualized the cumulative profit up until that decile when the model is used for campaign selection. It can be used to answer the following business question: 153 | 154 | > When we apply the model and select up until decile X, what is the expected profit of the campaign? 155 | 156 | ```{r profitplot, echo=FALSE, out.width = '100%'} 157 | knitr::include_graphics("https://modelplot.github.io/img/profitplot.png") 158 | ``` 159 | 160 | From this plot, it can be quickly spotted with what selection size the campaign profit is maximized. However, this does not mean that this is the best option from an investment point of view. It might be that, taking into consideration the investments that is needed for the profit, another decile is preferred. Therefore, the roi plot is needed as well. 161 | 162 | 163 | 164 | ### Return on investment plot 165 | 166 | The Return on Investment plot plots the cumulative revenues as a percentage of investments up until that decile when the model is used for campaign selection. It can be used to answer the following business question: 167 | 168 | > When we apply the model and select up until decile X, what is the expected % return on investment of the campaign? 169 | 170 | ```{r roiplot, echo=FALSE, out.width = '100%'} 171 | knitr::include_graphics("https://modelplot.github.io/img/roiplot.png") 172 | ``` 173 | 174 | From this plot, the selection size with the optimal return on investment for the campaign is easily identified. Do note that the decile at which the campaign profit is maximized, is not necessarily the same as the decile where the campaign ROI is maximized. It can be the case that a bigger selection (higher decile) results in a higher profit, however this selection needs a larger investment, impacting the ROI negatively. 175 | 176 | 177 | 178 | ## Data preparation steps 179 | 180 | To be able to use the plots presented above, you need to create a dataframe that serves as an input for all these plots. We've included three functions in modelplotr to create this input dataframe really easy and fast. Especially when you've trained your models using **caret**, **mlr**, **h2o** or **keras** the process is super simple. In this case, you ony need two function calls to prepare your data for plotting. In case you use another package to train your models or when you've trained your models outside of R, you can still use modelplotr, you just need one extra step. Further on we'll guide you how to prepare the input in that case. 181 | 182 | ### Some example data 183 | 184 | To show how modelplotr works, we've included some test data in the package. This dataset is a subset of the dataset made available by the University of California, Irvine. The complete dataset is available here: https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank-additional.zip. Let's load the data: 185 | 186 | ```{r loaddata, echo=TRUE} 187 | # load example data (Bank clients that have/have not subscribed a term deposit - see ?bank_td for details) 188 | data("bank_td") 189 | 190 | str(bank_td) 191 | ``` 192 | 193 | A brief introduction to the data: This dataset contains 7 variables, including 2 potential target variables and 5 features to predict these targets. The binary target is **has_td**; it indicates whether the client has subscribed for a term deposit ('term.deposit') or not ('no.term.deposit'). The multinomial target is **td_type** which has four possible values: 'no.td', 'td.type.A', 'td.type.B' and 'td.type.C'. This target is included to show that modelplotr also works to plot predictive models with multiclass targets. The five features are a subset of all features available in the actual source, just to have some predictors available to build some example models evaluate with modelplotr. For details on the data, see ?bank_td. 194 | 195 | Now, let's train some models. To illustrate that you can use models trained using caret, mlr, h2o and keras, we'll train some models first, one for each package, and include these models in our input for the plots. This way, we can easily compare the models in the plots. Also, we use a train set and a test set for each model, enabling us to also compare between datasets. More on modelplotr's options to compare stuff (models, datasets, target classes) in the next section on **Plotting scopes**. 196 | 197 | ```{r trainmodels, echo=TRUE} 198 | # prepare data for training model for binomial target has_td and train models 199 | train_index = base::sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE) 200 | train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 201 | test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] 202 | 203 | #train models using mlr... 204 | trainTask <- mlr::makeClassifTask(data = train, target = "has_td") 205 | testTask <- mlr::makeClassifTask(data = test, target = "has_td") 206 | mlr::configureMlr() # this line is needed when using mlr without loading it (mlr::) 207 | task = mlr::makeClassifTask(data = train, target = "has_td") 208 | lrn = mlr::makeLearner("classif.randomForest", predict.type = "prob") 209 | rf = mlr::train(lrn, task) 210 | 211 | #... or train models using caret... 212 | # setting caret cross validation, here tuned for speed (not accuracy!) 213 | fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE) 214 | # mnl model using glmnet package 215 | mnl = caret::train(has_td ~.,data = train, method = "glmnet",trControl = fitControl) 216 | ``` 217 | 218 | Modelplotr can be used with models built with h2o and keras just as easily. Want to see this in action? Run the code below and add these models to the function **prepare_scores_and_ntiles()** in the next step. 219 | ``` {r h2o_keras, echo=TRUE, eval=FALSE} 220 | #.. or train models using h2o... [NOT RUN] 221 | h2o::h2o.init() 222 | h2o::h2o.no_progress() 223 | h2o_train = h2o::as.h2o(train) 224 | h2o_test = h2o::as.h2o(test) 225 | gbm <- h2o::h2o.gbm(y = "has_td", 226 | x = setdiff(colnames(train), "has_td"), 227 | training_frame = h2o_train, 228 | nfolds = 5) 229 | 230 | #.. or train models using keras... [NOT RUN] 231 | x_train <- as.matrix(train[,-1]); y=train[,1]; y_train <- keras::to_categorical(as.numeric(y)-1); `%>%` <- magrittr::`%>%` 232 | nn <- keras::keras_model_sequential() %>% 233 | keras::layer_dense(units = 16,kernel_initializer = "uniform", activation='relu',input_shape = NCOL(x_train)) %>% 234 | keras::layer_dense(units = 16,kernel_initializer = "uniform", activation='relu') %>% 235 | keras::layer_dense(units = length(levels(train[,1])),activation='softmax') 236 | nn %>% keras::compile(optimizer = 'rmsprop',loss = 'categorical_crossentropy',metrics = c('accuracy')) 237 | nn %>% keras::fit(x_train,y_train,epochs = 20,batch_size = 1028,verbose=0) 238 | 239 | ``` 240 | 241 | Now that we have some datasets and some trained models, we can start using modelplotr to prepare the data for plotting: 242 | 243 | ### prepare_scores_and_ntiles() 244 | 245 | This function builds a dataframe object that contains actuals and predictions on the target variable for each dataset in datasets and each model in models. It contains the dataset name, actuals on the target, the predicted probabilities for each class of the target and attribution to ntiles in the dataset for each class of the target. 246 | 247 | The function prepare_scores_and_ntiles() has 6 parameters, of which 3 are required: 248 | 249 | ```{r psn_params, echo=FALSE} 250 | # prepare data 251 | text_tbl <- data.frame( 252 | Parameter = c('datasets *','dataset_labels' , 'models *','model_labels','target_column *','ntiles'), 253 | `Type and Description` = c( 254 | 'List of Strings. A list of the names of the dataframe objects to include in model evaluation. All dataframes need to contain target variable and feature variables.', 255 | 'List of Strings. A list of labels for the datasets, user. When dataset_labels is not specified, the names from datasets are used.', 256 | 'List of Strings. Names of the model objects containing parameters to apply models to data. To use this function, model objects need to be generated by the mlr package or by the caret package or by the h20 package. Modelplotr automatically detects whether the model is built using mlr or caret or h2o.', 257 | 'List of Strings. Labels for the models to use in plots. When model_labels is not specified, the names from moddels are used.', 258 | 'String. Name of the target variable in datasets. Target can be either binary or multinomial. Continuous targets are not supported.', 259 | 'Integer. Number of ntiles. The ntile parameter represents the specified number of equally sized buckets the observations in each dataset are grouped into. By default, observations are grouped in 10 equally sized buckets, often referred to as deciles.' 260 | ) 261 | ) 262 | 263 | kable(text_tbl) %>% 264 | kableExtra::kable_styling(full_width = T,font_size = 10) %>% 265 | kableExtra::row_spec(c(1,3,5),italic = T) 266 | 267 | 268 | ``` 269 | 270 | 271 | Now, let's use the *scores_and_ntiles()* function on our models to evaluate the business performance on both the train and the test data: 272 | 273 | ```{r prepdata, echo=TRUE} 274 | 275 | # prepare data (for h2o/keras: add "gbm" and "nn" to models and nice labels to model_labels params) 276 | 277 | scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"), 278 | dataset_labels = list("train data","test data"), 279 | models = list("rf","mnl"), 280 | model_labels = list("random forest","multinomial logit"), 281 | target_column="has_td", 282 | ntiles = 100) 283 | 284 | ``` 285 | 286 | This is what the resulting dataframe looks like (first 5 rows): 287 | 288 | ```{r df_sd,echo=FALSE} 289 | scores_and_ntiles %>% 290 | head(5)%>% 291 | kable(row.names = FALSE) %>% 292 | kableExtra::kable_styling(font_size = 10,full_width = FALSE,position="left") 293 | ``` 294 | 295 | ### plotting_scope() 296 | 297 | This function builds a dataframe in the required format for all modelplotr plots, relevant to the selected scope of evaluation. Each record in this dataframe represents a unique combination of datasets, models, target classes and ntiles. 298 | 299 | As an input, plotting_scope can handle a dataframe created with prepare_scores_and_ntiles (see above) or created otherwise with similar layout. Also, you can provide a dataframe created with aggregate_over_ntiles(). See the section below for more info on the function aggregate_over_ntiles. 300 | 301 | Aside from the input, the most important parameter is **scope=**. There are four perspectives you can take to plot with modelplotr: 302 | 303 | ```{r scopes, echo=FALSE} 304 | # prepare data 305 | text_tbl <- data.frame( 306 | Scope = c('"no_comparison" (default)','"compare_models"' , '"compare_datasets"','"compare_targetclasses"'), 307 | Description = c( 308 | "In this perspective, you're interested in the performance of one model on one dataset for one target class. Therefore, only one line is plotted in the plots. The parameters select_model_label, select_dataset_label and select_targetclass determine which group is plotted. When not specified, the first alphabetic model, the first alphabetic dataset and the smallest (when select_smallest_targetclass=TRUE) or first alphabetic target value are selected", 309 | "In this perspective, you're interested in how well different models perform in comparison to each other on the same dataset and for the same target value. This results in a comparison between models available in ntiles_aggregate\\$model_label for a selected dataset (default: first alphabetic dataset) and for a selected target value (default: smallest (when select_smallest_targetclass=TRUE) or first alphabetic target value).", 310 | "In this perspective, you're interested in how well a model performs in different datasets for a specific model on the same target value. This results in a comparison between datasets available in ntiles_aggregate\\$dataset_label for a selected model (default: first alphabetic model) and for a selected target value (default: smallest (when select_smallest_targetclass=TRUE) or first alphabetic target value).", 311 | "In this perspective, you're interested in how well a model performs for different target values on a specific dataset.This resuls in a comparison between target classes available in ntiles_aggregate\\$target_class for a selected model (default: first alphabetic model) and for a selected dataset (default: first alphabetic dataset)." 312 | ) 313 | ) 314 | 315 | kable(text_tbl) %>% 316 | kableExtra::kable_styling(full_width = T,font_size = 10) %>% 317 | kableExtra::row_spec(1,italic = T) 318 | 319 | 320 | ``` 321 | 322 | Other parameters let you select a subset of models/datasets/target classes you want to include in your plot, see ?plotting_scope for details. 323 | 324 | ```{r ps_params, echo=FALSE} 325 | # prepare data 326 | text_tbl <- data.frame( 327 | Parameter = c('prepared_input *','scope' , 'select_model_label','select_dataset_label','select_targetclass','select_smallest_targetclass'), 328 | `Type and Description` = c( 329 | 'Dataframe. Dataframe created with prepare_scores_and_ntiles or dataframe created with aggregate_over_ntiles or a dataframe that is created otherwise with similar layout as the output of these functions (see ?prepare_scores_and_ntiles and ?aggregate_over_ntiles for layout details)', 330 | 'String. Evaluation type of interest. Possible values: "compare_models","compare_datasets", "compare_targetclasses","no_comparison". Default is NA, equivalent to "no_comparison".', 331 | 'String. Selected model when scope is "compare_datasets" or "compare_targetclasses" or "no_comparison". Needs to be identical to model descriptions as specified in model_labels (or models when model_labels is not specified). When scope is "compare_models", select_model_label can be used to take a subset of available models.', 332 | 'String. Selected dataset when scope is compare_models or compare_targetclasses or no_comparison. Needs to be identical to dataset descriptions as specified in dataset_labels (or datasets when dataset_labels is not specified). When scope is "compare_datasets", select_dataset_label can be used to take a subset of available datasets.', 333 | 'String. Selected target value when scope is compare_models or compare_datasets or no_comparison. Default is smallest value when select_smallest_targetclass=TRUE, otherwise first alphabetical value. When scope is "compare_targetclasses", select_targetclass can be used to take a subset of available target classes.', 334 | 'Boolean. Select the target value with the smallest number of cases in dataset as group of interest. Default is True, hence the target value with the least observations is selected' 335 | ) 336 | ) 337 | 338 | kable(text_tbl) %>% 339 | kableExtra::kable_styling(full_width = T,font_size = 10) %>% 340 | kableExtra::row_spec(c(1),italic = T) 341 | 342 | 343 | ``` 344 | 345 | Now, let's use plotting_scope to generate the input for all plots: 346 | 347 | ```{r pi,echo=TRUE} 348 | #generate input data frame for all plots in modelplotr 349 | plot_input <- plotting_scope(prepared_input = scores_and_ntiles) 350 | ``` 351 | 352 | Since we only provided the input data and not the scope, the default scope (no comparison) is used. To adjust, specify **scope=** and/or set the other parameters to customize the models/datasets/target classes you want to include in your plots. 353 | 354 | ### Custom prepration of modelplotr input 355 | 356 | Maybe you prefer to prepare the input for modelplotr differently. For instance when your models are not created with mlr, catet, h2o or keras. Or when your models are created outside of R or your already have the scored data available. In these cases, you have two extra options to prepare the input for **plotting_scope**, as visualised below: 357 | 358 | ```{r modelplotr_process, echo=FALSE, out.width = '100%'} 359 | knitr::include_graphics("https://modelplot.github.io/img/modelplotr_process.png") 360 | ``` 361 | 362 | #### Option 2: Prepare input for aggregate_over_ntiles() 363 | 364 | With option 2, you prepare your own dataframe containing actuals and probabilities and ntiles (1st ntile = (1/#ntiles) percent with highest model probability, last ntile = (1/#ntiles) percent with lowest probability according to model) , In that case, make sure the input dataframe contains the folowing columns & formats: 365 | 366 | 367 | ```{r custinput_2, echo=FALSE, out.width = '100%'} 368 | # prepare data 369 | text_tbl <- data.frame( 370 | column = c('model_label','dataset_label','y_true','prob_[tv1]','prob_[tv2]','...', 371 | 'prob_[tvn]','ntl_[tv1]','ntl_[tv2]','...','ntl_[tvn]'), 372 | type = c('Factor','Factor','Factor','Decimal','Decimal','...','Decimal','Integer','Integerl','...','Integer'), 373 | definition = c('Name of the model object','Datasets to include in the plot as factor levels','Target with actual values', 374 | 'Probability according to model for target value 1','Probability according to model for target value 2','...', 375 | 'Probability according to model for target value n','Ntile based on probability according to model for target value 1', 376 | 'Ntile based on probability according to model for target value 2','...','Ntile based on probability according to model for target value n') 377 | ) 378 | 379 | kable(text_tbl) %>% 380 | kableExtra::row_spec(c(1),italic = T) %>% 381 | kableExtra::kable_styling(font_size = 10,bootstrap_options = "basic", 382 | latex_options = "basic", full_width = TRUE, position = "center") 383 | 384 | 385 | ``` 386 | 387 | Once you have this data frame prepared, you can use it as an input for **plotting_scope()**, as it aggregates the input automatically. If you prefer, you can aggregate it first yourself using **aggregate_over_ntiles()**. 388 | 389 | #### Option 3: Prepare input for plotting_scope() 390 | 391 | A third option is to build an aggregated data frame yourself as an input for **plotting_scope()**. This does require some extra preparation that is not needed when using option 2, but this can be the better option when you don't want to move around actual and predicted scores on individual cases, due to size or maybe privacy/confidentiality. In this case, make sure the data frame you create, exactly matches the definitions below: 392 | 393 | ```{r custinput_3, echo=FALSE, out.width = '100%'} 394 | # prepare data 395 | text_tbl <- data.frame( 396 | column = c('model_label','dataset_label','target_class','ntile','neg','pos','tot','pct','negtot','postot','tottot','pcttot', 397 | 'cumneg','cumpos','cumtot','cumpct','gain','cumgain','gain_ref','gain_opt','lift','cumlift','cumlift_ref'), 398 | type = c('String','Factor','String or Integer','Integer','Integer','Integer','Integer','Decimal','Integer','Integer','Integer', 399 | 'Decimal','Integer','Integer','Integer','Integer','Decimal','Decimal','Decimal','Decimal','Decimal','Decimal','Decimal'), 400 | definition = c('Name of the model object','Datasets to include in the plot as factor levels','Target classes to include in the plot','Ntile groups based on model probability for target class','Number of cases not belonging to target class in dataset in ntile','Number of cases belonging to target class in dataset in ntile','Total number of cases in dataset in ntile', 401 | 'Percentage of cases in dataset in ntile that belongs to target class (pos/tot)','Total number of cases not belonging to target class in dataset','Total number of cases belonging to target class in dataset','Total number of cases in dataset', 402 | 'Percentage of cases in dataset that belongs to target class (postot / tottot)','Cumulative number of cases not belonging to target class in dataset from ntile 1 up until ntile','Cumulative number of cases belonging to target class in dataset from ntile 1 up until ntile','Cumulative number of cases in dataset from ntile 1 up until ntile','Cumulative percentage of cases belonging to target class in dataset from ntile 1 up until ntile (cumpos/cumtot)','Gains value for dataset for ntile (pos/postot)', 403 | 'Cumulative gains value for dataset for ntile (cumpos/postot)','Lower reference for gains value for dataset for ntile (ntile/#ntiles)','Upper reference for gains value for dataset for ntile','Lift value for dataset for ntile (pct/pcttot)', 404 | 'Cumulative lift value for dataset for ntile ((cumpos/cumtot)/pcttot)','Reference value for Cumulative lift value (constant: 1)') 405 | ) 406 | 407 | kable(text_tbl) %>% 408 | #kable_styling(full_width = T,font_size = 10) %>% 409 | kableExtra::kable_styling(font_size = 10,full_width = FALSE,position="left") %>% 410 | kableExtra::row_spec(c(1),italic = T) 411 | 412 | 413 | ``` 414 | 415 | Once you have this data frame prepared, you can use it as an input for **plotting_scope()**. 416 | 417 | 418 | ## Plotting 419 | 420 | Now that we have data that are well prepared for all plots, plotting is quite easy. For instance, the cumulative gains plot: 421 | 422 | ```{r plot_cg,echo=TRUE, fig.width=7.2,fig.height=5} 423 | plot_cumgains(data = plot_input) 424 | ``` 425 | 426 | Creating the other non-financial plots is just as easy: 427 | 428 | ```{r plot_cl_r_cr,echo=TRUE, fig.width=7.2,fig.height=5,eval=FALSE} 429 | #Cumulative lift 430 | plot_cumlift(data = plot_input) 431 | 432 | #Response plot 433 | plot_response(data = plot_input) 434 | 435 | #Cumulative response plot 436 | plot_cumresponse(data = plot_input) 437 | ``` 438 | 439 | The cumulative lift plot, cumulative gains plot, response plot and cumulative response plot can be combined on one canvas: 440 | 441 | ```{r decrease_ntile, echo=FALSE} 442 | # prepare data 443 | scores_and_ntiles2 <- prepare_scores_and_ntiles(datasets=list("train","test"), 444 | dataset_labels = list("train data","test data"), 445 | models = list("rf","mnl"), 446 | model_labels = list("random forest","multinomial logit"), 447 | target_column="has_td", 448 | ntiles = 10) 449 | plot_input <- plotting_scope(prepared_input = scores_and_ntiles2) 450 | ``` 451 | 452 | ```{r plot_multi,echo=TRUE, fig.width=7.2,fig.height=5} 453 | plot_multiplot(data = plot_input) 454 | ``` 455 | 456 | ```{r increase_ntile, echo=FALSE} 457 | plot_input <- plotting_scope(prepared_input = scores_and_ntiles) 458 | ``` 459 | 460 | There is a lot you can customize for the plots in modelplotr: all textual elements, line colors, highlighting and annotating the plots at a specific ntile. All these options are discussed further on. 461 | 462 | For financial plots, three extra parameters need to be provided: 463 | - 464 | ```{r fin_params, echo=FALSE} 465 | # prepare data 466 | text_tbl <- data.frame( 467 | Parameter = c('fixed_costs','variable_costs_per_unit','profit_per_unit'), 468 | `Type and Description` = c( 469 | 'Numeric. Specifying the fixed costs related to a selection based on the model. These costs are constant and do not vary with selection size (ntiles).', 470 | 'Numeric. Specifying the variable costs per selected unit for a selection based on the model. These costs vary with selection size (ntiles).', 471 | 'Numeric. Specifying the profit per unit in case the selected unit converts / responds positively.' 472 | ) 473 | ) 474 | 475 | kable(text_tbl) %>% 476 | kableExtra::kable_styling(full_width = T,font_size = 10) %>% 477 | kableExtra::row_spec(c(1),italic = T) 478 | 479 | 480 | ``` 481 | 482 | With these extra parameters, all three financial plots can be plotted, for instance the ROI plot: 483 | ```{r plot_roi,echo=TRUE, fig.width=7.2,fig.height=5} 484 | plot_roi(data = plot_input,fixed_costs = 1000,variable_costs_per_unit = 10,profit_per_unit = 50) 485 | ``` 486 | 487 | By default, in the ROI plot the ntile is highlighted where return on investment is highest. In the profit plot and costs & revenues plot, the ntile where the profit is highest is highlighted by default. This can be changed, see the section on highlighting or ?plot_roi / ?plot_profit / ?plot_costsrevs for details. As an example: 488 | 489 | ```{r plot_costrev_profit,echo=TRUE, fig.width=7.2,fig.height=5} 490 | 491 | #Costs & Revenues plot, highlighted at max roi instead of max profit 492 | plot_costsrevs(data = plot_input,fixed_costs = 1000,variable_costs_per_unit = 10,profit_per_unit = 50,highlight_ntile = "max_roi") 493 | 494 | #Profit plot , highlighted at custom ntile instead of at max profit 495 | plot_profit(data = plot_input,fixed_costs = 1000,variable_costs_per_unit = 10,profit_per_unit = 50,highlight_ntile = 5) 496 | 497 | ``` 498 | 499 | 500 | 501 | ## Highlighting and customizing plots 502 | 503 | The look and feel of plots can be customized in a number of ways. In the next sections all customizations are presented. 504 | 505 | ### highlighting 506 | 507 | To highlight a specific decile (or ntile), this can be done with the parameter **highlight_ntile=**. 508 | 509 | ```{r plot_cgh,echo=TRUE, fig.width=7.2,fig.height=5} 510 | plot_cumgains(data = plot_input,highlight_ntile = 20) 511 | ``` 512 | 513 | For financial plots (plot_costsrevs, plot_profit and plot_roi), the highlighing is added automatically, highlighting the optimum. If you want to highlight at another decile value, this can easily be done by setting the parameter (eg. highlight_ntile = 20). 514 | 515 | With parameter **highlight_how** you can specify How to annotate the plot. Possible values: "plot_text","plot", "text". Default is "plot_text", both highlighting the ntile and value on the plot as well as in text below the plot. "plot" only highligths the plot, but does not add text below the plot explaining the plot at chosen ntile. "text" adds text below the plot explaining the plot at chosen ntile but does not highlight the plot. 516 | 517 | ```{r plot_crhh,echo=TRUE, fig.width=7.2,fig.height=5} 518 | plot_cumresponse(data = plot_input,highlight_ntile = 20,highlight_how = 'plot') 519 | ``` 520 | 521 | 522 | ### Customizing textual elements 523 | 524 | All textual elements in the plots can be customized. To achieve this, first you have to create a list object with all default values for all textual elements. Modelplotr has a special function to build this list. Once the list is created, you can easily explore the defaults for all plots and change them to your will. 525 | 526 | ```{r customtext,echo=TRUE} 527 | my_text <- customize_plot_text(plot_input=plot_input) 528 | 529 | #explore default values for the cumulative response plot: 530 | my_text$cumresponse 531 | 532 | #translate to Dutch 533 | my_text$cumresponse$plottitle <- 'Cumulatieve Respons grafiek' 534 | my_text$cumresponse$x_axis_label <- 'percentiel' 535 | my_text$cumresponse$y_axis_label <- '% respons (cumulatief)' 536 | my_text$cumresponse$response_refline_label <- 'respons in totale dataset' 537 | my_text$cumresponse$annotationtext <- "Selecteren we percentiel 1 t/m &NTL volgens model &MDL in dataset &DS dan is het %% &YVAL gevallen in de selectie &VALUE." 538 | 539 | 540 | ``` 541 | 542 | As you can see, in the annotationtext you can take advantage of some placeholders starting with **&**. For details on available placeholders, see **?customize_plot_text**. 543 | 544 | Now, you can include the altered list in your plot function to use the custom plot element texts: 545 | 546 | ```{r plotcustomtext,echo=TRUE, fig.width=7.2,fig.height=5} 547 | plot_cumresponse(data = plot_input,highlight_ntile = 20,custom_plot_text = my_text) 548 | ``` 549 | 550 | 551 | 552 | ### Customizing colors 553 | 554 | The colors of the value lines in all plots can be changed setting the parameter **custom_line_colors=** to a vector of strings, specifying colors for the lines in the plot. Both color names and color codes and RColorbrewer palet can be used. The vector is automatically cropped / expanded to match the required length. When not specified, colors from the RColorBrewer palet "Set1" are used. 555 | 556 | ```{r plotcustomcolor,echo=TRUE, fig.width=7.2,fig.height=5} 557 | # set scope to compare models, to have several lines in the plots 558 | plot_input <- plotting_scope(prepared_input = scores_and_ntiles,scope = 'compare_models') 559 | 560 | #customize plot line colors with RColorbrewer 561 | plot_cumgains(data = plot_input,custom_line_colors = RColorBrewer::brewer.pal(2,'Accent')) 562 | 563 | #customize plot line colors with color names / hexadecimal codes 564 | plot_cumlift(data = plot_input,custom_line_colors = c('deepskyblue2','#FF0000')) 565 | 566 | ``` 567 | 568 | 569 | ### Saving plots 570 | 571 | Saving plots can be done by setting the parameter **save_fig = TRUE** and/or by providing a filename for the plot (**save_fig_filename = **). The plot name can include the path to the location where to save the plot (eg. 'C://TEMP//myplotname.png'). When no (location and) file name is specified, the plot is saved to a temporary directory (**tempdir()**) with the plot type as its name. 572 | 573 | ```{r saveplot,echo=TRUE,eval=FALSE, fig.width=7.2,fig.height=5} 574 | 575 | # save plot with defaults 576 | plot_cumgains(data = plot_input,save_fig = TRUE) 577 | 578 | # save plot with custom filename 579 | plot_cumlift(data = plot_input,save_fig_filename = 'plot123.png') 580 | 581 | # save plot with custom location 582 | plot_cumresponse(data = plot_input,save_fig_filename = 'D:\\') 583 | 584 | # save plot with custom location and filename 585 | plot_cumresponse(data = plot_input,save_fig_filename = 'D:\\plot123.png') 586 | 587 | ``` 588 | 589 | 590 | ### modelplotr & Multinomial targets 591 | 592 | Modelplotr can also be used for a multinomial target. To illustrate this, let's build a model on the multinomial target in the data and plot the cumulative response chart. All functionality is the same, see above for details how to customize the plots. 593 | 594 | ```{r multinom, echo=TRUE, fig.width=7.2,fig.height=5} 595 | 596 | # prepare data for training model for multinomial target td_type and train models 597 | train_index = base::sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE) 598 | train = bank_td[train_index,c('td_type','duration','campaign','pdays','previous','euribor3m')] 599 | test = bank_td[-train_index,c('td_type','duration','campaign','pdays','previous','euribor3m')] 600 | 601 | # train a model 602 | # setting caret cross validation, here tuned for speed (not accuracy!) 603 | fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE) 604 | # mnl model using glmnet package 605 | mnl = caret::train(td_type ~.,data = train, method = "glmnet",trControl = fitControl) 606 | 607 | # prepare data 608 | scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"), 609 | dataset_labels = list("train data","test data"), 610 | models = list("mnl"), 611 | model_labels = list("multinomial logit"), 612 | target_column="td_type", 613 | ntiles = 100) 614 | 615 | #generate input data frame for all plots, set scope at comparing target classes, leave out the 'no.td' class 616 | plot_input <- plotting_scope(prepared_input = scores_and_ntiles,scope = 'compare_targetclasses', 617 | select_targetclass = c('td.type.A','td.type.B','td.type.C' )) 618 | 619 | #plot 620 | plot_cumresponse(data = plot_input) 621 | 622 | ``` 623 | 624 | -------------------------------------------------------------------------------- /vignettes/plot123.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jurrr/modelplotr/9ca4c5dc319eae91c854038f51fb76396fe82371/vignettes/plot123.png --------------------------------------------------------------------------------