├── .gitignore ├── DESCRIPTION ├── LICENSE ├── NAMESPACE ├── R ├── compute_moments.R ├── construct_priors.R ├── create_dists.R ├── get_data_dists.R ├── helpers.R ├── metrics.R ├── plotting.R ├── run_simulation.R ├── simulate_data.R ├── simulation_study.R ├── stopping_functions.R └── update_priors.R ├── README.md ├── inst └── abayes │ ├── app.R │ ├── dependencies.R │ ├── how_to.md │ ├── server_funs.R │ └── ui_elements.R ├── man ├── ab_arguments.Rd ├── approx_solver.Rd ├── b_gt_a.Rd ├── bernoulli_dist.Rd ├── beta_cdf.Rd ├── beta_dist.Rd ├── calc_beta_dist.Rd ├── calc_gamma_dist.Rd ├── calc_normal_gamma_dist.Rd ├── create_empty_dt.Rd ├── expected_loss_b.Rd ├── gamma_cdf.Rd ├── gamma_dist.Rd ├── get_actual_loss.Rd ├── get_data_dists.Rd ├── get_losses.Rd ├── get_metrics.Rd ├── get_supported_beta.Rd ├── get_supported_gamma.Rd ├── get_supported_normal_gamma.Rd ├── investigate_simulations.Rd ├── normal_dist.Rd ├── normal_gamma_dist.Rd ├── plot_beta.Rd ├── plot_gamma.Rd ├── plot_normal.Rd ├── plot_relative_gain.Rd ├── poisson_dist.Rd ├── sim_effect_size.Rd ├── simulate_ab_test.Rd ├── simulate_data.Rd ├── single_loss_stop.Rd ├── update_prior.Rd ├── update_priors.Rd ├── validate_dt.Rd └── validate_inputs.Rd ├── shiny_app_screen_shot.png ├── tests ├── testthat.R └── testthat │ ├── test-construct_priors.R │ ├── test-create_dists.R │ ├── test-get_data_dists.R │ ├── test-metrics.R │ ├── test-run_simulation.R │ ├── test-simulation_study.R │ ├── test-stopping_functions.R │ └── test-update_priors.R └── vignettes └── abayes_vignette.Rmd /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | .Rhistory 3 | inst/abayes/rsconnect 4 | -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: abayes 2 | Type: Package 3 | Title: Bayesian A/B Testing 4 | Version: 0.0.1 5 | Authors@R: c( 6 | person("Michael", "Frasco", email = "mfrasco@convoy.com", role = c("aut", "cre")), 7 | person("Convoy Inc.", role = c("cph"))) 8 | Maintainer: Michael Frasco 9 | Description: 10 | Simulate Bayesian A/B testing algorithms and evaluate the results of A/B 11 | tests with a shiny app. 12 | Depends: 13 | R (>= 3.3.0) 14 | Imports: 15 | data.table, 16 | ggplot2, 17 | gridExtra, 18 | parallel, 19 | purrr, 20 | stats 21 | Suggests: 22 | knitr, 23 | testthat, 24 | rmarkdown 25 | License: Apache License 2.0 | file LICENSE 26 | LazyData: TRUE 27 | RoxygenNote: 6.0.1 28 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License, Version 2.0 2 | =========================== 3 | 4 | Copyright 2018 Convoy, Inc. 5 | 6 | Licensed under the Apache License, Version 2.0 (the "License"); 7 | you may not use this file except in compliance with the License. 8 | You may obtain a copy of the License at 9 | 10 | http://www.apache.org/licenses/LICENSE-2.0 11 | 12 | Unless required by applicable law or agreed to in writing, software 13 | distributed under the License is distributed on an "AS IS" BASIS, 14 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | See the License for the specific language governing permissions and 16 | limitations under the License. -------------------------------------------------------------------------------- /NAMESPACE: -------------------------------------------------------------------------------- 1 | # Generated by roxygen2: do not edit by hand 2 | 3 | S3method(b_gt_a,beta_dist) 4 | S3method(b_gt_a,gamma_dist) 5 | S3method(b_gt_a,normal_gamma_dist) 6 | S3method(compute_moments,beta_dist) 7 | S3method(compute_moments,gamma_dist) 8 | S3method(compute_moments,normal_gamma_dist) 9 | S3method(expected_loss_b,beta_dist) 10 | S3method(expected_loss_b,gamma_dist) 11 | S3method(expected_loss_b,normal_gamma_dist) 12 | S3method(get_data_dists,beta_dist) 13 | S3method(get_data_dists,gamma_dist) 14 | S3method(get_data_dists,normal_gamma_dist) 15 | S3method(simulate_data,bernoulli_dist) 16 | S3method(simulate_data,beta_dist) 17 | S3method(simulate_data,gamma_dist) 18 | S3method(simulate_data,normal_dist) 19 | S3method(simulate_data,normal_gamma_dist) 20 | S3method(simulate_data,poisson_dist) 21 | S3method(update_prior,beta_dist) 22 | S3method(update_prior,gamma_dist) 23 | S3method(update_prior,normal_gamma_dist) 24 | export(approx_solver) 25 | export(b_gt_a) 26 | export(bernoulli_dist) 27 | export(beta_cdf) 28 | export(beta_dist) 29 | export(calc_beta_dist) 30 | export(calc_gamma_dist) 31 | export(calc_normal_gamma_dist) 32 | export(compute_moments) 33 | export(create_empty_dt) 34 | export(expected_loss_b) 35 | export(gamma_cdf) 36 | export(gamma_dist) 37 | export(get_actual_loss) 38 | export(get_data_dists) 39 | export(get_losses) 40 | export(get_metrics) 41 | export(get_supported_beta) 42 | export(get_supported_gamma) 43 | export(get_supported_normal_gamma) 44 | export(investigate_simulations) 45 | export(normal_dist) 46 | export(normal_gamma_dist) 47 | export(plot_beta) 48 | export(plot_gamma) 49 | export(plot_normal) 50 | export(plot_relative_gain) 51 | export(poisson_dist) 52 | export(sim_effect_size) 53 | export(simulate_ab_test) 54 | export(simulate_data) 55 | export(single_loss_stop) 56 | export(update_prior) 57 | export(update_priors) 58 | export(validate_dt) 59 | export(validate_inputs) 60 | importFrom(data.table,as.data.table) 61 | importFrom(data.table,between) 62 | importFrom(data.table,data.table) 63 | importFrom(data.table,rbindlist) 64 | importFrom(data.table,set) 65 | importFrom(data.table,setnames) 66 | importFrom(ggplot2,geom_density) 67 | importFrom(ggplot2,ggplot) 68 | importFrom(ggplot2,ggtitle) 69 | importFrom(ggplot2,scale_color_manual) 70 | importFrom(ggplot2,scale_fill_manual) 71 | importFrom(ggplot2,theme) 72 | importFrom(ggplot2,xlab) 73 | importFrom(ggplot2,ylab) 74 | importFrom(gridExtra,arrangeGrob) 75 | importFrom(gridExtra,grid.arrange) 76 | importFrom(parallel,clusterApplyLB) 77 | importFrom(parallel,clusterExport) 78 | importFrom(parallel,makeCluster) 79 | importFrom(parallel,stopCluster) 80 | importFrom(purrr,map) 81 | importFrom(purrr,map_dbl) 82 | importFrom(stats,pbeta) 83 | importFrom(stats,pgamma) 84 | importFrom(stats,pnorm) 85 | importFrom(stats,quantile) 86 | importFrom(stats,rbeta) 87 | importFrom(stats,rbinom) 88 | importFrom(stats,rgamma) 89 | importFrom(stats,rnorm) 90 | importFrom(stats,rpois) 91 | importFrom(stats,uniroot) 92 | -------------------------------------------------------------------------------- /R/compute_moments.R: -------------------------------------------------------------------------------- 1 | # these functions compute the mean and standard deviation of distribution objects 2 | 3 | #' @export 4 | compute_moments.beta_dist <- function(dist) { 5 | a <- dist[['alpha']]; b <- dist[['beta']] 6 | return(list(mu = a / (a + b), sigma = sqrt(a * b / (a + b) ^ 2 / (a + b + 1)))) 7 | } 8 | 9 | #' @export 10 | compute_moments.normal_gamma_dist <- function(dist) { 11 | mu <- dist[['mu']]; lambda <- dist[['lambda']] 12 | a <- dist[['alpha']]; b <- dist[['beta']] 13 | return(list(x = list(mu = mu, sigma = sqrt(b / lambda / (a - 1))), tau = list(mu = a / b, sigma = sqrt(a / b ^ 2)))) 14 | } 15 | 16 | #' @export 17 | compute_moments.gamma_dist <- function(dist) { 18 | a <- dist[['alpha']]; b <- dist[['beta']] 19 | return(list(mu = a / b, sigma = sqrt(a / b ^ 2))) 20 | } 21 | 22 | #' @title Simulate Data According to Some Distribution 23 | #' @name simulate_data 24 | #' @description Simulate a vector of data from a given distribution object. 25 | #' @export 26 | #' @param dist An object of class \code{'beta_dist'}, \code{'normal_gamma_dist'}, 27 | #' \code{'gamma_dist'} that specifies the parameters of some distribution 28 | #' @return A list 29 | compute_moments <- function(dist) { 30 | UseMethod('compute_moments') 31 | } -------------------------------------------------------------------------------- /R/construct_priors.R: -------------------------------------------------------------------------------- 1 | #' @title Calculate Parameters For Beta Distribution 2 | #' @name calc_beta_dist 3 | #' @description Calculate the parameters for a beta distribution parameterized 4 | #' by the expected value and standard deviation. 5 | #' @export 6 | #' @param mu The expected value of x (see details) 7 | #' @param sigma The standard deviation of x (see details) 8 | #' @details If x ~ Beta(alpha, beta), then \code{mu} = E[x] and \code{sigma^2} = Var(x) 9 | #' @return An object of class \code{beta_dist} 10 | calc_beta_dist <- function(mu, sigma) { 11 | alpha <- ((1 - mu) * mu ^ 2 - mu * sigma ^ 2) / sigma ^ 2 12 | beta <- alpha / mu - alpha 13 | return(beta_dist('alpha' = alpha, 'beta' = beta)) 14 | } 15 | 16 | #' @title CDF of Parameterized Beta Distribution 17 | #' @name beta_cdf 18 | #' @description Calculate the area to the left or right of \code{bound} for 19 | #' a beta distribution parameterized by the expected value and 20 | #' standard deviation. 21 | #' @export 22 | #' @importFrom stats pbeta 23 | #' @inheritParams calc_beta_dist 24 | #' @param bound The quantile of the distribution of interest 25 | #' @param lower.tail A logical that indicates whether to find the area of the upper or lower tail. 26 | #' Default is \code{TRUE}. 27 | #' @details If x ~ Beta(A, B), the \code{mu} = E[x] and \code{sigma^2} = Var(x) 28 | #' @return The area under the desired tail. 29 | beta_cdf <- function(mu, sigma, bound, lower.tail = TRUE) { 30 | beta_dist <- calc_beta_dist(mu, sigma) 31 | return(stats::pbeta(bound, beta_dist[['alpha']], beta_dist[['beta']], lower.tail = lower.tail)) 32 | } 33 | 34 | #' @title Calculate Parameters For Normal Gamma Distribution 35 | #' @name calc_normal_gamma_dist 36 | #' @description Calculate the parameters for a normal gamma distribution parameterized 37 | #' by the expected value and standard deviation. 38 | #' @export 39 | #' @param mu The expected value of x (see details) 40 | #' @param tau The expected value of T (see details) 41 | #' @param sigma_mu The standard deviation of x (see details) 42 | #' @param sigma_tau The standard deviation of T (see details) 43 | #' @details If (x, T) ~ NormalGamma(mu, lambda, alpha, beta), then \code{mu} = E[x], 44 | #' \code{tau} = E[T], \code{sigma_mu^2} = Var(x), and \code{sigma_tau^2} = Var(T) 45 | #' @return An object of class \code{normal_gamma_dist} 46 | calc_normal_gamma_dist <- function(mu, tau, sigma_mu, sigma_tau) { 47 | beta <- tau / (sigma_tau ^ 2) 48 | alpha <- beta * tau 49 | lambda <- beta / (sigma_mu ^ 2 * (alpha - 1)) 50 | return(normal_gamma_dist('mu' = mu, 'lambda' = lambda, 'alpha' = alpha, 'beta' = beta)) 51 | } 52 | 53 | #' @title Calculate Parameters For Gamma Distribution 54 | #' @name calc_gamma_dist 55 | #' @description This function determines the parameters for a gamma distribution 56 | #' from the desired mean 57 | #' @export 58 | #' @inheritParams calc_beta_dist 59 | #' @details If x ~ Gamma(alpha, beta), then \code{mu} = E[x] and \code{sigma^2} = Var(x) 60 | #' @return A named list 61 | calc_gamma_dist <- function(mu, sigma) { 62 | beta <- mu / (sigma ^ 2) 63 | alpha <- beta * mu 64 | return(gamma_dist('alpha' = alpha, 'beta' = beta)) 65 | } 66 | 67 | #' @title CDF of Parameterized Gamma Distribution 68 | #' @name gamma_cdf 69 | #' @description Calculate the area to the left or right of \code{bound} for 70 | #' a gamma distribution parameterized by the expected value and 71 | #' standard deviation. 72 | #' @export 73 | #' @importFrom stats pgamma 74 | #' @inheritParams calc_beta_dist 75 | #' @inheritParams beta_cdf 76 | #' @details If x ~ Gamma(alpha, beta), then \code{mu} = E[x] and \code{sigma^2} = Var(x) 77 | #' @return The area under the desired tail. 78 | gamma_cdf <- function(mu, sigma, bound, lower.tail = TRUE) { 79 | gamma_dist <- calc_gamma_dist(mu, sigma) 80 | return(stats::pgamma(bound, gamma_dist[['alpha']], gamma_dist[['beta']], lower.tail = lower.tail)) 81 | } 82 | 83 | #' @title find_percentile 84 | 85 | #' @title Approximate Distribution Solver 86 | #' @name approx_solver 87 | #' @description Determine the standard deviation of a distribution that will provide 88 | #' the desired amount of support at a certain bound. 89 | #' @export 90 | #' @inheritParams gamma_cdf 91 | #' @importFrom data.table between 92 | #' @param bound The desired maximum of the distribution 93 | #' @param desired_support The amount of probability mass to be more extreme than \code{bound} 94 | #' @param cdf_fun A CDF 95 | #' @param lower_sigma A guess of the lower bound on the standard deviation 96 | #' @param upper_sigma A guess of the upper bound on the standard deviation 97 | #' @param tolerance Return \code{NULL}, if we cannot find a value that gives support 98 | #' within \code{tolerance} of \code{desired_support} 99 | #' @return The optimal standard deviation 100 | approx_solver <- function(mu 101 | , bound 102 | , desired_support 103 | , cdf_fun 104 | , lower_sigma 105 | , upper_sigma 106 | , lower.tail = TRUE 107 | , tolerance = 0.01 108 | ) { 109 | 110 | # get a reasonable range of lower and upper values 111 | sigma_range <- c(lower_sigma, upper_sigma) 112 | prob_range <- cdf_fun(mu = mu, sigma = sigma_range, bound = bound, lower.tail = lower.tail) 113 | 114 | if (!data.table::between(desired_support, prob_range[1], prob_range[2])) { 115 | # lower and upper do not contain desired support 116 | # return(approx_solver(expected, bound, desired_support, cdf_fun, lower.tail 117 | # , lower_multipler * 0.5, upper_multiplier * 2, tolerance)) 118 | return(NULL) 119 | } 120 | 121 | candidates <- seq(sigma_range[1], sigma_range[2], length.out = 1000) 122 | probs <- cdf_fun(mu = mu, sigma = candidates, bound = bound, lower.tail = lower.tail) 123 | best <- which.min(abs(probs - desired_support)) 124 | if (abs(probs[best] - desired_support) > tolerance) { 125 | return(NULL) 126 | } else { 127 | return(candidates[best]) 128 | } 129 | } 130 | 131 | #' @title Get Beta Distribution With Desired Support 132 | #' @name get_supported_beta 133 | #' @description Determines the parameters for a beta distribution given an expected value 134 | #' and desired support at some bound. 135 | #' @inheritParams calc_beta_dist 136 | #' @inheritParams approx_solver 137 | #' @details If x ~ Beta(alpha, beta), then \code{mu} = E[x] and \code{sigma^2} = Var(x) 138 | #' @export 139 | #' @return An object of class \code{beta_dist} 140 | get_supported_beta <- function(mu, bound, desired_support = 0.05) { 141 | if (bound <= mu) { 142 | return('The maximum possible rate must be greater than expected rate.') 143 | } 144 | 145 | if (!all(data.table::between(c(mu, bound), 0, 1, incbounds = FALSE))) { 146 | return('Expected and Maximum must be between 0 and 1.') 147 | } 148 | 149 | # beta distribution has a maximum standard deviation of mu - mu ^ 2 150 | sigma_diff <- min(sqrt(mu - mu ^ 2), bound - mu) 151 | 152 | sigma <- approx_solver(mu = mu 153 | , bound = bound 154 | , desired_support = desired_support 155 | , cdf_fun = beta_cdf 156 | , lower_sigma = 0.1 * sigma_diff 157 | , upper_sigma = 0.99 * sigma_diff 158 | , lower.tail = FALSE 159 | , tolerance = 0.1) 160 | 161 | if (is.null(sigma)) { 162 | return('Please choose less extreme expected and maximum rates') 163 | } else { 164 | return(calc_beta_dist(mu = mu, sigma = sigma)) 165 | } 166 | } 167 | 168 | #' @title Get Normal Gamma Distribution With Desired Support 169 | #' @name get_supported_normal_gamma 170 | #' @description Determines the parameters for a normal gamma distribution given the expected values 171 | #' of the distribution and bounds on their maximum values 172 | #' @importFrom stats pnorm rgamma rnorm uniroot 173 | #' @export 174 | #' @inheritParams calc_normal_gamma_dist 175 | #' @inheritParams get_supported_beta 176 | #' @param bound_mu An upper bound on the likely value of x (see details) 177 | #' @param sigma The standard deviation of x (see details) 178 | #' @param bound_sigma The upper bound on the likely value of T (see details) 179 | #' @details If (x, T) ~ NormalGamma(mu, lambda, alpha, beta), then \code{mu} = E[x], 180 | #' \code{tau} = E[T], \code{sigma_mu^2} = Var(x), and \code{sigma_tau^2} = Var(T) 181 | #' @return An object of class \code{normal_gamma_dist} 182 | get_supported_normal_gamma <- function(mu, bound_mu, sigma, bound_sigma, desired_support = 0.05) { 183 | if ((bound_mu <= mu) || (bound_sigma <= sigma) || (bound_sigma <= 0)) { 184 | return('The maximum possible average must be greater than the expected average.') 185 | } 186 | 187 | # first, we need to solve for alpha and beta, then we can use that to solve for lambda 188 | tau <- 1 / sigma ^ 2 189 | bound_tau <- 1 / bound_sigma ^ 2 190 | 191 | gamma_dist <- get_supported_gamma(mu = tau 192 | , bound = bound_tau 193 | , desired_support = desired_support 194 | , lower.tail = TRUE) 195 | if (!inherits(gamma_dist, 'gamma_dist')) { 196 | return('Please choose less extreme difference between expected and maximum.') 197 | } 198 | 199 | sigma_tau <- sqrt(gamma_dist[['alpha']] / gamma_dist[['beta']] ^ 2) 200 | 201 | n_samps <- 1e4 202 | sample_tau <- stats::rgamma(n = n_samps, shape = gamma_dist[['alpha']], rate = gamma_dist[['beta']]) 203 | 204 | f <- function(sigma_mu) { 205 | ng_dist <- calc_normal_gamma_dist(mu, tau, sigma_mu, sigma_tau) 206 | sample_mu <- stats::rnorm(n = n_samps, mean = ng_dist[['mu']], sd = sqrt(1 / (ng_dist[['lambda']] * sample_tau))) 207 | return(quantile(sample_mu, 1 - desired_support) - bound_mu) 208 | } 209 | 210 | sigma_mu <- tryCatch(stats::uniroot(f, interval = c(0.2, 1) * (bound_mu - mu))$root 211 | , error = function(e) NULL) 212 | if (is.null(sigma_mu)) { 213 | return('Choose a less extreme difference between the maximum possible average and expected average.') 214 | } 215 | ng_dist <- calc_normal_gamma_dist(mu, tau, sigma_mu, sigma_tau) 216 | 217 | mu_tail <- mean(stats::pnorm(bound_mu, ng_dist[['mu']], sqrt(1 / (ng_dist[['lambda']] * sample_tau)), lower.tail = FALSE)) 218 | if (abs(mu_tail - desired_support) > 0.1) { 219 | return('Choose a less extreme difference between the maximum possible average and expected average.') 220 | } 221 | 222 | return(ng_dist) 223 | } 224 | 225 | #' @title Get parameters of gamma distribution from expected and maximum rate 226 | #' @name get_supported_gamma 227 | #' @description This function determines the parameters for a gamma distribution 228 | #' @inheritParams calc_gamma_dist 229 | #' @inheritParams get_supported_beta 230 | #' @param lower.tail A boolean that indicates whether we want to get the support on the upper or lower tail 231 | #' @export 232 | #' @details If x ~ Gamma(alpha, beta), then \code{mu} = E[x] and \code{sigma^2} = Var(x) 233 | #' @return A \code{gamma_dist} object 234 | get_supported_gamma <- function(mu, bound, desired_support = 0.05, lower.tail = FALSE) { 235 | if (!lower.tail) { 236 | if ((bound <= mu) || (bound <= 0)) { 237 | return('The maximum must be greater than the expected and positive.') 238 | } 239 | sigma_diff <- bound - mu 240 | } else { 241 | sigma_diff <- mu - bound 242 | } 243 | 244 | sigma <- approx_solver(mu = mu 245 | , bound = bound 246 | , desired_support = desired_support 247 | , cdf_fun = gamma_cdf 248 | , lower_sigma = sigma_diff * 0.1 249 | , upper_sigma = sigma_diff * 0.99 250 | , lower.tail = lower.tail 251 | , tolerance = 0.1) 252 | 253 | if (is.null(sigma)) { 254 | return('Please choose less extreme difference between expected and maximum.') 255 | } else { 256 | return(calc_gamma_dist(mu, sigma)) 257 | } 258 | } 259 | 260 | -------------------------------------------------------------------------------- /R/create_dists.R: -------------------------------------------------------------------------------- 1 | #' @title Beta Distribution 2 | #' @name beta_dist 3 | #' @description Create a beta distribution object, used as the prior for metrics that 4 | #' have a bernoulli distribution. 5 | #' @export 6 | #' @param alpha The non-negative alpha parameter of the beta distribution 7 | #' @param beta The non-negative beta parameter of the beta distribution 8 | #' @return An object of class \code{'beta'}. 9 | beta_dist <- function(alpha, beta) { 10 | if (!is.numeric(c(alpha, beta))) { 11 | stop('alpha and beta must be numeric') 12 | } 13 | if (alpha < 0 || beta < 0) { 14 | stop('alpha and beta must be non-negative') 15 | } 16 | x <- list('alpha' = alpha, 'beta' = beta) 17 | class(x) <- 'beta_dist' 18 | return(x) 19 | } 20 | 21 | #' @title Normal-Gamma Distribution 22 | #' @name normal_gamma_dist 23 | #' @description Create a normal_gamma distribution object, used as the prior for 24 | #' metrics that have a normal distribution. 25 | #' @export 26 | #' @param mu The mean of the normal-gamma distribution 27 | #' @param lambda The non-negative lambda parameter of the normal-gamma distribution 28 | #' @param alpha The non-negative alpha parameter of the normal-gamma distribution 29 | #' @param beta The non-negative beta parameter of the normal-gamma distribution 30 | #' @return A normal_unknown dist object 31 | normal_gamma_dist <- function(mu, lambda, alpha, beta) { 32 | if (!is.numeric(c(mu, lambda, alpha, beta))) { 33 | stop('mu, lambda, alpha, and beta must be numeric') 34 | } 35 | if (lambda < 0 || alpha < 0 || beta < 0) { 36 | stop('lambda, alpha, and beta must be non-negative') 37 | } 38 | x <- list('mu' = mu, 'lambda' = lambda, 'alpha' = alpha, 'beta' = beta) 39 | class(x) <- 'normal_gamma_dist' 40 | return(x) 41 | } 42 | 43 | #' @title Gamma Distribution 44 | #' @name gamma_dist 45 | #' @description Create a gamma distribution object, used as the prior for 46 | #' metrics that have a poisson distribution. 47 | #' @export 48 | #' @param alpha The non-negative alpha (shape) parameter of the gamma distribution 49 | #' @param beta The non-negative beta (rate) parameter of the gamma distribution 50 | #' @return A normal_unknown dist object 51 | gamma_dist <- function(alpha, beta) { 52 | if (!is.numeric(c(alpha, beta))) { 53 | stop('alpha and beta must be numeric') 54 | } 55 | if (alpha < 0 || beta < 0) { 56 | stop('alpha and beta must be non-negative') 57 | } 58 | x <- list('alpha' = alpha, 'beta' = beta) 59 | class(x) <- 'gamma_dist' 60 | return(x) 61 | } 62 | 63 | #' @title Bernoulli Distribution 64 | #' @name bernoulli_dist 65 | #' @description Create a bernoulli distribution object, used as the data generating 66 | #' distribution in tests where the metric is a probability of an 67 | #' event happening. 68 | #' @export 69 | #' @param rate The rate of the bernoulli distribution (between 0 and 1) 70 | #' @return A \code{bernoulli_dist} object 71 | bernoulli_dist <- function(rate) { 72 | if (!is.numeric(rate)) { 73 | stop('rate must be numeric') 74 | } 75 | if (rate < 0 || rate > 1) { 76 | stop('rate must be between 0 and 1') 77 | } 78 | x <- list('rate' = rate) 79 | class(x) <- 'bernoulli_dist' 80 | return(x) 81 | } 82 | 83 | #' @title Normal Distribution 84 | #' @name normal_dist 85 | #' @description Create a normal distribution object, used as the data generating 86 | #' distribution in tests where the metric is a continuous value. 87 | #' @export 88 | #' @param mu The mean of the normal distribution 89 | #' @param sigma The non-negative standard deviation of the normal distribution 90 | #' @return A \code{normal_dist} object 91 | normal_dist <- function(mu, sigma) { 92 | if (!is.numeric(c(mu, sigma))) { 93 | stop('mu and sigma must be numeric') 94 | } 95 | if (sigma < 0) { 96 | stop('sigma must be non-negative') 97 | } 98 | x <- list('mu' = mu, 'sigma' = sigma) 99 | class(x) <- 'normal_dist' 100 | return(x) 101 | } 102 | 103 | #' @title Poisson Distribution 104 | #' @name poisson_dist 105 | #' @description Create a poisson distribution object, used as the data generating 106 | #' distribution in tests where the metric is a count of an event happening. 107 | #' @export 108 | #' @param rate The non-negative mean of the poisson distribution 109 | #' @return A \code{poisson_dist} object 110 | poisson_dist <- function(rate) { 111 | if (!is.numeric(rate)) { 112 | stop('rate must be numeric') 113 | } 114 | if (rate < 0) { 115 | stop('rate must be non-negative') 116 | } 117 | x <- list('rate' = rate) 118 | class(x) <- 'poisson_dist' 119 | return(x) 120 | } 121 | -------------------------------------------------------------------------------- /R/get_data_dists.R: -------------------------------------------------------------------------------- 1 | #' @export 2 | #' @importFrom stats rbeta 3 | get_data_dists.beta_dist <- function(dist, n, num_variants, ...) { 4 | rates <- stats::rbeta(n = n * num_variants, dist[['alpha']], dist[['beta']]) 5 | return(purrr::map2(rates[1:n], rates[(n + 1):length(rates)] 6 | , function(x, y) list(a = bernoulli_dist(rate = x) 7 | , b = bernoulli_dist(rate = y)))) 8 | } 9 | 10 | #' @export 11 | #' @importFrom stats rgamma rnorm 12 | get_data_dists.normal_gamma_dist <- function(dist, n, num_variants, ...) { 13 | tau <- stats::rgamma(n = n * num_variants, shape = dist[['alpha']], rate = dist[['beta']]) 14 | mu <- stats::rnorm(n = n * num_variants, mean = dist[['mu']], sd = sqrt(1 / (dist[['lambda']] * tau))) 15 | joint <- purrr::map2(mu, tau, c) 16 | return(purrr::map2(joint[1:n], joint[(n + 1):length(joint)] 17 | , function(x, y) list(a = normal_dist(mu = x[1], sigma = sqrt(1 / x[2])) 18 | , b = normal_dist(mu = y[1], sigma = sqrt(1 / y[2]))))) 19 | } 20 | 21 | #' @export 22 | #' @importFrom stats rgamma 23 | get_data_dists.gamma_dist <- function(dist, n, num_variants, ...) { 24 | rates <- stats::rgamma(n = n * num_variants, dist[['alpha']], dist[['beta']]) 25 | return(purrr::map2(rates[1:n], rates[(n + 1):length(rates)] 26 | , function(x, y) list(a = poisson_dist(rate = x) 27 | , b = poisson_dist(rate = y)))) 28 | } 29 | 30 | #' @title Sample Data Distributions 31 | #' @name get_data_dists 32 | #' @description Given a prior distribution, sample many data generating distributions 33 | #' @param dist A prior distribution 34 | #' @param n The number of distributions to sample 35 | #' @param num_variants The number of variants to use 36 | #' @param ... Arguments to be used by other methods 37 | #' @export 38 | #' @return A list of data generating distributions for multiple variants 39 | get_data_dists <- function(dist, n, num_variants, ...) { 40 | UseMethod('get_data_dists') 41 | } 42 | -------------------------------------------------------------------------------- /R/helpers.R: -------------------------------------------------------------------------------- 1 | #' @title Bayesian A/B Arguments 2 | #' @name ab_arguments 3 | #' @description Provide documentation for arguments commonly used in this package 4 | #' @param data_dists A named list of distribution objects. This list specifies the 5 | #' distributions that are used to generate data in simulations. 6 | #' Currently, the list must only have elements named \code{'a'} 7 | #' and \code{'b'}. See Details for more information. 8 | #' @param priors A named list of distribution objects. This list specifies the 9 | #' distributions that are used as priors when estimating some parameter 10 | #' from the data generating distribution. Currently, the list must only 11 | #' have elements named \code{'a'} and \code{'b'}. See Details for more information. 12 | #' @param loss_threshold A positive number that identifies a bound for the expected 13 | #' loss for each variant. Once the expected loss is below this 14 | #' bound, the experiment is concluded. 15 | #' @param obs_per_round A positive number that represents how many observations, across both variants, 16 | #' are generated before we update the prior distributions and 17 | #' evaluate the expected loss. This number must be divisible by 18 | #' the number of variants used. Default is \code{1000}. 19 | #' @param max_rounds A positive integer that specifies the maximum number of 20 | #' times that we will evaluate the expected loss on both experiments. 21 | #' Default is \code{100}. 22 | #' @param sim_batch_size A positive integer that specifies how much data is simulated 23 | #' when evaluating the expected loss for variants that do 24 | #' not have an analytic solution (i.e. normal data). 25 | NULL 26 | 27 | 28 | #' @title Create an Empty Data Table 29 | #' @name create_empty_dt 30 | #' @description Create an empty data table with a specified number of rows and columns. 31 | #' You can also choose the value that fills the data table. 32 | #' @export 33 | #' @importFrom data.table as.data.table setnames 34 | #' @param num_rows A positive integer that specifies the number of rows for the data table 35 | #' @param column_names A character vector that specifies the number and names of the columns. 36 | #' @param fill_value Default is -1. Specifies what value should fill the data table 37 | #' @return A data.table 38 | create_empty_dt <- function(num_rows, column_names, fill_value = -1) { 39 | dt <- data.table::as.data.table(matrix(fill_value, nrow = num_rows, ncol = length(column_names))) 40 | data.table::setnames(dt, column_names) 41 | return(dt) 42 | } 43 | -------------------------------------------------------------------------------- /R/metrics.R: -------------------------------------------------------------------------------- 1 | use_exact_beta <- function(dist_a, dist_b, cutoff = 1000) { 2 | is_small_beta <- function(dist, cutoff) { 3 | return(dist[['alpha']] + dist[['beta']] < cutoff) 4 | } 5 | return(is_small_beta(dist_a, cutoff) || is_small_beta(dist_b, cutoff)) 6 | } 7 | 8 | use_exact_gamma <- function(dist_a, dist_b, cutoff = 1000) { 9 | is_small_gamma <- function(dist, cutoff) { 10 | return(dist[['alpha']] < cutoff) 11 | } 12 | return(is_small_gamma(dist_a, cutoff) || is_small_gamma(dist_b, cutoff)) 13 | } 14 | 15 | #' @export 16 | #' @importFrom stats pnorm 17 | b_gt_a.beta_dist <- function(dist_a, dist_b, theta_a, theta_b, exact = NULL, ...) { 18 | alpha_a <- dist_a[['alpha']] 19 | beta_a <- dist_a[['beta']] 20 | alpha_b <- dist_b[['alpha']] 21 | beta_b <- dist_b[['beta']] 22 | 23 | if (is.null(exact)) { 24 | exact <- use_exact_beta(dist_a, dist_b, ...) 25 | } 26 | 27 | if (exact) { 28 | iter <- seq(0, round(alpha_b) - 1) 29 | alpha_a <- round(alpha_a) 30 | w <- lbeta(alpha_a + iter, beta_a + beta_b) 31 | x <- log(beta_b + iter) 32 | y <- lbeta(1 + iter, beta_b) 33 | z <- lbeta(alpha_a, beta_a) 34 | return(sum(exp(w - x - y - z))) 35 | } else { 36 | mu_a <- alpha_a / (alpha_a + beta_a) 37 | mu_b <- alpha_b / (alpha_b + beta_b) 38 | var_a <- alpha_a * beta_a / ((alpha_a + beta_a) ^ 2 * (alpha_a + beta_a + 1)) 39 | var_b <- alpha_b * beta_b / ((alpha_b + beta_b) ^ 2 * (alpha_b + beta_b + 1)) 40 | return(stats::pnorm(0, mu_b - mu_a, sqrt(var_a + var_b), lower.tail = FALSE)) 41 | } 42 | } 43 | 44 | #' @export 45 | #' @importFrom stats pnorm 46 | b_gt_a.normal_gamma_dist <- function(dist_a, dist_b, theta_a, theta_b, exact = NULL, ...) { 47 | return(mean(theta_b > theta_a)) 48 | } 49 | 50 | #' @export 51 | #' @importFrom stats pnorm 52 | b_gt_a.gamma_dist <- function(dist_a, dist_b, theta_a, theta_b, exact = NULL, ...) { 53 | alpha_a <- dist_a[['alpha']] 54 | beta_a <- dist_a[['beta']] 55 | alpha_b <- dist_b[['alpha']] 56 | beta_b <- dist_b[['beta']] 57 | 58 | if (is.null(exact)) { 59 | exact <- use_exact_gamma(dist_a, dist_b) 60 | } 61 | 62 | if (exact) { 63 | iter <- seq(0, round(alpha_b) - 1) 64 | alpha_a <- round(alpha_a) 65 | x1 <- iter * log(beta_b) 66 | x2 <- alpha_a * log(beta_a) 67 | x3 <- log(iter + alpha_a) 68 | x4 <- lbeta(iter + 1, alpha_a) 69 | x5 <- (iter + alpha_a) * log(beta_a + beta_b) 70 | return(sum(exp(x1 + x2 - (x3 + x4 + x5)))) 71 | } else { 72 | mu_a <- alpha_a / beta_a 73 | mu_b <- alpha_b / beta_b 74 | var_a <- alpha_a / beta_a ^ 2 75 | var_b <- alpha_b / beta_b ^ 2 76 | return(stats::pnorm(0, mu_b - mu_a, sqrt(var_a + var_b), lower.tail = FALSE)) 77 | } 78 | } 79 | 80 | #' @title Probability Variant B is Greater Than Variant A 81 | #' @name b_gt_a 82 | #' @description Given two distributions, find the probability that the expected 83 | #' value of variant B is greater than the expected value of variant A 84 | #' @param dist_a Some distribution object (see examples) 85 | #' @param dist_b Some distribution object (see examples) 86 | #' @param theta_a A vector of simulated values from \code{dist_a} 87 | #' @param theta_b A vector of simulated values from \code{dist_b} 88 | #' @param exact A boolean that indicates whether the calculation should be approximated 89 | #' using the normal distribution. Default is \code{NULL}, which means 90 | #' that it will use the normal distribution if there is sufficient data. 91 | #' @param ... Arguments to be passed onto other methods 92 | #' @export 93 | #' @return A numeric value 94 | b_gt_a <- function(dist_a, dist_b, theta_a, theta_b, exact = NULL, ...) { 95 | UseMethod('b_gt_a') 96 | } 97 | 98 | #' @export 99 | expected_loss_b.beta_dist <- function(dist_a 100 | , dist_b 101 | , theta_a 102 | , theta_b 103 | , method = c('absolute', 'percent') 104 | , ... 105 | ) { 106 | method <- match.arg(method) 107 | 108 | alpha_a <- dist_a[['alpha']] 109 | beta_a <- dist_a[['beta']] 110 | alpha_b <- dist_b[['alpha']] 111 | beta_b <- dist_b[['beta']] 112 | 113 | if (method == 'absolute') { 114 | x1 <- lbeta(alpha_a + 1, beta_a) 115 | y1 <- log(1 - b_gt_a(beta_dist(alpha_a + 1, beta_a), dist_b, ...)) 116 | z1 <- lbeta(alpha_a, beta_a) 117 | 118 | x2 <- lbeta(alpha_b + 1, beta_b) 119 | y2 <- log(1 - b_gt_a(dist_a, beta_dist(alpha_b + 1, beta_b), ...)) 120 | z2 <- lbeta(alpha_b, beta_b) 121 | 122 | return(exp(x1 + y1 - z1) - exp(x2 + y2 - z2)) 123 | } else { 124 | prob_1 <- 1 - b_gt_a(dist_a, dist_b, ...) 125 | 126 | x <- lbeta(alpha_a - 1, beta_a) 127 | y <- lbeta(alpha_a, beta_a) 128 | z <- lbeta(alpha_b + 1, beta_b) 129 | w <- lbeta(alpha_b, beta_b) 130 | prob_2 <- log(1 - b_gt_a(beta_dist(alpha_a - 1, beta_a) 131 | , beta_dist(alpha_b + 1, beta_b), ...)) 132 | return(prob_1 - exp(x - y + z - w + prob_2)) 133 | } 134 | } 135 | 136 | #' @export 137 | expected_loss_b.normal_gamma_dist <- function(dist_a 138 | , dist_b 139 | , theta_a 140 | , theta_b 141 | , method = c('absolute', 'percent') 142 | , ... 143 | ) { 144 | return(mean(pmax(theta_a - theta_b, 0))) 145 | } 146 | 147 | #' @export 148 | expected_loss_b.gamma_dist <- function(dist_a 149 | , dist_b 150 | , theta_a 151 | , theta_b 152 | , method = c('absolute', 'percent') 153 | , ... 154 | ) { 155 | alpha_a <- dist_a[['alpha']] 156 | beta_a <- dist_a[['beta']] 157 | alpha_b <- dist_b[['alpha']] 158 | beta_b <- dist_b[['beta']] 159 | 160 | x1 <- lgamma(alpha_a + 1) 161 | y1 <- log(1 - b_gt_a(gamma_dist(alpha_a + 1, beta_a), dist_b)) 162 | z1 <- log(beta_a) 163 | w1 <- lgamma(alpha_a) 164 | 165 | x2 <- lgamma(alpha_b + 1) 166 | y2 <- log(1 - b_gt_a(dist_a, gamma_dist(alpha_b + 1, beta_b))) 167 | z2 <- log(beta_b) 168 | w2 <- lgamma(alpha_b) 169 | return(exp(x1 + y1 - z1 - w1) - exp(x2 + y2 - z2 - w2)) 170 | } 171 | 172 | #' @title Expected Loss of Choosing Variant B 173 | #' @name expected_loss_b 174 | #' @description This function calculates the expected loss of choosing variant B 175 | #' @export 176 | #' @param dist_a The distribution object for variant A 177 | #' @param dist_b The distribution object for variant B 178 | #' @param theta_a A vector of simulated values from \code{dist_a} 179 | #' @param theta_b A vector of simulated values from \code{dist_b} 180 | #' @param method One of \code{'absolute'} or \code{'percent'} that indicates 181 | #' whether the loss function takes the absolute difference 182 | #' or the percent difference between \code{theta_a} and \code{theta_b} 183 | #' @param ... Arguments to be passed onto other methods 184 | #' @return A numeric 185 | expected_loss_b <- function(dist_a 186 | , dist_b 187 | , theta_a 188 | , theta_b 189 | , method = c('absolute', 'percent') 190 | , ... 191 | ) { 192 | UseMethod('expected_loss_b') 193 | } 194 | 195 | #' @title Expected Losses For Variants 196 | #' @name get_losses 197 | #' @description Evaluate the expected loss for each variant 198 | #' @export 199 | #' @param posteriors A list of distribution objects that identify the posterior distributions 200 | #' for each variant 201 | #' @inheritParams ab_arguments 202 | #' @inheritParams expected_loss_b 203 | #' @importFrom purrr map 204 | #' @inheritParams ab_arguments 205 | #' @return A list of expected losses for each variant 206 | get_losses <- function(posteriors, sim_batch_size, method = c('absolute', 'percent')) { 207 | dist_name <- class(posteriors[['a']]) 208 | dist_a <- posteriors[['a']] 209 | dist_b <- posteriors[['b']] 210 | if (dist_name %in% c('normal_gamma_dist')) { 211 | thetas <- purrr::map(posteriors, function(dist) simulate_data(dist = dist, n = sim_batch_size)) 212 | } else { 213 | thetas <- NULL 214 | } 215 | 216 | loss_a <- expected_loss_b(dist_a = dist_b, dist_b = dist_a 217 | , theta_a = thetas[['b']], theta_b = thetas[['a']] 218 | , method = method) 219 | loss_b <- expected_loss_b(dist_a = dist_a, dist_b = dist_b 220 | , theta_a = thetas[['a']], theta_b = thetas[['b']] 221 | , method = method) 222 | 223 | return(list(a = loss_a, b = loss_b)) 224 | } 225 | 226 | #' @title Get Bayesian A/B Testing Metrics 227 | #' @name get_metrics 228 | #' @description Calculate various metrics 229 | #' @export 230 | #' @importFrom purrr map 231 | #' @param posteriors A list of distribution objects 232 | #' @param sim_batch_size How many observations of data to simulate 233 | #' @param method What type of loss to calculate? 234 | #' @return A named list with the metrics 235 | get_metrics <- function(posteriors, sim_batch_size, method = c('absolute', 'percent')) { 236 | dist_a <- posteriors[['a']] 237 | dist_b <- posteriors[['b']] 238 | 239 | # simulate data for effect size credible interval and normal gamma 240 | thetas <- purrr::map(posteriors, function(dist) simulate_data(dist = dist, n = sim_batch_size)) 241 | 242 | # probability that one metric is greater than another 243 | prob_b_gt_a <- b_gt_a(dist_a = dist_a 244 | , dist_b = dist_b 245 | , theta_a = thetas[['a']] 246 | , theta_b = thetas[['b']]) 247 | 248 | # risk for each variant 249 | loss_a <- expected_loss_b(dist_a = dist_b 250 | , dist_b = dist_a 251 | , theta_a = thetas[['b']] 252 | , theta_b = thetas[['a']] 253 | , method = method) 254 | loss_b <- expected_loss_b(dist_a = dist_a 255 | , dist_b = dist_b 256 | , theta_a = thetas[['a']] 257 | , theta_b = thetas[['b']] 258 | , method = method) 259 | 260 | # effect size for variant b 261 | effect <- sim_effect_size(theta_a = thetas[['a']], theta_b = thetas[['b']]) 262 | 263 | return(list('loss_a' = loss_a, 'loss_b' = loss_b, 'prob_b_gt_a' = prob_b_gt_a 264 | , 'effect_lower' = effect[['lower']], 'effect_expected' = effect[['avg']] 265 | , 'effect_upper' = effect[['upper']])) 266 | } 267 | 268 | #' @title Actual Loss of an Experiment 269 | #' @name get_actual_loss 270 | #' @description Once the experiment has concluded, measure the loss of the decision 271 | #' @inheritParams ab_arguments 272 | #' @param selected_variant A string that identifies the winning variant. 273 | #' @importFrom purrr map 274 | #' @export 275 | #' @return Numeric 276 | get_actual_loss <- function(data_dists, selected_variant) { 277 | dist_name <- class(data_dists[['a']]) 278 | metric_mapping <- list(bernoulli_dist = 'rate' 279 | , normal_dist = 'mu' 280 | , poisson_dist = 'rate') 281 | truth <- purrr::map(data_dists, metric_mapping[[dist_name]]) 282 | if (selected_variant == 'a') { 283 | return(pmax(truth[['b']] - truth[['a']], 0)) 284 | } else { 285 | return(pmax(truth[['a']] - truth[['b']], 0)) 286 | } 287 | } 288 | 289 | #' @title Simulate the effect size of variant B - variant A 290 | #' @name sim_effect_size 291 | #' @description This function simulates the effect size (B - A) 292 | #' @export 293 | #' @importFrom stats quantile 294 | #' @param theta_a A vector of draws from the posterior of a 295 | #' @param theta_b A vector of draws from the posterior of b 296 | #' @return A list containing the 2.5%, mean, and 97.5% of the effect size 297 | sim_effect_size <- function(theta_a, theta_b) { 298 | diffs <- theta_b - theta_a 299 | credible_interval <- stats::quantile(diffs, c(0.025, 0.975)) 300 | return(list('lower' = credible_interval['2.5%'] 301 | , 'avg' = mean(diffs) 302 | , 'upper' = credible_interval['97.5%'])) 303 | } 304 | 305 | 306 | 307 | 308 | -------------------------------------------------------------------------------- /R/plotting.R: -------------------------------------------------------------------------------- 1 | #' @title Plot Beta Distributions 2 | #' @name plot_beta 3 | #' @description This function plots the densities of multiple beta distributions 4 | #' @export 5 | #' @importFrom data.table between data.table rbindlist 6 | #' @importFrom stats rbeta 7 | #' @importFrom ggplot2 ggplot geom_density ggtitle xlab ylab scale_color_manual scale_fill_manual theme 8 | #' @param betas A list of lists of beta distributions 9 | #' @param title The title of the plot 10 | #' @param xlab The title of the x axis 11 | #' @param ylab The title of the y axis 12 | #' @param color The color for the plot 13 | #' @param support_level The desired amount of area between the lower and upper bounds. Default is \code{0.99}. 14 | #' 15 | #' @return NULL. A plot is generated 16 | plot_beta <- function(betas 17 | , title = 'Beta Distribution' 18 | , xlab = 'Rate that the Event Occurs' 19 | , ylab = 'Density of that Rate' 20 | , color = '#f65335' 21 | , support_level = 0.99 22 | ) { 23 | 24 | n_samp <- 1e5 25 | beta_dt <- NULL 26 | 27 | for (i in seq_along(betas)) { 28 | x <- names(betas)[i] 29 | var_name <- paste('variant', x) 30 | 31 | # generate some data 32 | beta_vec <- stats::rbeta(n_samp, betas[[x]][['alpha']], betas[[x]][['beta']]) 33 | 34 | # remove the lower and upper extremes of the data 35 | beta_vec <- beta_vec[data.table::between(beta_vec 36 | , quantile(beta_vec, (1 - support_level) / 2) 37 | , quantile(beta_vec, support_level + (1 - support_level) / 2))] 38 | 39 | beta_dt <- data.table::rbindlist(list(beta_dt, data.table::data.table('variant' = rep(var_name, length(beta_vec)) 40 | , 'betas' = beta_vec))) 41 | } 42 | 43 | if (length(betas) > 1) { 44 | col_vals <- c('darkred', 'darkblue') 45 | } else { 46 | col_vals <- color 47 | } 48 | 49 | rate_plot <- ggplot(beta_dt, aes(x = betas, colour = variant, fill =variant)) + 50 | geom_density(size = 1, alpha = 0.1) + 51 | ggtitle(title) + xlab(xlab) + ylab(ylab) + 52 | scale_color_manual(values = col_vals) + 53 | scale_fill_manual(values = col_vals) + 54 | theme(plot.title = element_text(hjust = 0.5, size = 22) 55 | , axis.title = element_text(size = 18) 56 | , axis.text = element_text(size = 14) 57 | , legend.title = element_blank() 58 | , legend.text = element_text(size = 16) 59 | , legend.position = if(length(betas) == 2) 'bottom' else 'none') 60 | return(rate_plot) 61 | } 62 | 63 | #' @title Plot Normal Distributions 64 | #' @name plot_normal 65 | #' @description This function allows you to visualize the densities of a normal distribution 66 | #' @export 67 | #' @importFrom stats rgamma rnorm 68 | #' @importFrom data.table between data.table rbindlist 69 | #' @importFrom gridExtra grid.arrange arrangeGrob 70 | #' @param normals A list of lists of normal distributions 71 | #' @inheritParams plot_beta 72 | #' @return NULL. A plot is generated 73 | plot_normal <- function(normals 74 | , title = 'Normal Distribution' 75 | , color = '#f65335' 76 | , support_level = 0.99 77 | ) { 78 | n_samp <- 1e5 79 | 80 | sd_dt <- NULL 81 | mu_dt <- NULL 82 | 83 | # sample some data 84 | for (i in seq_along(normals)) { 85 | x <- names(normals)[i] 86 | var_name <- paste('variant', x) 87 | 88 | # generate some data 89 | sd_vec <- sqrt(1 / stats::rgamma(n_samp 90 | , normals[[x]][['alpha']] 91 | , normals[[x]][['beta']])) 92 | mu_vec <- stats::rnorm(n_samp 93 | , normals[[x]][['mu']] 94 | , sqrt(1 / normals[[x]][['lambda']]) * sd_vec) 95 | 96 | # remove the lower and upper extremes of the data 97 | sd_vec <- sd_vec[data.table::between(sd_vec 98 | , quantile(sd_vec, (1 - support_level) / 2) 99 | , quantile(sd_vec, support_level + (1 - support_level) / 2))] 100 | n_sd <- length(sd_vec) 101 | mu_vec <- mu_vec[data.table::between(mu_vec 102 | , quantile(mu_vec, (1 - support_level) / 2) 103 | , quantile(mu_vec, support_level + (1 - support_level) / 2))] 104 | n_mu <- length(mu_vec) 105 | 106 | sd_dt <- data.table::rbindlist(list(sd_dt, data.table::data.table('variant' = rep(var_name, n_sd) 107 | , 'sds' = sd_vec))) 108 | mu_dt <- data.table::rbindlist(list(mu_dt, data.table::data.table('variant' = rep(var_name, n_mu) 109 | , 'mus' = mu_vec))) 110 | } 111 | 112 | if (length(normals) > 1) { 113 | col_vals <- c('darkred', 'darkblue') 114 | } else { 115 | col_vals <- color 116 | } 117 | 118 | mu_plot <- ggplot(mu_dt, aes(x = mus, colour = variant, fill = variant)) + 119 | geom_density(size = 1, alpha = 0.1) + 120 | ggtitle("What We Believe About The Mean") + 121 | xlab('Mean') + 122 | ylab('Probability') + 123 | scale_color_manual(values = col_vals) + 124 | scale_fill_manual(values = col_vals) + 125 | theme(plot.title = element_text(hjust = 0.5, size = 22) 126 | , axis.title = element_text(size = 18) 127 | , axis.text = element_text(size = 14) 128 | , legend.title = element_blank() 129 | , legend.text = element_text(size = 16) 130 | , legend.position = 'none') 131 | 132 | sd_plot <- ggplot(sd_dt, aes(x = sds, colour = variant, fill = variant)) + 133 | geom_density(size = 1, alpha = 0.1) + 134 | ggtitle("What We Believe About The Standard Deviation") + 135 | xlab('Standard Deviation') + 136 | ylab('Probability') + 137 | scale_color_manual(values = col_vals) + 138 | scale_fill_manual(values = col_vals) + 139 | theme(plot.title = element_text(hjust = 0.5, size = 22) 140 | , axis.title = element_text(size = 18) 141 | , axis.text = element_text(size = 14) 142 | , legend.title = element_blank() 143 | , legend.text = element_text(size = 16) 144 | , legend.position = if(length(normals) == 2) 'bottom' else 'none') 145 | 146 | if (length(normals) == 2) { 147 | g_legend <- function(a.gplot) { 148 | tmp <- ggplot_gtable(ggplot_build(a.gplot)) 149 | leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box") 150 | legend <- tmp$grobs[[leg]] 151 | return(legend) 152 | } 153 | 154 | my_legend <- g_legend(sd_plot) 155 | 156 | results_plot <- gridExtra::grid.arrange(gridExtra::arrangeGrob(mu_plot + theme(legend.position="none"), 157 | sd_plot + theme(legend.position="none"), 158 | nrow = 2) 159 | , my_legend, nrow = 2, heights=c(10, 1)) 160 | } else { 161 | results_plot <- gridExtra::grid.arrange(mu_plot, sd_plot, ncol = 1, heights=c(1, 1)) 162 | } 163 | 164 | return(NULL) 165 | } 166 | 167 | #' @title Plot Gamma Distributions 168 | #' @name plot_gamma 169 | #' @description This function plots the densities of multiple gamma distributions 170 | #' @export 171 | #' @importFrom purrr map map_dbl 172 | #' @param gammas A list of lists of gamma distributions 173 | #' @inheritParams plot_beta 174 | #' @return NULL. A plot is generated 175 | plot_gamma <- function(gammas 176 | , title = 'Density of Gamma Distribution' 177 | , color = '#f65335' 178 | , support_level = 0.99 179 | ) { 180 | 181 | n_samp <- 1e5 182 | gamma_dt <- NULL 183 | 184 | for (i in seq_along(gammas)) { 185 | x <- names(gammas)[i] 186 | var_name <- paste('variant', x) 187 | 188 | # generate some data 189 | gamma_vec <- rgamma(n_samp, gammas[[x]][['alpha']], gammas[[x]][['beta']]) 190 | 191 | # remove the lower and upper extremes of the data 192 | gamma_vec <- gamma_vec[data.table::between(gamma_vec 193 | , quantile(gamma_vec, (1 - support_level) / 2) 194 | , quantile(gamma_vec, support_level + (1 - support_level) / 2))] 195 | 196 | gamma_dt <- data.table::rbindlist(list(gamma_dt, data.table::data.table('variant' = rep(var_name, length(gamma_vec)) 197 | , 'gammas' = gamma_vec))) 198 | } 199 | 200 | if (length(gammas) > 1) { 201 | col_vals <- c('darkred', 'darkblue') 202 | } else { 203 | col_vals <- color 204 | } 205 | 206 | lambda_plot <- ggplot(gamma_dt, aes(x = gammas, colour = variant, fill = variant)) + 207 | geom_density(size = 1, alpha = 0.1) + 208 | ggtitle(title) + 209 | xlab('Expected Amount of Times Event Occurrs') + 210 | ylab('Probability of That Rate') + 211 | scale_color_manual(values = col_vals) + 212 | scale_fill_manual(values = col_vals) + 213 | theme(plot.title = element_text(hjust = 0.5, size = 22) 214 | , axis.title = element_text(size = 18) 215 | , axis.text = element_text(size = 14) 216 | , legend.title = element_blank() 217 | , legend.text = element_text(size = 16) 218 | , legend.position = if(length(gammas) == 2) 'bottom' else 'none') 219 | return(lambda_plot) 220 | } 221 | 222 | #' @title Plot Relative Gain 223 | #' @name plot_relative_gain 224 | #' @description Plot the cumulative density of the ratio of the metric under variant B to the metric under variant A 225 | #' @importFrom purrr map 226 | #' @export 227 | #' @param dists A list of distribution objects, with elements named \code{'a'} and \code{'b'} 228 | #' @param sim_batch_size The number of objects to simulate 229 | #' @return A plot 230 | plot_relative_gain <- function(dists, sim_batch_size = 1e5, title = 'Cumulative Density of B / A') { 231 | thetas <- purrr::map(dists, function(dist) simulate_data(dist = dist, n = sim_batch_size)) 232 | ratios <- thetas[['b']] / thetas[['a']] 233 | df <- data.frame(x = ratios) 234 | ecdf_plot <- ggplot(df, aes(x, colour = '#f65335')) + stat_ecdf(size = 1.5) + 235 | ggtitle(title) + xlab('Relative Gain (B / A)') + ylab('Cumulative Density') + 236 | theme(plot.title = element_text(hjust = 0.5, size = 22) 237 | , axis.title = element_text(size = 18) 238 | , axis.text = element_text(size = 14) 239 | , legend.position = 'none') 240 | return(ecdf_plot) 241 | } 242 | -------------------------------------------------------------------------------- /R/run_simulation.R: -------------------------------------------------------------------------------- 1 | #' @title Simulate a Bayesian A/B Test 2 | #' @name simulate_ab_test 3 | #' @description Given true data generating distributions and prior distributions for 4 | #' variants A and B, simulate the data, calculate the necessary statistics 5 | #' and declare one of the tests a winner. 6 | #' @inheritParams ab_arguments 7 | #' @export 8 | #' @importFrom purrr map 9 | #' @importFrom data.table set 10 | #' @details In order to create \code{data_dists} and \code{priors}, you need to use the following 11 | #' for data generating distributions: \code{\link{bernoulli_dist}}, 12 | #' \code{\link{normal_dist}}, \code{\link{poisson_dist}} and the following for prior 13 | #' distributions \code{\link{beta_dist}}, \code{\link{normal_gamma_dist}}, 14 | #' \code{\link{beta_dist}}, 15 | #' @seealso \code{\link{bernoulli_dist}} \code{\link{normal_dist}} \code{\link{poisson_dist}} 16 | #' \code{\link{beta_dist}} \code{\link{normal_gamma_dist}} \code{\link{beta_dist}} 17 | #' @return A list that contains the name of the winning variant, the number of observations used, 18 | #' the loss of the decision, whether the test finished, the metrics from each round, and 19 | #' the raw data. 20 | simulate_ab_test <- function(data_dists 21 | , priors 22 | , loss_threshold 23 | , obs_per_round = 1000 24 | , max_rounds = 100 25 | , sim_batch_size = 1e5 26 | ) { 27 | if (any(c(loss_threshold, obs_per_round, max_rounds, sim_batch_size) <= 0)) { 28 | stop('loss_threshold, obs_per_round, max_rounds, and sim_batch_size must be positive') 29 | } 30 | validate_inputs(data_dists, priors, obs_per_round) 31 | variants <- names(data_dists) 32 | num_variants <- length(data_dists) 33 | obs_per_variant <- obs_per_round / num_variants 34 | 35 | evidence_dt <- create_empty_dt(num_rows = max_rounds * obs_per_variant, column_names = variants) 36 | loss_dt <- create_empty_dt(num_rows = max_rounds, column_names = variants) 37 | best_variant <- NULL 38 | 39 | for (round_num in 1:max_rounds) { 40 | 41 | # simulate new set of data 42 | which_rows <- (1 + (round_num - 1) * obs_per_variant):(round_num * obs_per_variant) 43 | new_evidence <- purrr::map(data_dists, function(dist) simulate_data(dist = dist, n = obs_per_variant)) 44 | data.table::set(evidence_dt, i = which_rows, j = names(evidence_dt), value = new_evidence) 45 | 46 | # update priors with new evidence 47 | posteriors <- update_priors(priors = priors 48 | , evidence_dt = evidence_dt[1:max(which_rows)]) 49 | 50 | # evaluate metrics 51 | losses <- get_losses(posteriors = posteriors 52 | , sim_batch_size = sim_batch_size) 53 | 54 | data.table::set(loss_dt, i = round_num, j = names(loss_dt), value = losses) 55 | 56 | # determine if we can stop the experiment 57 | decision <- single_loss_stop(losses = unlist(losses) 58 | , loss_threshold = loss_threshold) 59 | if (decision[['stop']]) { 60 | best_variant <- decision[['variant']] 61 | break 62 | } 63 | } 64 | test_finished <- as.numeric(!is.null(best_variant)) 65 | if (test_finished) { 66 | actual_loss <- get_actual_loss(data_dists = data_dists 67 | , selected_variant = best_variant) 68 | } else { 69 | actual_loss <- NA_real_ 70 | } 71 | 72 | 73 | out <- list(best_variant = best_variant 74 | , num_obs = obs_per_round * round_num 75 | , actual_loss = actual_loss 76 | , test_finished = test_finished 77 | , loss_dt = loss_dt[1:round_num] 78 | , evidence_dt = evidence_dt[1:(round_num * obs_per_variant)]) 79 | return(out) 80 | } 81 | 82 | #' @title Validate Inputs to Bayesian A/B Simulation 83 | #' @name validate_inputs 84 | #' @description Check the validity of the parameters being used in the simulation. 85 | #' @export 86 | #' @inheritParams ab_arguments 87 | #' @return NULL if all of the tests pass. Else, it will fail loudly. 88 | validate_inputs <- function(data_dists 89 | , priors 90 | , obs_per_round 91 | ) { 92 | # check the number and names of the variants 93 | num_variants <- length(data_dists) 94 | if (num_variants != 2) { 95 | stop('Currently, only A/B tests with two variants are supported') 96 | } 97 | true_names <- names(data_dists) 98 | prior_names <- names(priors) 99 | if (is.null(true_names) || !identical(true_names, prior_names) || !identical(true_names, letters[1:num_variants])) { 100 | stop('data_dists and priors must be named lists. And the elements must be named "a" and "b"') 101 | } 102 | 103 | supported_data_dists <- c('bernoulli_dist', 'normal_dist', 'poisson_dist') 104 | supported_priors <- c('beta_dist', 'normal_gamma_dist', 'gamma_dist') 105 | data_a <- class(data_dists[['a']]) 106 | data_b <- class(data_dists[['b']]) 107 | prior_a <- class(priors[['a']]) 108 | prior_b <- class(priors[['b']]) 109 | 110 | if (!identical(data_a, data_b) | !(data_a %in% supported_data_dists)) { 111 | stop(paste('All data distributions need to be identical. Supported:' 112 | , paste(supported_data_dists, collapse = ', '))) 113 | } 114 | 115 | if (!identical(prior_a, prior_b) | !(prior_a %in% supported_priors)) { 116 | stop(paste('All prior distributions need to be identical. Supported:' 117 | , paste(supported_priors, collapse = ', '))) 118 | } 119 | 120 | # check that the number of obs to sample each round is divisible by the number of variants 121 | if (obs_per_round %% num_variants != 0) { 122 | stop('Number of obs must be divisible by number of variants') 123 | } 124 | 125 | return(NULL) 126 | } 127 | 128 | -------------------------------------------------------------------------------- /R/simulate_data.R: -------------------------------------------------------------------------------- 1 | #' @export 2 | #' @importFrom stats rbeta 3 | simulate_data.beta_dist <- function(dist, n, ...) { 4 | return(stats::rbeta(n = n, dist[['alpha']], dist[['beta']])) 5 | } 6 | 7 | #' @export 8 | #' @importFrom stats rgamma rnorm 9 | simulate_data.normal_gamma_dist <- function(dist, n, ...) { 10 | tau <- stats::rgamma(n = n, shape = dist[['alpha']], rate = dist[['beta']]) 11 | return(stats::rnorm(n = n, mean = dist[['mu']], sd = sqrt(1 / (dist[['lambda']] * tau)))) 12 | } 13 | 14 | #' @export 15 | #' @importFrom stats rgamma 16 | simulate_data.gamma_dist <- function(dist, n, ...) { 17 | return(stats::rgamma(n = n, shape = dist[['alpha']], rate = dist[['beta']])) 18 | } 19 | 20 | #' @export 21 | #' @importFrom stats rbinom 22 | simulate_data.bernoulli_dist <- function(dist, n, ...) { 23 | return(as.numeric(stats::rbinom(n, size = 1, prob = dist[['rate']]))) 24 | } 25 | 26 | #' @export 27 | #' @importFrom stats rnorm 28 | simulate_data.normal_dist <- function(dist, n, ...) { 29 | return(stats::rnorm(n, mean = dist[['mu']], sd = dist[['sigma']])) 30 | } 31 | 32 | #' @export 33 | #' @importFrom stats rpois 34 | simulate_data.poisson_dist <- function(dist, n, ...) { 35 | return(as.numeric(stats::rpois(n, lambda = dist[['rate']]))) 36 | } 37 | 38 | #' @title Simulate Data According to Some Distribution 39 | #' @name simulate_data 40 | #' @description Simulate a vector of data from a given distribution object. 41 | #' @export 42 | #' @param dist An object of class \code{'beta_dist'}, \code{'normal_gamma_dist'}, 43 | #' \code{'gamma_dist'}, \code{'bernouilli_dist'}, \code{'normal_dist'}, 44 | #' or \code{'poisson_dist'} that specifies the parameters of some distribution 45 | #' @param n How many data points to simulate 46 | #' @param ... Arguments to be passed onto other methods 47 | #' @return A vector of simulated data. 48 | simulate_data <- function(dist, n, ...) { 49 | UseMethod('simulate_data') 50 | } 51 | 52 | -------------------------------------------------------------------------------- /R/simulation_study.R: -------------------------------------------------------------------------------- 1 | #' @title Investigate Properties of Bayesian A/B Testing 2 | #' @name investigate_simulations 3 | #' @description Run multiple simulations of an A/B test in order to evaluate various properties 4 | #' @export 5 | #' @inheritParams ab_arguments 6 | #' @param num_sims A positive integer that specifies how many simulations to perform. 7 | #' @param sampling_distribution An list of distribution objects that specifies how the data generating 8 | #' distributions should be created. 9 | #' @param num_cores How many cores to use in the parallelization of tests. Default is \code{NULL}, which 10 | #' means no parallelization. 11 | #' @importFrom data.table setnames set rbindlist 12 | #' @importFrom purrr map_dbl 13 | #' @importFrom parallel makeCluster clusterExport clusterApplyLB stopCluster 14 | #' @return A list containing two data.tables: one with summary statistics for each simulation 15 | #' and one with the averages over all of the simulations. 16 | investigate_simulations <- function(num_sims 17 | , priors 18 | , loss_threshold 19 | , data_dists = NULL 20 | , sampling_distribution = NULL 21 | , obs_per_round = 1000 22 | , max_rounds = 100 23 | , sim_batch_size = 1e5 24 | , num_cores = NULL 25 | ) { 26 | 27 | if (!is.null(sampling_distribution)) { 28 | distributions <- get_data_dists(dist = sampling_distribution 29 | , n = num_sims 30 | , num_variants = 2) 31 | } else { 32 | distributions <- rep(list(data_dists), num_sims) 33 | } 34 | 35 | if (!is.null(num_cores)) { 36 | cluster <- parallel::makeCluster(num_cores) 37 | parallel::clusterExport(cluster 38 | , varlist = c('simulate_ab_test', 'priors' 39 | , 'loss_threshold', 'obs_per_round' 40 | , 'max_rounds', 'sim_batch_size') 41 | , envir = environment()) 42 | result_list <- parallel::clusterApplyLB( 43 | cluster 44 | , x = distributions 45 | , fun = function(dist) { 46 | result <- simulate_ab_test(data_dists = dist 47 | , priors = priors 48 | , loss_threshold = loss_threshold 49 | , obs_per_round = obs_per_round 50 | , max_rounds = max_rounds 51 | , sim_batch_size = sim_batch_size) 52 | return(list(num_obs_seen = result[['num_obs']] 53 | , test_finished = result[['test_finished']] 54 | , bad_decision = as.numeric(result[['actual_loss']] > 0) 55 | , loss = result[['actual_loss']])) 56 | }) 57 | parallel::stopCluster(cluster) 58 | sim_dt <- data.table::rbindlist(result_list) 59 | } else { 60 | sim_columns <- c('num_obs_seen', 'test_finished', 'bad_decision', 'loss') 61 | sim_dt <- create_empty_dt(num_rows = num_sims, column_names = sim_columns) 62 | for (i in 1:num_sims) { 63 | result <- simulate_ab_test(data_dists = distributions[[i]] 64 | , priors = priors 65 | , loss_threshold = loss_threshold 66 | , obs_per_round = obs_per_round 67 | , max_rounds = max_rounds 68 | , sim_batch_size = sim_batch_size) 69 | result_list <- list(result[['num_obs']] 70 | , result[['test_finished']] 71 | , as.numeric(result[['actual_loss']] > 0) 72 | , result[['actual_loss']]) 73 | 74 | data.table::set(sim_dt, i = i, j = names(sim_dt), value = result_list) 75 | } 76 | } 77 | summary_dt <- sim_dt[, .(avg_obs_seen = mean(num_obs_seen) 78 | , p80_obs_seen = quantile(num_obs_seen, 0.8) 79 | , max_obs_seen = max(num_obs_seen) 80 | , test_finish_rate = mean(test_finished) 81 | , wrong_decision_rate = mean(bad_decision) 82 | , avg_loss = mean(loss))] 83 | 84 | return(list(sim_dt = sim_dt, summary_dt = summary_dt)) 85 | } 86 | -------------------------------------------------------------------------------- /R/stopping_functions.R: -------------------------------------------------------------------------------- 1 | #' @title Decide whether to stop the experiment using the last loss values 2 | #' @name single_loss_stop 3 | #' @description This function decides whether or not to stop the experiment and 4 | #' declare one of the variants as the winner. 5 | #' @param losses A named vector of the losses for each variant. 6 | #' @param loss_threshold Choose a variant once the loss goes beneath this value. 7 | #' @export 8 | #' @return A list containg two named elements: \code{'stop'} (a boolean) and 9 | #' \code{'winner'}, which is \code{NULL} if \code{'stop'} is \code{FALSE} 10 | #' else it is the name of the winning variant 11 | single_loss_stop <- function(losses, loss_threshold) { 12 | if (any(losses < loss_threshold)) { 13 | best_variant <- names(losses)[which.min(losses)] 14 | return(list(stop = TRUE, variant = best_variant)) 15 | } else { 16 | return(list(stop = FALSE, variant = NULL)) 17 | } 18 | } -------------------------------------------------------------------------------- /R/update_priors.R: -------------------------------------------------------------------------------- 1 | #' @export 2 | update_prior.beta_dist <- function(prior, evidence = NULL, stats = NULL, ...) { 3 | if (!is.null(evidence)) { 4 | if (!is.numeric(evidence) || any(!(evidence %in% c(0, 1)))) { 5 | stop('data passed to update_prior for beta distribution must be a binary vector') 6 | } 7 | num_success <- sum(evidence == 1) 8 | num_failure <- sum(evidence == 0) 9 | } else { 10 | if (is.null(stats)) { 11 | stop('If evidence is NULL, stats cannot be NULL') 12 | } 13 | num_success <- stats[['num_obs']] * stats[['observed_rate']] 14 | num_failure <- stats[['num_obs']] - num_success 15 | } 16 | return(beta_dist(alpha = prior[['alpha']] + num_success 17 | , beta = prior[['beta']] + num_failure)) 18 | } 19 | 20 | #' @export 21 | update_prior.normal_gamma_dist <- function(prior, evidence = NULL, stats = NULL, ...) { 22 | if (!is.null(evidence)) { 23 | if (!is.numeric(evidence)) { 24 | stop('evidence must be a numeric vector') 25 | } 26 | num_obs <- length(evidence) 27 | avg <- mean(evidence) 28 | sse <- sum((evidence - avg) ^ 2) 29 | } else { 30 | if (is.null(stats)) { 31 | stop('If evidence is NULL, stats cannot be NULL') 32 | } 33 | num_obs <- stats[['num_obs']] 34 | avg <- stats[['avg']] 35 | std_dev <- stats[['std_dev']] 36 | sse <- (std_dev ^ 2) * (num_obs - 1) 37 | } 38 | 39 | beta_term <- (prior[['lambda']] * num_obs * (avg - prior[['mu']]) ^ 2) / (prior[['lambda']] + num_obs) 40 | return(normal_gamma_dist(mu = (prior[['lambda']] * prior[['mu']] + num_obs * avg) / (prior[['lambda']] + num_obs) 41 | , lambda = prior[['lambda']] + num_obs 42 | , alpha = prior[['alpha']] + num_obs / 2 43 | , beta = prior[['beta']] + (sse + beta_term) / 2)) 44 | } 45 | 46 | #' @export 47 | update_prior.gamma_dist <- function(prior, evidence = NULL, stats = NULL, ...) { 48 | if (!is.null(evidence)) { 49 | if (any(evidence < 0)) { 50 | stop('evidence must be a positive integer vector') 51 | } 52 | num_sessions <- length(evidence) 53 | observed_count <- sum(evidence) 54 | } else { 55 | if (is.null(stats)) { 56 | stop('If evidence is NULL, stats cannot be NULL') 57 | } 58 | num_sessions <- stats[['num_sessions']] 59 | observed_count <- stats[['observed_count']] 60 | } 61 | return(gamma_dist(alpha = prior[['alpha']] + observed_count 62 | , beta = prior[['beta']] + num_sessions)) 63 | } 64 | 65 | #' @title Update Prior Parameters 66 | #' @name update_prior 67 | #' @description Use observed data to update parameters of prior distribution 68 | #' @export 69 | #' @importFrom purrr map 70 | #' @param prior An object of class \code{'beta_dist'}, \code{'normal_gamma_dist'} 71 | #' , or \code{'gamma_dist'} that specifies the parameters of some distribution 72 | #' @param evidence A numeric vector that contains observed data. Default is \code{NULL}, 73 | #' will override \code{stats} if specified. 74 | #' @param stats An object of class \code{'beta_stats'}, \code{'normal_gamma_stats'}, or 75 | #' \code{'gamma_stats'} that contains sufficient statistics for the 76 | #' update. Default is \code{NULL}, will be ignored if \code{evidence} 77 | #' is specified. 78 | #' @param ... Arguments to be passed onto other methods 79 | update_prior <- function(prior, evidence = NULL, stats = NULL, ...) { 80 | UseMethod('update_prior') 81 | } 82 | 83 | #' @title Update the hyper-parameters of prior distributions 84 | #' @name update_priors 85 | #' @description This function updates the hyper-parameters of our prior distributions 86 | #' @inheritParams ab_arguments 87 | #' @param evidence_dt A data.table containing the raw data generated by each variant. 88 | #' Column names must be identical to variant names in \code{priors} 89 | #' @param stats_dt A data.table containing statistics about the raw data generated by each 90 | #' variant. 91 | #' @export 92 | #' @importFrom purrr map 93 | #' @return A list of distribution objects that represented the updated prior distributions 94 | update_priors <- function(priors, evidence_dt = NULL, stats_dt = NULL) { 95 | variants <- names(priors) 96 | 97 | if (!is.null(evidence_dt)) { 98 | validate_dt(evidence_dt, expected_cols = variants) 99 | posteriors <- purrr::map(variants, function(x) { 100 | update_prior(prior = priors[[x]] 101 | , evidence = evidence_dt[[x]]) 102 | }) 103 | } else { 104 | validate_stats_dt(stats_dt, class(priors[['a']])) 105 | posteriors <- purrr::map(variants, function(x) { 106 | update_prior(prior = priors[[x]] 107 | , stats = as.list(stats_dt[variant == x, !'variant'])) 108 | }) 109 | } 110 | 111 | names(posteriors) <- names(priors) 112 | return(posteriors) 113 | } 114 | 115 | #' @title Check That Data Table Matches Expectations 116 | #' @name validate_dt 117 | #' @description Use this function to assert certain properties of a data.table 118 | #' @export 119 | #' @param dt A data.table 120 | #' @param expected_cols A vector of column names that \code{dt} must have 121 | #' @return \code{NULL} if successful, else a fatal error 122 | validate_dt <- function(dt, expected_cols) { 123 | if (!inherits(dt, 'data.table')) { 124 | stop('dt must be a data.table') 125 | } 126 | 127 | cols <- names(dt) 128 | missing_cols <- setdiff(expected_cols, cols) 129 | if (length(missing_cols) > 0) { 130 | stop(paste('missing columns:', paste(missing_cols, collapse = ', '))) 131 | } 132 | return(NULL) 133 | } 134 | 135 | validate_stats_dt <- function(dt, dist_name) { 136 | if (!inherits(dt, 'data.table')) { 137 | stop('stats_dt must be a data.table') 138 | } 139 | 140 | cols <- names(dt) 141 | if (identical(dist_name, 'beta_dist')) { 142 | expected <- c('variant', 'num_obs', 'observed_rate') 143 | } else if (identical(dist_name, 'normal_gamma_dist')) { 144 | expected <- c('variant', 'num_obs', 'avg', 'std_dev') 145 | } else if (identical(dist_name, 'gamma_dist')) { 146 | expected <- c('variant', 'num_sessions', 'observed_count') 147 | } else { 148 | stop('Unsupported prior distribution type') 149 | } 150 | if (!identical(sort(cols), sort(expected))) { 151 | stop(paste0('bad column names in stats_dt provided to update_priors. expected:' 152 | , paste(expected, collapse = ', '), 'received:' 153 | , paste(cols, collapse = ', '))) 154 | } 155 | } 156 | 157 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # abayes 2 | 3 | ## What is this package? 4 | 5 | abayes is an R package that implements Bayesian methods for A/B testing. You can use this package to perform simulations of A/B tests in order to evaluate the Bayesian methods. You can also use the functions in this package to evaluate the results of a particular A/B test or use [this shiny app](https://convoyds.shinyapps.io/abayes/). 6 | 7 | ### Other Packages on Bayesian A/B Testing 8 | 9 | I wrote most of this code before investigating the existence of open source alternatives. However, it is important to recognize them now. In order to evaluate A/B tests, a popular option is to use the [bayesAB](https://github.com/FrankPortman/bayesAB) package created by Frank Portman. In order to run simulations, David Robinson wrote code to perform simulations for [this blog post](http://varianceexplained.org/r/bayesian-ab-testing/). abayes lies somewhere in between these two packages. bayesAB can handle more complicated experiment settings and performs inferences faster, but is not designed for simulating data to analyze the performance of Bayesian methods. David's code is only designed for evaluating tests of binomial data and makes stronger assumptions about the simulation procedure than this package. 10 | 11 | ## Why did I write this package? 12 | 13 | As a data scientist at Convoy, I investigated the performance of Bayesian A/B testing compared to Convoy's existing experimentation framework. After initial simulations showed that the speed and accuracy of Bayesian methods were promising, I wrote more simulation code and expanded the set of functions to support various experimental settings. As Bayesian A/B testing began to be used more and more throughout Convoy, I put all of the functions into a package and built [a shiny app](https://convoyds.shinyapps.io/abayes/) on top of that package. This allowed my co-workers to use my functions easily. 14 | 15 | ## Why did I open source this package? 16 | 17 | As part of the blog post about Bayesian A/B testing, I included a visualization of the guarantees about controlling the expected loss. In the spirit of full transparency, I wanted to provide the script and the code that produced the visualization. 18 | 19 | ## How to install 20 | 21 | This package is not on CRAN, and there is no plan on submitting it to CRAN. In order to install this package, you can use `devtools::install_github('convoyinc/abayes')` or you can clone the package locally and use `devtools::install_local('abayes')`. 22 | 23 | ## What can this package do? 24 | 25 | For a detailed demonstration of the simulation capabilities, please see [the vignette](https://github.com/convoyinc/abayes/blob/master/vignettes/abayes_vignette.Rmd) included in this repo. There, I provide an example of the simulations that demonstrate the guarantees that A/B testing makes. 26 | 27 | However, this package can also be used to evaluate the results of a single A/B test. Below is some code that demonstrates how to use the package for that purpose. 28 | 29 | ``` r 30 | library(abayes) 31 | library(purrr) 32 | library(data.table) 33 | n <- 1000 34 | prior_dist <- list(a = beta_dist(alpha = 1, beta = 1), b = beta_dist(alpha = 1, beta = 1)) 35 | data_dist <- list(a = bernoulli_dist(rate = 0.1), b = bernoulli_dist(rate = 0.12)) 36 | evidence_dt <- data.table::as.data.table(purrr::map(data_dist, function(x) simulate_data(x, n))) 37 | posterior_dist <- update_priors(prior_dist, evidence_dt) 38 | 39 | b_gt_a(posterior_dist[['a']], posterior_dist[['b']]) 40 | expected_loss_b(posterior_dist[['a']], posterior_dist[['b']]) 41 | plot_beta(posterior_dist) 42 | ``` 43 | 44 | ## We welcome other open source contributions! 45 | 46 | There are many ways that we can make this package better. If you have any ideas, please fork this repo and submit a pull request. Thanks for your interest in Bayesian A/B testing. 47 | -------------------------------------------------------------------------------- /inst/abayes/app.R: -------------------------------------------------------------------------------- 1 | library(shiny) 2 | library(data.table) 3 | library(purrr) 4 | library(ggplot2) 5 | library(gridExtra) 6 | library(markdown) 7 | library(abayes) 8 | 9 | source('server_funs.R') 10 | source('ui_elements.R') 11 | 12 | ui <- navbarPage('Bayesian A/B Testing' 13 | , tabPanel('Basic', 14 | sidebarLayout( 15 | sidebarPanel( 16 | 17 | h3('Choose Your Metric', align = 'center', style="color:#f65335") 18 | , radioButtons(inputId = 'dist_name_basic', label = 'Which option describes your metric?' 19 | , choices = dist_name_choices) 20 | 21 | , h3('Enter What You Believe Before The Experiment', align = 'center', style="color:#f65335") 22 | 23 | , conditionalPanel('input.dist_name_basic == "binomial"' 24 | , choose_binomial_rate(variant = NULL, id_type = 'basic') 25 | , choose_binomial_bound(variant = NULL, id_type = 'basic') 26 | , br() 27 | , h3('Enter The Results of Your Experiment', align = 'center', style="color:#f65335") 28 | , h4('Variant A', align = 'center') 29 | , set_binomial_obs(variant = 'a', id_type = 'basic') 30 | , set_binomial_rate(variant = 'a', id_type = 'basic') 31 | , h4('Variant B', align = 'center') 32 | , set_binomial_obs(variant = 'b', id_type = 'basic') 33 | , set_binomial_rate(variant = 'b', id_type = 'basic') 34 | , br() 35 | , h3('Choose Your Loss Function', align = 'center', style="color:#f65335") 36 | , radioButtons(inputId = 'loss_type_basic', label = 'Choose between the absolute loss or the percent loss' 37 | , choices = list('absolute' = 'absolute', 'percent' = 'percent')) 38 | ) 39 | 40 | , conditionalPanel('input.dist_name_basic == "normal"' 41 | , choose_normal_mean(variant = NULL, id_type = 'basic') 42 | , choose_normal_mean_bound(variant = NULL, id_type = 'basic') 43 | , choose_normal_sd(variant = NULL, id_type = 'basic') 44 | , choose_normal_sd_bound(variant = NULL, id_type = 'basic') 45 | , br() 46 | , h3('Enter The Results of Your Experiment', align = 'center', style="color:#f65335") 47 | , h4('Variant A', align = 'center') 48 | , set_normal_count(variant = 'a', id_type = 'basic') 49 | , set_normal_mean(variant = 'a', id_type = 'basic') 50 | , set_normal_sd(variant = 'a', id_type = 'basic') 51 | , h4('Variant B', align = 'center') 52 | , set_normal_count(variant = 'b', id_type = 'basic') 53 | , set_normal_mean(variant = 'b', id_type = 'basic') 54 | , set_normal_sd(variant = 'b', id_type = 'basic') 55 | ) 56 | 57 | , conditionalPanel('input.dist_name_basic == "poisson"' 58 | , choose_poisson_lambda(variant = NULL, id_type = 'basic') 59 | , choose_poisson_max(variant = NULL, id_type = 'basic') 60 | , br() 61 | , h3('Enter The Results of Your Experiment', align = 'center', style="color:#f65335") 62 | , h4('Variant A', align = 'center') 63 | , set_poisson_num(variant = 'a', id_type = 'basic') 64 | , set_poisson_total(variant = 'a', id_type = 'basic') 65 | , h4('Variant B', align = 'center') 66 | , set_poisson_num(variant = 'b', id_type = 'basic') 67 | , set_poisson_total(variant = 'b', id_type = 'basic') 68 | ) 69 | ), 70 | 71 | # Main panel for displaying outputs ---- 72 | mainPanel( 73 | 74 | h1('Before The Experiment', align = 'center', style="color:#f65335") 75 | , plotOutput(outputId = "plot_prior_basic") 76 | , textOutput(outputId = 'prior_description_basic') 77 | , h1('After Observing Some Data', align = 'center', style="color:#f65335") 78 | , conditionalPanel('(input.dist_name_basic == "binomial" && (input.obs_a_basic != 0 || input.obs_b_basic != 0)) 79 | || (input.dist_name_basic == "normal" && (input.count_a_basic != 0 || input.count_b_basic != 0)) 80 | || (input.dist_name_basic == "poisson" && (input.num_a_basic != 0 || input.num_b_basic != 0))' 81 | , tableOutput(outputId = 'metrics_basic') 82 | , plotOutput(outputId = "plot_posterior_basic") 83 | ) 84 | , conditionalPanel('(input.dist_name_basic == "binomial" && input.obs_a_basic == 0 && input.obs_b_basic == 0) 85 | || (input.dist_name_basic == "normal" && input.count_a_basic == 0 && input.count_b_basic == 0) 86 | || (input.dist_name_basic == "poisson" && input.num_a_basic == 0 && input.num_b_basic == 0)' 87 | , br() 88 | , br() 89 | , h4(paste('In order to determine which variant is better, you need to enter' 90 | , 'the results of your experiment. Use the panel on the left.') 91 | , style="color:black") 92 | ) 93 | ) 94 | ) 95 | ) 96 | , tabPanel('Advanced', 97 | sidebarLayout( 98 | sidebarPanel( 99 | 100 | h3('Choose Your Metric', align = 'center', style="color:#f65335") 101 | , radioButtons(inputId = 'dist_name_adv', label = 'Which option describes your metric?' 102 | , choices = dist_name_choices) 103 | 104 | , h3('Enter What You Believe Before The Experiment', align = 'center', style="color:#f65335") 105 | 106 | , conditionalPanel('input.dist_name_adv == "binomial"' 107 | , h4('Variant A', align = 'center') 108 | , choose_binomial_rate(variant = 'a', id_type = 'adv') 109 | , choose_binomial_bound(variant = 'a', id_type = 'adv') 110 | , h4('Variant B', align = 'center') 111 | , choose_binomial_rate(variant = 'b', id_type = 'adv') 112 | , choose_binomial_bound(variant = 'b', id_type = 'adv') 113 | , br() 114 | , h3('Enter The Results of Your Experiment', align = 'center', style="color:#f65335") 115 | , h4('Variant A', align = 'center') 116 | , set_binomial_obs(variant = 'a', id_type = 'adv') 117 | , set_binomial_rate(variant = 'a', id_type = 'adv') 118 | , h4('Variant B', align = 'center') 119 | , set_binomial_obs(variant = 'b', id_type = 'adv') 120 | , set_binomial_rate(variant = 'b', id_type = 'adv') 121 | , br() 122 | , h3('Choose Your Loss Function', align = 'center', style="color:#f65335") 123 | , radioButtons(inputId = 'loss_type_adv', label = 'Choose between the absolute loss or the percent loss' 124 | , choices = list('absolute' = 'absolute', 'percent' = 'percent')) 125 | ) 126 | 127 | , conditionalPanel('input.dist_name_adv == "normal"' 128 | , h4('Variant A', align = 'center') 129 | , choose_normal_mean(variant = 'a', id_type = 'adv') 130 | , choose_normal_mean_bound(variant = 'a', id_type = 'adv') 131 | , choose_normal_sd(variant = 'a', id_type = 'adv') 132 | , choose_normal_sd_bound(variant = 'a', id_type = 'adv') 133 | , h4('Variant B', align = 'center') 134 | , choose_normal_mean(variant = 'b', id_type = 'adv') 135 | , choose_normal_mean_bound(variant = 'b', id_type = 'adv') 136 | , choose_normal_sd(variant = 'b', id_type = 'adv') 137 | , choose_normal_sd_bound(variant = 'b', id_type = 'adv') 138 | , br() 139 | , h3('Enter The Results of Your Experiment', align = 'center', style="color:#f65335") 140 | , h4('Variant A', align = 'center') 141 | , set_normal_count(variant = 'a', id_type = 'adv') 142 | , set_normal_mean(variant = 'a', id_type = 'adv') 143 | , set_normal_sd(variant = 'a', id_type = 'adv') 144 | , h4('Variant B', align = 'center') 145 | , set_normal_count(variant = 'b', id_type = 'adv') 146 | , set_normal_mean(variant = 'b', id_type = 'adv') 147 | , set_normal_sd(variant = 'b', id_type = 'adv') 148 | ) 149 | 150 | , conditionalPanel('input.dist_name_adv == "poisson"' 151 | , h4('Variant A', align = 'center') 152 | , choose_poisson_lambda(variant = 'a', id_type = 'adv') 153 | , choose_poisson_max(variant = 'a', id_type = 'adv') 154 | , h4('Variant B', align = 'center') 155 | , choose_poisson_lambda(variant = 'b', id_type = 'adv') 156 | , choose_poisson_max(variant = 'b', id_type = 'adv') 157 | , br() 158 | , h3('Enter The Results of Your Experiment', align = 'center', style="color:#f65335") 159 | , h4('Variant A', align = 'center') 160 | , set_poisson_num(variant = 'a', id_type = 'adv') 161 | , set_poisson_total(variant = 'a', id_type = 'adv') 162 | , h4('Variant B', align = 'center') 163 | , set_poisson_num(variant = 'b', id_type = 'adv') 164 | , set_poisson_total(variant = 'b', id_type = 'adv') 165 | ) 166 | ), 167 | 168 | # Main panel for displaying outputs ---- 169 | mainPanel( 170 | 171 | h1('Before The Experiment', align = 'center', style="color:#f65335") 172 | , plotOutput(outputId = "plot_prior_adv") 173 | , textOutput(outputId = 'prior_description_adv') 174 | , h1('After Observing Some Data', align = 'center', style="color:#f65335") 175 | , conditionalPanel('(input.dist_name_adv == "binomial" && (input.obs_a_adv != 0 || input.obs_b_adv != 0)) 176 | || (input.dist_name_adv == "normal" && (input.count_a_adv != 0 || input.count_b_adv != 0)) 177 | || (input.dist_name_adv == "poisson" && (input.num_a_adv != 0 || input.num_b_adv != 0))' 178 | , tableOutput(outputId = 'metrics_adv') 179 | , plotOutput(outputId = "plot_posterior_adv") 180 | , textOutput(outputId = 'posterior_description_adv') 181 | # , plotOutput(outputId = 'relative_gain_adv') 182 | ) 183 | , conditionalPanel('(input.dist_name_adv == "binomial" && input.obs_a_adv == 0 && input.obs_b_adv == 0) 184 | || (input.dist_name_adv == "normal" && input.count_a_adv == 0 && input.count_b_adv == 0) 185 | || (input.dist_name_adv == "poisson" && input.num_a_adv == 0 && input.num_b_adv == 0)' 186 | , br() 187 | , br() 188 | , h4(paste('In order to determine which variant is better, you need to enter' 189 | , 'the results of your experiment. Use the panel on the left.') 190 | , style="color:black") 191 | ) 192 | ) 193 | ) 194 | ) 195 | , tabPanel('How To Use', 196 | fluidRow(column(12, includeMarkdown('how_to.md'))) 197 | ) 198 | ) 199 | 200 | # generate the outputs 201 | server <- function(input, output) { 202 | 203 | prior_dist_basic <- reactive({ 204 | prior_dist <- create_prior_dist(dist_name = input$dist_name_basic 205 | , expected_rate = input$expected_rate_basic 206 | , upper_bound_rate = input$upper_bound_basic 207 | , expected_mean = input$expected_mean_basic 208 | , upper_bound_mean = input$upper_bound_mean_basic 209 | , expected_sd = input$expected_sd_basic 210 | , upper_bound_sd = input$upper_bound_sd_basic 211 | , expected_lambda = input$expected_lambda_basic 212 | , upper_bound_lambda = input$max_lambda_basic 213 | ) 214 | prior_dist 215 | }) 216 | 217 | prior_dist_adv <- reactive({ 218 | prior_dist_a <- create_prior_dist(dist_name = input$dist_name_adv 219 | , expected_rate = input$expected_rate_a_adv 220 | , upper_bound_rate = input$upper_bound_a_adv 221 | , expected_mean = input$expected_mean_a_adv 222 | , upper_bound_mean = input$upper_bound_mean_a_adv 223 | , expected_sd = input$expected_sd_a_adv 224 | , upper_bound_sd = input$upper_bound_sd_a_adv 225 | , expected_lambda = input$expected_lambda_a_adv 226 | , upper_bound_lambda = input$max_lambda_a_adv 227 | ) 228 | prior_dist_b <- create_prior_dist(dist_name = input$dist_name_adv 229 | , expected_rate = input$expected_rate_b_adv 230 | , upper_bound_rate = input$upper_bound_b_adv 231 | , expected_mean = input$expected_mean_b_adv 232 | , upper_bound_mean = input$upper_bound_mean_b_adv 233 | , expected_sd = input$expected_sd_b_adv 234 | , upper_bound_sd = input$upper_bound_sd_b_adv 235 | , expected_lambda = input$expected_lambda_b_adv 236 | , upper_bound_lambda = input$max_lambda_b_adv 237 | ) 238 | if (is.character(prior_dist_a)) { 239 | paste('Variant A:', prior_dist_a) 240 | } else if (is.character(prior_dist_b)) { 241 | paste('Variant B:', prior_dist_b) 242 | } else { 243 | list('a' = prior_dist_a, 'b' = prior_dist_b) 244 | } 245 | }) 246 | 247 | results_dt_basic <- reactive({ 248 | results_dt <- create_results_dt(dist_name = input$dist_name_basic 249 | , obs_a = input$obs_a_basic, obs_b = input$obs_b_basic 250 | , rate_a = input$rate_a_basic, rate_b = input$rate_b_basic 251 | , count_a = input$count_a_basic, count_b = input$count_b_basic 252 | , mean_a = input$mean_a_basic, mean_b = input$mean_b_basic 253 | , sd_a = input$sd_a_basic, sd_b = input$sd_b_basic 254 | , num_a = input$num_a_basic, num_b = input$num_b_basic 255 | , total_a = input$total_a_basic, total_b = input$total_b_basic) 256 | results_dt 257 | }) 258 | 259 | results_dt_adv <- reactive({ 260 | results_dt <- create_results_dt(dist_name = input$dist_name_adv 261 | , obs_a = input$obs_a_adv, obs_b = input$obs_b_adv 262 | , rate_a = input$rate_a_adv, rate_b = input$rate_b_adv 263 | , count_a = input$count_a_adv, count_b = input$count_b_adv 264 | , mean_a = input$mean_a_adv, mean_b = input$mean_b_adv 265 | , sd_a = input$sd_a_adv, sd_b = input$sd_b_adv 266 | , num_a = input$num_a_adv, num_b = input$num_b_adv 267 | , total_a = input$total_a_adv, total_b = input$total_b_adv) 268 | results_dt 269 | }) 270 | 271 | posterior_basic <- reactive({ 272 | prior_dist <- prior_dist_basic() 273 | results_dt <- results_dt_basic() 274 | 275 | if (is.character(prior_dist)) { 276 | return(prior_dist) 277 | } 278 | 279 | if (is.character(results_dt)) { 280 | return(results_dt) 281 | } 282 | priors <- list('a' = prior_dist, 'b' = prior_dist) 283 | update_priors(priors = priors, stats_dt = results_dt) 284 | }) 285 | 286 | posterior_adv <- reactive({ 287 | prior_dist <- prior_dist_adv() 288 | results_dt <- results_dt_adv() 289 | 290 | if (is.character(prior_dist)) { 291 | return(prior_dist) 292 | } 293 | 294 | if (is.character(results_dt)) { 295 | return(results_dt) 296 | } 297 | update_priors(priors = prior_dist, stats_dt = results_dt) 298 | }) 299 | 300 | metrics_dt_basic <- reactive({ 301 | posteriors <- posterior_basic() 302 | if (is.character(posteriors)) { 303 | return(data.table::data.table()) 304 | } 305 | 306 | create_metric_dt(posteriors = posteriors, method = input$loss_type_basic) 307 | }) 308 | 309 | metrics_dt_adv <- reactive({ 310 | posteriors <- posterior_adv() 311 | if (is.character(posteriors)) { 312 | return(data.table::data.table()) 313 | } 314 | create_metric_dt(posteriors = posteriors, method = input$loss_type_adv) 315 | }) 316 | 317 | # prior plot 318 | output$plot_prior_basic <- renderPlot({ 319 | prior_dist <- prior_dist_basic() 320 | if (!is.character(prior_dist)) { 321 | prior_dist <- list('a' = prior_dist) 322 | } 323 | plot_dists(dists = prior_dist, input$dist_name_basic) 324 | }) 325 | 326 | output$plot_prior_adv <- renderPlot({ 327 | plot_dists(dists = prior_dist_adv(), input$dist_name_adv) 328 | }) 329 | 330 | # prior description 331 | output$prior_description_basic <- renderText({ 332 | create_description(dist_name = input$dist_name_basic, dist = prior_dist_basic()) 333 | }) 334 | 335 | output$prior_description_adv <- renderText({ 336 | prior_dist <- prior_dist_adv() 337 | if (is.character(prior_dist)) { 338 | return('') 339 | } 340 | 341 | a <- create_description(dist_name = input$dist_name_adv, dist = prior_dist[['a']]) 342 | b <- create_description(dist_name = input$dist_name_adv, dist = prior_dist[['b']]) 343 | return(paste0('Variant A: ', a, '. Variant B: ', b)) 344 | }) 345 | 346 | # posterior plot 347 | output$plot_posterior_basic <- renderPlot({ 348 | plot_dists(dists = posterior_basic(), input$dist_name_basic) 349 | }) 350 | 351 | output$plot_posterior_adv <- renderPlot({ 352 | plot_dists(dists = posterior_adv(), input$dist_name_adv) 353 | }) 354 | 355 | output$metrics_basic <- renderTable({ 356 | metrics_dt_basic() 357 | }, digits = 7, striped = TRUE, hover = FALSE, bordered = FALSE 358 | , spacing = 'm', width = '90%', size = '40', align = 'c') 359 | 360 | output$metrics_adv <- renderTable({ 361 | metrics_dt_adv() 362 | }, digits = 7, striped = TRUE, hover = FALSE, bordered = FALSE 363 | , spacing = 'm', width = '90%', size = '40', align = 'c') 364 | 365 | output$posterior_description_adv <- renderText({ 366 | posterior <- posterior_adv() 367 | if (is.character(posterior)) { 368 | return('') 369 | } 370 | 371 | a <- create_description(dist_name = input$dist_name_adv, dist = posterior[['a']]) 372 | b <- create_description(dist_name = input$dist_name_adv, dist = posterior[['b']]) 373 | return(paste0('Variant A: ', a, '. Variant B: ', b)) 374 | }) 375 | 376 | # output$relative_gain_adv <- renderPlot({ 377 | # dists <- posterior_adv() 378 | # plot_relative_gain(dists = dists) 379 | # }) 380 | 381 | } 382 | 383 | shinyApp(ui = ui, server = server) 384 | 385 | 386 | 387 | 388 | -------------------------------------------------------------------------------- /inst/abayes/dependencies.R: -------------------------------------------------------------------------------- 1 | require(abayes) 2 | require(data.table) 3 | require(ggplot2) 4 | require(gridExtra) 5 | require(markdown) 6 | require(purrr) 7 | require(stats) -------------------------------------------------------------------------------- /inst/abayes/how_to.md: -------------------------------------------------------------------------------- 1 | # Bayesian A/B Testing 2 | 3 | A brief explanation of Bayesian A/B testing and how to use this shiny app. 4 | -------------------------------------------------------------------------------- /inst/abayes/server_funs.R: -------------------------------------------------------------------------------- 1 | library(data.table) 2 | 3 | MAX_VAL <- 1e6 4 | MAX_OBS <- 1e8 5 | 6 | create_prior_dist <- function(dist_name 7 | , expected_rate 8 | , upper_bound_rate 9 | , expected_mean 10 | , upper_bound_mean 11 | , expected_sd 12 | , upper_bound_sd 13 | , expected_lambda 14 | , upper_bound_lambda 15 | ) { 16 | if (dist_name == 'binomial') { 17 | in_vals <- c(expected_rate, upper_bound_rate) 18 | if (any(is.na(in_vals))) { 19 | return('Please enter data') 20 | } 21 | if (!all(data.table::between(in_vals, 0, 1))) { 22 | return('Rates must be between 0 and 1') 23 | } 24 | return(get_supported_beta(mu = expected_rate, bound = upper_bound_rate, desired_support = 0.05)) 25 | } else if (dist_name == 'normal') { 26 | means <- c(expected_mean, upper_bound_mean) 27 | sds <- c(expected_sd, upper_bound_sd) 28 | if (any(is.na(c(means, sds)))) { 29 | return('Please enter data') 30 | } 31 | if (!all(data.table::between(means, -MAX_VAL, MAX_VAL))) { 32 | return('Average must be less extreme') 33 | } 34 | if (!all(data.table::between(sds, 0, MAX_VAL, incbounds = FALSE))) { 35 | return('Standard deviations must be less extreme') 36 | } 37 | return(get_supported_normal_gamma(mu = expected_mean, bound_mu = upper_bound_mean 38 | , sigma = expected_sd, bound_sigma = upper_bound_sd 39 | , desired_support = 0.05)) 40 | } else { 41 | in_vals <- c(expected_lambda, upper_bound_lambda) 42 | if (any(is.na(in_vals))) { 43 | return('Please enter data') 44 | } 45 | if (!all(data.table::between(in_vals, 0, MAX_VAL))) { 46 | return('Values must be less extreme') 47 | } 48 | return(get_supported_gamma(mu = expected_lambda, bound = upper_bound_lambda, desired_support = 0.05)) 49 | } 50 | } 51 | 52 | create_results_dt <- function(dist_name 53 | , obs_a, obs_b, rate_a, rate_b 54 | , count_a, count_b, mean_a, mean_b, sd_a, sd_b 55 | , num_a, num_b, total_a, total_b 56 | ) { 57 | if (dist_name == 'binomial') { 58 | obs <- c(obs_a, obs_b) 59 | rates <- c(rate_a, rate_b) 60 | if (any(is.na(c(obs, rates)))) { 61 | return('Please enter data') 62 | } 63 | if (!all(data.table::between(obs, 0, MAX_OBS))) { 64 | return('Please use less extreme number of observations') 65 | } 66 | if (!all(data.table::between(rates, 0, 1))) { 67 | return('Rates must be between 0 and 1') 68 | } 69 | return(data.table::data.table('variant' = c('a', 'b') 70 | , 'num_obs' = obs 71 | , 'observed_rate' = rates)) 72 | } else if (dist_name == 'normal') { 73 | counts <- c(count_a, count_b) 74 | means <- c(mean_a, mean_b) 75 | sds <- c(sd_a, sd_b) 76 | if (any(is.na(c(counts, means, sds)))) { 77 | return('Please enter data') 78 | } 79 | if (!all(data.table::between(counts, 0, MAX_OBS))) { 80 | return('Please use less extreme number of observations') 81 | } 82 | if (!all(data.table::between(means, -MAX_VAL, MAX_VAL))) { 83 | return('Average must be less extreme') 84 | } 85 | if (!all(data.table::between(sds, 0, MAX_VAL, incbounds=FALSE))) { 86 | return('Standard deviations must be less extreme') 87 | } 88 | return(data.table::data.table('variant' = c('a', 'b') 89 | , 'num_obs' = counts 90 | , 'avg' = means 91 | , 'std_dev' = sds)) 92 | } else { 93 | nums <- c(num_a, num_b) 94 | totals <- c(total_a, total_b) 95 | if (any(is.na(c(nums, totals)))) { 96 | return('Please enter data') 97 | } 98 | if (!all(data.table::between(nums, 0, MAX_VAL))) { 99 | return('Please use less extreme number of sessions') 100 | } 101 | if (!all(data.table::between(totals, 0, MAX_VAL))) { 102 | return('Please use less extreme number of observations') 103 | } 104 | return(data.table::data.table('variant' = c('a', 'b') 105 | , 'num_sessions' = c(num_a, num_b) 106 | , 'observed_count' = c(total_a, total_b))) 107 | } 108 | } 109 | 110 | create_description <- function(dist_name, dist) { 111 | if (is.character(dist)) { 112 | return('') 113 | } else { 114 | if (dist_name == 'binomial') { 115 | moments <- abayes::compute_moments(dist) 116 | mu <- moments[['mu']]; sigma <- moments[['sigma']] 117 | return(paste0('The distribution is a beta distribution with parameters: alpha = ' 118 | , round(dist[['alpha']], 2), ', beta = ', round(dist[['beta']], 2) 119 | , '. Mean: ', signif(mu, 3), ' and Standard Deviation: ', signif(sigma, 3))) 120 | } else if (dist_name == 'normal') { 121 | moments <- abayes::compute_moments(dist) 122 | x_mu <- moments[['x']][['mu']]; x_sigma <- moments[['x']][['sigma']] 123 | tau_mu <- moments[['tau']][['mu']]; tau_sigma <- moments[['tau']][['sigma']] 124 | return(paste0('The distribution is a normal gamma distribution with parameters: mu = ' 125 | , round(dist[['mu']], 2), ', lambda = ', round(dist[['lambda']], 2) 126 | , ', alpha = ', round(dist[['alpha']], 2), ', beta = ', round(dist[['beta']], 2) 127 | , '. Mean: ', signif(x_mu, 3), ' and Standard Deviation: ', signif(x_sigma, 3))) 128 | } else { 129 | moments <- abayes::compute_moments(dist) 130 | mu <- moments[['mu']]; sigma <- moments[['sigma']] 131 | return(paste0('The distribution is a gamma distribution with parameters: alpha = ' 132 | , round(dist[['alpha']], 2), ', beta = ', round(dist[['beta']], 2) 133 | , '. Mean: ', signif(mu, 3), ' and Standard Deviation: ', signif(sigma, 3))) 134 | } 135 | } 136 | } 137 | 138 | create_metric_dt <- function(posteriors, method) { 139 | 140 | metrics <- abayes::get_metrics(posteriors = posteriors, sim_batch_size = 1e5, method = method) 141 | dt <- data.table::data.table(x = c('A', 'B') 142 | , y = as.character(signif(c(metrics[['loss_a']], metrics[['loss_b']]), 3)) 143 | , z = as.character(signif(c(1 - metrics[['prob_b_gt_a']], metrics[['prob_b_gt_a']]), 3)) 144 | , w = c('-', as.character(signif(metrics[['effect_lower']], 3))) 145 | , a = c('-', as.character(signif(metrics[['effect_expected']], 3))) 146 | , b = c('-', as.character(signif(metrics[['effect_upper']], 3))) 147 | ) 148 | data.table::setnames(dt, c('Variant', 'Risk Of Choosing Variant', 'Prob Variant is Larger' 149 | , '95% CI Lower Bound', 'Expected Effect Size (B - A)', '95% CI Upper Bound')) 150 | return(dt) 151 | } 152 | 153 | plot_dists <- function(dists, dist_name) { 154 | if (is.character(dists)) { 155 | df <- data.frame() 156 | return(ggplot(df) + geom_point() + xlim(0, 1) + ylim(0, 1) + 157 | annotate('text', x = 0.5, y = 0.5, label = dists, size = 5)) 158 | } else { 159 | if (dist_name == 'binomial') { 160 | return(plot_beta(betas = dists, title = 'What We Believe About the Rate')) 161 | } else if (dist_name == 'normal') { 162 | return(plot_normal(normals = dists)) 163 | } else { 164 | return(plot_gamma(gammas = dists, title = 'What We Believe About the Expected Count')) 165 | } 166 | } 167 | } 168 | 169 | -------------------------------------------------------------------------------- /inst/abayes/ui_elements.R: -------------------------------------------------------------------------------- 1 | dist_name_choices <- list('Rate: Your metric is the frequency that an event happens.' = 'binomial' 2 | , 'Quantity: Your metric is a normally distributed, continuous quantity.' = 'normal' 3 | , 'Count: Your metric is the number of times an event happens.' = 'poisson') 4 | 5 | get_input_id <- function(id_string, variant, id_type) { 6 | if (is.null(variant)) { 7 | return(paste(id_string, id_type, sep = '_')) 8 | } else { 9 | return(paste(id_string, variant, id_type, sep = '_')) 10 | } 11 | } 12 | 13 | choose_binomial_rate <- function(variant, id_type) { 14 | return(numericInput(inputId = get_input_id("expected_rate", variant, id_type) 15 | , label = "Expected Rate of the Event:" 16 | , value = 0.1, min = 0, max = 1, step = 0.01)) 17 | } 18 | 19 | choose_binomial_bound <- function(variant, id_type) { 20 | return(numericInput(inputId = get_input_id("upper_bound", variant, id_type) 21 | , label = "Maximum Possible Rate:" 22 | , value = 0.2, min = 0, max = 1, step = 0.01)) 23 | } 24 | 25 | set_binomial_obs <- function(variant, id_type) { 26 | return(numericInput(inputId = get_input_id("obs", variant, id_type) 27 | , label = "Number of Observations:" 28 | , value = 0, min = 0, max = 1e8, step = 1)) 29 | } 30 | 31 | set_binomial_rate <- function(variant, id_type) { 32 | return(numericInput(inputId = get_input_id("rate", variant, id_type) 33 | , label = "Observed Frequency of the Event:" 34 | , value = 0.1, min = 0, max = 1, step = 0.01)) 35 | } 36 | 37 | choose_normal_mean <- function(variant, id_type) { 38 | return(numericInput(inputId = get_input_id('expected_mean', variant, id_type) 39 | , label = "Expected Average:" 40 | , value = 0, min = -1e6, max = 1e6, step = 1)) 41 | } 42 | 43 | choose_normal_mean_bound <- function(variant, id_type) { 44 | return(numericInput(inputId = get_input_id("upper_bound_mean", variant, id_type) 45 | , label = "Maximum Possible Average:" 46 | , value = 1, min = -1e6, max = 1e6, step = 1)) 47 | } 48 | 49 | choose_normal_sd <- function(variant, id_type) { 50 | return(numericInput(inputId = get_input_id("expected_sd", variant, id_type) 51 | , label = "Expected Standard Deviation:" 52 | , value = 1, min = 0, max = 1e6, step = 1)) 53 | } 54 | 55 | choose_normal_sd_bound <- function(variant, id_type) { 56 | return(numericInput(inputId = get_input_id("upper_bound_sd", variant, id_type) 57 | , label = "Maximum Possible Standard Deviation:" 58 | , value = 2, min = 0, max = 1e6, step = 1)) 59 | } 60 | 61 | set_normal_count <- function(variant, id_type) { 62 | return(numericInput(inputId = get_input_id("count", variant, id_type) 63 | , label = "Number of Observations:" 64 | , value = 0, min = 0, max = 1e8, step = 1)) 65 | } 66 | 67 | set_normal_mean <- function(variant, id_type) { 68 | return(numericInput(inputId = get_input_id("mean", variant, id_type) 69 | , label = "Observed Average:" 70 | , value = 0, min = -1e6, max = 1e6, step = 1)) 71 | } 72 | 73 | set_normal_sd <- function(variant, id_type) { 74 | return(numericInput(inputId = get_input_id("sd", variant, id_type) 75 | , label = "Observed Standard Deviation:" 76 | , value = 1, min = 0, max = 1e6, step = 1)) 77 | } 78 | 79 | choose_poisson_lambda <- function(variant, id_type) { 80 | return(numericInput(inputId = get_input_id("expected_lambda", variant, id_type) 81 | , label = "Expected Number of Events in One Session:" 82 | , value = 1, min = 0, max = 1e3, step = 1)) 83 | } 84 | 85 | choose_poisson_max <- function(variant, id_type) { 86 | return(numericInput(inputId = get_input_id("max_lambda", variant, id_type) 87 | , label = "Maximum Possible Events in One Session:" 88 | , value = 3, min = 0, max = 1e3, step = 1)) 89 | } 90 | 91 | set_poisson_num <- function(variant, id_type) { 92 | return(numericInput(inputId = get_input_id("num", variant, id_type) 93 | , label = "Number of Sessions:" 94 | , value = 0, min = 0, max = 1e5, step = 1)) 95 | } 96 | 97 | set_poisson_total <- function(variant, id_type) { 98 | return(numericInput(inputId = get_input_id("total", variant, id_type) 99 | , label = "Observed Number of Events Across All Sessions:" 100 | , value = 0, min = 0, max = 1e8, step = 1)) 101 | } 102 | -------------------------------------------------------------------------------- /man/ab_arguments.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/helpers.R 3 | \name{ab_arguments} 4 | \alias{ab_arguments} 5 | \title{Bayesian A/B Arguments} 6 | \arguments{ 7 | \item{data_dists}{A named list of distribution objects. This list specifies the 8 | distributions that are used to generate data in simulations. 9 | Currently, the list must only have elements named \code{'a'} 10 | and \code{'b'}. See Details for more information.} 11 | 12 | \item{priors}{A named list of distribution objects. This list specifies the 13 | distributions that are used as priors when estimating some parameter 14 | from the data generating distribution. Currently, the list must only 15 | have elements named \code{'a'} and \code{'b'}. See Details for more information.} 16 | 17 | \item{loss_threshold}{A positive number that identifies a bound for the expected 18 | loss for each variant. Once the expected loss is below this 19 | bound, the experiment is concluded.} 20 | 21 | \item{obs_per_round}{A positive number that represents how many observations, across both variants, 22 | are generated before we update the prior distributions and 23 | evaluate the expected loss. This number must be divisible by 24 | the number of variants used. Default is \code{1000}.} 25 | 26 | \item{max_rounds}{A positive integer that specifies the maximum number of 27 | times that we will evaluate the expected loss on both experiments. 28 | Default is \code{100}.} 29 | 30 | \item{sim_batch_size}{A positive integer that specifies how much data is simulated 31 | when evaluating the expected loss for variants that do 32 | not have an analytic solution (i.e. normal data).} 33 | } 34 | \description{ 35 | Provide documentation for arguments commonly used in this package 36 | } 37 | -------------------------------------------------------------------------------- /man/approx_solver.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/construct_priors.R 3 | \name{approx_solver} 4 | \alias{approx_solver} 5 | \title{find_percentile} 6 | \usage{ 7 | approx_solver(mu, bound, desired_support, cdf_fun, lower_sigma, upper_sigma, 8 | lower.tail = TRUE, tolerance = 0.01) 9 | } 10 | \arguments{ 11 | \item{mu}{The expected value of x (see details)} 12 | 13 | \item{bound}{The desired maximum of the distribution} 14 | 15 | \item{desired_support}{The amount of probability mass to be more extreme than \code{bound}} 16 | 17 | \item{cdf_fun}{A CDF} 18 | 19 | \item{lower_sigma}{A guess of the lower bound on the standard deviation} 20 | 21 | \item{upper_sigma}{A guess of the upper bound on the standard deviation} 22 | 23 | \item{lower.tail}{A logical that indicates whether to find the area of the upper or lower tail. 24 | Default is \code{TRUE}.} 25 | 26 | \item{tolerance}{Return \code{NULL}, if we cannot find a value that gives support 27 | within \code{tolerance} of \code{desired_support}} 28 | } 29 | \value{ 30 | The optimal standard deviation 31 | } 32 | \description{ 33 | Determine the standard deviation of a distribution that will provide 34 | the desired amount of support at a certain bound. 35 | } 36 | -------------------------------------------------------------------------------- /man/b_gt_a.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/metrics.R 3 | \name{b_gt_a} 4 | \alias{b_gt_a} 5 | \title{Probability Variant B is Greater Than Variant A} 6 | \usage{ 7 | b_gt_a(dist_a, dist_b, theta_a, theta_b, exact = NULL, ...) 8 | } 9 | \arguments{ 10 | \item{dist_a}{Some distribution object (see examples)} 11 | 12 | \item{dist_b}{Some distribution object (see examples)} 13 | 14 | \item{theta_a}{A vector of simulated values from \code{dist_a}} 15 | 16 | \item{theta_b}{A vector of simulated values from \code{dist_b}} 17 | 18 | \item{exact}{A boolean that indicates whether the calculation should be approximated 19 | using the normal distribution. Default is \code{NULL}, which means 20 | that it will use the normal distribution if there is sufficient data.} 21 | 22 | \item{...}{Arguments to be passed onto other methods} 23 | } 24 | \value{ 25 | A numeric value 26 | } 27 | \description{ 28 | Given two distributions, find the probability that the expected 29 | value of variant B is greater than the expected value of variant A 30 | } 31 | -------------------------------------------------------------------------------- /man/bernoulli_dist.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/create_dists.R 3 | \name{bernoulli_dist} 4 | \alias{bernoulli_dist} 5 | \title{Bernoulli Distribution} 6 | \usage{ 7 | bernoulli_dist(rate) 8 | } 9 | \arguments{ 10 | \item{rate}{The rate of the bernoulli distribution (between 0 and 1)} 11 | } 12 | \value{ 13 | A \code{bernoulli_dist} object 14 | } 15 | \description{ 16 | Create a bernoulli distribution object, used as the data generating 17 | distribution in tests where the metric is a probability of an 18 | event happening. 19 | } 20 | -------------------------------------------------------------------------------- /man/beta_cdf.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/construct_priors.R 3 | \name{beta_cdf} 4 | \alias{beta_cdf} 5 | \title{CDF of Parameterized Beta Distribution} 6 | \usage{ 7 | beta_cdf(mu, sigma, bound, lower.tail = TRUE) 8 | } 9 | \arguments{ 10 | \item{mu}{The expected value of x (see details)} 11 | 12 | \item{sigma}{The standard deviation of x (see details)} 13 | 14 | \item{bound}{The quantile of the distribution of interest} 15 | 16 | \item{lower.tail}{A logical that indicates whether to find the area of the upper or lower tail. 17 | Default is \code{TRUE}.} 18 | } 19 | \value{ 20 | The area under the desired tail. 21 | } 22 | \description{ 23 | Calculate the area to the left or right of \code{bound} for 24 | a beta distribution parameterized by the expected value and 25 | standard deviation. 26 | } 27 | \details{ 28 | If x ~ Beta(A, B), the \code{mu} = E[x] and \code{sigma^2} = Var(x) 29 | } 30 | -------------------------------------------------------------------------------- /man/beta_dist.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/create_dists.R 3 | \name{beta_dist} 4 | \alias{beta_dist} 5 | \title{Beta Distribution} 6 | \usage{ 7 | beta_dist(alpha, beta) 8 | } 9 | \arguments{ 10 | \item{alpha}{The non-negative alpha parameter of the beta distribution} 11 | 12 | \item{beta}{The non-negative beta parameter of the beta distribution} 13 | } 14 | \value{ 15 | An object of class \code{'beta'}. 16 | } 17 | \description{ 18 | Create a beta distribution object, used as the prior for metrics that 19 | have a bernoulli distribution. 20 | } 21 | -------------------------------------------------------------------------------- /man/calc_beta_dist.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/construct_priors.R 3 | \name{calc_beta_dist} 4 | \alias{calc_beta_dist} 5 | \title{Calculate Parameters For Beta Distribution} 6 | \usage{ 7 | calc_beta_dist(mu, sigma) 8 | } 9 | \arguments{ 10 | \item{mu}{The expected value of x (see details)} 11 | 12 | \item{sigma}{The standard deviation of x (see details)} 13 | } 14 | \value{ 15 | An object of class \code{beta_dist} 16 | } 17 | \description{ 18 | Calculate the parameters for a beta distribution parameterized 19 | by the expected value and standard deviation. 20 | } 21 | \details{ 22 | If x ~ Beta(alpha, beta), then \code{mu} = E[x] and \code{sigma^2} = Var(x) 23 | } 24 | -------------------------------------------------------------------------------- /man/calc_gamma_dist.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/construct_priors.R 3 | \name{calc_gamma_dist} 4 | \alias{calc_gamma_dist} 5 | \title{Calculate Parameters For Gamma Distribution} 6 | \usage{ 7 | calc_gamma_dist(mu, sigma) 8 | } 9 | \arguments{ 10 | \item{mu}{The expected value of x (see details)} 11 | 12 | \item{sigma}{The standard deviation of x (see details)} 13 | } 14 | \value{ 15 | A named list 16 | } 17 | \description{ 18 | This function determines the parameters for a gamma distribution 19 | from the desired mean 20 | } 21 | \details{ 22 | If x ~ Gamma(alpha, beta), then \code{mu} = E[x] and \code{sigma^2} = Var(x) 23 | } 24 | -------------------------------------------------------------------------------- /man/calc_normal_gamma_dist.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/construct_priors.R 3 | \name{calc_normal_gamma_dist} 4 | \alias{calc_normal_gamma_dist} 5 | \title{Calculate Parameters For Normal Gamma Distribution} 6 | \usage{ 7 | calc_normal_gamma_dist(mu, tau, sigma_mu, sigma_tau) 8 | } 9 | \arguments{ 10 | \item{mu}{The expected value of x (see details)} 11 | 12 | \item{tau}{The expected value of T (see details)} 13 | 14 | \item{sigma_mu}{The standard deviation of x (see details)} 15 | 16 | \item{sigma_tau}{The standard deviation of T (see details)} 17 | } 18 | \value{ 19 | An object of class \code{normal_gamma_dist} 20 | } 21 | \description{ 22 | Calculate the parameters for a normal gamma distribution parameterized 23 | by the expected value and standard deviation. 24 | } 25 | \details{ 26 | If (x, T) ~ NormalGamma(mu, lambda, alpha, beta), then \code{mu} = E[x], 27 | \code{tau} = E[T], \code{sigma_mu^2} = Var(x), and \code{sigma_tau^2} = Var(T) 28 | } 29 | -------------------------------------------------------------------------------- /man/create_empty_dt.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/helpers.R 3 | \name{create_empty_dt} 4 | \alias{create_empty_dt} 5 | \title{Create an Empty Data Table} 6 | \usage{ 7 | create_empty_dt(num_rows, column_names, fill_value = -1) 8 | } 9 | \arguments{ 10 | \item{num_rows}{A positive integer that specifies the number of rows for the data table} 11 | 12 | \item{column_names}{A character vector that specifies the number and names of the columns.} 13 | 14 | \item{fill_value}{Default is -1. Specifies what value should fill the data table} 15 | } 16 | \value{ 17 | A data.table 18 | } 19 | \description{ 20 | Create an empty data table with a specified number of rows and columns. 21 | You can also choose the value that fills the data table. 22 | } 23 | -------------------------------------------------------------------------------- /man/expected_loss_b.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/metrics.R 3 | \name{expected_loss_b} 4 | \alias{expected_loss_b} 5 | \title{Expected Loss of Choosing Variant B} 6 | \usage{ 7 | expected_loss_b(dist_a, dist_b, theta_a, theta_b, method = c("absolute", 8 | "percent"), ...) 9 | } 10 | \arguments{ 11 | \item{dist_a}{The distribution object for variant A} 12 | 13 | \item{dist_b}{The distribution object for variant B} 14 | 15 | \item{theta_a}{A vector of simulated values from \code{dist_a}} 16 | 17 | \item{theta_b}{A vector of simulated values from \code{dist_b}} 18 | 19 | \item{method}{One of \code{'absolute'} or \code{'percent'} that indicates 20 | whether the loss function takes the absolute difference 21 | or the percent difference between \code{theta_a} and \code{theta_b}} 22 | 23 | \item{...}{Arguments to be passed onto other methods} 24 | } 25 | \value{ 26 | A numeric 27 | } 28 | \description{ 29 | This function calculates the expected loss of choosing variant B 30 | } 31 | -------------------------------------------------------------------------------- /man/gamma_cdf.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/construct_priors.R 3 | \name{gamma_cdf} 4 | \alias{gamma_cdf} 5 | \title{CDF of Parameterized Gamma Distribution} 6 | \usage{ 7 | gamma_cdf(mu, sigma, bound, lower.tail = TRUE) 8 | } 9 | \arguments{ 10 | \item{mu}{The expected value of x (see details)} 11 | 12 | \item{sigma}{The standard deviation of x (see details)} 13 | 14 | \item{bound}{The quantile of the distribution of interest} 15 | 16 | \item{lower.tail}{A logical that indicates whether to find the area of the upper or lower tail. 17 | Default is \code{TRUE}.} 18 | } 19 | \value{ 20 | The area under the desired tail. 21 | } 22 | \description{ 23 | Calculate the area to the left or right of \code{bound} for 24 | a gamma distribution parameterized by the expected value and 25 | standard deviation. 26 | } 27 | \details{ 28 | If x ~ Gamma(alpha, beta), then \code{mu} = E[x] and \code{sigma^2} = Var(x) 29 | } 30 | -------------------------------------------------------------------------------- /man/gamma_dist.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/create_dists.R 3 | \name{gamma_dist} 4 | \alias{gamma_dist} 5 | \title{Gamma Distribution} 6 | \usage{ 7 | gamma_dist(alpha, beta) 8 | } 9 | \arguments{ 10 | \item{alpha}{The non-negative alpha (shape) parameter of the gamma distribution} 11 | 12 | \item{beta}{The non-negative beta (rate) parameter of the gamma distribution} 13 | } 14 | \value{ 15 | A normal_unknown dist object 16 | } 17 | \description{ 18 | Create a gamma distribution object, used as the prior for 19 | metrics that have a poisson distribution. 20 | } 21 | -------------------------------------------------------------------------------- /man/get_actual_loss.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/metrics.R 3 | \name{get_actual_loss} 4 | \alias{get_actual_loss} 5 | \title{Actual Loss of an Experiment} 6 | \usage{ 7 | get_actual_loss(data_dists, selected_variant) 8 | } 9 | \arguments{ 10 | \item{data_dists}{A named list of distribution objects. This list specifies the 11 | distributions that are used to generate data in simulations. 12 | Currently, the list must only have elements named \code{'a'} 13 | and \code{'b'}. See Details for more information.} 14 | 15 | \item{selected_variant}{A string that identifies the winning variant.} 16 | } 17 | \value{ 18 | Numeric 19 | } 20 | \description{ 21 | Once the experiment has concluded, measure the loss of the decision 22 | } 23 | -------------------------------------------------------------------------------- /man/get_data_dists.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/get_data_dists.R 3 | \name{get_data_dists} 4 | \alias{get_data_dists} 5 | \title{Sample Data Distributions} 6 | \usage{ 7 | get_data_dists(dist, n, num_variants, ...) 8 | } 9 | \arguments{ 10 | \item{dist}{A prior distribution} 11 | 12 | \item{n}{The number of distributions to sample} 13 | 14 | \item{num_variants}{The number of variants to use} 15 | 16 | \item{...}{Arguments to be used by other methods} 17 | } 18 | \value{ 19 | A list of data generating distributions for multiple variants 20 | } 21 | \description{ 22 | Given a prior distribution, sample many data generating distributions 23 | } 24 | -------------------------------------------------------------------------------- /man/get_losses.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/metrics.R 3 | \name{get_losses} 4 | \alias{get_losses} 5 | \title{Expected Losses For Variants} 6 | \usage{ 7 | get_losses(posteriors, sim_batch_size, method = c("absolute", "percent")) 8 | } 9 | \arguments{ 10 | \item{posteriors}{A list of distribution objects that identify the posterior distributions 11 | for each variant} 12 | 13 | \item{sim_batch_size}{A positive integer that specifies how much data is simulated 14 | when evaluating the expected loss for variants that do 15 | not have an analytic solution (i.e. normal data).} 16 | 17 | \item{method}{One of \code{'absolute'} or \code{'percent'} that indicates 18 | whether the loss function takes the absolute difference 19 | or the percent difference between \code{theta_a} and \code{theta_b}} 20 | } 21 | \value{ 22 | A list of expected losses for each variant 23 | } 24 | \description{ 25 | Evaluate the expected loss for each variant 26 | } 27 | -------------------------------------------------------------------------------- /man/get_metrics.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/metrics.R 3 | \name{get_metrics} 4 | \alias{get_metrics} 5 | \title{Get Bayesian A/B Testing Metrics} 6 | \usage{ 7 | get_metrics(posteriors, sim_batch_size, method = c("absolute", "percent")) 8 | } 9 | \arguments{ 10 | \item{posteriors}{A list of distribution objects} 11 | 12 | \item{sim_batch_size}{How many observations of data to simulate} 13 | 14 | \item{method}{What type of loss to calculate?} 15 | } 16 | \value{ 17 | A named list with the metrics 18 | } 19 | \description{ 20 | Calculate various metrics 21 | } 22 | -------------------------------------------------------------------------------- /man/get_supported_beta.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/construct_priors.R 3 | \name{get_supported_beta} 4 | \alias{get_supported_beta} 5 | \title{Get Beta Distribution With Desired Support} 6 | \usage{ 7 | get_supported_beta(mu, bound, desired_support = 0.05) 8 | } 9 | \arguments{ 10 | \item{mu}{The expected value of x (see details)} 11 | 12 | \item{bound}{The desired maximum of the distribution} 13 | 14 | \item{desired_support}{The amount of probability mass to be more extreme than \code{bound}} 15 | } 16 | \value{ 17 | An object of class \code{beta_dist} 18 | } 19 | \description{ 20 | Determines the parameters for a beta distribution given an expected value 21 | and desired support at some bound. 22 | } 23 | \details{ 24 | If x ~ Beta(alpha, beta), then \code{mu} = E[x] and \code{sigma^2} = Var(x) 25 | } 26 | -------------------------------------------------------------------------------- /man/get_supported_gamma.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/construct_priors.R 3 | \name{get_supported_gamma} 4 | \alias{get_supported_gamma} 5 | \title{Get parameters of gamma distribution from expected and maximum rate} 6 | \usage{ 7 | get_supported_gamma(mu, bound, desired_support = 0.05, lower.tail = FALSE) 8 | } 9 | \arguments{ 10 | \item{mu}{The expected value of x (see details)} 11 | 12 | \item{bound}{The desired maximum of the distribution} 13 | 14 | \item{desired_support}{The amount of probability mass to be more extreme than \code{bound}} 15 | 16 | \item{lower.tail}{A boolean that indicates whether we want to get the support on the upper or lower tail} 17 | } 18 | \value{ 19 | A \code{gamma_dist} object 20 | } 21 | \description{ 22 | This function determines the parameters for a gamma distribution 23 | } 24 | \details{ 25 | If x ~ Gamma(alpha, beta), then \code{mu} = E[x] and \code{sigma^2} = Var(x) 26 | } 27 | -------------------------------------------------------------------------------- /man/get_supported_normal_gamma.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/construct_priors.R 3 | \name{get_supported_normal_gamma} 4 | \alias{get_supported_normal_gamma} 5 | \title{Get Normal Gamma Distribution With Desired Support} 6 | \usage{ 7 | get_supported_normal_gamma(mu, bound_mu, sigma, bound_sigma, 8 | desired_support = 0.05) 9 | } 10 | \arguments{ 11 | \item{mu}{The expected value of x (see details)} 12 | 13 | \item{bound_mu}{An upper bound on the likely value of x (see details)} 14 | 15 | \item{sigma}{The standard deviation of x (see details)} 16 | 17 | \item{bound_sigma}{The upper bound on the likely value of T (see details)} 18 | 19 | \item{desired_support}{The amount of probability mass to be more extreme than \code{bound}} 20 | } 21 | \value{ 22 | An object of class \code{normal_gamma_dist} 23 | } 24 | \description{ 25 | Determines the parameters for a normal gamma distribution given the expected values 26 | of the distribution and bounds on their maximum values 27 | } 28 | \details{ 29 | If (x, T) ~ NormalGamma(mu, lambda, alpha, beta), then \code{mu} = E[x], 30 | \code{tau} = E[T], \code{sigma_mu^2} = Var(x), and \code{sigma_tau^2} = Var(T) 31 | } 32 | -------------------------------------------------------------------------------- /man/investigate_simulations.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/simulation_study.R 3 | \name{investigate_simulations} 4 | \alias{investigate_simulations} 5 | \title{Investigate Properties of Bayesian A/B Testing} 6 | \usage{ 7 | investigate_simulations(num_sims, priors, loss_threshold, data_dists = NULL, 8 | sampling_distribution = NULL, obs_per_round = 1000, max_rounds = 100, 9 | sim_batch_size = 1e+05, num_cores = NULL) 10 | } 11 | \arguments{ 12 | \item{num_sims}{A positive integer that specifies how many simulations to perform.} 13 | 14 | \item{priors}{A named list of distribution objects. This list specifies the 15 | distributions that are used as priors when estimating some parameter 16 | from the data generating distribution. Currently, the list must only 17 | have elements named \code{'a'} and \code{'b'}. See Details for more information.} 18 | 19 | \item{loss_threshold}{A positive number that identifies a bound for the expected 20 | loss for each variant. Once the expected loss is below this 21 | bound, the experiment is concluded.} 22 | 23 | \item{data_dists}{A named list of distribution objects. This list specifies the 24 | distributions that are used to generate data in simulations. 25 | Currently, the list must only have elements named \code{'a'} 26 | and \code{'b'}. See Details for more information.} 27 | 28 | \item{sampling_distribution}{An list of distribution objects that specifies how the data generating 29 | distributions should be created.} 30 | 31 | \item{obs_per_round}{A positive number that represents how many observations, across both variants, 32 | are generated before we update the prior distributions and 33 | evaluate the expected loss. This number must be divisible by 34 | the number of variants used. Default is \code{1000}.} 35 | 36 | \item{max_rounds}{A positive integer that specifies the maximum number of 37 | times that we will evaluate the expected loss on both experiments. 38 | Default is \code{100}.} 39 | 40 | \item{sim_batch_size}{A positive integer that specifies how much data is simulated 41 | when evaluating the expected loss for variants that do 42 | not have an analytic solution (i.e. normal data).} 43 | 44 | \item{num_cores}{How many cores to use in the parallelization of tests. Default is \code{NULL}, which 45 | means no parallelization.} 46 | } 47 | \value{ 48 | A list containing two data.tables: one with summary statistics for each simulation 49 | and one with the averages over all of the simulations. 50 | } 51 | \description{ 52 | Run multiple simulations of an A/B test in order to evaluate various properties 53 | } 54 | -------------------------------------------------------------------------------- /man/normal_dist.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/create_dists.R 3 | \name{normal_dist} 4 | \alias{normal_dist} 5 | \title{Normal Distribution} 6 | \usage{ 7 | normal_dist(mu, sigma) 8 | } 9 | \arguments{ 10 | \item{mu}{The mean of the normal distribution} 11 | 12 | \item{sigma}{The non-negative standard deviation of the normal distribution} 13 | } 14 | \value{ 15 | A \code{normal_dist} object 16 | } 17 | \description{ 18 | Create a normal distribution object, used as the data generating 19 | distribution in tests where the metric is a continuous value. 20 | } 21 | -------------------------------------------------------------------------------- /man/normal_gamma_dist.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/create_dists.R 3 | \name{normal_gamma_dist} 4 | \alias{normal_gamma_dist} 5 | \title{Normal-Gamma Distribution} 6 | \usage{ 7 | normal_gamma_dist(mu, lambda, alpha, beta) 8 | } 9 | \arguments{ 10 | \item{mu}{The mean of the normal-gamma distribution} 11 | 12 | \item{lambda}{The non-negative lambda parameter of the normal-gamma distribution} 13 | 14 | \item{alpha}{The non-negative alpha parameter of the normal-gamma distribution} 15 | 16 | \item{beta}{The non-negative beta parameter of the normal-gamma distribution} 17 | } 18 | \value{ 19 | A normal_unknown dist object 20 | } 21 | \description{ 22 | Create a normal_gamma distribution object, used as the prior for 23 | metrics that have a normal distribution. 24 | } 25 | -------------------------------------------------------------------------------- /man/plot_beta.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/plotting.R 3 | \name{plot_beta} 4 | \alias{plot_beta} 5 | \title{Plot Beta Distributions} 6 | \usage{ 7 | plot_beta(betas, title = "Beta Distribution", 8 | xlab = "Rate that the Event Occurs", ylab = "Density of that Rate", 9 | color = "#f65335", support_level = 0.99) 10 | } 11 | \arguments{ 12 | \item{betas}{A list of lists of beta distributions} 13 | 14 | \item{title}{The title of the plot} 15 | 16 | \item{xlab}{The title of the x axis} 17 | 18 | \item{ylab}{The title of the y axis} 19 | 20 | \item{color}{The color for the plot} 21 | 22 | \item{support_level}{The desired amount of area between the lower and upper bounds. Default is \code{0.99}.} 23 | } 24 | \value{ 25 | NULL. A plot is generated 26 | } 27 | \description{ 28 | This function plots the densities of multiple beta distributions 29 | } 30 | -------------------------------------------------------------------------------- /man/plot_gamma.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/plotting.R 3 | \name{plot_gamma} 4 | \alias{plot_gamma} 5 | \title{Plot Gamma Distributions} 6 | \usage{ 7 | plot_gamma(gammas, title = "Density of Gamma Distribution", 8 | color = "#f65335", support_level = 0.99) 9 | } 10 | \arguments{ 11 | \item{gammas}{A list of lists of gamma distributions} 12 | 13 | \item{title}{The title of the plot} 14 | 15 | \item{color}{The color for the plot} 16 | 17 | \item{support_level}{The desired amount of area between the lower and upper bounds. Default is \code{0.99}.} 18 | } 19 | \value{ 20 | NULL. A plot is generated 21 | } 22 | \description{ 23 | This function plots the densities of multiple gamma distributions 24 | } 25 | -------------------------------------------------------------------------------- /man/plot_normal.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/plotting.R 3 | \name{plot_normal} 4 | \alias{plot_normal} 5 | \title{Plot Normal Distributions} 6 | \usage{ 7 | plot_normal(normals, title = "Normal Distribution", color = "#f65335", 8 | support_level = 0.99) 9 | } 10 | \arguments{ 11 | \item{normals}{A list of lists of normal distributions} 12 | 13 | \item{title}{The title of the plot} 14 | 15 | \item{color}{The color for the plot} 16 | 17 | \item{support_level}{The desired amount of area between the lower and upper bounds. Default is \code{0.99}.} 18 | } 19 | \value{ 20 | NULL. A plot is generated 21 | } 22 | \description{ 23 | This function allows you to visualize the densities of a normal distribution 24 | } 25 | -------------------------------------------------------------------------------- /man/plot_relative_gain.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/plotting.R 3 | \name{plot_relative_gain} 4 | \alias{plot_relative_gain} 5 | \title{Plot Relative Gain} 6 | \usage{ 7 | plot_relative_gain(dists, sim_batch_size = 1e+05, 8 | title = "Cumulative Density of B / A") 9 | } 10 | \arguments{ 11 | \item{dists}{A list of distribution objects, with elements named \code{'a'} and \code{'b'}} 12 | 13 | \item{sim_batch_size}{The number of objects to simulate} 14 | } 15 | \value{ 16 | A plot 17 | } 18 | \description{ 19 | Plot the cumulative density of the ratio of the metric under variant B to the metric under variant A 20 | } 21 | -------------------------------------------------------------------------------- /man/poisson_dist.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/create_dists.R 3 | \name{poisson_dist} 4 | \alias{poisson_dist} 5 | \title{Poisson Distribution} 6 | \usage{ 7 | poisson_dist(rate) 8 | } 9 | \arguments{ 10 | \item{rate}{The non-negative mean of the poisson distribution} 11 | } 12 | \value{ 13 | A \code{poisson_dist} object 14 | } 15 | \description{ 16 | Create a poisson distribution object, used as the data generating 17 | distribution in tests where the metric is a count of an event happening. 18 | } 19 | -------------------------------------------------------------------------------- /man/sim_effect_size.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/metrics.R 3 | \name{sim_effect_size} 4 | \alias{sim_effect_size} 5 | \title{Simulate the effect size of variant B - variant A} 6 | \usage{ 7 | sim_effect_size(theta_a, theta_b) 8 | } 9 | \arguments{ 10 | \item{theta_a}{A vector of draws from the posterior of a} 11 | 12 | \item{theta_b}{A vector of draws from the posterior of b} 13 | } 14 | \value{ 15 | A list containing the 2.5%, mean, and 97.5% of the effect size 16 | } 17 | \description{ 18 | This function simulates the effect size (B - A) 19 | } 20 | -------------------------------------------------------------------------------- /man/simulate_ab_test.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/run_simulation.R 3 | \name{simulate_ab_test} 4 | \alias{simulate_ab_test} 5 | \title{Simulate a Bayesian A/B Test} 6 | \usage{ 7 | simulate_ab_test(data_dists, priors, loss_threshold, obs_per_round = 1000, 8 | max_rounds = 100, sim_batch_size = 1e+05) 9 | } 10 | \arguments{ 11 | \item{data_dists}{A named list of distribution objects. This list specifies the 12 | distributions that are used to generate data in simulations. 13 | Currently, the list must only have elements named \code{'a'} 14 | and \code{'b'}. See Details for more information.} 15 | 16 | \item{priors}{A named list of distribution objects. This list specifies the 17 | distributions that are used as priors when estimating some parameter 18 | from the data generating distribution. Currently, the list must only 19 | have elements named \code{'a'} and \code{'b'}. See Details for more information.} 20 | 21 | \item{loss_threshold}{A positive number that identifies a bound for the expected 22 | loss for each variant. Once the expected loss is below this 23 | bound, the experiment is concluded.} 24 | 25 | \item{obs_per_round}{A positive number that represents how many observations, across both variants, 26 | are generated before we update the prior distributions and 27 | evaluate the expected loss. This number must be divisible by 28 | the number of variants used. Default is \code{1000}.} 29 | 30 | \item{max_rounds}{A positive integer that specifies the maximum number of 31 | times that we will evaluate the expected loss on both experiments. 32 | Default is \code{100}.} 33 | 34 | \item{sim_batch_size}{A positive integer that specifies how much data is simulated 35 | when evaluating the expected loss for variants that do 36 | not have an analytic solution (i.e. normal data).} 37 | } 38 | \value{ 39 | A list that contains the name of the winning variant, the number of observations used, 40 | the loss of the decision, whether the test finished, the metrics from each round, and 41 | the raw data. 42 | } 43 | \description{ 44 | Given true data generating distributions and prior distributions for 45 | variants A and B, simulate the data, calculate the necessary statistics 46 | and declare one of the tests a winner. 47 | } 48 | \details{ 49 | In order to create \code{data_dists} and \code{priors}, you need to use the following 50 | for data generating distributions: \code{\link{bernoulli_dist}}, 51 | \code{\link{normal_dist}}, \code{\link{poisson_dist}} and the following for prior 52 | distributions \code{\link{beta_dist}}, \code{\link{normal_gamma_dist}}, 53 | \code{\link{beta_dist}}, 54 | } 55 | \seealso{ 56 | \code{\link{bernoulli_dist}} \code{\link{normal_dist}} \code{\link{poisson_dist}} 57 | \code{\link{beta_dist}} \code{\link{normal_gamma_dist}} \code{\link{beta_dist}} 58 | } 59 | -------------------------------------------------------------------------------- /man/simulate_data.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/compute_moments.R, R/simulate_data.R 3 | \name{simulate_data} 4 | \alias{simulate_data} 5 | \alias{compute_moments} 6 | \alias{simulate_data} 7 | \title{Simulate Data According to Some Distribution} 8 | \usage{ 9 | compute_moments(dist) 10 | 11 | simulate_data(dist, n, ...) 12 | } 13 | \arguments{ 14 | \item{dist}{An object of class \code{'beta_dist'}, \code{'normal_gamma_dist'}, 15 | \code{'gamma_dist'} that specifies the parameters of some distribution} 16 | 17 | \item{n}{How many data points to simulate} 18 | 19 | \item{...}{Arguments to be passed onto other methods} 20 | 21 | \item{dist}{An object of class \code{'beta_dist'}, \code{'normal_gamma_dist'}, 22 | \code{'gamma_dist'}, \code{'bernouilli_dist'}, \code{'normal_dist'}, 23 | or \code{'poisson_dist'} that specifies the parameters of some distribution} 24 | } 25 | \value{ 26 | A list 27 | 28 | A vector of simulated data. 29 | } 30 | \description{ 31 | Simulate a vector of data from a given distribution object. 32 | 33 | Simulate a vector of data from a given distribution object. 34 | } 35 | -------------------------------------------------------------------------------- /man/single_loss_stop.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/stopping_functions.R 3 | \name{single_loss_stop} 4 | \alias{single_loss_stop} 5 | \title{Decide whether to stop the experiment using the last loss values} 6 | \usage{ 7 | single_loss_stop(losses, loss_threshold) 8 | } 9 | \arguments{ 10 | \item{losses}{A named vector of the losses for each variant.} 11 | 12 | \item{loss_threshold}{Choose a variant once the loss goes beneath this value.} 13 | } 14 | \value{ 15 | A list containg two named elements: \code{'stop'} (a boolean) and 16 | \code{'winner'}, which is \code{NULL} if \code{'stop'} is \code{FALSE} 17 | else it is the name of the winning variant 18 | } 19 | \description{ 20 | This function decides whether or not to stop the experiment and 21 | declare one of the variants as the winner. 22 | } 23 | -------------------------------------------------------------------------------- /man/update_prior.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/update_priors.R 3 | \name{update_prior} 4 | \alias{update_prior} 5 | \title{Update Prior Parameters} 6 | \usage{ 7 | update_prior(prior, evidence = NULL, stats = NULL, ...) 8 | } 9 | \arguments{ 10 | \item{prior}{An object of class \code{'beta_dist'}, \code{'normal_gamma_dist'} 11 | , or \code{'gamma_dist'} that specifies the parameters of some distribution} 12 | 13 | \item{evidence}{A numeric vector that contains observed data. Default is \code{NULL}, 14 | will override \code{stats} if specified.} 15 | 16 | \item{stats}{An object of class \code{'beta_stats'}, \code{'normal_gamma_stats'}, or 17 | \code{'gamma_stats'} that contains sufficient statistics for the 18 | update. Default is \code{NULL}, will be ignored if \code{evidence} 19 | is specified.} 20 | 21 | \item{...}{Arguments to be passed onto other methods} 22 | } 23 | \description{ 24 | Use observed data to update parameters of prior distribution 25 | } 26 | -------------------------------------------------------------------------------- /man/update_priors.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/update_priors.R 3 | \name{update_priors} 4 | \alias{update_priors} 5 | \title{Update the hyper-parameters of prior distributions} 6 | \usage{ 7 | update_priors(priors, evidence_dt = NULL, stats_dt = NULL) 8 | } 9 | \arguments{ 10 | \item{priors}{A named list of distribution objects. This list specifies the 11 | distributions that are used as priors when estimating some parameter 12 | from the data generating distribution. Currently, the list must only 13 | have elements named \code{'a'} and \code{'b'}. See Details for more information.} 14 | 15 | \item{evidence_dt}{A data.table containing the raw data generated by each variant. 16 | Column names must be identical to variant names in \code{priors}} 17 | 18 | \item{stats_dt}{A data.table containing statistics about the raw data generated by each 19 | variant.} 20 | } 21 | \value{ 22 | A list of distribution objects that represented the updated prior distributions 23 | } 24 | \description{ 25 | This function updates the hyper-parameters of our prior distributions 26 | } 27 | -------------------------------------------------------------------------------- /man/validate_dt.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/update_priors.R 3 | \name{validate_dt} 4 | \alias{validate_dt} 5 | \title{Check That Data Table Matches Expectations} 6 | \usage{ 7 | validate_dt(dt, expected_cols) 8 | } 9 | \arguments{ 10 | \item{dt}{A data.table} 11 | 12 | \item{expected_cols}{A vector of column names that \code{dt} must have} 13 | } 14 | \value{ 15 | \code{NULL} if successful, else a fatal error 16 | } 17 | \description{ 18 | Use this function to assert certain properties of a data.table 19 | } 20 | -------------------------------------------------------------------------------- /man/validate_inputs.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/run_simulation.R 3 | \name{validate_inputs} 4 | \alias{validate_inputs} 5 | \title{Validate Inputs to Bayesian A/B Simulation} 6 | \usage{ 7 | validate_inputs(data_dists, priors, obs_per_round) 8 | } 9 | \arguments{ 10 | \item{data_dists}{A named list of distribution objects. This list specifies the 11 | distributions that are used to generate data in simulations. 12 | Currently, the list must only have elements named \code{'a'} 13 | and \code{'b'}. See Details for more information.} 14 | 15 | \item{priors}{A named list of distribution objects. This list specifies the 16 | distributions that are used as priors when estimating some parameter 17 | from the data generating distribution. Currently, the list must only 18 | have elements named \code{'a'} and \code{'b'}. See Details for more information.} 19 | 20 | \item{obs_per_round}{A positive number that represents how many observations, across both variants, 21 | are generated before we update the prior distributions and 22 | evaluate the expected loss. This number must be divisible by 23 | the number of variants used. Default is \code{1000}.} 24 | } 25 | \value{ 26 | NULL if all of the tests pass. Else, it will fail loudly. 27 | } 28 | \description{ 29 | Check the validity of the parameters being used in the simulation. 30 | } 31 | -------------------------------------------------------------------------------- /shiny_app_screen_shot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/convoyinc/abayes/6e74f8288ad84cf58ffea58dd396a5d2f8ff10e8/shiny_app_screen_shot.png -------------------------------------------------------------------------------- /tests/testthat.R: -------------------------------------------------------------------------------- 1 | Sys.setenv('R_TESTS' = '') 2 | 3 | library(abayes) 4 | testthat::test_check('abayes') -------------------------------------------------------------------------------- /tests/testthat/test-construct_priors.R: -------------------------------------------------------------------------------- 1 | context('Testing functions that construct priors') 2 | 3 | test_that('calc_beta_dist can get a beta distribution from mu and sigma', { 4 | mu <- 0.5 5 | sigma <- 0.01 6 | dist <- calc_beta_dist(mu = mu, sigma = sigma) 7 | expected <- beta_dist(alpha = 1249.5, beta = 1249.5) 8 | expect_identical(dist, expected) 9 | }) 10 | 11 | test_that('beta_cdf calculates the correct integral', { 12 | mu <- 0.5 13 | sigma <- 0.01 14 | bound <- 0.51 15 | lower.tail <- FALSE 16 | area <- beta_cdf(mu = mu, sigma = sigma, bound = bound, lower.tail = lower.tail) 17 | expected <- pbeta(0.51, 1249.5, 1249.5, lower.tail = FALSE) 18 | expect_identical(area, expected) 19 | }) 20 | 21 | test_that('calc_normal_gamma_dist can get a normal gamma distribution', { 22 | mu <- 10 23 | tau <- 2 24 | sigma_mu <- 2 25 | sigma_tau <- 0.5 26 | dist <- calc_normal_gamma_dist(mu = mu, tau = tau, sigma_mu = sigma_mu, sigma_tau = sigma_tau) 27 | expected <- normal_gamma_dist(mu = 10, lambda = 2/15, alpha = 16, beta = 8) 28 | expect_identical(dist, expected) 29 | }) 30 | 31 | test_that('calc_gamma_dist can get a gamma distribution from expected value and standard deviation', { 32 | mu <- 10 33 | sigma <- 5 34 | dist <- calc_gamma_dist(mu = mu, sigma = sigma) 35 | expected <- gamma_dist(alpha = 4, beta = 0.4) 36 | expect_identical(dist, expected) 37 | }) 38 | 39 | test_that('gamma_cdf calculates the correct integral', { 40 | mu <- 3 41 | sigma <- 2 42 | bound <- 4 43 | lower.tail <- FALSE 44 | area <- gamma_cdf(mu = mu, sigma = sigma, bound = bound, lower.tail = lower.tail) 45 | expected <- pgamma(4, 2.25, 0.75, lower.tail = FALSE) 46 | expect_identical(area, expected) 47 | }) 48 | 49 | test_that('get_supported_beta returns correct distribution', { 50 | run_test <- function(mu, bound, desired_support) { 51 | beta_dist <- get_supported_beta(mu = mu, bound = bound, desired_support = desired_support) 52 | expect_true(inherits(beta_dist, 'beta_dist')) 53 | actual_support <- pbeta(bound, beta_dist[['alpha']], beta_dist[['beta']], lower.tail = FALSE) 54 | expect_true(abs(actual_support - desired_support) < 0.01) 55 | } 56 | 57 | run_test(mu = 0.1, bound = 0.2, desired_support = 0.05) 58 | run_test(mu = 0.01, bound = 0.011, desired_support = 0.05) 59 | run_test(mu = 0.3, bound = 0.5, desired_support = 0.05) 60 | }) 61 | 62 | test_that('get_supported_normal_gamma returns correct distribution', { 63 | run_test <- function(mu, bound_mu, sigma, bound_sigma, desired_support) { 64 | ng_dist <- get_supported_normal_gamma(mu = mu, bound_mu = bound_mu 65 | , sigma = sigma, bound_sigma = bound_sigma 66 | , desired_support = desired_support) 67 | expect_true(inherits(ng_dist, 'normal_gamma_dist')) 68 | sample_tau <- rgamma(n = 1e4, shape = ng_dist[['alpha']], rate = ng_dist[['beta']]) 69 | actual_support <- mean(pnorm(bound_mu 70 | , ng_dist[['mu']] 71 | , sqrt(1 / (ng_dist[['lambda']] * sample_tau)) 72 | , lower.tail = FALSE)) 73 | expect_true(abs(actual_support - desired_support) < 0.01) 74 | } 75 | 76 | run_test(mu = 0, bound_mu = 2, sigma = 2, bound_sigma = 4, desired_support = 0.05) 77 | run_test(mu = 10, bound_mu = 25, sigma = 100, bound_sigma = 150, desired_support = 0.05) 78 | }) 79 | 80 | test_that('get_supported_gamma returns correct distribution', { 81 | run_test <- function(mu, bound, desired_support) { 82 | gamma_dist <- get_supported_gamma(mu = mu, bound = bound, desired_support = desired_support) 83 | expect_true(inherits(gamma_dist, 'gamma_dist')) 84 | actual_support <- pgamma(bound, gamma_dist[['alpha']], gamma_dist[['beta']], lower.tail = FALSE) 85 | expect_true(abs(actual_support - desired_support) < 0.01) 86 | } 87 | 88 | run_test(mu = 10, bound = 30, desired_support = 0.05) 89 | run_test(mu = 25, bound = 30, desired_support = 0.05) 90 | }) 91 | 92 | -------------------------------------------------------------------------------- /tests/testthat/test-create_dists.R: -------------------------------------------------------------------------------- 1 | context('Testing functions that create distribution objects') 2 | 3 | test_that('beta_dist fails on bad inputs', { 4 | expect_error(beta_dist(alpha = 'a', beta = 0), regexp = 'must be numeric') 5 | expect_error(beta_dist(alpha = -1, beta = 0), regexp = 'must be non-negative') 6 | }) 7 | 8 | test_that('beta_dist creates a valid object', { 9 | dist <- beta_dist(alpha = 1, beta = 2) 10 | expect_equal(class(dist), 'beta_dist') 11 | expect_equal(names(dist), c('alpha', 'beta')) 12 | expect_equal(dist[['alpha']], 1) 13 | expect_equal(dist[['beta']], 2) 14 | }) 15 | 16 | test_that('normal_gamma_dist fails on bad inputs', { 17 | expect_error(normal_gamma_dist(mu = 'a', lambda = 0, alpha = 1, beta = 2) 18 | , regexp = 'must be numeric') 19 | expect_error(normal_gamma_dist(mu = -1, lambda = -1, alpha = 1, beta = 2) 20 | , regexp = 'must be non-negative') 21 | }) 22 | 23 | test_that('normal_gamma_dist creates a valid object', { 24 | dist <- normal_gamma_dist(mu = 0, lambda = 3, alpha = 1, beta = 2) 25 | expect_equal(class(dist), 'normal_gamma_dist') 26 | expect_equal(names(dist), c('mu', 'lambda', 'alpha', 'beta')) 27 | expect_equal(dist[['mu']], 0) 28 | expect_equal(dist[['lambda']], 3) 29 | expect_equal(dist[['alpha']], 1) 30 | expect_equal(dist[['beta']], 2) 31 | }) 32 | 33 | test_that('gamma_dist fails on bad inputs', { 34 | expect_error(gamma_dist(alpha = 'a', beta = 0), regexp = 'must be numeric') 35 | expect_error(gamma_dist(alpha = -1, beta = 0), regexp = 'must be non-negative') 36 | }) 37 | 38 | test_that('gamma_dist creates a valid object', { 39 | dist <- gamma_dist(alpha = 1, beta = 2) 40 | expect_equal(class(dist), 'gamma_dist') 41 | expect_equal(names(dist), c('alpha', 'beta')) 42 | expect_equal(dist[['alpha']], 1) 43 | expect_equal(dist[['beta']], 2) 44 | }) 45 | 46 | test_that('bernoulli_dist fails on bad inputs', { 47 | expect_error(bernoulli_dist(rate = 'a'), regexp = 'must be numeric') 48 | expect_error(bernoulli_dist(rate = 1.5), regexp = 'must be between 0 and 1') 49 | }) 50 | 51 | test_that('bernoulli_dist creates a valid object', { 52 | dist <- bernoulli_dist(rate = 0.5) 53 | expect_equal(class(dist), 'bernoulli_dist') 54 | expect_equal(names(dist), 'rate') 55 | expect_equal(dist[['rate']], 0.5) 56 | }) 57 | 58 | test_that('normal_dist fails on bad inputs', { 59 | expect_error(normal_dist(mu = 'a', sigma = 1), regexp = 'must be numeric') 60 | expect_error(normal_dist(mu = 1.5, sigma = -1), regexp = 'must be non-negative') 61 | }) 62 | 63 | test_that('normal_dist creates a valid object', { 64 | dist <- normal_dist(mu = 0, sigma = 1) 65 | expect_equal(class(dist), 'normal_dist') 66 | expect_equal(names(dist), c('mu', 'sigma')) 67 | expect_equal(dist[['mu']], 0) 68 | expect_equal(dist[['sigma']], 1) 69 | }) 70 | 71 | test_that('poisson_dist fails on bad inputs', { 72 | expect_error(poisson_dist(rate = 'a'), regexp = 'must be numeric') 73 | expect_error(poisson_dist(rate = -1), regexp = 'must be non-negative') 74 | }) 75 | 76 | test_that('poisson_dist creates a valid object', { 77 | dist <- poisson_dist(rate = 5) 78 | expect_equal(class(dist), 'poisson_dist') 79 | expect_equal(names(dist), 'rate') 80 | expect_equal(dist[['rate']], 5) 81 | }) 82 | 83 | -------------------------------------------------------------------------------- /tests/testthat/test-get_data_dists.R: -------------------------------------------------------------------------------- 1 | context('Test ability to generate data distributions') 2 | 3 | test_that('get_data_dists can create a list of distributions', { 4 | set.seed(123) 5 | sampling_distribution <- beta_dist(alpha = 1, beta = 1) 6 | distributions <- get_data_dists(sampling_distribution, n = 3, num_variants = 2) 7 | set.seed(123) 8 | rates <- rbeta(6, 1, 1) 9 | expect_equal(distributions[[1]][['b']][['rate']], rates[4]) 10 | 11 | set.seed(123) 12 | sampling_distribution <- normal_gamma_dist(mu = 0, lambda = 1, alpha = 1, beta = 1) 13 | distributions <- get_data_dists(sampling_distribution, n = 3, num_variants = 2) 14 | set.seed(123) 15 | tau <- rgamma(6, 1, 1) 16 | mu <- rnorm(6, 0, sqrt(1 / tau)) 17 | expect_equal(distributions[[2]][['a']][['mu']], mu[2]) 18 | expect_equal(distributions[[3]][['b']][['sigma']], 1 / sqrt(tau[6])) 19 | 20 | set.seed(123) 21 | sampling_distribution <- gamma_dist(alpha = 1, beta = 1) 22 | distributions <- get_data_dists(sampling_distribution, n = 3, num_variants = 2) 23 | set.seed(123) 24 | rates <- rgamma(6, 1, 1) 25 | expect_equal(distributions[[1]][['b']][['rate']], rates[4]) 26 | }) -------------------------------------------------------------------------------- /tests/testthat/test-metrics.R: -------------------------------------------------------------------------------- 1 | context('Testing functions that calculate metrics on the posterior distribution work') 2 | 3 | test_that('b_gt_a and expected_loss_b work correctly for beta distribution', { 4 | dist_a <- beta_dist(alpha = 3, beta = 5) 5 | dist_b <- beta_dist(alpha = 4, beta = 4) 6 | prob <- b_gt_a(dist_a = dist_a, dist_b = dist_b) 7 | loss <- expected_loss_b(dist_a = dist_a, dist_b = dist_b) 8 | theta_a <- simulate_data(dist_a, 1e7) 9 | theta_b <- simulate_data(dist_b, 1e7) 10 | sim_prob <- mean(theta_b > theta_a) 11 | sim_loss <- mean(pmax(theta_a - theta_b, 0)) 12 | expect_true(abs(prob - sim_prob) < 0.01) 13 | expect_true(abs(loss - sim_loss) < 0.01) 14 | 15 | dist_a <- beta_dist(alpha = 3300, beta = 5000) 16 | dist_b <- beta_dist(alpha = 3350, beta = 5000) 17 | prob <- b_gt_a(dist_a = dist_a, dist_b = dist_b) 18 | loss <- expected_loss_b(dist_a = dist_a, dist_b = dist_b) 19 | theta_a <- simulate_data(dist_a, 1e7) 20 | theta_b <- simulate_data(dist_b, 1e7) 21 | sim_prob <- mean(theta_b > theta_a) 22 | sim_loss <- mean(pmax(theta_a - theta_b, 0)) 23 | expect_true(abs(prob - sim_prob) < 0.01) 24 | expect_true(abs(loss - sim_loss) < 0.01) 25 | }) 26 | 27 | test_that('b_gt_a and expected_loss_b work correctly for normal gamma distribution', { 28 | set.seed(123) 29 | dist_a <- normal_gamma_dist(mu = 0, lambda = 1, alpha = 2, beta = 3) 30 | dist_b <- normal_gamma_dist(mu = 0, lambda = 3, alpha = 2, beta = 1) 31 | theta_a <- simulate_data(dist_a, 5) 32 | theta_b <- simulate_data(dist_b, 5) 33 | prob <- b_gt_a(dist_a, dist_b, theta_a, theta_b) 34 | loss <- expected_loss_b(dist_a, dist_b, theta_a, theta_b) 35 | expect_equal(prob, 0.6) 36 | expect_true(abs(loss - 0.318) < 0.01) 37 | }) 38 | 39 | test_that('b_gt_a works correctly for gamma distribution', { 40 | dist_a <- gamma_dist(alpha = 3, beta = 5) 41 | dist_b <- gamma_dist(alpha = 4, beta = 4) 42 | prob <- b_gt_a(dist_a = dist_a, dist_b = dist_b) 43 | loss <- expected_loss_b(dist_a = dist_a, dist_b = dist_b) 44 | theta_a <- simulate_data(dist_a, 1e7) 45 | theta_b <- simulate_data(dist_b, 1e7) 46 | sim_prob <- mean(theta_b > theta_a) 47 | sim_loss <- mean(pmax(theta_a - theta_b, 0)) 48 | expect_true(abs(prob - sim_prob) < 0.01) 49 | expect_true(abs(loss - sim_loss) < 0.01) 50 | 51 | dist_a <- gamma_dist(alpha = 3300, beta = 400) 52 | dist_b <- gamma_dist(alpha = 4650, beta = 550) 53 | prob <- b_gt_a(dist_a = dist_a, dist_b = dist_b) 54 | loss <- expected_loss_b(dist_a = dist_a, dist_b = dist_b) 55 | theta_a <- simulate_data(dist_a, 1e6) 56 | theta_b <- simulate_data(dist_b, 1e6) 57 | sim_prob <- mean(theta_b > theta_a) 58 | sim_loss <- mean(pmax(theta_a - theta_b, 0)) 59 | expect_true(abs(prob - sim_prob) < 0.01) 60 | }) 61 | 62 | test_that('get metrics produces correct output', { 63 | dist_a <- gamma_dist(alpha = 3, beta = 5) 64 | dist_b <- gamma_dist(alpha = 4, beta = 4) 65 | metrics <- get_metrics(posteriors = list(a = dist_a, b = dist_b), sim_batch_size = 1e4) 66 | expect_equal(metrics[['prob_b_gt_a']], b_gt_a(dist_a, dist_b)) 67 | }) 68 | 69 | -------------------------------------------------------------------------------- /tests/testthat/test-run_simulation.R: -------------------------------------------------------------------------------- 1 | context('Test that we can run a whole simulation') 2 | 3 | test_that('A simulation is successful with beta', { 4 | 5 | data_dists <- list(a = bernoulli_dist(rate = 0.3), b = bernoulli_dist(rate = 0.8)) 6 | priors <- list(a = beta_dist(alpha = 1, beta = 1), b = beta_dist(alpha = 1, beta = 1)) 7 | 8 | result <- simulate_ab_test(data_dists = data_dists 9 | , priors = priors 10 | , loss_threshold = 1e-4 11 | , obs_per_round = 10 12 | , max_rounds = 100) 13 | expect_equal(result[['best_variant']], 'b') 14 | }) 15 | 16 | test_that('A simulation is successful with normal gamma', { 17 | 18 | data_dists <- list(a = normal_dist(mu = 2, sigma = 1), b = normal_dist(mu = 1, sigma = 2)) 19 | priors <- list(a = normal_gamma_dist(mu = 0, lambda = 1, alpha = 1, beta = 1) 20 | , b = normal_gamma_dist(mu = 0, lambda = 1, alpha = 1, beta = 1)) 21 | 22 | result <- simulate_ab_test(data_dists = data_dists 23 | , priors = priors 24 | , loss_threshold = 1e-4 25 | , obs_per_round = 10 26 | , max_rounds = 100) 27 | expect_equal(result[['best_variant']], 'a') 28 | }) 29 | 30 | test_that('A simulation is successful with gamma', { 31 | 32 | data_dists <- list(a = poisson_dist(rate = 1), b = poisson_dist(rate = 2)) 33 | priors <- list(a = gamma_dist(alpha = 1, beta = 1), b = gamma_dist(alpha = 1, beta = 1)) 34 | 35 | result <- simulate_ab_test(data_dists = data_dists 36 | , priors = priors 37 | , loss_threshold = 1e-4 38 | , obs_per_round = 10 39 | , max_rounds = 100) 40 | expect_equal(result[['best_variant']], 'b') 41 | }) 42 | -------------------------------------------------------------------------------- /tests/testthat/test-simulation_study.R: -------------------------------------------------------------------------------- 1 | context('Testing the simulation study') 2 | 3 | test_that('Test that frequentist simulation works', { 4 | data_dists <- list('a' = bernoulli_dist(0.4), 'b' = bernoulli_dist(0.5)) 5 | priors <- list('a' = beta_dist(1, 1), 'b' = beta_dist(1, 1)) 6 | result <- investigate_simulations(num_sims = 10 7 | , data_dists = data_dists 8 | , priors = priors 9 | , loss_threshold = 5e-4 10 | , obs_per_round = 10) 11 | expect_identical(result[['summary_dt']][['avg_loss']], 0) 12 | 13 | data_dists <- list('a' = normal_dist(0, 1), 'b' = normal_dist(1, 1)) 14 | priors <- list('a' = normal_gamma_dist(0, 1, 1, 1), 'b' = normal_gamma_dist(0, 1, 1, 1)) 15 | result <- investigate_simulations(num_sims = 10 16 | , data_dists = data_dists 17 | , priors = priors 18 | , loss_threshold = 5e-4 19 | , obs_per_round = 10) 20 | expect_identical(result[['summary_dt']][['avg_loss']], 0) 21 | 22 | data_dists <- list('a' = poisson_dist(1), 'b' = poisson_dist(2)) 23 | priors <- list('a' = gamma_dist(1, 1), 'b' = gamma_dist(1, 1)) 24 | result <- investigate_simulations(num_sims = 10 25 | , data_dists = data_dists 26 | , priors = priors 27 | , loss_threshold = 5e-4 28 | , obs_per_round = 10) 29 | expect_identical(result[['summary_dt']][['avg_loss']], 0) 30 | }) -------------------------------------------------------------------------------- /tests/testthat/test-stopping_functions.R: -------------------------------------------------------------------------------- 1 | context('Testing functions that stop the tests') 2 | 3 | test_that('single_loss_stop provides the best variant if there is a winner', { 4 | losses <- c(a = 10, b = 2, c = 30) 5 | out <- single_loss_stop(losses = losses, loss_threshold = 25) 6 | expect_equal(out, list(stop = TRUE, variant = 'b')) 7 | }) 8 | 9 | test_that('single_loss_stop provides NULL if no variant has a small loss', { 10 | losses <- c(a = 10, b = 2, c = 30) 11 | out <- single_loss_stop(losses = losses, loss_threshold = 1) 12 | expect_equal(out, list(stop = FALSE, variant = NULL)) 13 | }) 14 | -------------------------------------------------------------------------------- /tests/testthat/test-update_priors.R: -------------------------------------------------------------------------------- 1 | context('Testing functions that update prior distributions') 2 | 3 | test_that('update_prior.beta_dist fails on bad inputs', { 4 | prior <- beta_dist(alpha = 5, beta = 10) 5 | expect_error(update_prior(prior = prior, evidence = 1:3), regexp = 'must be a binary vector') 6 | expect_error(update_prior(prior = prior), regexp = 'cannot be NULL') 7 | }) 8 | 9 | test_that('update_prior.beta_dist produces proper output', { 10 | prior <- beta_dist(alpha = 5, beta = 10) 11 | post <- update_prior(prior = prior, evidence = c(1, 1, 0, 0)) 12 | expected <- beta_dist(alpha = 7, beta = 12) 13 | expect_equal(post, expected) 14 | post <- update_prior(prior = prior, stats = list('num_obs' = 10, 'observed_rate' = 0.5)) 15 | expected <- beta_dist(alpha = 10, beta = 15) 16 | expect_equal(post, expected) 17 | }) 18 | 19 | test_that('update_prior.normal_gamma_dist fails on bad inputs', { 20 | prior <- normal_gamma_dist(mu = 0, lambda = 1, alpha = 5, beta = 10) 21 | expect_error(update_prior(prior = prior, evidence = c('a', 'b')), regexp = 'must be a numeric vector') 22 | expect_error(update_prior(prior = prior), regexp = 'cannot be NULL') 23 | }) 24 | 25 | test_that('update_prior.normal_gamma_dist produces proper output', { 26 | prior <- normal_gamma_dist(mu = 0, lambda = 1, alpha = 5, beta = 10) 27 | post <- update_prior(prior = prior, evidence = c(1, 2, 3, 4)) 28 | expected <- normal_gamma_dist(mu = 2, lambda = 5, alpha = 7, beta = 15) 29 | expect_equal(post, expected) 30 | post <- update_prior(prior = prior, stats = list('num_obs' = 4, 'avg' = 2.5, 'std_dev' = sd(c(1, 2, 3, 4)))) 31 | expect_equal(post, expected) 32 | }) 33 | 34 | test_that('update_prior.gamma_dist fails on bad inputs', { 35 | prior <- gamma_dist(alpha = 5, beta = 10) 36 | expect_error(update_prior(prior = prior, evidence = -1:3), regexp = 'positive integer vector') 37 | expect_error(update_prior(prior = prior), regexp = 'cannot be NULL') 38 | }) 39 | 40 | test_that('update_prior.gamma_dist produces proper output', { 41 | prior <- gamma_dist(alpha = 5, beta = 10) 42 | post <- update_prior(prior = prior, evidence = 1:4) 43 | expected <- gamma_dist(alpha = 15, beta = 14) 44 | expect_equal(post, expected) 45 | post <- update_prior(prior = prior, stats = list('num_sessions' = 3, 'observed_count' = 30)) 46 | expected <- gamma_dist(alpha = 35, beta = 13) 47 | expect_equal(post, expected) 48 | }) 49 | 50 | test_that('validate_dt fails where it should', { 51 | df <- data.frame(x = 1:4, y = 2:5) 52 | expect_error(validate_dt(df), regexp = 'data.table') 53 | dt <- data.table::data.table(x = 1:4, y = 2:5) 54 | expect_error(validate_dt(dt, c('x', 'y', 'z')), 'z') 55 | expect_null(validate_dt(dt, c('x', 'y'))) 56 | }) 57 | 58 | test_that('update priors can update a list of priors', { 59 | priors <- list('a' = beta_dist(alpha = 1, beta = 1) 60 | , 'b' = beta_dist(alpha = 1, beta = 1)) 61 | evidence_dt <- data.table::data.table(a = c(1, 1, 1, 0, 0, 0) 62 | , b = c(1, 1, 1, 1, 0, 0)) 63 | posteriors <- update_priors(priors = priors, evidence_dt = evidence_dt) 64 | expected <- list('a' = beta_dist(alpha = 4, beta = 4) 65 | , 'b' = beta_dist(alpha = 5, beta = 3)) 66 | expect_equal(posteriors, expected) 67 | stats_dt <- data.table::data.table(variant = c('a', 'b') 68 | , num_obs = c(6, 6) 69 | , observed_rate = c(1 / 2, 2 / 3)) 70 | posteriors <- update_priors(priors = priors, evidence_dt = evidence_dt) 71 | expect_equal(posteriors, expected) 72 | }) 73 | 74 | 75 | -------------------------------------------------------------------------------- /vignettes/abayes_vignette.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "abayes Vignette" 3 | author: "Michael Frasco" 4 | date: "`r Sys.Date()`" 5 | output: rmarkdown::html_vignette 6 | vignette: > 7 | %\VignetteIndexEntry{Vignette Title} 8 | %\VignetteEngine{knitr::rmarkdown} 9 | \usepackage[utf8]{inputenc} 10 | --- 11 | 12 | ## What is this package? 13 | 14 | For background information, please read the README.md at the root of the git repository. The rest of this document assumes that you have read that README. 15 | 16 | ## Scope 17 | 18 | This document supports [this blog post](convoy.com). 19 | 20 | ```{r, message=FALSE} 21 | library(ggplot2) 22 | library(data.table) 23 | library(parallel) 24 | library(purrr) 25 | library(abayes) 26 | ``` 27 | 28 | ## Loss Function 29 | 30 | The loss function that we are using in these simulations is shown below as a function of the difference between $\theta_b$ and $\theta_a$. We lose nothing if $\theta_a$ is larger than $\theta_b$, else we lose the difference. There are many variants that we could apply to this function. We'll stick with this function because it is the simplest. 31 | 32 | ```{r, fig.height=4, fig.width=6.67} 33 | dt <- data.table(theta_diff = seq(-3, 3, length.out = 100)) 34 | dt[, loss_fun := pmax(theta_diff, 0)] 35 | 36 | loss_plot <- ggplot(dt, aes(x = theta_diff, y = loss_fun)) + 37 | geom_line(colour = '#f65335', size = 1.5) + 38 | labs(title = 'Loss Function when Choosing Variant A', aes(colour = '#f65335')) + 39 | xlab(expression(beta*' - '*alpha)) + 40 | ylab('Loss Function') + 41 | theme(plot.title = element_text(hjust = 0.5, size = 22) 42 | , axis.title = element_text(size = 18) 43 | , axis.text = element_text(size = 14)) 44 | loss_plot 45 | ``` 46 | 47 | ## Simulations 48 | 49 | I want to demonstrate how we can control our average loss over a sequence of experiments by ending the experiment once the expected loss for either variant drops below $\epsilon$. This is the guarantee about reliability that Bayesian methods provide. 50 | 51 | I use 20 different values of $\epsilon$ for each setup. In practice, the choice of $\epsilon$ is very context dependent. In a bernoulli experiment with a small expected rate, $\epsilon$ is going to be very small. In a normal experiment with a large variance, $\epsilon$ is going to be much larger. You can also use a loss function that takes the percent difference between $\theta_a$ and $\theta_b$ in order to eliminate some of this variability from experiment to experiment. 52 | 53 | The last lever that controls the settings of the experiment is how frequently we evaluate the stopping condition. We could check the results after every single observation. However, for the sake of experimentation speed, I try to setup the simulations so that we evaluate the stopping condition on average 25 times over the entire experiment. In my head, this corresponding to checking the results once a day for a month. Of course, evaluating the stopping condition fewer times will lead to fewer mistakes. Therefore, one might consider a stopping rule that makes sure that the expected loss has been below $\epsilon$ for a consecutive period of time. 54 | 55 | ### Choosing a Prior 56 | 57 | The prior distribution that I am going to use for this experiment is shown below. 58 | 59 | ```{r, fig.height=4, fig.width=6.67} 60 | prior <- beta_dist(alpha = 70, beta = 7000) 61 | sampling_plot <- plot_beta(betas = list('a' = prior), title = 'Prior Distribution', xlab = 'Rate that Events Occur') 62 | sampling_plot 63 | ``` 64 | 65 | This distribution was chosen because it is centered around $0.01$ and has a reasonable range of values from $0.007$ to $0.013$. We are going to use these values as our prior for both variants and as our sampling distribution for the data generating distribution in each simulation. 66 | 67 | ### Running the Simulation 68 | 69 | ```{r, eval=FALSE} 70 | num_cores <- detectCores() - 1 71 | 72 | num_thresholds <- 20 73 | num_sims <- 250 # doing more simulations would lead to more accurate results 74 | max_rounds <- 10000 75 | 76 | numeric_seed <- as.numeric(paste(charToRaw('ab'), collapse = '')) # 6162 77 | set.seed(numeric_seed) 78 | 79 | priors <- list('a' = prior, 'b' = prior) 80 | thresholds <- 10 ^ seq(-5.5, -3, length.out = num_thresholds) 81 | obs_to_see <- seq(3000, 50, length.out = num_thresholds) 82 | obs_to_see <- ceiling(obs_to_see) - ceiling(obs_to_see) %% 2 83 | results <- vector('list', num_thresholds) 84 | 85 | for (i in 1:num_thresholds) { 86 | print(paste('performing simulation', i)) 87 | results[[i]] <- investigate_simulations(num_sims = num_sims 88 | , priors = priors 89 | , loss_threshold = thresholds[i] 90 | , sampling_distribution = prior 91 | , obs_per_round = obs_to_see[i] 92 | , max_rounds = max_rounds 93 | , num_cores = num_cores) 94 | } 95 | ``` 96 | 97 | ```{r, eval=FALSE} 98 | avg_losses <- purrr::map_dbl(results, function(x) mean(x[['sim_dt']][['loss']], na.rm = TRUE)) 99 | dt <- data.table(log_thresh = thresholds, avg_loss = avg_losses, expected = thresholds) 100 | sim_plot <- ggplot(dt, aes(log_thresh)) + 101 | geom_line(aes(y = avg_loss, colour = 'observed average loss'), size = 1.25) + 102 | geom_line(aes(y = expected, colour = 'expected loss'), size = 1.5) + 103 | labs(title = expression('Observed Loss is Controlled by '*epsilon), aes(colour = '#f65335')) + 104 | xlab(expression(epsilon*': Threshold Used in Simulations')) + 105 | ylab('Simulated Average Loss') + 106 | scale_colour_manual(values=c('black', '#f65335')) + 107 | scale_x_log10() + 108 | scale_y_log10() + 109 | theme(plot.title = element_text(size = 16, hjust = 0.5) 110 | , plot.subtitle = element_text(size = 14, hjust = 0.5) 111 | , axis.text = element_text(size = 12) 112 | , axis.title = element_text(size = 14) 113 | , legend.title = element_blank() 114 | , legend.position = 'bottom' 115 | , legend.text = element_text(size = 12)) 116 | sim_plot 117 | ``` 118 | 119 | 120 | ## Derivations for Bayesian A/B Testing 121 | 122 | ### Calculating Metrics Between Two Variants 123 | 124 | The following derivations were taken from [Evan Miller's website](https://www.evanmiller.org/bayesian-ab-testing.html) and [Chris Stucchio's website](https://www.chrisstucchio.com/blog/2014/bayesian_ab_decision_rule.html). I am re-creating their derivations for my own sake. 125 | 126 | #### Beta Distribution 127 | 128 | ##### Probability Variant B Greater than Variant A 129 | 130 | We aim to provide an analytic solution for calculating $P(p_b > p_a)$ where 131 | 132 | $$ 133 | \begin{eqnarray} 134 | p_a & \sim & Beta(\alpha_a, \beta_a) \\ 135 | p_b & \sim & Beta(\alpha_b, \beta_b) 136 | \end{eqnarray} 137 | $$ 138 | 139 | Here $Beta(., .)$ represents the [Beta distribution](https://en.wikipedia.org/wiki/Beta_distribution). 140 | 141 | ###### Background Information Needed For the Derivation 142 | 143 | Before, we begin the actual derivation, we are going to provide some established facts about the Beta distribution and the [Beta function](https://en.wikipedia.org/wiki/Beta_function). 144 | 145 | First, for any random variable $X$, if $X \sim B(\alpha, \beta)$, then the probability density function (i.e. the probability that $X$ takes on some value $x$) can be written as follows. 146 | 147 | $$ 148 | \begin{eqnarray} 149 | f(x) = P(X = x) & = & \frac{x^{\alpha - 1} (1 - x)^{\beta - 1}}{B(\alpha, \beta)} 150 | \end{eqnarray} 151 | $$ 152 | Where $B(., .)$ represents the Beta function. 153 | 154 | Next, the cumulative density function of $X$ (i.e. the probability that $X$ takes on some value less than $x$) is known as the regularized incomplete beta function. 155 | 156 | $$ 157 | \begin{eqnarray} 158 | P(X < x) = I_x(\alpha, \beta) & = & \frac{B(x; \alpha, \beta)}{B(\alpha, \beta)} 159 | \end{eqnarray} 160 | $$ 161 | 162 | Where $B(x; \alpha, \beta)$ is the incomplete beta function and can be expressed as 163 | 164 | $$ 165 | \begin{eqnarray} 166 | B(x; \alpha, \beta) = \int_0^x t^{\alpha - 1} (1 - t)^{\beta - 1} dt 167 | \end{eqnarray} 168 | $$ 169 | 170 | Lastly, we have the [following equalities](https://en.wikipedia.org/wiki/Beta_function#Properties_2) 171 | 172 | $$ 173 | \begin{eqnarray} 174 | I_x(1, \beta) & = & 1 - (1 - x)^{\beta} \\ 175 | I_x(\alpha, \beta) & = & I_x(\alpha - 1, \beta) - \frac{x^{\alpha - 1} (1 - x)^\beta}{(\alpha - 1) B(\alpha - 1, \beta)} 176 | \end{eqnarray} 177 | $$ 178 | 179 | If we apply that second equation recursively and use the first equation as the base case, we obtain the following equation. 180 | 181 | $$ 182 | \begin{eqnarray} 183 | I_x(\alpha, \beta) & = & 1 - (1 - x)^b - \sum_{j=1}^{\alpha - 1} \frac{x^{\alpha - j} (1 - x)^b}{(\alpha - j) B(\alpha - j, \beta)} 184 | \end{eqnarray} 185 | $$ 186 | Now, if change the iterating variable so that $i = \alpha - j$ and let $(1 - x)^b$ represent the case where $i = 0$, then we have the following identities 187 | 188 | $$ 189 | \begin{eqnarray} 190 | x^{\alpha - j} & = & x^i \\ 191 | (\alpha - j) * B(\alpha - 1, \beta) & = & (\beta + i) * B(1 + i, \beta) 192 | \end{eqnarray} 193 | $$ 194 | And this allows us to write 195 | 196 | $$ 197 | \begin{eqnarray} 198 | I_x(\alpha, \beta) = 1 - \sum_{i=0}^{\alpha - 1} \frac{x^i (1 - x)^\beta}{(\beta + i) B(1 + i, \beta)} 199 | \end{eqnarray} 200 | $$ 201 | 202 | ###### The actual derivation 203 | 204 | Given all of the information above, we can write 205 | 206 | $$ 207 | \begin{eqnarray} 208 | P(p_b > p_a) & = & \int_0^1 \int_0^1 1_{p_b > p_a} f(p_a) f(p_b) dp_b dp_a \\ 209 | & = & \int_0^1 \int_{p_a}^1 f(p_a) f(p_b) dp_b dp_a \\ 210 | & = & \int_0^1 f(p_a) \Big[ \int_{p_a}^1 f(p_b) dp_b \Big] dp_a \\ 211 | & = & \int_0^1 f(p_a) \Big[ 1 - I_{p_a}(\alpha_b, \beta_b) \Big] dp_a \\ 212 | & = & \int_0^1 f(p_a) dp_a - \int_0^1 f(p_a) I_{p_a}(\alpha_b, \beta_b) dp_a \\ 213 | & = & 1 - \int_0^1 f(p_a) \Big[ 1 - \sum_{i=0}^{\alpha_b - 1} \frac{p_a^i (1 - p_a)^{\beta_b}}{(\beta_b + i) B(1 + i, \beta_b)} \Big] dp_a \\ 214 | & = & 1 - 1 + \int_0^1 f(p_a) \Big[ \sum_{i=0}^{\alpha_b - 1} \frac{p_a^i (1 - p_a)^{\beta_b}}{(\beta_b + i) B(1 + i, \beta_b)} \Big] dp_a \\ 215 | & = & \sum_{i=0}^{\alpha_b - 1} \int_0^1 f(p_a) \Big[ \frac{p_a^i (1 - p_a)^{\beta_b}}{(\beta_b + i) B(1 + i, \beta_b)} \Big] dp_a \\ 216 | & = & \sum_{i=0}^{\alpha_b - 1} \int_0^1 \frac{p_a^{\alpha_a + i - 1} (1 - p_a)^{\beta_a + \beta_b - 1}}{(\beta_b + i) B(\alpha_a, \beta_a) B(1 + i, \beta_b)} dp_a \\ 217 | \end{eqnarray} 218 | $$ 219 | 220 | Lastly, if we multiple the numerator and denominator by $B(\alpha_a + i, \beta_a + \beta_b)$ and bring all of the terms that do not depend on $p_a$ out of the integral, we get 221 | 222 | $$ 223 | \begin{eqnarray} 224 | P(p_b > p_a) & = & \sum_{i=0}^{\alpha_b - 1} \frac{B(\alpha_a + i, \beta_a + \beta_b)}{(\beta_b + i)B(\alpha_a, \alpha_b)B(1 + i, \beta_b)} \int_0^1 \frac{p_a^{\alpha_a + i - 1}(1 - p_a)^{\beta_a + \beta_b - 1}}{B(\alpha_a + i, \beta_a + \beta_b)} dp_a \\ 225 | & = & \sum_{i=0}^{\alpha_b - 1} \frac{B(\alpha_a + i, \beta_a + \beta_b)}{(\beta_b + i)B(\alpha_a, \alpha_b)B(1 + i, \beta_b)} \equiv h(\alpha_a, \beta_a, \alpha_b, \beta_b) 226 | \end{eqnarray} 227 | $$ 228 | 229 | Thus, we have a closed form expression for the desired probability that only depends on the parameters of the posterior distributions. 230 | 231 | ##### $E[L](A)$ and $E[L](B)$ 232 | 233 | Next, we want to derive the formula for the bayesian A/B testing decision rule. 234 | 235 | We assume that the loss function for each variant is the absolute loss, which is shown below. 236 | 237 | $$ 238 | \begin{eqnarray} 239 | L(p_a, p_b, A) & = & max(p_b - p_a, 0) \\ 240 | L(p_a, p_b, B) & = & max(p_a - p_b, 0) 241 | \end{eqnarray} 242 | $$ 243 | 244 | This derivation depends on the derivation for $P(p_b > p_a) \equiv h(\alpha_a, \beta_a, \alpha_b, \beta_b)$, which is shown above. 245 | 246 | $$ 247 | \begin{eqnarray} 248 | E[L](B) & = & \int_0^1 \int_0^1 max(p_a - p_b, 0) f(p_a) f(p_b) dp_a dp_b \\ 249 | & = & \int_0^1 \int_{p_b}^1 (p_a - p_b) f(p_a) f(p_b) dp_a dp_b \\ 250 | & = & \int_0^1 \int_{p_b}^1 \frac{p_a^{\alpha_a} (1 - p_a)^{\beta_a - 1}}{B(\alpha_a, \beta_a)} \frac{p_b^{\alpha_b - 1} (1 - p_b)^{\beta_b - 1}}{B(\alpha_b, \beta_b)} dp_a dp_b - \int_0^1 \int_{p_b}^1 \frac{p_a^{\alpha_a - 1} (1 - p_a)^{\beta_a - 1}}{B(\alpha_a, \beta_a)} \frac{p_b^{\alpha_b} (1 - p_b)^{\beta_b - 1}}{B(\alpha_b, \beta_b)} dp_a dp_b \\ 251 | \end{eqnarray} 252 | $$ 253 | Now, we can multiply the first term by $\frac{B(\alpha_a + 1, \beta_a)}{B(\alpha_a + 1, \beta_a)}$ and the second term by $\frac{B(\alpha_b + 1, \beta_b)}{B(\alpha_b + 1, \beta_b)}$. 254 | 255 | $$ 256 | \begin{eqnarray} 257 | E[L](B) & = & \frac{B(\alpha_a + 1, \beta_a)}{B(\alpha_a + 1, \beta_a)} \int_0^1 \int_{p_b}^1 \frac{p_a^{\alpha_a} (1 - p_a)^{\beta_a - 1}}{B(\alpha_a, \beta_a)} \frac{p_b^{\alpha_b - 1} (1 - p_b)^{\beta_b - 1}}{B(\alpha_b, \beta_b)} dp_a dp_b - \frac{B(\alpha_b + 1, \beta_b)}{B(\alpha_b + 1, \beta_b)} \int_0^1 \int_{p_b}^1 \frac{p_a^{\alpha_a - 1} (1 - p_a)^{\beta_a - 1}}{B(\alpha_a, \beta_a)} \frac{p_b^{\alpha_b} (1 - p_b)^{\beta_b - 1}}{B(\alpha_b, \beta_b)} dp_a dp_b \\ 258 | & = & \frac{B(\alpha_a + 1, \beta_a)}{B(\alpha_a, \beta_a)} \int_0^1 \int_{p_b}^1 \frac{p_a^{\alpha_a} (1 - p_a)^{\beta_a - 1}}{B(\alpha_a + 1, \beta_a)} \frac{p_b^{\alpha_b - 1} (1 - p_b)^{\beta_b - 1}}{B(\alpha_b, \beta_b)} dp_a dp_b - \frac{B(\alpha_b + 1, \beta_b)}{B(\alpha_b, \beta_b)} \int_0^1 \int_{p_b}^1 \frac{p_a^{\alpha_a - 1} (1 - p_a)^{\beta_a - 1}}{B(\alpha_a, \beta_a)} \frac{p_b^{\alpha_b} (1 - p_b)^{\beta_b - 1}}{B(\alpha_b + 1, \beta_b)} dp_a dp_b \\ 259 | & = & \frac{B(\alpha_a + 1, \beta_a)}{B(\alpha_a, \beta_a)} \big(1 - h(\alpha_a + 1, \beta_a, \alpha_b, \beta_b)\big) - \frac{B(\alpha_b + 1, \beta_b)}{B(\alpha_b, \beta_b)} \big(1 - h(\alpha_a, \beta_a, \alpha_b + 1, \beta_b)\big) 260 | \end{eqnarray} 261 | $$ 262 | 263 | Since the loss function is symmetrical, we can solve for $E[L](A)$ by switching the roles of the two variants in the equation above. 264 | 265 | 266 | ### Constructing Parameterized Distributions From Desired Properties 267 | 268 | #### Beta Distribution 269 | 270 | Suppose that $x \sim Beta(\alpha, \beta)$. 271 | 272 | We know that $E[x] = \mu = \frac{\alpha}{\alpha + \beta}$ and $Var(x) = \frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}$. 273 | 274 | Then, we can write $\alpha + \beta = \frac{\alpha}{\mu}$ and then $\beta = \frac{\alpha}{\mu} - \alpha$. We can use this value to solve for $\alpha$. We have that $\sigma^2 = \frac{\alpha (\frac{\alpha}{\mu} - \alpha)}{(\frac{\alpha}{\mu})^2 (\frac{\alpha}{\mu} + 1)}$. 275 | 276 | Simplifying the above, we have 277 | 278 | $$ 279 | \begin{eqnarray} 280 | \sigma^2 & = & \frac{\alpha^2 (1 - \mu) \frac{1}{\mu}}{\alpha^2 (\alpha + \mu) \frac{1}{\mu^3}} \\ 281 | & = & \frac{(1 - \mu)}{(\alpha + \mu) \frac{1}{\mu^2}} \\ 282 | \end{eqnarray} 283 | $$ 284 | 285 | This implies that $(1 - \mu) \mu^2 = \sigma^2 \alpha + \sigma^2 \mu$, so that we get $\alpha = \frac{(1 - \mu)\mu^2 - \mu \sigma^2}{\sigma^2}$. 286 | 287 | #### Normal Gamma Distribution 288 | 289 | Suppose that $\mu \sim N(\mu_0, \frac{1}{\lambda \tau})$ and that $\tau \sim Gamma(\alpha, \beta)$, then we have that $(\mu, \tau) \sim NormalGamma(\mu_0, \lambda, \alpha, \beta)$. 290 | 291 | We know that $E[\mu] = \mu_0$, $E[\tau] = \tau_0 = \frac{\alpha}{\beta}$, $Var(\mu) = \sigma^2_{\mu} = \frac{\beta}{\lambda (\alpha - 1)}$, and $Var(\tau) = \sigma^2_{\tau} \frac{\alpha}{\beta^2}$. 292 | 293 | Then, we can write $\beta \tau_0 = \alpha$. We can use this to solve for $\beta$. We have that $\sigma^2_{\tau} = \frac{\beta \tau_0}{\beta^2} = \frac{\tau_0}{\beta}$, so that $\beta = \frac{\tau_0}{\sigma^2_{\tau}}$. 294 | 295 | Finally, we can use the values for $\alpha$ and $\beta$ to solve for $\lambda$. We have that $\lambda = \frac{\beta}{\sigma^2_{\mu} (\alpha - 1)}$. 296 | 297 | #### Gamma Distribution 298 | 299 | Suppose that $x \sim Gamma(\alpha, \beta)$. 300 | 301 | We know that $E[x] = \mu = \frac{\alpha}{\beta}$ and that $Var(x) = \sigma^2 = \frac{\alpha}{\beta^2}$. 302 | 303 | Then, we can write $\beta \mu = \alpha$. We can use this to solve for $\beta$. We have that $\sigma^2 = \frac{\beta \mu}{\beta^2} = \frac{\mu}{\beta}$. This allows us to write $\beta = \frac{\mu}{\sigma ^ 2}$. 304 | 305 | Therfore, given $\mu$ and $\sigma$, we can first solve for $\beta$ and then use this to solve for $\alpha$. 306 | 307 | 308 | 309 | --------------------------------------------------------------------------------